eprintid: 2476 rev_number: 2 eprint_status: archive userid: 1 dir: disk0/00/00/24/76 datestamp: 2023-11-09 15:50:42 lastmod: 2023-11-09 15:50:42 status_changed: 2023-11-09 15:43:37 type: conference_item metadata_visibility: show creators_name: Zamin, N. creators_name: Oxley, A. creators_name: Bakar, Z.A. creators_name: Farhan, S.A. title: Characteristics of a Malay journalistic corpus ispublished: pub keywords: corpus; Empirical analysis; Information Extraction; Linguistic patterns; Malaysia; Named entities; NAtural language processing; word distribution, Natural language processing systems; Terrorism, Linguistics note: cited By 1; Conference of 2012 1st IEEE Conference on Control, Systems and Industrial Informatics, ICCSII 2012 ; Conference Date: 23 September 2012 Through 26 September 2012; Conference Code:96023 abstract: This paper presents in detail a linguistics study of a journalistic corpus of Malay describing Indonesian terrorism. The initial raw text was manually annotated for its parts-of-speech. It is the first corpus of its nature ever established in Malaysia. The objective of this research is to conduct an empirical analysis of the actual patterns of use in journalistic texts. This paper presents the characteristics of Malay terrorism corpus which include the properties, word classes, named entities and word occurrences. The results of this work are given purely in terms of the characteristics of a Malay terrorism corpus. The results are highly useful for solving larger tasks in the Natural Language Processing area, such as Information Retrieval and Information Extraction, in the area of terrorism. © 2012 IEEE. date: 2012 official_url: https://www.scopus.com/inward/record.uri?eid=2-s2.0-84875103783&doi=10.1109%2fCCSII.2012.6470503&partnerID=40&md5=6776ad8b5bae336d40307290eb6d1aff id_number: 10.1109/CCSII.2012.6470503 full_text_status: none publication: Proceedings of 2012 IEEE Conference on Control, Systems and Industrial Informatics, ICCSII 2012 place_of_pub: Bandung pagerange: 214-218 refereed: TRUE isbn: 9781467310239 citation: Zamin, N. and Oxley, A. and Bakar, Z.A. and Farhan, S.A. (2012) Characteristics of a Malay journalistic corpus. In: UNSPECIFIED.