eprintid: 3836 rev_number: 2 eprint_status: archive userid: 1 dir: disk0/00/00/38/36 datestamp: 2023-11-09 15:52:06 lastmod: 2023-11-09 15:52:06 status_changed: 2023-11-09 15:47:44 type: conference_item metadata_visibility: show creators_name: Fuddoly, A. creators_name: Jaafar, J. creators_name: Zamin, N. title: Keywords similarity based topic identification for Indonesian news documents ispublished: pub keywords: Information retrieval; Statistical tests, Computation problems; Computational time; Large data volumes; News domain; Similarity algorithm; Text document; Textual documents; Topic identification, Algorithms note: cited By 0; Conference of UKSim-AMSS 7th European Modelling Symposium on Computer Modelling and Simulation, EMS 2013 ; Conference Date: 20 November 2013 Through 22 November 2013; Conference Code:104642 abstract: Topic identification (TID) is a technique associated with labelling a set of textual documents with a meaningful label representing its content. TID for online news presents different problems from TID for other corpora, such as the large data volume and the frequently updated topic. Moreover, the number of developing methods for Indonesian corpus is rather small. Brace well's algorithm has been proven effective in identifying topics in English and Japanese corpora with high accuracy. This paper implements a method for TID based on Brace well's keywords similarity algorithm and the top-n keywords selection for Indonesian news documents. The top-n method is utilized to improve Brace well's performance within Indonesian corpus, and to reduce the dimension of dataset during training. The combination is aimed to reduce the heavy computation problem and to explore the possibility of a new emerging topic which possibly has to be created. The method consists of two stages: training and classification. It studies the keywords of the training dataset then calculates the similarity between testing and training articles' keywords. The algorithm produced accuracy as high as 95.22 on onlineand95.26 on offline environment, 84 against human evaluation, and an average of 2.96 seconds computational time. © 2013 IEEE. date: 2013 publisher: IEEE Computer Society official_url: https://www.scopus.com/inward/record.uri?eid=2-s2.0-84899499787&doi=10.1109%2fEMS.2013.3&partnerID=40&md5=06d01b7b09d0aa96f6eaea6ca6481fef id_number: 10.1109/EMS.2013.3 full_text_status: none publication: Proceedings - UKSim-AMSS 7th European Modelling Symposium on Computer Modelling and Simulation, EMS 2013 place_of_pub: Manchester pagerange: 14-20 refereed: TRUE citation: Fuddoly, A. and Jaafar, J. and Zamin, N. (2013) Keywords similarity based topic identification for Indonesian news documents. In: UNSPECIFIED.