Keywords similarity based topic identification for Indonesian news documents

Fuddoly, A. and Jaafar, J. and Zamin, N. (2013) Keywords similarity based topic identification for Indonesian news documents. In: UNSPECIFIED.

Full text not available from this repository.
Official URL: https://www.scopus.com/inward/record.uri?eid=2-s2....

Abstract

Topic identification (TID) is a technique associated with labelling a set of textual documents with a meaningful label representing its content. TID for online news presents different problems from TID for other corpora, such as the large data volume and the frequently updated topic. Moreover, the number of developing methods for Indonesian corpus is rather small. Brace well's algorithm has been proven effective in identifying topics in English and Japanese corpora with high accuracy. This paper implements a method for TID based on Brace well's keywords similarity algorithm and the top-n keywords selection for Indonesian news documents. The top-n method is utilized to improve Brace well's performance within Indonesian corpus, and to reduce the dimension of dataset during training. The combination is aimed to reduce the heavy computation problem and to explore the possibility of a new emerging topic which possibly has to be created. The method consists of two stages: training and classification. It studies the keywords of the training dataset then calculates the similarity between testing and training articles' keywords. The algorithm produced accuracy as high as 95.22 on onlineand95.26 on offline environment, 84 against human evaluation, and an average of 2.96 seconds computational time. © 2013 IEEE.

Item Type: Conference or Workshop Item (UNSPECIFIED)
Additional Information: cited By 0; Conference of UKSim-AMSS 7th European Modelling Symposium on Computer Modelling and Simulation, EMS 2013 ; Conference Date: 20 November 2013 Through 22 November 2013; Conference Code:104642
Uncontrolled Keywords: Information retrieval; Statistical tests, Computation problems; Computational time; Large data volumes; News domain; Similarity algorithm; Text document; Textual documents; Topic identification, Algorithms
Depositing User: Mr Ahmad Suhairi UTP
Date Deposited: 09 Nov 2023 15:52
Last Modified: 09 Nov 2023 15:52
URI: https://khub.utp.edu.my/scholars/id/eprint/3836

Actions (login required)

View Item
View Item