Using unsupervised clustering approach to train the Support Vector Machine for text classification

Shafiabady, N. and Lee, L.H. and Rajkumar, R. and Kallimani, V.P. and Akram, N.A. and Isa, D. (2016) Using unsupervised clustering approach to train the Support Vector Machine for text classification. Neurocomputing, 211. pp. 4-10. ISSN 09252312

Full text not available from this repository.
Official URL: https://www.scopus.com/inward/record.uri?eid=2-s2....

Abstract

The use of learning algorithms for text classification assumes the availability of a large amount of documents which have been organized and labeled correctly by human experts for use in the training phase. Unless the text documents in question have been in existence for some time, using an expert system is inevitable because manual organizing and labeling of thousands of groups of text documents can be a very labor intensive and intellectually challenging activity. Also, in some new domains, the knowledge to organize and label different classes might not be unavailable. Therefore unsupervised learning schemes for automatically clustering data in the training phase are needed. Furthermore, even when knowledge exists, variation is high when the subject under classification depends on personal opinions and is open to different interpretations. This paper describes a methodology which uses Self Organizing Maps (SOM) and alternatively does the automatic clustering by using the Correlation Coefficient (CorrCoef). Consequently the clusters are used as the labels to train the Support Vector Machine (SVM). Experiments and results are presented based on applying the methodology to some standard text datasets in order to verify the accuracy of the proposed scheme. We will also present results which are used to evaluate the effect that dimensionality reduction and changes in the clustering schemes have on the accuracy of the SVM. Results show that the proposed combination has better accuracy compared to training the learning machine using the expert knowledge. © 2016 Elsevier B.V.

Item Type: Article
Additional Information: cited By 46
Uncontrolled Keywords: Algorithms; Classification (of information); Conformal mapping; Expert systems; Information retrieval systems; Learning algorithms; Learning systems; Self organizing maps; Text processing; Unsupervised learning, Automatic clustering; Clustering scheme; Correlation coefficient; Dimensionality reduction; Expert knowledge; Learning machines; Text classification; Unsupervised clustering, Support vector machines, accuracy; algorithm; Article; calculation; classification; linear system; mathematical analysis; mathematical model; prediction; priority journal; probability; support vector machine
Depositing User: Mr Ahmad Suhairi UTP
Date Deposited: 09 Nov 2023 16:18
Last Modified: 09 Nov 2023 16:18
URI: https://khub.utp.edu.my/scholars/id/eprint/6728

Actions (login required)

View Item
View Item