Muflikhah, L. and Baharudin, B. (2009) Document clustering using concept space and cosine similarity measurement. In: UNSPECIFIED.
Full text not available from this repository.Abstract
Document clustering is related to data clustering concept which is one of data mining tasks and unsupervised classification. It is often applied to the huge data in order to make a partition based on their similarity. Initially, it used for Information Retrieval in order to improve the precision and recall from query. It is very easy to cluster with small data attributes which contains of important items. Furthermore, document clustering is very useful in retrieve information application in order to reduce the consuming time and get high precision and recall. Therefore, we propose to integrate the information retrieval method and document clustering as concept space approach. The method is known as Latent Semantic Index (LSI) approach which used Singular Vector Decomposition (SVD) or Principle Component Analysis (PCA). The aim of this method is to reduce the matrix dimension by finding the pattern in document collection with refers to concurrent of the terms. Each method is implemented to weight of term-document in vector space model (VSM) for document clustering using fuzzy c-means algorithm. Besides reduction of term-document matrix, this research also uses the cosine similarity measurement as replacement of Euclidean distance to involve in fuzzy c-means. And as a result, the performance of the proposed method is better than the existing method with f-measure around 0.91 and entropy around 0.51. © 2009 IEEE.
Item Type: | Conference or Workshop Item (UNSPECIFIED) |
---|---|
Additional Information: | cited By 81; Conference of 2009 International Conference on Computer Technology and Development, ICCTD 2009 ; Conference Date: 13 November 2009 Through 15 November 2009; Conference Code:79880 |
Uncontrolled Keywords: | Clustering; Concept space; Cosine similarity; Data clustering; Data mining tasks; Document Clustering; Document collection; Document matrices; Euclidean distance; Existing method; F-measure; Fuzzy C mean; Fuzzy C-means algorithms; High precision; Latent semantic indices; matrix; Precision and recall; Principle component analysis; Singular vectors; Small data; Unsupervised classification; Vector space models, Clustering algorithms; Copying; Fuzzy clustering; Fuzzy systems; Geometry; Information retrieval; Information retrieval systems; LSI circuits; Singular value decomposition; Vector spaces, Cluster analysis |
Depositing User: | Mr Ahmad Suhairi UTP |
Date Deposited: | 09 Nov 2023 15:48 |
Last Modified: | 09 Nov 2023 15:48 |
URI: | https://khub.utp.edu.my/scholars/id/eprint/581 |