eprintid: 12648 rev_number: 2 eprint_status: archive userid: 1 dir: disk0/00/01/26/48 datestamp: 2023-11-10 03:27:12 lastmod: 2023-11-10 03:27:12 status_changed: 2023-11-10 01:49:11 type: conference_item metadata_visibility: show creators_name: Phua, Y.-T. creators_name: Yew, K.-H. creators_name: Foong, O.-M. creators_name: Teow, M.Y.-W. title: Assessing Suitable Word Embedding Model for Malay Language through Intrinsic Evaluation ispublished: pub keywords: Correlation methods; Embeddings; Intelligent computing; Semantics, Gram models; Malay languages; Malay texts; NAtural language processing; Pearson correlation coefficients; Pre-processing; Semantic similarity; Word similarity, Natural language processing systems note: cited By 1; Conference of 2020 International Conference on Computational Intelligence, ICCI 2020 ; Conference Date: 8 October 2020 Through 9 October 2020; Conference Code:164916 abstract: Word embeddings were created to form meaningful representation for words in an efficient manner. This is an essential step in most of the Natural Language Processing tasks. In this paper, different Malay language word embedding models were trained on Malay text corpus. These models were trained using Word2Vec and fastText using both CBOW and Skip-gram architectures, and GloVe. These trained models were tested on intrinsic evaluation for semantic similarity and word analogies. In the experiment, the custom-trained fastText Skip-gram model achieved 0.5509 for Pearson correlation coefficient at word similarity evaluation, and 36.80 for accuracy at word analogies evaluation. The result outperformed the fastText pre-trained models which only achieved 0.477 and 22.96 for word similarity evaluation and word analogies evaluation, respectively. The result shows that there is still room for improvement in both pre-processing tasks and datasets for evaluation. © 2020 IEEE. date: 2020 publisher: Institute of Electrical and Electronics Engineers Inc. official_url: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85097520726&doi=10.1109%2fICCI51257.2020.9247707&partnerID=40&md5=68e584984f71f741dbb5c313d6dcf19e id_number: 10.1109/ICCI51257.2020.9247707 full_text_status: none publication: 2020 International Conference on Computational Intelligence, ICCI 2020 pagerange: 202-210 refereed: TRUE isbn: 9781728154473 citation: Phua, Y.-T. and Yew, K.-H. and Foong, O.-M. and Teow, M.Y.-W. (2020) Assessing Suitable Word Embedding Model for Malay Language through Intrinsic Evaluation. In: UNSPECIFIED.