eprintid: 6736 rev_number: 2 eprint_status: archive userid: 1 dir: disk0/00/00/67/36 datestamp: 2023-11-09 16:18:32 lastmod: 2023-11-09 16:18:32 status_changed: 2023-11-09 16:07:32 type: conference_item metadata_visibility: show creators_name: Indra, Z. creators_name: Jaafar, J. creators_name: Zamin, N. creators_name: Bakar, Z.A. title: A language identifier for Indonesian and Malay text document ispublished: pub keywords: Algorithms; Computer programming; Computer science, Asian languages; Digital Documents; European languages; Indonesian languages; Language identification; N-grams; NAtural language processing; nocv1; Text document, Natural language processing systems note: cited By 2; Conference of 2015 International Symposium on Mathematical Sciences and Computing Research, iSMSC 2015 ; Conference Date: 19 May 2015 Through 20 May 2015; Conference Code:124374 abstract: There is huge growth of online text documents in the Internet today. We can easily find documents written in languages from all over part of the just from a single click. Increasing number of online text document in Internet makes the increased availability of information on the Internet. In fact that none in the world can understand all languages of the digital documents. Hence, there is a significant need to have a language identifier to assist user to understand the information. Up to now, the language identification is more focused in European languages and still limited for Asian languages. Whilst the research of language identification for similar languages from popular languages has attracted the attention of many researchers. In this research, a new language identification for language with similar topology, Malay and Indonesian language, is proposed. The algorithm is experimented on a set of Indonesian and Malay text documents to support the limited research of language identification for Asian language. An experiment done on 100 Indonesian and Malay text documents has produced a number of satisfactorily accurate results. © 2015 IEEE. date: 2016 publisher: Institute of Electrical and Electronics Engineers Inc. official_url: https://www.scopus.com/inward/record.uri?eid=2-s2.0-84995701431&doi=10.1109%2fISMSC.2015.7594040&partnerID=40&md5=d9715785f362d63c5eefd4f58185acc8 id_number: 10.1109/ISMSC.2015.7594040 full_text_status: none publication: 2015 International Symposium on Mathematical Sciences and Computing Research, iSMSC 2015 - Proceedings pagerange: 127-131 refereed: TRUE isbn: 9781479978946 citation: Indra, Z. and Jaafar, J. and Zamin, N. and Bakar, Z.A. (2016) A language identifier for Indonesian and Malay text document. In: UNSPECIFIED.