eprintid: 13942 rev_number: 2 eprint_status: archive userid: 1 dir: disk0/00/01/39/42 datestamp: 2023-11-10 03:28:30 lastmod: 2023-11-10 03:28:30 status_changed: 2023-11-10 01:52:21 type: article metadata_visibility: show creators_name: Ali, M. creators_name: Jung, L.T. creators_name: Hosam, O. creators_name: Wagan, A.A. creators_name: Shah, R.A. creators_name: Khayyat, M. title: A new text-based w-distance metric to find the perfect match between words ispublished: pub keywords: Data mining; Genetic algorithms; Nearest neighbor search; Pattern recognition; Personnel, Cosine similarity; Data mining applications; distance/similarity metric; Euclidean distance; Instance based learning; k-NN algorithm; Similarity distance; text match, Text mining note: cited By 1 abstract: The k-NN algorithm is an instance-based learning algorithm which is widely used in the data mining applications. The core engine of the k-NN algorithm is the distance/similarity function. The performance of the k-NN algorithm varies with the selection of distance function. The traditional distance/similarity functions in k-NN do not perfectly handle the mix-mode words such as when one string has multiple substrings/words. For example, a two-word string of 'Employee Name', a one-word string of 'Name' or more than one word such as, 'Name of Employee'. This ambiguity is faced by different distance/similarity functions causing difficulties in finding the perfect match of words. To improve the perfect-match calculation functionality in the traditional k-NN algorithm, a new similarity distance metric is developed and named as word-distance (w-distance). The perfect match will help us to identify the exact required value. The proposed w-distance is a hybrid of distance and similarity in nature because it is to handle dissimilarity and similarity features of strings at the same time. The simulation results showed that w-distance has a better impact on the performance of the k-NN algorithm as compared to the Euclidean distance and the cosine similarity. © 2020-IOS Press and the authors. All rights reserved. date: 2020 publisher: IOS Press official_url: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85081537860&doi=10.3233%2fJIFS-179552&partnerID=40&md5=2f8ba723a873404c83134a9f0365fe1b id_number: 10.3233/JIFS-179552 full_text_status: none publication: Journal of Intelligent and Fuzzy Systems volume: 38 number: 3 pagerange: 2661-2672 refereed: TRUE issn: 10641246 citation: Ali, M. and Jung, L.T. and Hosam, O. and Wagan, A.A. and Shah, R.A. and Khayyat, M. (2020) A new text-based w-distance metric to find the perfect match between words. Journal of Intelligent and Fuzzy Systems, 38 (3). pp. 2661-2672. ISSN 10641246