Data harmonization for heterogeneous datasets: A systematic literature review

Kumar, G. and Basri, S. and Imam, A.A. and Khowaja, S.A. and Capretz, L.F. and Balogun, A.O. (2021) Data harmonization for heterogeneous datasets: A systematic literature review. Applied Sciences (Switzerland), 11 (17). ISSN 20763417

Full text not available from this repository.
Official URL: https://www.scopus.com/inward/record.uri?eid=2-s2....

Abstract

As data size increases drastically, its variety also increases. Investigating such heterogeneous data is one of the most challenging tasks in information management and data analytics. The heterogeneity and decentralization of data sources affect data visualization and prediction, thereby influencing analytical results accordingly. Data harmonization (DH) corresponds to a field that uni-fies the representation of such a disparate nature of data. Over the years, multiple solutions have been developed to minimize the heterogeneity aspects and disparity in formats of big�data types. In this study, a systematic review of the literature was conducted to assess the state�of�the�art DH techniques. This study aimed to understand the issues faced due to heterogeneity, the need for DH and the techniques that deal with substantial heterogeneous textual datasets. The process produced 1355 articles, but among them, only 70 articles were found to be relevant through inclusion and exclusion criteria methods. The result shows that the heterogeneity of structured, semi�structured, and unstructured (SSU) data can be managed by using DH and its core techniques, such as text preprocessing, Natural Language Preprocessing (NLP), machine learning (ML), and deep learning (DL). These techniques are applied to many real�world applications centered on the information-retrieval domain. Several assessment criteria were implemented to measure the efficiency of these techniques, such as precision, recall, F�1, accuracy, and time. A detailed explanation of each research question, common techniques, and performance measures is also discussed. Lastly, we present readers with a detailed discussion of the existing work, contributions, and managerial and academic implications, along with the conclusion, limitations, and future research directions. © 2021 by the authors. Licensee MDPI, Basel, Switzerland.

Item Type: Article
Additional Information: cited By 14
Depositing User: Mr Ahmad Suhairi UTP
Date Deposited: 10 Nov 2023 03:29
Last Modified: 10 Nov 2023 03:29
URI: https://khub.utp.edu.my/scholars/id/eprint/14514

Actions (login required)

View Item
View Item