eprintid: 13467 rev_number: 2 eprint_status: archive userid: 1 dir: disk0/00/01/34/67 datestamp: 2023-11-10 03:28:01 lastmod: 2023-11-10 03:28:01 status_changed: 2023-11-10 01:51:14 type: conference_item metadata_visibility: show creators_name: Masrom, S. creators_name: Rahman, R.A. creators_name: Baharun, N. creators_name: Rahman, A.S.A. title: Automated Machine Learning with Genetic Programming on Real Dataset of Tax Avoidance Classification Problem ispublished: pub keywords: Classification (of information); Genetic algorithms; Genetic programming; Machine learning; Open source software, Automated machines; Benchmark datasets; Empirical studies; Future improvements; Missing values; Parameters setting; Real applications; Stumbling blocks, Learning algorithms note: cited By 3; Conference of 9th International Conference on Educational and Information Technology, ICEIT 2020 ; Conference Date: 11 February 2020 Through 13 February 2020; Conference Code:168617 abstract: Dealing with real application datasets often derive a stumbling block for machine learning algorithms to produce good results in solving either prediction or classification problems. Imbalance dataset is the major reason for this problem associated with missing values, small dimension of data size and very skewed data distribution. This paper demonstrates an empirical study that used Automated Machine Learning (AML) based on Genetic Programming (GP) named as AML TPOT. This is a very recent AML developed as an open source Python library and reported as a promising model by a few of researchers who have tested the algorithm. Nevertheless, most of the works on the AML TPOT were conducted on a set of common or benchmark datasets for machine learning testing. In this paper, the focus is on real and deviant dataset, which were collected according to the tax avoidance of the Government-Link Company in Malaysia. Comparison of the AML performances that tested on the dataset with different GP parameters setting is provided. Thus, this paper provides a fundamental knowledge on the experimental design and finding that will be useful for the AML based GP future improvement. © 2020 ACM. date: 2020 publisher: Association for Computing Machinery official_url: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85105597002&doi=10.1145%2f3383923.3383942&partnerID=40&md5=50a5b5a65d49f614bcedc31e94b19228 id_number: 10.1145/3383923.3383942 full_text_status: none publication: ACM International Conference Proceeding Series pagerange: 139-143 refereed: TRUE isbn: 9781450375085 citation: Masrom, S. and Rahman, R.A. and Baharun, N. and Rahman, A.S.A. (2020) Automated Machine Learning with Genetic Programming on Real Dataset of Tax Avoidance Classification Problem. In: UNSPECIFIED.