Automated Machine Learning with Genetic Programming on Real Dataset of Tax Avoidance Classification Problem

Masrom, S. and Rahman, R.A. and Baharun, N. and Rahman, A.S.A. (2020) Automated Machine Learning with Genetic Programming on Real Dataset of Tax Avoidance Classification Problem. In: UNSPECIFIED.

Full text not available from this repository.
Official URL: https://www.scopus.com/inward/record.uri?eid=2-s2....

Abstract

Dealing with real application datasets often derive a stumbling block for machine learning algorithms to produce good results in solving either prediction or classification problems. Imbalance dataset is the major reason for this problem associated with missing values, small dimension of data size and very skewed data distribution. This paper demonstrates an empirical study that used Automated Machine Learning (AML) based on Genetic Programming (GP) named as AML TPOT. This is a very recent AML developed as an open source Python library and reported as a promising model by a few of researchers who have tested the algorithm. Nevertheless, most of the works on the AML TPOT were conducted on a set of common or benchmark datasets for machine learning testing. In this paper, the focus is on real and deviant dataset, which were collected according to the tax avoidance of the Government-Link Company in Malaysia. Comparison of the AML performances that tested on the dataset with different GP parameters setting is provided. Thus, this paper provides a fundamental knowledge on the experimental design and finding that will be useful for the AML based GP future improvement. © 2020 ACM.

Item Type: Conference or Workshop Item (UNSPECIFIED)
Additional Information: cited By 3; Conference of 9th International Conference on Educational and Information Technology, ICEIT 2020 ; Conference Date: 11 February 2020 Through 13 February 2020; Conference Code:168617
Uncontrolled Keywords: Classification (of information); Genetic algorithms; Genetic programming; Machine learning; Open source software, Automated machines; Benchmark datasets; Empirical studies; Future improvements; Missing values; Parameters setting; Real applications; Stumbling blocks, Learning algorithms
Depositing User: Mr Ahmad Suhairi UTP
Date Deposited: 10 Nov 2023 03:28
Last Modified: 10 Nov 2023 03:28
URI: https://khub.utp.edu.my/scholars/id/eprint/13467

Actions (login required)

View Item
View Item