EMPIRICAL ANALYSIS OF THRESHOLD VALUES FOR RANK-BASED FILTER FEATURE SELECTION METHODS IN SOFTWARE DEFECT PREDICTION

Almomani, M. and Balogun, A.O. and Basri, S. and Imam, A.A. and Alazzawi, A.K. and Adeyemo, V.E. and Kumar, G. (2023) EMPIRICAL ANALYSIS OF THRESHOLD VALUES FOR RANK-BASED FILTER FEATURE SELECTION METHODS IN SOFTWARE DEFECT PREDICTION. Journal of Engineering Science and Technology, 18 (1). pp. 187-209. ISSN 18234690

Full text not available from this repository.

Official URL: https://www.scopus.com/inward/record.uri?eid=2-s2....

Abstract

Many studies have been conducted to explore the influence of feature selection (FS) techniques on software defect prediction (SDP) models, with conflicting empirical results and research outcomes. These reported contradictions may be due to relative research limitations, such as types of FS techniques or the size of defect datasets. In the instance of FS methods, it was discovered that selecting a suitable threshold value for picking top-ranked features in FS methods might be a cause of discrepancies in reported findings on SDP. Investigating and assessing the impacts of threshold values for the rank-based filter (RBF) FS techniques, as done in this work, becomes critical. 4 RBF (Chi-square, Correlation, Information Gain, and Relief) methods with 5 thresholds (No FS, log2N, Top20, Top 30, and Top 50) values were investigated with 2 prediction models (NaÃ¯ve Bayes (NB) and Decision Tree (DT)) on 25 software defects datasets. The experimented RBF techniques were selected based on distinct computational features to assure heterogeneity, as well as their performance in the current SDP research. Developed SDP models were evaluated using accuracy and area under the curve (AUC) values while the Scott-KnottESD rank statistical test technique was employed to rank experimented RBF methods with applied threshold values. According to the experimental results, selecting the Top20 of top-ranked features in RBF methods had a greater (positive) impact on the prediction performances of SDP models than other applied threshold values. Furthermore, the outcomes of this study corroborate previous research on the capacity of FS techniques to improve the prediction efficacies of SDP models. Consequently, we urge that FS methods be utilized in SDP tasks. In the case of RBF methods, the Top20 threshold value should be used since it outperforms de-factor log2N and other threshold values. Moreover, findings from this study can be a guide to subsequent SDP studies and further strengthen the tenacity of experimental findings and conclusions in SDP studies. Â© School of Engineering, Taylor's University.

Item Type:	Article
Additional Information:	cited By 0
Depositing User:	Mr Ahmad Suhairi UTP
Date Deposited:	04 Jun 2024 14:11
Last Modified:	04 Jun 2024 14:11
URI:	https://khub.utp.edu.my/scholars/id/eprint/18862

Actions (login required)

: View Item