relation: https://khub.utp.edu.my/scholars/17568/ title: Empirical Analysis of Data Sampling-Based Ensemble Methods in Software Defect Prediction creator: Balogun, A.O. creator: Odejide, B.J. creator: Bajeh, A.O. creator: Alanamu, Z.O. creator: Usman-Hamza, F.E. creator: Adeleke, H.O. creator: Mabayoje, M.A. creator: Yusuff, S.R. description: This research work investigates the deployment of data sampling and ensemble techniques in alleviating the class imbalance problem in software defect prediction (SDP). Specifically, the effect of data sampling techniques on the performance of ensemble methods is investigated. The experiments were conducted using software defect datasets from the NASA software archives. Five data sampling methods (over-sampling techniques (SMOTE, ADASYN, and ROS), and undersampling techniques (RUS and NearMiss) were combined with bagging and boosting ensemble methods based on Naïve Bayes (NB) and Decision Tree (DT) classifier. Predictive performances of developed models were assessed based on the area under the curve (AUC), and Matthewâ��s correlation coefficient (MCC) values. From the experimental findings, it was observed that the implementation of data sampling methods further enhanced the predictive performances of the experimented ensemble methods. Specifically, BoostedDT on the ROS-balanced datasets recorded the highest average AUC (0.995), and MCC (0.918) values respectively. Aside NearMiss method, which worked best with the Bagging ensemble method, other studied data sampling methods worked well with the Boosting ensemble technique. Also, some of the developed models particularly BoostedDT showed better prediction performance over existing SDP models. As a result, combining data sampling techniques with ensemble methods may not only improve SDP model prediction performance but also provide a plausible solution to the latent class imbalance issue in SDP processes. © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG. publisher: Springer Science and Business Media Deutschland GmbH date: 2022 type: Article type: PeerReviewed identifier: Balogun, A.O. and Odejide, B.J. and Bajeh, A.O. and Alanamu, Z.O. and Usman-Hamza, F.E. and Adeleke, H.O. and Mabayoje, M.A. and Yusuff, S.R. (2022) Empirical Analysis of Data Sampling-Based Ensemble Methods in Software Defect Prediction. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 13381 . pp. 363-379. ISSN 03029743 relation: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85135913774&doi=10.1007%2f978-3-031-10548-7_27&partnerID=40&md5=482de431af6c972ed54b22c36335dfb6 relation: 10.1007/978-3-031-10548-7₂₇ identifier: 10.1007/978-3-031-10548-7₂₇