Software Fault Prediction: Using Machine Learning Algorithms
2024 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE credits
Student thesis
Abstract [en]
Background: Software fault prediction (SFP) is a critical task in software engineering, enabling early identification of fault modules to improve software quality and reduce maintenance costs. SFP datasets, such as PROMISE, are often characterized by high-dimensional metrics and multicollinearity, posing unique challenges. This research investigates the combined effects of feature selection and parameter tuning on the performance of machine learning models for SFP.
Method: This study evaluates the interaction between feature selection methods, including Correlation-Based Feature Selection (CFS), Recursive Feature Elimination (RFE), Mutual Information (MI), and L1 Regularization, and hyperparameter tuning techniques such as Grid Search, Randomized Search, and Genetic Algorithm. Widely used machine learning algorithms, including Random Forest, Logistic Regression, and Support Vector Machines (SVM), are employed to optimize fault prediction performance.
Results: The combined application of CFS and Genetic Algorithm yielded the highest accuracy, achieving 88.40% with Random Forest, representing an 18% improvement over baseline models without feature selection or tuning. Feature selection reduced dimensionality and identified critical attributes such as Weighted Methods per Class (wmc) and Coupling Between Objects (cbo), while iterative parameter tuning optimized model alignment to these feature sets. Notably, the proposed methods demonstrated robustness, with minimal cross-validation variability (±1.0%), and eff iciency, reducing training times in univariate methods like L1 Regularization.
Conclusions: This study concludes that integrating multivariate feature selection with iterative hyperparameter tuning significantly improves the accuracy, robustness, and computational efficiency of software fault prediction models. The findings establish a framework in this research optimizing fault prediction models, such as combining CFS and Genetic Algorithms for high-stakes scenarios or Randomized Search with sparse feature selection for resource-constrained environments. These f indings bridge critical gaps in SFP optimization, offering a structured approach to achieve scalable and high-performing prediction models.
Place, publisher, year, edition, pages
2024. , p. 75
Keywords [en]
Software Fault Prediction, Machine Learning, Parameter Tuning, Feature Selection, Search Optimization
National Category
Software Engineering
Identifiers
URN: urn:nbn:se:bth-27169OAI: oai:DiVA.org:bth-27169DiVA, id: diva2:1916628
Subject / course
PA2534 Master's Thesis (120 credits) in Software Engineering
Educational program
PAAPT Master of Science Programme in Software Engineering
Supervisors
2024-12-022024-11-282025-09-30Bibliographically approved