A Comparative Analysis of RIPPER and CN2 Algorithms in Personal Loan Credit Scoring
2024 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE credits
Student thesis
Abstract [en]
Background: Credit scoring is a mechanism used by lenders and financial orga-nizations to determine the creditworthiness of individuals or businesses requestingcredit or loans. It involves assessing various indicators to determine the likelihoodof a borrower repaying their loans on time. The primary goal of credit scoring is toevaluate the risk involved in lending money to a particular borrower. Additionally,rule-based machine learning algorithms are employed to determine whether an indi-vidual is eligible to receive a loan.
Objectives: The thesis compared the RIPPER and CN2 algorithms to determinetheir usefulness in predicting personal loan credit scores. The primary objective wasto assess model performance using accuracy, precision, recall, and F1 score metrics.The study aimed to determine which of these two rule-based machine learning al-gorithms gave more accurate and dependable predictions in the context of personalloan credit scores.
Method: The thesis used an experimentation methodology to achieve its objec-tives and answer the research questions. The Credit Score Classification datasetfrom Kaggle is chosen which is extremely helpful since it provides critical data to de-termine creditworthiness, which is an important part of the banking business. Usingthis method, a comparison was made between two rule-based machine learning (ML)algorithms, RIPPER and CN2. The experimentation step involved rigorous training,validation, and testing of these algorithms on the chosen dataset. Performance eval-uation was then carried out, concentrating on key measures such as accuracy, preci-sion, recall, and F1 score. This allowed for a complete comparison of the algorithm’sperformance in predicting credit scores for personal loans. Hyperparameter refin-ing optimized model performance through extensive testing with different parametervalues. Model validation was carried out using k-fold cross-validation. Finally, thestudy aimed to find the superior algorithm based on the utilized performance metrics.
Results: According to the experiment results, CN2 outperformed the two chosenmachine learning algorithms, with an accuracy rate of 47.51% against RIPPER’s38.24%. The precision analysis found that CN2 has a somewhat greater precision of42%, compared to RIPPER’s value of 34%. However, RIPPER had a greater recallof 58% than CN2’s 43%. Despite this, CN2 achieved a higher F1 score of 42%, aboveRIPPER’s score of 38%, indicating that it performed better overall on the testingdataset.
Conclusions: CN2 outperformed RIPPER in accuracy, precision, and F1 score, in-dicating its ability to make accurate predictions when identifying creditworthy loanapplicants. While RIPPER had a slightly higher recall, suggesting its effectivenessin identifying positive cases, CN2 achieved a better balance between precision andrecall. CN2’s more balanced and consistent performance across all measures makesit the preferred option for personal loan credit scoring applications.
Place, publisher, year, edition, pages
2024. , p. 63
Keywords [en]
Machine Learning, RIPPER, CN2, Rule-Based Models, Credit Scoring.
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:bth-26688OAI: oai:DiVA.org:bth-26688DiVA, id: diva2:1882627
Subject / course
DV1478 Bachelor Thesis in Computer Science
Educational program
DVGDT Bachelor Qualification Plan in Computer Science 60.0 hp
Presentation
(English)
Supervisors
Examiners
2024-08-062024-07-052024-08-06Bibliographically approved