Employee Turnover Prediction - A Comparative Study of Supervised Machine Learning Models
2022 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE credits
Student thesis
Abstract [en]
Background: In every organization, employees are an essential resource. For several reasons, employees are neglected by the organizations, which leads to employee turnover. Employee turnover causes considerable losses to the organization. Using machine learning algorithms and with the data in hand, a prediction of an employee’s future in an organization is made.
Objectives: The aim of this thesis is to conduct a comparison study utilizing supervised machine learning algorithms such as Logistic Regression, Naive Bayes Classifier, Random Forest Classifier, and XGBoost to predict an employee’s future in a company. Using evaluation metrics models are assessed in order to discover the best efficient model for the data in hand.
Methods: The quantitative research approach is used in this thesis, and data is analyzed using statistical analysis. The labeled data set comes from Kaggle and includes information on employees at a company. The data set is used to train algorithms. The created models will be evaluated on the test set using evaluation measures including Accuracy, Precision, Recall, F1 Score, and ROC curve to determine which model performs the best at predicting employee turnover.
Results: Among the studied features in the data set, there is no feature that has a significant impact on turnover. Upon analyzing the results, the XGBoost classifier has better mean accuracy with 85.3%, followed by the Random Forest classifier with 83% accuracy than the other two algorithms. XGBoost classifier has better precision with 0.88, followed by Random Forest Classifier with 0.82. Both the Random Forest classifier and XGBoost classifier showed a 0.69 Recall score. XGBoost classifier had the highest F1 Score with 0.77, followed by the Random Forest classifier with 0.75. In the ROC curve, the XGBoost classifier had a higher area under the curve(AUC) with 0.88.
Conclusions: Among the studied four machine learning algorithms, Logistic Regression, Naive Bayes Classifier, Random Forest Classifier, and XGBoost, the XGBoost classifier is the most optimal with a good performance score respective to the tested performance metrics. No feature is found majorly affect employee turnover.
Place, publisher, year, edition, pages
2022.
Keywords [en]
Machine Learning, Employee Turnover Prediction, Supervised Learn- ing Models, Logistic Regression, Naive Bayes Classifier, Random Forest Classifier, XGBoost
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:bth-23399OAI: oai:DiVA.org:bth-23399DiVA, id: diva2:1679511
Subject / course
DV1478 Bachelor Thesis in Computer Science
Educational program
DVGDT Bachelor Qualification Plan in Computer Science 60.0 hp
Presentation
2022-05-23, J1640, campus grasvik, Karlskrona, 09:45 (English)
Supervisors
Examiners
2022-07-012022-07-012022-07-01Bibliographically approved