Prediction of Employee Turnover with Imbalance Dataset Using Machine Learning Methods
Abstract views: 97 / PDF downloads: 38
Keywords:Employee Turnover, Machine Learning, Imbalanced Data, Cross Validation, Classification Algorithm
Employee turnover can have a significant impact on an organisation's productivity, culture and profitability. Accurately predicting employee turnover can help organisations proactively identify and address issues before they become major problems. In this paper, various analyses were performed with the help of traditional machine learning methods using employee turnover and attrition dataset. As a result of the analyses, unbalanced data distribution was detected in the dataset. In order to solve this problem, methods for balancing up and down data sets were used. After data balancing, the k-fold method, one of the cross-validation methods, was applied to avoid overlearning. The Random Forest Classification method was selected and used together with the ROS method, which shows higher performance. GridSearchCV, a hyper-parameterisation technique, was applied to the selected model to select the best parameters. At the same time, both data pre-processing and postprocessing activities were performed. As a result of the experiments conducted in the study, it was found that the data set balanced using the proposed method increased the performance values in the classification result and improved the classification performance compared to the raw data set and other sampling methods.
D. Ramyachitra and P. Manikandan,” IMBALANCED DATASET CLASSIFICATION AND SOLUTIONS: A REVIEW”, International Journal of Computing and Business Research (IJCBR), vol. 5, Issue 4, July 2014
Hong, W.-C., Pai, P.-F., Huang, Y.-Y., & Yang, S.-L. (2005). Application of Support Vector Machines in Predicting Employee Turnover Based on Job Performance. Advances in Natural Computation, 668–674. doi:10.1007/11539087_85
Danquah, R. A., Handling Imbalanced Data: A Case Study For Binary Class Problems, Department of Mathematics Southern Illinois University Edwardsville, IL 62026
Kim, J., Jeong, J., & Shin, J. (2020). M2m: Imbalanced Classification via Major-to-Minor Translation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). doi:10.1109/cvpr42600.2020.01391
Leevy, J. L., Khoshgoftaar, T. M., Bauder, R. A and Seliya, N., A survey on addressing high-class imbalance in big data, Leevy et al. J Big Data (2018) 5:42
Şimşek, M., Daş, A. S. ,The Effect of Handling Imbalanced Datasets Methods on Prediction of Entrepreneurial Competency in University Students, 2022, www.iceans.org
Zhao, Y., Hryniewicki, M. K., Cheng, F., Fu, B., & Zhu, X. (2018). Employee Turnover Prediction with Machine Learning: A Reliable Approach. Intelligent Systems and Applications, 737–758. doi:10.1007/978-3-030-01057-7_56
Alao, D., Adeyemo, A.B.: Analyzing employee attrition using decision tree algorithms. Comput. Inf. Syst. Dev. Inform. Allied Res. J. 4 (2013)
Sexton, R.S., McMurtrey, S., Michalopoulos, J.O., Smith, A.M.: Employee turnover: aneural network solution. Comput. Oper. Res. 32, 2635-2651 (2005)
A. Fernández, S. García, M. Galar, R. C. Prati, B. Krawczyk, and F. Herrera, Learning from imbalanced data sets. Springer, 2018.