Enhancing Diabetes Diagnosis through the Investigation of Cost-Sensitive  Learning with Ensemble Techniques

Yasmine Khedimi; Nadjette Dendani; Hana Khemisa

Authors

Yasmine Khedimi Badji Mokhtar University
Nadjette Dendani Badji Mokhtar University
Hana Khemisa Badji Mokhtar University

Keywords:

Classification problem, Imbalanced datasets, Algorithm-level solutions, Ensemble techniques, Cost-sensitive learning, diabetes mellitus diagnosis

Abstract

Diabetes mellitus is a prevalent chronic disease represented in the body’s un-successful insulin effect, that appears in the elevation of the blood’s glucose levels and potential damage to many body systems, causing the increasing of mortality rates. Early diagnosis is important for managing this illness, and machine learning algorithms play a crucial role employing various methodologies for diabetes detection, and in handling imbalanced data in particular. Using diverse classification algorithms such as (Logistic Regression, Decision Tree, Random Forest, Support Vector Machine, Convolutional Neural Network) for diabetes diagnosis and classification demonstrate the dominance of one class and the resulting underrepresentation of the minority class. To address this issue, cost-sensitive learning and resampling techniques are investigated in this study. The proposed approach aimed to propose robust cost-sensitive classifiers by modifying the objective functions of well-known algorithms. Additionally, hybrid approach of our improved Cost-sensitive models with well used ensemble techniques like Cost-sensitive XGBoost and Cost-sensitive Random Forest, Cost-sensitive Logistic Regression are analyzed to effectively address imbalanced classes. To validate proposed models two imbalanced medical datasets (PIMA Indi-an, and BASEDIABET datasets) are applied. Obtained results proves the accuracy and sensitivity of diabetes prediction models enhancement, by reducing costly classification errors.

Downloads

Download data is not yet available.

Author Biographies

Yasmine Khedimi, Badji Mokhtar University

Dept. of Computer Science, Annaba, Algeria

Nadjette Dendani, Badji Mokhtar University

Dept. of Computer Science, Annaba, Algeria

Hana Khemisa, Badji Mokhtar University

Dept. of Computer Science, Annaba, Algeria

References

World Health Organization. (2016). Global report on diabetes. [Online]. Available: https://www.who.int/publications/i/item/9789241565257.

M. N. Reiley et al., "COVID-19 mRNA vaccines are immunogenic in cancer patients," JCI Insight, vol. 6, no. 6, Mar. 2021. [Online]. Available : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7755046/.

M. Koziarski and K. Pancerz, "Stop Oversampling for Class Imbalance Learning: A Review," arXiv preprint arXiv:2202.03579, 2022. [Online]. Available: https://arxiv.org/pdf/2202.03579.

Pima Indians Diabetes Database. [Online]. Available: https://kaggle.com/uciml/pima-indians-diabetes-database. [Accessed: 17-Apr-2021].

R. Allouani and N. Dendani, "Design and Creation of an Expert System Based on an Ontology for Diagnosis of Diabetes," License thesis, Department of Computer Science, Badji Mokhtar Annaba University, Algeria, June 2018.

Ibomoiye Domor Mienye, Yanxia Sun, Performance analysis of cost-sensitive learning methods with application to imbalanced medical data, Informatics in Medicine Unlocked, Volume 25, 2021,100690, ISSN 2352-9148, https://doi.org/10.1016/j.imu.2021.100690.

Wan X, Liu J, Cheung WK, Tong T. Learning to improve medical decision making from imbalanced data without a priori cost. BMC Med Inf Decis Making Dec. 2014; 14(1):111. https://doi.org/10.1186/s12911-014-0111-9.

Gan D, Shen J, An B, Xu M, Liu N. Integrating TANBN with cost sensitive classification algorithm for imbalanced data in medical diagnosis. Comput Ind Eng Feb. 2020;140:106266. https://doi.org/10.1016/j.cie.2019.106266.

Wang H, Cui Z, Chen Y, Avidan M, Abdallah AB, Kronzer A. Predicting hospital readmission via cost-sensitive deep learning. IEEE ACM Trans Comput Biol Bioinf Dec. 2018;15(6):1968–78. https://doi.org/10.1109/TCBB.2018.2827029.

Wu J-C, Shen J, Xu M, Liu F-S. An evolutionary self-organizing cost-sensitive radial basis function neural network to deal with imbalanced data in medical diagnosis. Int J Comput Intell Syst. Oct. 2020;13(1):1608–18. [Online]. Available: https://doi.org/10.2991/ijcis.d.201012.005.

Chawla, N. V., Japkowicz, K., Kotzba, S., & Wróbel, W. (2002). Editorial: Special issue on learning from imbalanced data sets. ACM SIGKDD Explorations Newsletter, 6(1), 1-6. (https://www.researchgate.net/

Breiman,L.(2001). Random forests. Machine learning, 45(3), 5-32. (https://link.springer.com/article/10.1023/A:1010933404324)

Sun, Y., Wong, A. K., & Kwok, J. T. (2009). Cost-sensitive learning for multi-target prediction. In Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 875-884). (https://jds.acm.org/vol_1_issue_2.html)

Huang, H., Sun, G., Hussain, Z., & Zhang, C. (2016). Deep imbalanced learning for face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 4547-4555). (https://ieeexplore.ieee.org/iel7/34/4359286/08708977.pdf)

XGBoost Documentation: https://xgboost.readthedocs.io/ (Look for the section on "scale_pos_weight" parameter)

Zadrozny, B., Langford, J., & Abe, N. (2002). Cost-sensitive learning by cost-proportionate example weighting. In Third International Conference on Data Mining (ICDM'02) (pp. 107-114). IEEE. (https://hunch.net/~jl/projects/reductions/costing/finalICDM2003.pdf)

Similar to cost-sensitive decision trees, you can use the reference by Zadrozny et al. (2002) mentioned above.

Liu, W., Wang, Y., Li, S., Ling, H., & Lin, W. (2019). Cost-sensitive deep learning for imbalanced image classification. Neurocomputing, 354, 107-118. (https://www.sciencedirect.com/science/article/pii/S0925231219304151)

Enhancing Diabetes Diagnosis through the Investigation of Cost-Sensitive Learning with Ensemble Techniques

Authors

Keywords:

Abstract

Downloads

Author Biographies

Yasmine Khedimi, Badji Mokhtar University

Nadjette Dendani, Badji Mokhtar University

Hana Khemisa, Badji Mokhtar University

References

Downloads

Published

How to Cite

Issue

Section

Keywords

Information

Current Issue