Enhancing Diabetes Diagnosis through the Investigation of Cost-Sensitive Learning with Ensemble Techniques
Abstract views: 73 / PDF downloads: 24
Keywords:
Classification problem, Imbalanced datasets, Algorithm-level solutions, Ensemble techniques, Cost-sensitive learning, diabetes mellitus diagnosisAbstract
Diabetes mellitus is a prevalent chronic disease represented in the body’s un-successful insulin
effect, that appears in the elevation of the blood’s glucose levels and potential damage to many body
systems, causing the increasing of mortality rates. Early diagnosis is important for managing this illness,
and machine learning algorithms play a crucial role employing various methodologies for diabetes
detection, and in handling imbalanced data in particular.
Using diverse classification algorithms such as (Logistic Regression, Decision Tree, Random Forest,
Support Vector Machine, Convolutional Neural Network) for diabetes diagnosis and classification
demonstrate the dominance of one class and the resulting underrepresentation of the minority class.
To address this issue, cost-sensitive learning and resampling techniques are investigated in this study.
The proposed approach aimed to propose robust cost-sensitive classifiers by modifying the objective
functions of well-known algorithms. Additionally, hybrid approach of our improved Cost-sensitive
models with well used ensemble techniques like Cost-sensitive XGBoost and Cost-sensitive Random
Forest, Cost-sensitive Logistic Regression are analyzed to effectively address imbalanced classes.
To validate proposed models two imbalanced medical datasets (PIMA Indi-an, and BASEDIABET
datasets) are applied. Obtained results proves the accuracy and sensitivity of diabetes prediction models
enhancement, by reducing costly classification errors.
Downloads
References
World Health Organization. (2016). Global report on diabetes. [Online]. Available: https://www.who.int/publications/i/item/9789241565257.
M. N. Reiley et al., "COVID-19 mRNA vaccines are immunogenic in cancer patients," JCI Insight, vol. 6, no. 6, Mar. 2021. [Online]. Available : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7755046/.
M. Koziarski and K. Pancerz, "Stop Oversampling for Class Imbalance Learning: A Review," arXiv preprint arXiv:2202.03579, 2022. [Online]. Available: https://arxiv.org/pdf/2202.03579.
Pima Indians Diabetes Database. [Online]. Available: https://kaggle.com/uciml/pima-indians-diabetes-database. [Accessed: 17-Apr-2021].
R. Allouani and N. Dendani, "Design and Creation of an Expert System Based on an Ontology for Diagnosis of Diabetes," License thesis, Department of Computer Science, Badji Mokhtar Annaba University, Algeria, June 2018.
Ibomoiye Domor Mienye, Yanxia Sun, Performance analysis of cost-sensitive learning methods with application to imbalanced medical data, Informatics in Medicine Unlocked, Volume 25, 2021,100690, ISSN 2352-9148, https://doi.org/10.1016/j.imu.2021.100690.
Wan X, Liu J, Cheung WK, Tong T. Learning to improve medical decision making from imbalanced data without a priori cost. BMC Med Inf Decis Making Dec. 2014; 14(1):111. https://doi.org/10.1186/s12911-014-0111-9.
Gan D, Shen J, An B, Xu M, Liu N. Integrating TANBN with cost sensitive classification algorithm for imbalanced data in medical diagnosis. Comput Ind Eng Feb. 2020;140:106266. https://doi.org/10.1016/j.cie.2019.106266.
Wang H, Cui Z, Chen Y, Avidan M, Abdallah AB, Kronzer A. Predicting hospital readmission via cost-sensitive deep learning. IEEE ACM Trans Comput Biol Bioinf Dec. 2018;15(6):1968–78. https://doi.org/10.1109/TCBB.2018.2827029.
Wu J-C, Shen J, Xu M, Liu F-S. An evolutionary self-organizing cost-sensitive radial basis function neural network to deal with imbalanced data in medical diagnosis. Int J Comput Intell Syst. Oct. 2020;13(1):1608–18. [Online]. Available: https://doi.org/10.2991/ijcis.d.201012.005.
Chawla, N. V., Japkowicz, K., Kotzba, S., & Wróbel, W. (2002). Editorial: Special issue on learning from imbalanced data sets. ACM SIGKDD Explorations Newsletter, 6(1), 1-6. (https://www.researchgate.net/
Breiman,L.(2001). Random forests. Machine learning, 45(3), 5-32. (https://link.springer.com/article/10.1023/A:1010933404324)
Sun, Y., Wong, A. K., & Kwok, J. T. (2009). Cost-sensitive learning for multi-target prediction. In Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 875-884). (https://jds.acm.org/vol_1_issue_2.html)
Huang, H., Sun, G., Hussain, Z., & Zhang, C. (2016). Deep imbalanced learning for face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 4547-4555). (https://ieeexplore.ieee.org/iel7/34/4359286/08708977.pdf)
XGBoost Documentation: https://xgboost.readthedocs.io/ (Look for the section on "scale_pos_weight" parameter)
Zadrozny, B., Langford, J., & Abe, N. (2002). Cost-sensitive learning by cost-proportionate example weighting. In Third International Conference on Data Mining (ICDM'02) (pp. 107-114). IEEE. (https://hunch.net/~jl/projects/reductions/costing/finalICDM2003.pdf)
Similar to cost-sensitive decision trees, you can use the reference by Zadrozny et al. (2002) mentioned above.
Liu, W., Wang, Y., Li, S., Ling, H., & Lin, W. (2019). Cost-sensitive deep learning for imbalanced image classification. Neurocomputing, 354, 107-118. (https://www.sciencedirect.com/science/article/pii/S0925231219304151)