Evaluating Optimizable Machine Learning Models for Anemia Type Prediction from Complete Blood Count Data
Abstract views: 56 / PDF downloads: 31
Keywords:
Data Exploration, Machine Learning, Multiclass Classification, Anemia Types, Complete Blood Count, CBC TestAbstract
This paper compares different optimizable machine learning classification models to predict
eight types of anemia from complete blood count (CBC) data. For the research, we used a publicly
available Kaggle dataset containing 1281 observations, 14 predictors, and the diagnosis as the categorical
target variable with nine categories (eight types of anemia and the healthy category). First, we examined
the dataset and observed the histograms of some of the predictors. We compared the values of predictors
of observations with no anemia to the observations where any anemia was diagnosed. Next, we used
MATLAB R2024a to train and test nine optimizable machine-learning classification models. These
models were Ensemble, Tree, SVM, Efficient Linear, Neural Network, Kernel, KNN, Naïve Bayes, and
the Discriminant. Bayesian optimization was used to optimize the hyperparameters of all these models.
We used 90% of observations for training and 10% of observations for testing. During the training, 10
fold cross-validation was used to prevent overfitting. The results showed the best accuracy was reached
with the Ensemble classification model using the bag ensemble method (validation accuracy: 99.22%, test
accuracy: 100%). Finally, we inspected our best classification model in more detail. We calculated the
permutation feature importance to determine the contribution of each predictor to the final model. The
results showed 6–7 important predictors, while the most important feature was the amount of hemoglobin.
Downloads
References
L. Végh, K. Czakóová, and O. Takáč, “Comparing Machine Learning Classification Models on a Loan Approval Prediction Dataset,” International Journal of Advanced Natural Sciences and Engineering Researches, vol. 7, no. 9, pp. 98–103, Oct. 2023, doi: 10.59287/ijanser.1516.
N. Annuš, “Usability of Artificial Intelligence to Create Predictive Models in Education,” presented at the 15th International Conference on Education and New Learning Technologies, Palma, Spain, Jul. 2023, pp. 5061–5065. doi: 10.21125/edulearn.2023.1328.
S. Szénási, G. Légrádi, and B. Vígh, “Machine Learning-Assisted Approach for Optimizing Step Size of Hill Climbing Algorithm,” in 2024 IEEE 18th International Symposium on Applied Computational Intelligence and Informatics (SACI), Timisoara, Romania: IEEE, May 2024, pp. 000425–000430. doi: 10.1109/SACI60582.2024.10619891.
J. Udvaros and N. Forman, “Artificial Intelligence and Education 4.0,” presented at the 17th International Technology, Education and Development Conference, Valencia, Spain, Mar. 2023, pp. 6309–6317. doi: 10.21125/inted.2023.1670.
K. A. Tran, O. Kondrashova, A. Bradley, E. D. Williams, J. V. Pearson, and N. Waddell, “Deep learning in cancer diagnosis, prognosis and treatment selection,” Genome Medicine, vol. 13, no. 1, p. 152, Sep. 2021, doi: 10.1186/s13073-021-00968-x.
A. Mujumdar and V. Vaidehi, “Diabetes Prediction using Machine Learning Algorithms,” Procedia Computer Science, vol. 165, pp. 292–299, Jan. 2019, doi: 10.1016/j.procs.2020.01.047.
I. Tasin, T. U. Nabil, S. Islam, and R. Khan, “Diabetes prediction using machine learning and explainable AI techniques,” Healthc Technol Lett, vol. 10, no. 1–2, pp. 1–10, Dec. 2022, doi: 10.1049/htl2.12039.
L. Végh, O. Takáč, K. Czakóová, D. Dancsa, and M. Nagy, “Comparative Analysis of Machine Learning Classification Models in Predicting Cardiovascular Disease,” International Journal of Advanced Natural Sciences and Engineering Researches, vol. 8, no. 6, pp. 23–31, Jul. 2024.
S. Subramani et al., “Frontiers | Cardiovascular diseases prediction by machine learning incorporation with deep learning”, doi: 10.3389/fmed.2023.1150933.
N. B. Bahadure, R. Khomane, and A. Nittala, “Anemia Detection and Classification from Blood Samples Using Data Analysis and Deep Learning,” Automatika, vol. 65, no. 3, pp. 1163–1176, Jul. 2024, doi: 10.1080/00051144.2024.2352317.
A. Kovacevic, A. Lakota, L. Kuka, E. Becic, A. Smajovic, and L. G. Pokvic, “Application of Artificial Intelligence in Diagnosis and Classification of Anemia,” in 2022 11th Mediterranean Conference on Embedded Computing (MECO), Budva, Montenegro: IEEE, Jun. 2022, pp. 1–4. doi: 10.1109/MECO55406.2022.9797180.
P. T. Dalvi and N. Vernekar, “Anemia Detection Using Ensemble Learning Techniques and Statistical Models,” in 2016 IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), Bangalore, India: IEEE, May 2016, pp. 1747–1751. doi: 10.1109/RTEICT.2016.7808133.
T. Karagül Yıldız, N. Yurtay, and B. Öneç, “Classifying Anemia Types Using Artificial Learning Methods,” Engineering Science and Technology, an International Journal, vol. 24, no. 1, pp. 50–70, Feb. 2021, doi: 10.1016/j.jestch.2020.12.003.
S. Pullakhandam and S. McRoy, “Classification and Explanation of Iron Deficiency Anemia from Complete Blood Count Data Using Machine Learning,” BioMedInformatics, vol. 4, no. 1, Art. no. 1, Mar. 2024, doi: 10.3390/biomedinformatics4010036.
R. Vohra, A. Hussain, A. K. Dudyala, J. Pahareeya, and W. Khan, “Multi-Class Classification Algorithms for the Diagnosis of Anemia in an Outpatient Clinical Setting,” PLoS One, vol. 17, no. 7, p. e0269685, Jul. 2022, doi: 10.1371/journal.pone.0269685.
“Anemia Types Classification.” Accessed: Aug. 14, 2024. [Online]. Available: https://www.kaggle.com/datasets/ehababoelnaga/anemia-types-classification
“SPSS Software | IBM.” Accessed: Aug. 15, 2024. [Online]. Available: https://www.ibm.com/spss
“MATLAB.” Accessed: Aug. 14, 2024. [Online]. Available: https://www.mathworks.com/products/matlab.html
“Train models to classify data using supervised machine learning - MATLAB.” Accessed: Aug. 15, 2024. [Online]. Available: https://www.mathworks.com/help/stats/classificationlearner-app.html
A. Viloria, O. B. Pineda Lezama, and N. Mercado-Caruzo, “Unbalanced data processing using oversampling: Machine Learning,” Procedia Computer Science, vol. 175, pp. 108–113, Jan. 2020, doi: 10.1016/j.procs.2020.07.018.
R. Mohammed, J. Rawashdeh, and M. Abdullah, “Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results,” in 2020 11th International Conference on Information and Communication Systems (ICICS), Apr. 2020, pp. 243–248. doi: 10.1109/ICICS49469.2020.239556.