Novel SMote Based Ensemble Model for Medical Multi Class Imbalanced Data Set
Abstract views: 45 / PDF downloads: 38
Abstract
Class imbalance, defined as a difference in the number of occurrences of the various classes in
the issue, is present in many real-world classification datasets. Classifiers are known to suffer from this
problem due to their accuracy-oriented design, which leads the minority class to be neglected. To overcome
this issue, a number of balancing approaches have been widely adopted.
In this study, a classification system incorporating dynamic resampling approaches for the classification of
imbalanced medical datasets is suggested.The major goal of our master thesis is to shed light on multi-class
imbalanced data issues by adopting resampling techniques from SMOTE extensions for the classification
of imbalanced multi-class data.
To beneficate of the different generated synthesis samples, our contribution consists on combining without
duplication all datasets. The final decision will be ensured by the integration of a hybrid ensemble approach
dubbed SMOTEBagging.
An empirical investigation on five imbalanced datasets was done to attain this objective and to evaluate the
performance of the SMOTE extension methods before and after rebalancing data. As a first stage, a variety
of resampling techniques were used to rebalance the data. Then, in a non-repetitive data fusion procedure,
we integrate the results of each approach, and finally, we employ the resulting dataset to work with a hybrid
ensemble method called SMOTEBagging.
The many tests we conducted on diverse datasets that the suggested approach performs quite well.
Downloads
References
Han, Hui, Wen-Yuan Wang, and Bing-Huan Mao. "Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning." International conference on intelligent computing. Berlin, Heidelberg: Springer Berlin Heidelberg, 2005.
Brandt, Jakob, and Emil Lanzén. "A comparative review of SMOTE and ADASYN in imbalanced data classification." (2021).
He, Haibo, and Edwardo A. Garcia. "Learning from imbalanced data." IEEE Transactions on knowledge and data engineering 21.9 (2009): 1263-1284.
Cheriguene, S., Azizi, A., & Ziani, A. (2016, December). A two stage Classifier Selection Ensemble based on mRMR Algorithm and Diversity Measures. In the 2nd Conference on Computing Systems and Applications, Algiers, Algeria
Rout, Neelam, Debahuti Mishra, and Manas Kumar Mallick. "Handling imbalanced data: a survey." International Proceedings on Advances in Soft Computing, Intelligent Systems and Applications: ASISA 2016. Springer Singapore, 2018.
Cheriguene, S., Azizi, N., Dey, N., Ashour, A. S., Mnerie, C. A., Olariu, T., &a Shi, F. (2016, August). Classifier Ensemble Selection Based on mRMR Algorithm and Diversity Measures: An Application of Medical Data Classification. In International Workshop Soft Computing Applications (pp. Bibliographie 155 375-384). Springer, Cham
LAMARI, Mouna, AZIZI, Nabiha, HAMMAMI, Nacer Eddine, et al. SMOTE–ENN-based data sampling and improved dynamic ensemble selection for imbalanced medical data classification. In : Advances on Smart and Soft Computing: Proceedings of ICACIn 2020. Springer Singapore, 2021. p. 37-49.
Tanha, Jafar, et al. "Boosting methods for multi-class imbalanced data classification: an experimental review." Journal of Big Data 7 (2020): 1-47.
Krawczyk, Bartosz, et al. "Undersampling with support vectors for multi-class imbalanced data classification." 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 2021.