Breast Cancer Prediction with Hybrid Filter-Wrapper Feature Selection
Abstract views: 33 / PDF downloads: 26
Keywords:
Feature Selection, Filter Methods, Wrapper Methods, Hybrid Feature Selection, Hybrid Filter-WrapperAbstract
Feature selection, the process of selecting a subset of relevant features for model construction,
plays a pivotal role in machine learning tasks, particularly in enhancing model efficiency and performance.
It aids in mitigating the curse of dimensionality, reducing computational costs, and improving the
generalization of models. Among the various methods employed in feature selection, both filter and
wrapper methods stand out for their effectiveness. However, the integration of hybrid versions of these
methods holds promising prospects in further enhancing model performance. In a recent study utilizing a
breast cancer dataset, encompassing 30 features, the utilization of traditional methods yielded an ROC AUC
score of 0.943. Upon employing the hybrid feature selection technique proposed herein, the ROC AUC
score surged to 0.954 after selecting a reduced set of 10 features. This significant improvement underscores
the efficacy of the proposed method in enhancing model performance, thus affirming its superiority in
optimizing predictive accuracy and robustness.
Downloads
References
Miao J. and Niu L., "A survey on feature selection," Procedia computer science, vol. 91, no. pp. 919-926, 2016.
Shardlow M., "An analysis of feature selection techniques," The University of Manchester, vol. 1, no. 2016, pp. 1-7, 2016.
Wang S., Tang J., and Liu H., "Feature Selection," vol. no. pp. 2017.
Kalousis A., Prados J., and Hilario M., "Stability of feature selection algorithms: a study on high-dimensional spaces," Knowledge and information systems, vol. 12, no. pp. 95-116, 2007.
Remeseiro B. and Bolon-Canedo V., "A review of feature selection methods in medical applications," Computers in biology and medicine, vol. 112, no. pp. 103375, 2019.
Chandrashekar G. and Sahin F., "A survey on feature selection methods," Computers & Electrical Engineering, vol. 40, no. 1, pp. 16-28, 2014.
Li J., Cheng K., Wang S., Morstatter F., Trevino R.P., Tang J., and Liu H., "Feature selection: A data perspective," ACM computing surveys (CSUR), vol. 50, no. 6, pp. 1-45, 2017.
Kohavi R. and John G.H., "Wrappers for feature subset selection," Artificial intelligence, vol. 97, no. 1-2, pp. 273-324, 1997.
Liu H., Motoda H., Setiono R., and Zhao Z., "Feature selection: An ever evolving frontier in data mining," Feature selection in data mining, pp. 4-13, 2010. pp. 4-13, 2010.
Ramchandran A. and Sangaiah A.K., Unsupervised anomaly detection for high dimensional data—An exploratory analysis, Elsevier, 2018.
Jimenez-del-Toro O., Otálora S., Andersson M., Eurén K., Hedlund M., Rousson M., Müller H., and Atzori M., Analysis of histopathology images: From traditional machine learning to deep learning, Elsevier, 2017.
Dey N., Borra S., Ashour A.S., and Shi F., Machine learning in bio-signal analysis and diagnostic imaging, Academic Press, 2018.
Talavera L., "An evaluation of filter and wrapper methods for feature selection in categorical clustering," International Symposium on Intelligent Data Analysis, pp. 440-451, 2005. pp. 440-451, 2005.
Dash M. and Liu H., "Feature selection for classification," Intelligent data analysis, vol. 1, no. 1-4, pp. 131-156, 1997.
Zheng A. and Casari A., Feature engineering for machine learning: principles and techniques for data scientists, " O'Reilly Media, Inc.", 2018.
Ang J.C., Mirzal A., Haron H., and Hamed H.N.A., "Supervised, unsupervised, and semi-supervised feature selection: a review on gene selection," IEEE/ACM transactions on computational biology and bioinformatics, vol. 13, no. 5, pp. 971-989, 2015.
Opitz D. and Maclin R., "Popular ensemble methods: An empirical study," Journal of artificial intelligence research, vol. 11, no. pp. 169-198, 1999.
Saeys Y., Abeel T., and Van de Peer Y., "Robust feature selection using ensemble feature selection techniques," Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2008, Antwerp, Belgium, September 15-19, 2008, Proceedings, Part II 19, pp. 313-325, 2008. pp. 313-325, 2008.
Kabir M.M., Islam M.M., and Murase K., "A new wrapper feature selection approach using neural network," Neurocomputing, vol. 73, no. 16-18, pp. 3273-3283, 2010.
Peng Y., Wu Z., and Jiang J., "A novel feature selection approach for biomedical data classification," Journal of Biomedical Informatics, vol. 43, no. 1, pp. 15-23, 2010.
Jović A., Brkić K., and Bogunović N., "A review of feature selection methods with applications," 2015 38th international convention on information and communication technology, electronics and microelectronics (MIPRO), pp. 1200-1205, 2015. pp. 1200-1205, 2015.
Whitney A.W., "A direct method of nonparametric measurement selection," IEEE transactions on computers, vol. 100, no. 9, pp. 1100-1103, 1971.
Inza I., Larranaga P., Blanco R., and Cerrolaza A.J., "Filter versus wrapper gene selection approaches in DNA microarray domains," Artificial intelligence in medicine, vol. 31, no. 2, pp. 91-103, 2004.
Marill T. and Green D., "On the effectiveness of receptors in recognition systems," IEEE transactions on Information Theory, vol. 9, no. 1, pp. 11-17, 1963.
Ladha L. and Deepa T., "Feature selection methods and algorithms," International journal on computer science and engineering, vol. 3, no. 5, pp. 1787-1797, 2011.
Pudil P., Novovičová J., and Kittler J., "Floating search methods in feature selection," Pattern Recognition Letters, vol. 15, no. 11, pp. 1119-1125, 1994.
Mlambo W., Cheruiyot W.K., and Kimwele M.W., "A survey and comparative study of filter and wrapper feature selection techniques," Int. J. Eng. Sci, vol. 5, no. 8, pp. 57-67, 2016.
Galatenko V., Shkurnikov M.Y., Samatov T., Galatenko A., Mityakina I., Kaprin A., Schumacher U., and Tonevitsky A., "Highly informative marker sets consisting of genes with low individual degree of differential expression," Scientific Reports, vol. 5, no. 1, pp. 14967, 2015.
Galatenko V.V., Maltseva D.V., Galatenko A.V., Rodin S., and Tonevitsky A.G., "Cumulative prognostic power of laminin genes in colorectal cancer," BMC Medical Genomics, vol. 11, no. 1, pp. 77-81, 2018.
Kullback S. and Leibler R.A., "On information and sufficiency," The annals of mathematical statistics, vol. 22, no. 1, pp. 79-86, 1951.
St L. and Wold S., "Analysis of variance (ANOVA)," Chemometrics and intelligent laboratory systems, vol. 6, no. 4, pp. 259-272, 1989.
AJPAS A., "A Feature Selection Based on One-Way-Anova for Microarray Data Classification," AJPAS JOURNAL, vol. 3, no. pp. 1-6, 2016.
Shannon C.E., "A mathematical theory of communication," The Bell system technical journal, vol. 27, no. 3, pp. 379-423, 1948.
Peng H., Long F., and Ding C., "Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy," IEEE transactions on pattern analysis and machine intelligence, vol. 27, no. 8, pp. 1226-1238, 2005.
Kass G.V., "An exploratory technique for investigating large quantities of categorical data," Journal of the Royal Statistical Society: Series C (Applied Statistics), vol. 29, no. 2, pp. 119-127, 1980.
Esmael B., Arnaout A., Fruhwirth R., and Thonhauser G., "A statistical feature-based approach for operations recognition in drilling time series," International Journal of Computer Information Systems and Industrial Management Applications, vol. 4, no. 6, pp. 100-108, 2012.
Breiman L., "Random forests," Machine learning, vol. 45, no. pp. 5-32, 2001.
Breiman L., "Bagging predictors," Machine learning, vol. 24, no. 2, pp. 123-140, 1996.
Biau G. and Scornet E., "A random forest guided tour," Test, vol. 25, no. pp. 197-227, 2016.