Towards Hybrid Strategies for Classifying Subjects in Scientific Publications: Comparing Traditional, ML and Model-Driven Methods


Keywords:
Review, Scientific Document Analysis, Subject Classification, Traditional Model, Machine Learning-Based Subject Classification, Model-Driven Subject ClassificationAbstract
In today’s rapidly evolving research landscape, the exponential growth of digital content has
rendered the subject classification of academic publications a critical and increasingly complex task. This
study systematically examines three predominant methodologies employed in subject classification for
scholarly articles: traditional, machine learning-based, and model-driven methods. Traditional methods,
grounded in manual curation and expert-driven evaluation, offer nuanced understanding of complex subject
matter but face limitations in scalability and objectivity. Machine learning-based methods provide
automated processing and efficient handling of large-scale datasets, enabling greater consistency and
adaptability. Meanwhile, model-driven methods—particularly those leveraging natural language
processing and deep neural architectures—offer enhanced capability in detecting latent patterns within
high-dimensional text data.
By integrating these methods, the study proposes a comprehensive and balanced classification
framework that synthesizes the interpretive depth of manual systems, the algorithmic efficiency of machine
learning, and the advanced analytical potential of deep learning models. This synergy facilitates broader
thematic coverage and improved classification accuracy, highlighting how each methodology uniquely
contributes to the overall effectiveness of content analysis. Ultimately, the study not only guides researchers
in selecting suitable classification strategies for academic literature, but also emphasizes how
methodological integration can enhance both the precision and productivity of research workflows. Such
hybrid approaches are essential for navigating the increasing complexity of academic knowledge in a
digitally-driven era.
Downloads
References
Hanyurwimfura, D., et al., Topics and Search Based Classification of Scientific Publications. Journal of Computational and Theoretical Nanoscience, 2015. 12(12): p. 5210-5222.
Griffiths, T.L. and M. Steyvers, Finding scientific topics. Proceedings of the National academy of Sciences, 2004. 101(suppl_1): p. 5228-5235.
Anupriya, P. and S. Karpagavalli. LDA based topic modeling of journal abstracts. in 2015 International Conference on Advanced Computing and Communication Systems. 2015. IEEE.
Sing, D.C., L.N. Metz, and S. Dudli, Machine learning-based classification of 38 years of spine-related literature into 100 research topics. Spine, 2017. 42(11): p. 863-870.
Makagonov, P., Alexandrov, M. & Gelbukh, A. (2004). Clustering abstracts instead of full texts. Metin, Konuşma ve Diyalog, LNCS 3206, 129–135, Springer.
Singh, P., et al., Revisiting subject classification in academic databases: A comparison of the classification accuracy of web of science, scopus & dimensions. Journal of Intelligent & Fuzzy Systems, 2020. 39(2): p. 2471-2476.
Bornmann, L., Field classification of publications in Dimensions: A first case study testing its reliability and validity. Scientometrics, 2018. 117: p. 637-640.
Hillard, D., S. Purpura, and J. Wilkerson, Computer-assisted topic classification for mixed-methods social science research. Journal of Information Technology & Politics, 2008. 4(4): p. 31-46.
Scott, M.L. and M.L. SCOTT, Dewey decimal classification. Libraries Unlimited, 1998.
Jenkins, C., et al., Automatic classification of Web resources using Java and Dewey decimal classification. Computer Networks and ISDN Systems, 1998. 30(1-7): p. 646-648.
Baum, M.A., Soft news goes to war: Public opinion and American foreign policy in the new media age. 2011: Princeton University Press.
Carneiro, P., et al., “Identify-to-reject”: A specific strategy to avoid false memories in the DRM paradigm. Memory & cognition, 2012. 40(2): p. 252-265.
Baumgartner, F.R., B.D. Jones, and J.D. Wilkerson, Studying policy dynamics. Policy dynamics, 2002: p. 29-46.
Segal, J.A. and H.J. Spaeth, The Supreme Court and the attitudinal model revisited. 2002: Cambridge University Press.
Adler, E.S. and J.D. Wilkerson, Intended consequences: Jurisdictional reform and issue control in the US House of Representatives. Legislative Studies Quarterly, 2008. 33(1): p. 85-112.
Kabakçı-Alyeşil, D., Yitmez, B. G., & Faydaoğlu, Ş. (2023). Matematiksel dil ile ilgili makalelerin incelenmesi: Bir içerik analizi. Muş Alparslan Üniversitesi Eğitim Fakültesi Dergisi, 3(1), 1–24.
Serin Güner, A. P., & Gökmen, H. (2020). Mimarlık ve edebiyat ilişkisine dair yapılmış akademik çalışmaların bir sınıflandırması. İDEALKENT, 11(31), 1722–1763.
Alkan, A., & Sevli, O. (2023). Türkiye’de yapay zekâ alanında yazılmış yüksek lisans tezlerinin incelenmesi. Osmaniye Korkut Ata Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 6(1), 931–947.
Güler, M., & Mert, O. (2022). Türkçe eğitimi alanında yenilenmiş Bloom taksonomisini temel alarak yapılan akademik çalışmaların incelenmesi. Bayburt Eğitim Fakültesi Dergisi, 17(35), 1089–1118.
Altunçekiç, A. (2020). 2010 2020 yılları arasında mobil öğrenme çalışmalarının içerik analiz yöntemi ile değerlendirilmesi: Türkiye örneği. Gazi Üniversitesi Gazi Eğitim Fakültesi Dergisi, 40(3), 1087–1104.
Gözüm, A. G. (2019). Girişimcilik alanında yazılan akademik makalelerin kategorik olarak değerlendirilmesi: Girişimciliğin Türkiye’deki akademik örüntüsü. Ufuk Üniversitesi Sosyal Bilimler Enstitüsü Dergisi, 8(15), 367–385.
Kaynar, O., et al. Makine öğrenmesi yöntemleri ile Duygu Analizi. in International Artificial Intelligence and Data Processing Symposium (IDAP'16). 2016.
Yuan, H., et al., A detection method for android application security based on TF-IDF and machine learning. Plos one, 2020. 15(9): p. e0238694.
Raschka, S., Model evaluation, model selection, and algorithm selection in machine learning. arXiv preprint arXiv:1811.12808, 2018.
Novaković, J.D., et al., Evaluation of classification models in machine learning. Theory and Applications of Mathematics & Computer Science, 2017. 7(1): p. 39.
Grossman, D. and P. Domingos. Learning Bayesian network classifiers by maximizing conditional likelihood. in Proceedings of the twenty-first international conference on Machine learning. 2004.
Berrar, D., Bayes’ theorem and naive Bayes classifier. Encyclopedia of bioinformatics and computational biology: ABC of bioinformatics, 2018. 403: p. 412.
Rymarczyk, T., et al., Logistic regression for machine learning in process tomography. Sensors, 2019. 19(15): p. 3400.
Charbuty, B. and A. Abdulazeez, Classification based on decision tree algorithm for machine learning. Journal of Applied Science and Technology Trends, 2021. 2(01): p. 20-28.
Mahesh, B., Machine learning algorithms-a review. International Journal of Science and Research (IJSR).[Internet], 2020. 9(1): p. 381-386.
Tosunoğlu, E., Yılmaz, R., Özeren, E., & Sağlam, Z. (2021). Eğitimde makine öğrenmesi: Araştırmalardaki güncel eğilimler üzerine inceleme. Ahmet Keleşoğlu Eğitim Fakültesi Dergisi, 3(2), 178–199.
Taşkıran, F., & Kaya, E. (2022). Doğal dil işleme ile akademik metin kümeleme. Konya Mühendislik Bilimleri Dergisi, 10(2022), 41–51.
Kat, B. (2023). Natural language processing for the Turkish academic texts in the engineering field and development of a decision support system: The case of TÜBİTAK project proposals. Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi, 38(3), 1879-1892. https://doi.org/10.17341/gazimmfd.1132053
Kat, B. (2023). Natural language processing for the Turkish academic texts in the engineering field and development of a decision support system: The case of TÜBİTAK project proposals. Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi, 38(3), 1879-1892. https://doi.org/10.17341/gazimmfd.1132053
Lezama-Sánchez, A.L., M. Tovar Vidal, and J.A. Reyes-Ortiz, An Approach Based on Semantic Relationship Embeddings for Text Classification. Mathematics, 2022. 10(21): p. 4161.
Ramos, F. and J. Vélez, Integración de técnicas de procesamiento de lenguaje natural a través de servicios web. Universidad Nacional del Centro de la provincia de Buenos Aires, 2016.
Lezama-Sánchez, A.L., M. Tovar Vidal, and J.A. Reyes-Ortiz, Integrating Text Classification into Topic Discovery Using Semantic Embedding Models. Applied Sciences, 2023. 13(17): p. 9857.
Athiwaratkun, B., A.G. Wilson, and A. Anandkumar, Probabilistic fasttext for multi-sense word embeddings. arXiv preprint arXiv:1806.02901, 2018.
Devlin, J., et al., Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
Mete, B.R. and T. Ensari. Flower classification with deep CNN and machine learning algorithms. in 2019 3rd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT). 2019. IEEE.