A Comparative Study of Transformer-Based Text Classification Models Using Hugging Face
Keywords:
Transformers, BERT Model, DistilBERT, Text Classification, Hugging FaceAbstract
Transformer-based language models have transformed the landscape of text classification by
leveraging large-scale pretraining and self-attention mechanisms to capture deep contextual relationships
in text. This study presents a comparative evaluation of three prominent transformer architectures—
BERT-base-uncased, DistilBERT-base-uncased, and RoBERTa-base—applied to sentiment classification
using the IMDb dataset. A unified fine-tuning pipeline was implemented in Google Colab using the
Hugging Face ecosystem to ensure fairness and reproducibility across models. Experimental results
demonstrate that RoBERTa-base achieved the highest performance, with an accuracy of 95.33% and an
F1-score of 0.9536, followed by BERT-base and DistilBERT. While BERT delivered strong and balanced
results, DistilBERT provided comparable performance with significantly fewer parameters, making it
suitable for resource-constrained environments. The findings highlight the impact of pretraining strategies
on downstream performance and provide practical insights for selecting transformer architectures in real
world text classification scenarios.
Downloads
References
Zhang, Y., & Wallace, B. C. (2015). A sensitivity analysis of (and practitioners' guide to) convolutional neural networks for sentence classification. arXiv preprint arXiv:1510.03820.
Almeida, T. A., Hidalgo, J. M. G., & Yamakami, A. (2011). Contributions to the study of SMS spam filtering: New collection and results. Proceedings of the 11th ACM Symposium on Document Engineering, 259–262.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1–47.
Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT.
Liu, Y., Ott, M., Goyal, N., et al. (2019). RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692.
Wolf, T., Debut, L., Sanh, V., et al. (2020). Transformers: State-of-the-art natural language processing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 38–45.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT.
Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.
Liu, Y., Ott, M., Goyal, N., et al. (2019). RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692.
Sun, C., Qiu, X., Xu, Y., & Huang, X. (2019). How to fine-tune BERT for text classification? Chinese Computational Linguistics, 194–206.
Conneau, A., Kiela, D., Schwenk, H., Barrault, L., & Bordes, A. (2017). Supervised learning of universal sentence representations from natural language inference data. Proceedings of EMNLP.
Jiao, X., Yin, Y., Shang, L., et al. (2020). TinyBERT: Distilling BERT for natural language understanding. Findings of EMNLP, 4163–4174.
Tay, Y., Dehghani, M., Bahri, D., & Metzler, D. (2022). Efficient transformers: A survey. ACM Computing Surveys, 55(6), 1–28.
Lee, J., Yoon, W., Kim, S., et al. (2020). BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4), 1234–1240.
Chalkidis, I., Androutsopoulos, I., & Aletras, N. (2019). Neural legal judgment prediction. Proceedings of ACL, 4317–4323.
Brown, G., Patel, A., & Saini, V. (2021). Cybersecurity threat detection using transformer-based models. IEEE Access, 9, 142738–142749.