Sentiment Analysis and Rating Prediction for App Reviews Using Transformer-based Models

Abstract views: 299 / PDF downloads: 155


  • Gokberk ESER Gazi University
  • Cagri SAHIN Gazi University


Sentiment Analysis, NLP, User App Review, Transformer Models, Classification, Rating Prediction


In this study, we present the sentiment analysis of Spotify app reviews, the implementation of Natural Language Processing (NLP) methods, and the use of transformer-based models including BERT, DistilBERT, RoBERTa, and XLM-RoBERTa. Comprehensive preprocessing, including emoji removal, typo correction, and tokenization, was utilized for processing Spotify app reviews from the Google Play Store. Sentiments were analyzed using the VADER Sentiment Intensity Analyzer, categorized into positive, neutral, and negative. Models were assessed for accuracy, precision, recall, and F1-score. DistilBERT achieved the highest accuracy and recall 71.68%, while XLM-RoBERTa demonstrated the best balance with an F1-score of 69.24% in predicting Spotify app ratings.


Download data is not yet available.

Author Biographies

Gokberk ESER, Gazi University

Department of Computer Engineering, Ankara, Turkiye

Cagri SAHIN, Gazi University

Department of Computer Engineering, Ankara, Turkiye


Kerai, A. (2024). Cell Phone Usage Statistics: Mornings Are for Notifications. Retrieved March 20, 2024, from,

Wurmser, Y. (2020). The Majority of Americans’ Mobile Time Spent Takes Place in Apps. EMARKETER. Retrieved March 21, 2024, from

Mobile app revenue worldwide 2019-2027, by segment. (2023). Statista. Retrieved March 22, 2024, from

Turner, A. (2024). Spotify Users: How Many People Have Spotify? . BankMyCell. Retrieved March 26, 2024, from

Chen, N., Lin, J., Hoi, S. C. H., Xiao, X., & Zhang, B. (2014). AR-miner: mining informative reviews for developers from mobile app marketplace. Proceedings of the 36th International Conference on Software Engineering.

Panichella, S., Di Sorbo, A., Guzman, E., Visaggio, C. A., Canfora, G., & Gall, H. C. (2015). How can i improve my app? Classifying user reviews for software maintenance and evolution. 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

Ali, M., Joorabchi, M. E., & Mesbah, A. (2017,). Same App, Different App Stores: A Comparative Study. 2017 IEEE/ACM 4th International Conference on Mobile Software Engineering and Systems (MOBILESoft).

Noei, E., Zhang, F., & Zou, Y. (2021). Too Many User-Reviews! What Should App Developers Look at First? IEEE Transactions on Software Engineering, 47(2), 367–378.

Pagano, D., & Maalej, W. (2013). User feedback in the appstore: An empirical study. 2013 21st IEEE International Requirements Engineering Conference (RE).

Aljrees, T., Umer, M., Saidani, O., Almuqren, L., Ishaq, A., Alsubai, S., Eshmawi, A. A., & Ashraf, I. (2024). Contradiction in text review and apps rating: prediction using textual features and transfer learning. PeerJ Computer Science, 10, e1722.

Wong, W. H., Ismail, S., Arifin, M. A., Make, S. S. A., Wahab, M. H. A., & Shaharudin, S. M. (2021). Sentiment Analysis of Snapchat Application’s Reviews. 2021 2nd International Conference on Artificial Intelligence and Data Sciences (AiDAS).

Verma, P., Srivastava, R., Fatima, S., & Pratap, A. (2024). Sentiment Analysis on ChatGPT Reviews. 2024 IEEE International Students’ Conference on Electrical, Electronics and Computer Science (SCEECS).

Sultana, R., & Sarker, S. (2018). App Review Mining and Summarization. International Journal of Computer Applications, 179(38), 45–52.

Guzman, E., & Maalej, W. (2014). How Do Users Like This Feature? A Fine Grained Sentiment Analysis of App Reviews. 2014 IEEE 22nd International Requirements Engineering Conference (RE).

emoji — emoji documentation. [Online]. Available:

fastText. [Online]. Available:

LanguageTool. [Online]. Available:

spaCy · Industrial-strength Natural Language Processing in Python. [Online]. Available:

NLTK :: Natural Language Toolkit. [Online]. Available:

Hutto, C., & Gilbert, E. (2014). VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. Proceedings of the International AAAI Conference on Web and Social Media, 8(1), 216–225.

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.

Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., ... & Stoyanov, V. (2019). Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116.

Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. (2019). Optuna. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.

Transformers. [Online]. Available:




How to Cite

ESER, G., & SAHIN, C. (2024). Sentiment Analysis and Rating Prediction for App Reviews Using Transformer-based Models . International Journal of Advanced Natural Sciences and Engineering Researches, 8(4), 372–379. Retrieved from


