Comparative Analysis of Machine Learning Techniques for Hate Speech Identification on Social Media
Abstract views: 56 / PDF downloads: 47
Keywords:
Machine Learning, Stochastic Gradient Descent (SGD), Decision Tree (C4.5), K-Nearest Neighbors (KNN), Hate Speech, Offensive Language, NLP And Social MediaAbstract
Identification of hate speech on social media has become a critical challenge due to its
detrimental impact on individuals and communities. Machine learning models have emerged as a potential
solution to identify and mitigate hate speech. This research aims to conduct a comparative analysis among
various Machine Learning (ML) techniques for hate speech identification, with the primary objective of
identifying an optimal algorithmic combination that is efficient, simple, and easy to implement while
yielding optimal results. Stochastic Gradient Descent (SGD), Decision tree (C4.5) and KNN models were
implemented to accomplish the task. This study utilizes a labelled dataset of 49159 tweets to detect hate
speech. Accuracy, precision, recall, and F1-score measures were incorporated to evaluate the models'
performance, and how well these models can differentiate between instances of hate speech and those that
are not. The Stochastic Gradient Descent (SGD) algorithm demonstrated remarkable accuracy (96%),
precision (94%), and recall (96%) on the test dataset, highlighting its efficacy in hate speech detection
compared to Decision Tree (DT) and K-Nearest Neighbors (KNN). These results pave the way for
developing robust solutions, contributing to a safer and more inclusive digital environment.
Downloads
References
Di Fátima B, Munoriyarwa A, Gilliland A, Msughter AE, Vizcaíno-Verdú A, Gökaliler E, Capoano E, Yu H, Alikılıç İ, González-Aguilar JM, Tsene L. (2023). Hate Speech on Social Media: A Global Approach. Pontificia Universidad Católica del Ecuador.
Beausoleil LE.( 2019). Free, hateful, and posted: Rethinking First Amendment protection of hate speech in a social media world. BCL Rev, 60:2101.
Ruwandika ND, Weerasinghe AR.(2018). Identification of hate speech in social media. In2018 18th international conference on advances in ICT for emerging regions (ICTer), (pp. 273-278). IEEE.
Gagliardone I, Gal D, Alves T, Martinez G. (2015). Countering online hate speech. Unesco Publishing.
Tontodimamma A, Nissi E, Sarra A, Fontanella L. (2021). Thirty years of research into hate speech: topics of interest and their evolution. Scientometrics. 126:157-79.
Fortuna P, Nunes S.( 2018). A survey on automatic detection of hate speech in text. ACM Computing Surveys (CSUR). 31;51(4):1-30.
Abro S, Shaikh S, Khand ZH, Zafar A, Khan S, Mujtaba G. (2020).Automatic hate speech detection using machine learning: A comparative study. International Journal of Advanced Computer Science and Applications,11(8).
Mazari AC, Kheddar H. (2023). Deep learning-based analysis of Algerian dialect dataset targeted hate speech, offensive language and cyberbullying. International Journal of Computing and Digital Systems.
Elzayady H, Mohamed MS, Badran KM, Salama GI. (2023).A hybrid approach based on personality traits for hate speech detection in Arabic social media. International Journal of Electrical and Computer Engineering,1;13(2):1979.
Vakili M, Ghamsari M, Rezaei M. ( 2020). Performance analysis and comparison of machine and deep learning algorithms for IoT data classification. arXiv preprint arXiv:2001.09636.
Kamal M, Bablu TA. (2022). Machine Learning Models for Predicting Click-through Rates on social media: Factors and Performance Analysis. International Journal of Applied Machine Learning and Computational Intelligence. 12(4):1-4.
Akuma S, Lubem T, Adom IT. (2022) Comparing Bag of Words and TF-IDF with different models for hate speech detection from live tweets. International Journal of Information Technology. Dec;14(7):3629-35.
Muneer A, Fati SM. (2020). A comparative analysis of machine learning techniques for cyberbullying detection on twitter. Future Internet. 29;12(11):187.
Song YY, Ying LU. (2015). Decision tree methods: applications for classification and prediction. Shanghai archives of psychiatry. 4;27(2):130.
Kovatchev V, Gupta S, Das A, Lease M. (2022). Fairly Accurate: Learning Optimal Accuracy vs. Fairness Tradeoffs for Hate Speech Detection. arXiv preprint arXiv:2204.07661.