Comparative Analysis of Machine Learning Techniques for Hate Speech  Identification on Social Media

Sajida Perveen

Authors

Sajida Perveen National Textile University

Keywords:

Machine Learning, Stochastic Gradient Descent (SGD), Decision Tree (C4.5), K-Nearest Neighbors (KNN), Hate Speech, Offensive Language, NLP And Social Media

Abstract

Identification of hate speech on social media has become a critical challenge due to its detrimental impact on individuals and communities. Machine learning models have emerged as a potential solution to identify and mitigate hate speech. This research aims to conduct a comparative analysis among various Machine Learning (ML) techniques for hate speech identification, with the primary objective of identifying an optimal algorithmic combination that is efficient, simple, and easy to implement while yielding optimal results. Stochastic Gradient Descent (SGD), Decision tree (C4.5) and KNN models were implemented to accomplish the task. This study utilizes a labelled dataset of 49159 tweets to detect hate speech. Accuracy, precision, recall, and F1-score measures were incorporated to evaluate the models' performance, and how well these models can differentiate between instances of hate speech and those that are not. The Stochastic Gradient Descent (SGD) algorithm demonstrated remarkable accuracy (96%), precision (94%), and recall (96%) on the test dataset, highlighting its efficacy in hate speech detection compared to Decision Tree (DT) and K-Nearest Neighbors (KNN). These results pave the way for developing robust solutions, contributing to a safer and more inclusive digital environment.

Downloads

Download data is not yet available.

Author Biography

Sajida Perveen, National Textile University

Department of Computer Science, Faisalabad, Pakistan

References

Di Fátima B, Munoriyarwa A, Gilliland A, Msughter AE, Vizcaíno-Verdú A, Gökaliler E, Capoano E, Yu H, Alikılıç İ, González-Aguilar JM, Tsene L. (2023). Hate Speech on Social Media: A Global Approach. Pontificia Universidad Católica del Ecuador.

Beausoleil LE.( 2019). Free, hateful, and posted: Rethinking First Amendment protection of hate speech in a social media world. BCL Rev, 60:2101.

Ruwandika ND, Weerasinghe AR.(2018). Identification of hate speech in social media. In2018 18th international conference on advances in ICT for emerging regions (ICTer), (pp. 273-278). IEEE.

Gagliardone I, Gal D, Alves T, Martinez G. (2015). Countering online hate speech. Unesco Publishing.

Tontodimamma A, Nissi E, Sarra A, Fontanella L. (2021). Thirty years of research into hate speech: topics of interest and their evolution. Scientometrics. 126:157-79.

Fortuna P, Nunes S.( 2018). A survey on automatic detection of hate speech in text. ACM Computing Surveys (CSUR). 31;51(4):1-30.

Abro S, Shaikh S, Khand ZH, Zafar A, Khan S, Mujtaba G. (2020).Automatic hate speech detection using machine learning: A comparative study. International Journal of Advanced Computer Science and Applications,11(8).

Mazari AC, Kheddar H. (2023). Deep learning-based analysis of Algerian dialect dataset targeted hate speech, offensive language and cyberbullying. International Journal of Computing and Digital Systems.

Elzayady H, Mohamed MS, Badran KM, Salama GI. (2023).A hybrid approach based on personality traits for hate speech detection in Arabic social media. International Journal of Electrical and Computer Engineering,1;13(2):1979.

Vakili M, Ghamsari M, Rezaei M. ( 2020). Performance analysis and comparison of machine and deep learning algorithms for IoT data classification. arXiv preprint arXiv:2001.09636.

Kamal M, Bablu TA. (2022). Machine Learning Models for Predicting Click-through Rates on social media: Factors and Performance Analysis. International Journal of Applied Machine Learning and Computational Intelligence. 12(4):1-4.

Akuma S, Lubem T, Adom IT. (2022) Comparing Bag of Words and TF-IDF with different models for hate speech detection from live tweets. International Journal of Information Technology. Dec;14(7):3629-35.

Muneer A, Fati SM. (2020). A comparative analysis of machine learning techniques for cyberbullying detection on twitter. Future Internet. 29;12(11):187.

Song YY, Ying LU. (2015). Decision tree methods: applications for classification and prediction. Shanghai archives of psychiatry. 4;27(2):130.

Kovatchev V, Gupta S, Das A, Lease M. (2022). Fairly Accurate: Learning Optimal Accuracy vs. Fairness Tradeoffs for Hate Speech Detection. arXiv preprint arXiv:2204.07661.

Comparative Analysis of Machine Learning Techniques for Hate Speech Identification on Social Media

Authors

Keywords:

Abstract

Downloads

Author Biography

Sajida Perveen, National Textile University

References

Downloads

Published

How to Cite

Issue

Section

Keywords

Information

Current Issue