Exploring Somali Sentiment Analysis: A Resource-Light Approach for Small-scale Text Classification
Abstract views: 198 / PDF downloads: 407
DOI:
https://doi.org/10.59287/icaens.1069Keywords:
Somali Language, Sentiment Analysis, NLP, Under-Resourced Languages, Resource-Light Approach, Tokenization, Stemming, Stopwords Removal, Negation Handling, Sentiment Classification ModelAbstract
Sentiment analysis, a fundamental task in natural language processing (NLP), plays a crucial role in understanding people's opinions and emotions expressed in textual data. While sentiment analysis has been extensively studied for major languages, under-resourced languages like Somali have received limited attention in this domain. This paper aims to address this research gap by proposing a resourcelight approach for sentiment analysis in Somali, which is tailored to the language's unique characteristics and limited linguistic resources. We present a methodology that combines lexicon-based methods and feature engineering techniques to effectively extract sentiment information from Somali text. A sentiment-annotated dataset was created through crowdsourcing, enabling the training and evaluation of a sentiment classification model specifically designed for Somali. Experimental results demonstrate the competitive performance of our approach compared to existing sentiment analysis techniques for underresourced languages. The findings highlight the feasibility of sentiment analysis in Somali, even with a small-scale dataset, and shed light on the implications for sentiment analysis in other under-resourced languages. This research contributes to the advancement of sentiment analysis capabilities for underresourced languages, empowering researchers and practitioners to gain insights from sentiment information in diverse linguistic contexts.