Parameswari_faith_nagaraju@Dravidian-CodeMixFIRE: A machine-learning approach using n-grams in sentiment analysis for code-mixed texts: A case study in Tamil and Malayalam

No Thumbnail Available
Date
2020-01-01
Authors
Krishnamurthy, Parameswari
Varghese, Faith
Vuppala, Nagaraju
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Sentiment analysis is a fast growing research positioned to uncover the underlying meaning of a text by categorizing it into different levels. This paper is an attempt to decode the deeply entangled code-mixed Malayalam and Tamil datasets and classify its interlined meaning at five various levels. Along with the corpus creation, [1] propose a five-level classification for Malayalam and Tamil code-mixed datasets. In this paper, we follow the five-level annotated datasets and aim to solve the classification problem by implementing unigram and bigram knowledge with a Multinomial Naive Bayes model. Our model scores an F1-score of 0.55 for Tamil and 0.48 for Malayalam.
Description
Keywords
A Multinomial Naive Bayes model, Code-mixed texts, Malayalam, N-gram, Sentiment Analysis, Tamil
Citation
CEUR Workshop Proceedings. v.2826