A Performance Study of Applications of AI in Toxic Comments Classification
PDF

Keywords

Toxic Comments
Machine Learning
Deep Learning
Word Embedding

How to Cite

Pavithra, K., and S. Rathi. 2023. “A Performance Study of Applications of AI in Toxic Comments Classification”. Journal of Artificial Intelligence and Capsule Networks 5 (2): 96-109. https://doi.org/10.36548/jaicn.2023.2.002.

Abstract

Social media websites and tweeting apps have seen a sharp rise in popularity in the recent years. One can express their opinions and sentiments about things, people, and events through these platforms. Arguments frequently start on social media platforms during discussions and debates and involve the usage of toxic comments, which are unpleasant, disrespectful, and hurtful statements. According to many, social networking sites must be able to identify these harmful comments. This research analyses several deep learning and machine learning methods like Convolutional Neural Network, Long Short -Term Memory, Support Vector Machine, Random Forest, and Naive Bayes for toxic comments classification along with the study that examines the effects of many word embedding methods including Word2Vector, Bag of Words, Global Vectors, Bidirectional Encoder Representations from Transformers, and Embeddings from Language Model on the classification of toxic comments and also the future scope of the research.

PDF

References

Digital 2023: India –DataReportal-Global Digital Insights, february 2023 (online). Available: https://datareportal.com/reports/digital-2023-india.

Digital Around the World – DataReportal (online). Available: https://datareportal.com/global-digital-overview.

B. Gamback and U. K. Sikdar, ‘‘Using convolutional neural networks to classify hate-speech,’’ in Proc. 1st Workshop Abusive Lang. Online, 2017, pp. 85–90.

M. Ibrahim, M. Torki, and N. El-Makky, ‘‘Imbalanced toxic comments classification using data augmentation and deep learning,’’ in Proc. 17th IEEE Int. Conf. Mach. Learn. Appl. (ICMLA), Dec. 2018, pp. 875–878.

M. A. Saif, A. N. Medvedev, M. A. Medvedev, and T. Atanasova, ‘‘Classification of online toxic comments using the logistic regression and neural networks models,’’ AIP Conf. Proc., vol. 2048, no. 1, 2018, Art. no. 060011.

S. V. Georgakopoulos, S. K. Tasoulis, A. G. Vrahatis, and V. P. Plagianakos, ‘‘Convolutional neural networks for toxic comment classification,’’ in Proc. 10th Hellenic Conf. Artif. Intell., Jul. 2018, pp.1–6.

H. H. Saeed, K. Shahzad, and F. Kamiran, ‘‘Overlapping toxic sentiment classification using deep neural architectures,’’ in Proc. IEEE Int. Conf. Data Mining Workshops (ICDMW), Nov. 2018, pp. 1361–1366.

A.N. M. Jubaer, A. Sayem, and M. A. Rahman, ‘‘Bangla toxic comment classification (machine learning and deep learning approach),’’ in Proc. 8th Int. Conf. Syst. Modeling Adv. Res. Trends (SMART), Nov. 2019, pp. 62–66.

S. Zaheri, J. Leath, and D. Stroud, ‘‘Toxic comment classification,’’ SMU Data Sci. Rev., vol. 3, no. 1, p. 13, 2020.

M. Umer, I. Ashraf, A. Mehmood, S. Kumari, S. Ullah, and G. S. Choi, ‘‘Sentiment analysis of tweets using a unified convolutional neural network-long short-term memory network model,’’ Comput. Intell., vol. 37, no. 1, pp. 409–434, Feb. 2021.

S. Malmasi and M. Zampieri, ‘‘Detecting hate speech in social media,’’ 2017, arXiv:1712.06427.[Online]. Available: http://arxiv.org/abs/ 1712.06427.

R. Martins, M. Gomes, J. J. Almeida, P. Novais, and P. Henriques, ‘‘Hate speech classification in social media using emotional analysis,’’ in Proc. 7th Brazilian Conf. Intell. Syst. (BRACIS), Oct. 2018, pp. 61–66.

S. R. Basha, J. K. Rani, J. P. Yadav, and G. R. Kumar, ‘‘Impact of feature selection techniques in text classification: An experimental study,’’ J. Mech. Continua Math. Sci., no. 3, pp. 39–51, 2019.

S. R. Basha and J. K. Rani, ‘‘A comparative approach of dimensionality reduction techniques in text classification,’’ Eng., Technol. Appl. Sci. Res., vol. 9, no. 6, pp. 4974–4979, Dec. 2019.

S. Alam and N. Yao, ‘‘The impact of pre-processing steps on the accuracy of machine learning algorithms in sentiment analysis,’’ Comput. Math. Org. Theory, vol. 25, no. 3, pp. 319–335, Sep. 2019.

F. Rustam, I. Ashraf, A. Mehmood, S. Ullah, and G. Choi, ‘‘Tweets classification on the base of sentiments for US airline companies,’’ Entropy, vol. 21, no. 11, p. 1078, Nov. 2019.

S. Carta, A. Corriga, R. Mulas, D. Recupero, and R. Saia, ‘‘A supervised multi-class multi-label word embeddings approach for toxic comment classification,’’ in Proc. KDIR, 2019, pp. 105–112.

Hoyeon Park and Kyoung-jae Kim, “Impact of Word Embedding Methods on Performance of Sentiment Analysis with Machine Learning Techniques,” in Journal of The Korea Society of Computer and Information Vol. 25 No. 8, pp. 181-188, August 2020.

Google jigsaw’s toxic comment classification dataset: Toxic Comment Classification Challenge. Accessed: May 5, 2020.[Online]. Available: https://www.kaggle.com/c/jigsaw-toxic-commentclassification-challenge.

Wikipedia dataset: Wikipedia talk page edit dataset.[Online]. Available: https://www.kaggle.com/datasets/jigsaw-team/wikipedia-talk-labels-personal-attacks.

Hate speech detection dataset:[Online]. Available: https://www.kaggle.com/datasets/mrmorj/hate-speech-and-offensive-language-dataset.

Reuters 21578 corpus:[Online]. Available: https://www.kaggle.com/datasets/feyzazkefe/reuters21578-sgm.

Twitter dataset:[Online]. Available:https://www.kaggle.com/datasets/saurabhshahane/twitter-sentiment-dataset.

US Airline twitter dataset:[Online]. Available: https://www.kaggle.com/datasets/vedaangchopra/twitter-us-airline-sentiment-dataset.

Women's e-commerce clothing reviews:[Online]. Available: https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews.

IMDB dataset provided by Keras:[Online]. Available: https://keras.io/api/datasets/imdb