Identification of “toxicity” in social networks based on the semantic proximity criterion

Ekaterina Vladimirovna Kurganskaia; Natalia Valentinovna Stepanova

doi:10.30853/phil20240231

Original research article
May 27, 2024
Open access

Identification of “toxicity” in social networks based on the semantic proximity criterion

E. V. Kurganskaia
N. V. Stepanova

Abstract

The aim of the research is to check the effectiveness of the method of automatic identification of “toxic” comments of users in social networks based on semantic proximity. The article carries out a linguistic analysis of examples of “toxic” behavior, defines the criteria of “toxicity” and the main lexical and stylistic features of “toxic” texts. The analysis of the latest works on the topic gives a general idea of the current methods of identifying “toxicity”. A solution for identifying “toxic” comments based on the idea of the lack of semantic proximity between the text of the post and the “toxic” comment is tested. The scientific novelty lies in the fact that the work proposes for the first time to use the criterion of semantic proximity to identify “toxic” comments, which is a fairly simple and effective solution. Moreover, such studies have not been conducted earlier within the framework of the most popular Russian-language social network VKontakte. As a result of the research, it was found that determining the semantic proximity between a post and a comment is a fairly effective way to determine the relevance of a comment and, consequently, its probable “toxic” connotation. It was also found that the cosine similarity metric is suitable for conducting experiments to identify “toxicity”, but to improve the results, it can be supplemented with other machine learning methods.

References

Арутюнова Н. Д. Дискурс // Лингвистический энциклопедический словарь / отв. ред. В. Н. Ярцева. М.: СЭ, 1990.
Буряковская В. А., Дмитриева О. А. Квазинаучный термин «токсичный» в современной блогосфере (на материале русского, английского и французского языков) // Известия Волгоградского государственного педагогического университета. 2022. № 5 (168).
Галичкина Е. Н. Специфика компьютерного дискурса на английском и русском языках: на материале жанра компьютерных конференций: дисс. … к. филол. н. Астрахань, 2001.
Грибовод Е. Г. Дискурс // Дискурс-Пи. 2013. Т. 10. № 3.
Ефанова А. А., Осокин А. А. Дискурс социальных медиа: к проблеме интерпретации // Вопросы теории и практики журналистики. 2022. Т. 11. № 3.
Ионова С. В. Токсичный руководитель: лингвоэкология речевого поведения // Экология языка и коммуникативная практика. 2018. № 4.
Карасик В. И. Жанры сетевого дискурса // Жанры речи. 2019. № 1 (21).
Красных В. В. Этнопсихолингвистика и лингвокультурология: курс лекций. М.: Гнозис, 2002.
Лутовинова О. В. Лингвокультурологические характеристики виртуального дискурса. Волгоград: ВГПУ; Перемена, 2009.
Овинова Л. Н., Шрайбер Е. Г. «Токсичное» педагогическое общение: анализ состояния, причины и признаки // Вестник Южно-Уральского государственного университета. Серия: Образование. Педагогические науки. 2022. Т. 14. № 3.
Павлов М. А. Понятие сетевого дискурса в современной лингвистике // Наука и образование: новое время. 2017. № 1.
Платонов Е. Н., Руденко В. Ю. Выявление и классификация токсичных высказываний методами машинного обучения // Моделирование и анализ данных. 2022. Т. 12. № 1.
Русанов Е. К. Интернет-дискурс в дискурсивной парадигме // Гуманитарные юридические исследования. 2016. № 1.
Рябова А. С. Лингвистические особенности англоязычного дискурса социальных сетей // Огарёв-Online. 2020. № 6 (143)
Сундиев И. Ю., Смирнов А. А. «Токсичный» контент в сети Интернет и его влияние на радикализацию молодежи // Научный портал МВД России. 2020. № 4 (52).
Ушаков А. А. Интернет-дискурс как особый тип речи // Вестник Адыгейского государственного университета. Серия 2: Филология и искусствоведение. 2010. № 4.
Юртаева Е. С. Характеристики виртуальной языковой личности в коммуникативном пространстве Интернет-дискурса // Иностранные языки в контексте межкультурной коммуникации: материалы докладов VIII международной конференции. Саратов, 2016.
Aken B. van, Risch J., Krestel R., Löser A. Challenges for Toxic Comment Classification: An In-Depth Error Analysis // Proceedings of the 2nd Workshop on Abusive Language Online (ALW2) / ed. by D. Fišer, R. Huang, V. Prabhakaran, R. Voigt, Z. Waseem, J. Wernimont. Brussels, 2018. https://doi.org/10.18653/v1/W18-5105
Andrusyak B., Rimel M., Kern R. Detection of Abusive Speech for Mixed Sociolects of Russian and Ukrainian Languages // Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2018. Karlova Studánka, 2018.
Bakarov A., Gureenkova O. Automated Detection of Non-Relevant Posts on the Russian Imageboard “2ch”: Importance of the Choice of Word Representations // Analysis of Images, Social Networks and Texts. AIST 2017 / ed. by W. M. P. van der Aalst, D. I. Ignatov, M. Khachay, S. O. Kuznetsov, V. Lempitsky, I. A. Lomazova, N. Loukachevitch, A. Napoli, A. Panchenko, P. M. Pardalos, A. V. Savchenko, S. Wasserman. Cham: Springer, 2017. https://doi.org/10.1007/978-3-319-73013-4_2
Hao L., Weiguan M., Hanyan L. Toxic Comment Detection and Classification. 2018. https://cs229.stanford.edu/proj2019spr/report/71.pdf
Khieu K., Narwal N. Detecting and Classifying Toxic Comments. 2019. https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1184/reports/6837517.pdf
Risch J., Krestel R. Toxic Comment Detection in Online Discussions // Deep Learning-Based Approaches for Sentiment Analysis / ed. by Dr. B. Agarwal, Dr. R. Nayak, Dr. N. Mittal, Prof. S. Patnaik. Singapore: Springer, 2020.
Smetanin S. Toxic Comments Detection in Russian // Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2020” (Moscow, June 17-20). Moscow, 2020.

Author information

Ekaterina Vladimirovna Kurganskaia

Saint Petersburg Electrotechnical University “LETI”

https://orcid.org/0009-0007-9084-597X

Natalia Valentinovna Stepanova

PhD

Saint Petersburg Electrotechnical University “LETI”

https://orcid.org/0000-0002-0920-753X

About this article

Publication history

Received: February 26, 2024.
Published: May 27, 2024.

Keywords

токсичность в социальных сетях
релевантность комментариев
семантическая близость
векторные вложения слов
toxicity in social networks
relevance of comments
semantic proximity
word vector embeddings

Copyright

User license

Creative Commons Attribution 4.0 International (CC BY 4.0)