Development of a digital model for identifying fake news: Analysis of linguistic markers and contextual features
Abstract
The purpose of the study is to establish criteria for distinguishing a fake from related linguistic phenomena based on the analysis of the frequency of word usage, which will serve as the basis for a digital model of primary identification of fake news. A study of a set of N-grams and content words in the KWIC format, taking into account the context, allowed us to establish that foreign policy is the predominant topic for fake news in 2014-2021. The scientific novelty of the study lies in the following: the coincidence of contexts and the absence of unique lexemes, established in the framework of the analysis, made it possible to conclude that the similarity of non-fake and fake texts is due to the masking of the latter under typical media discourse texts, which complicates the procedure for their processing. As a result of the study, the hypothesis about the fulfillment of such a differentiating and identifying role by functional words is proved. The presence of such words helps to hide the false nature of the message and block the reader’s critical thinking. Fake news texts are characterized by the anonymity of the author and the presence of a semantic component “uncertainty”, which at the verbal level is expressed in a decrease in the proportion of personal pronouns and the predominance of impersonal and indefinite personal constructions.
References
- Баранов А. Н. Злоупотребление правом как лингвистический феномен // Язык. Право. Общество: сборник статей V международной научно-практической конференции (г. Пенза, 22-25 мая 2018 г.). Пенза: Пензенский государственный университет. 2018.
- Воронцов К. В. Фейковые новости и другие виды потенциально опасного дискурса: типология, подходы, датасеты, соревнования // Международная независимая открытая конференция по ИИ “OpenTalks.AI” (г. Москва, 3-5 февраля 2021 г.). М., 2021.
- Засорина Л. Н. Частотный словарь русского языка. М.: Русский язык, 1977.
- Кушнерук С. Л. Дискурсивный мир информационно-психологической войны: репрезентационная структура по данным корпуса // Политическая лингвистика. 2020. № 5 (83).
- Кушнерук С. Л., Курочкина М. А. Информационно-психологическая война в зарубежной медиакоммуникации: взгляд дискурсолога // Вестник Челябинского государственного университета. 2020. № 7 (441).
- Савчук С. О., Архангельский Т. А., Бонч-Осмоловская А. А., Донина О. В., Кузнецова Ю. Н., Ляшевская О. Н., Орехов Б. В., Подрядчикова М. В. Национальный корпус русского языка 2.0: новые возможности и перспективы развития // Вопросы языкознания. 2024. № 2.
- Эпштейн М. Н. Предлог «В» как философема. Частотный словарь и основной вопрос философии // Вопросы философии: научно-теоретический журнал. 2003. № 6.
- Allcott H., Gentzkow M. Social media and fake news in the 2016 election // Journal of Economic Perspectives. 2017. Vol. 31. No. 2.
- Anthony L. AntConc: Design and development of a freeware corpus analysis toolkit for the technical writing classroom // Proceedings of the International Professional Communication Conference (Limerick, 10-13 Jily 2005). Limerick, 2005.
- Choraś M., Demestichas K., Gielczyk A., Herrero A., Ksieniewicz P., Remoundou K., Urda D., Wozniak M. Advanced Machine Learning techniques for fake news (online disinformation) detection: A systematic mapping study // Applied Soft Computing. 2021. Vol. 101.
- Hassan N., Gomaa W., Khoriba G., Haggag M. Credibility detection in twitter using word n-gram analysis and supervised machine learning techniques // International Journal of Intelligent Engineering and Systems. 2020. Vol. 13.
- Kong S., Tan L., Gan K., Samsudin N. Fake news detection using deep learning // 2020 IEEE 10th symposium on computer applications & industrial electronics (ISCAIE) (Malaysia, 18-19 April 2020). Piscataway, 2020.
- Luhn H. P. A statistical approach to mechanized encoding and searching of literary information // IBM Journal of Research and Development. 1957. Vol. 1. No. 4.
- Monogarova A., Shiryaeva T., Tikhonova E. The words that make fake stories go viral: A corpus‐based approach to analyzing Russian Covid‐19 disinformation // Russian Journal of Linguistics. 2023. Vol. 27. No. 3.
- Saquete E., Tomás D., Moreda P., Martínez-Barco P., Palomar M. Fighting post-truth using natural language processing: A review and open challenges // Expert Systems With Applications. 2020. Vol. 141.
- Segalovich I. A fast morphological algorithm with unknown word guessing induced by a dictionary for a web search engine // Proceedings of the International conference of Machine Learning: Models, Technologies and Applications (MLMTA’03) (23-26 June 2003). Las Vegas, 2003. Vol. 2003.
- Wynne H., Wint Z. Content based fake news detection using n-gram models // Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services (Munich, 2-4 December 2019). Munich, 2019.
Author information
About this article
Publication history
- Received: October 14, 2024.
- Published: November 25, 2024.
Keywords
- фейковые новости
- смысловой компонент «неопределенность»
- корпусная лингвистика
- речевое воздействие
- неполнознаменательные лексемы
- fake news
- semantic component “uncertainty”
- corpus linguistics
- speech impact
- functional lexemes
Copyright
© 2024 The Author(s)
© 2024 Gramota Publishing, LLC