• Original research article
  • October 25, 2023
  • Open access

Software tools for creating and analyzing a text data bank of short electronic messages from social network users


The research aims at developing an algorithm for creating and analyzing a text data bank of short electronic messages (posts) from social networks using free software tools. The scientific novelty lies in the fact that to solve such a problem, an interdisciplinary approach is used, taking into account the latest achievements of applied and mathematical linguistics and information security, with the involvement of the current regulatory framework. In the course of the work, according to the proposed graphical model, textual research material of ca. 1.5 MB was collected using the Web Scraper plug-in; a text data bank of short electronic messages was generated, converted into a CSV format suitable for further processing; a basic analysis of this data bank was carried out using PolyAnalyst free software package, which included such procedures as the extraction of terms, entities and keywords, sentiment analysis and determination of the subject matter of texts. As a result, the functionality of the created algorithm was proven, prospects for further research were identified – working with big text data and analyzing this data to find destructive content in them.


The reported study was carried out as a part of state assignment to conduct scientific research No. FSFU-2020-0020 “Promising technologies for implementing the information function of the state and ensuring digital sovereignty”.

Author information

Alina Olegovna Loginova

Moscow State Linguistic University

Alexey Ivanovich Gorozhanov


Moscow State Linguistic University

Darya Viktorovna Aleynikova


Moscow State Linguistic University; Peoples’ Friendship University of Russia, Moscow

About this article

Publication history

  • Received: September 12, 2023.
  • Published: October 25, 2023.


  • корпусная лингвистика
  • массив текстовых данных
  • информационная безопасность
  • тексты коротких электронных сообщений
  • деструктивный контент
  • corpus linguistics
  • text data bank
  • information security
  • texts of short electronic messages
  • destructive content


© 2023 The Author(s)
© 2023 Gramota Publishing, LLC

User license

Creative Commons Attribution 4.0 International (CC BY 4.0)