• Original research article
  • June 2, 2025
  • Open access

Contextual features of linguistic polysemous terms for their identification in Chinese, English and Russian

Abstract

The main aim was to develop effective methods for building a corpus designed to identify the linguistic meanings of terms in Chinese, English and Russian based on the patterns of term phrases formation. In the research, we analyzed the dictionary descriptions of linguistic polysemous terms by using a continuous sampling method to study the ways of forming such terms that are significant for identifying their contextual features. The narrow and wide contexts of terms indicating their linguistic meanings were studied on the basis of the selected corpus of the Chinese, English and Russian languages. The novelty of the study lies in obtaining a fundamentally new methodology for creating a corpus designed to recognize linguistic terms in multiple languages; similar studies have not been conducted before. As the results show, there are various patterns of phrase formation in linguistics across the three languages studied, but the most typical structure for them is “Adj. + N.”. As for contextual features, two situations were observed – the analysis of the fixed phrases of some linguistic polysemous terms can assist in recognizing the meanings of the terms immediately, while some term phrases should be used for identification, combined with the terms’ features in wide context.

References

  1. Верещагин Е. М., Костомаров В. Г. Лингвострановедческая теория слова. M.: Русский язык, 1980.
  2. Гибкий П. В., Супрунчук Н. В. Сохранение эквивалентности при передаче значения категории пассивного залога с русского языка на китайский (на материале сайтов минского тракторного завода и индустриального парка «Великий камень») // Вестник Омского государственного педагогического университета. Гуманитарные исследования. 2021. № 4 (33).
  3. Инютина Л. А., Большакова И. Д. О сопоставимости лингвистической терминологии в русском и английском языке // Успехи гуманитарных наук. 2021. № 9.
  4. Крапивник Е. В., Носенко А. О. Национально специфичные особенности метафоризации как способа терминообразования (на материале лингвистической терминологии русского и английского языков) // Гуманитарные исследования. 2021. № 1 (77).
  5. Allahyari M., Pouriyeh S., Assefi M., Safaei S., Trippe E., Gutierrez J., Kochut K. A brief survey of text mining: classification, clustering and extraction techniques. https://doi.org/10.48550/arXiv.1707.02919.
  6. Festman J. Learning and processing multiple languages: The more the easier? // Language Learning. 2021. Vol. 71 (1).
  7. Gortych-Michalak K., Grzybek J. Polysemic Terms in Chinese, German, Greek and Polish Legal Language: A Comparative Study // Comparative Legal Linguistics: International Journal for Legal Communication. 2013. Vol. 15.
  8. Hamon T., Grabar N. Linguistic approach for identification of medication names and related information in clinical narratives // Journal of the American Medical Informatics Association. 2010. Vol. 17 (5).
  9. Huang S. Application of Informatics and Electronics on a Linguistic Level for Handling Consubstantial Terms in Automatic Processing of Scientific Publications // IEEE 24th International Conference of Young Professionals in Electron Devices and Materials (EDM). 2023. https://doi.org/10.1109/EDM58354.2023.10225126
  10. Kim S., Liu L., Cao F. How does first language (L1) influence second language (L2) reading in the brain? Evidence from Korean-English and Chinese-English bilinguals // Brain and Language. 2017. Vol. 171.
  11. Laparra E., Mascio A., Velupilla S., Miller T. A review of recent work in transfer learning and domain adaptation for natural language processing of electronic health records // Yearbook of Medical Informatics. 2021. Vol. 30 (1).
  12. Liu Z., Chen Y., Tang B., Wang X., Chen Q., Li H., Wang J., Deng Q., Zhu S. Automatic de-identification of electronic medical records using token-level and character-level conditional random fields // Journal of Biomedical Informatics. 2015. Vol. 58.
  13. Pikhart M., Klímová B. Managerial computer communication: implementation of applied linguistics approaches in managing electronic communication // Advances in Computer, Communication and Computational Sciences: Proceedings of IC4S 2019. Berlin: Springer, 2020. Vol. 1158.
  14. Tsinaraki C., Velegrakis Y., Kiyavitskaya N., Mylopoulos J. A Context-Based Model for the Interpretation of Polysemous Terms // On the Move to Meaningful Internet Systems (OTM 2010). Berlin – Heidelberg: Springer, 2010. Vol. 6427. https://doi.org/10.1007/978-3-642-16949-6_20
  15. Venuti L. The Translator’s Invisibility: A History of Translation. L. – N. Y.: Routledge, 1995.
  16. Wang Y., Wang L., Rastegar-Mojarad M., Moon S., Shen F., Afzal N., Liu S., Zeng Y., Mehrabi S., Sohn S., Liu H. Clinical information extraction applications: A literature review // Journal of Biomedical Informatics. 2018. Vol. 77.
  17. Wijayasekara D., Linda O., Manic M., Rieger C. Mining building energy management system data using fuzzy anomaly detection and linguistic descriptions // IEEE Transactions on Industrial Informatics. 2014. Vol. 10 (3).
  18. Yuniarti W. D., Hartati S., Priyanta S., Surjono H. D. Utilization of linguistic data for learner assessment on e-learning: instrument and processing // Seventh International Conference on Informatics and Computing (ICIC). 2022. https://doi.org/10.1109/icic56845.2022.10006977
  19. 荀恩东, 饶高琦, 肖晓悦, 臧娇娇. 大数据背景下BCC语料库的研制 // 语料库语言学. 2016. № 1 (Xun E., Rao G., Xiao X., Zang J. The construction of the BCC Corpus in the age of Big Data // Corpus Linguistics. 2016. Vol. 3 (1)).

Author information

Shanglong Huang

Novosibirsk State University

About this article

Publication history

  • Received: November 14, 2024.
  • Published: June 2, 2025.

Keywords

  • лингвистические многозначные термины
  • сравнение языков
  • закономерности образования терминов
  • контекстуальные признаки
  • идентификация терминов
  • linguistic polysemous terms
  • comparison of languages
  • patterns of term formation
  • contextual features
  • identification of terms

Copyright

© 2025 The Author(s)
© 2025 Gramota Publishing, LLC

User license

Creative Commons Attribution 4.0 International (CC BY 4.0)