• Original research article
  • September 16, 2024
  • Open access

Inter-rater agreement in annotating text world elements in the TextWorlds corpus


From the perspective of Text World Theory, narratives contain elements (indications of time, place, characters, etc.) that can be automatically identified and compared to establish versions of events and similar plots based on these elements. We have annotated a corpus of fairy tales and short stories, TextWorlds, and discovered that raters do not always agree on whether a particular word refers to a character, time, or place of action. The aim of the research is to determine the degree of inter-rater agreement regarding the position of these narrative categories in the text. The practical task of the research is to assess the reliability of the annotation that will be used to train algorithms for automatically identifying text worlds. The scientific novelty lies in the fact that we are specifically studying the degree of agreement, whereas in other works, agreement is taken for granted, and if raters disagree with each other, it is perceived as an error by one of the raters or the annotation procedure. In this paper, we present the results of two expert agreement metrics: percent agreement and Krippendorff’s alpha. The obtained results for these metrics show that agreement regarding different elements varies depending on the work and sometimes reaches an average level, sufficient to speak of the reliability of the annotation.


  1. Бахтин М. М. Вопросы литературы и эстетики. М.: Худ. лит., 1975.
  2. Евсеев О. В., Кох А. Н., Михалькова Е. В. Сопоставительный анализ элементов текстовых миров в литературной сказке «Золушка» Ш. Перро (в переводе на русский) и одноименном киносценарии Е. Шварца // Вестник Тюменского государственного университета. Гуманитарные исследования. Humanitates. 2023. Т. 9. № 1 (33).
  3. Олейник А. Н., Попова И. П., Кирдина С. Г., Шаталова Т. Ю. Надежность и достоверность в контент-анализе текстов: выбор показателей // Психологический журнал. 2014. Т. 35. № 6.
  4. Bakeman R., Quera V., McArthur D., Robinson B. F. Detecting sequential patterns and determining their reliability with fallible observers // Psychological Methods. 1997. Vol. 2 (4).
  5. Beck C., Booth H., El-Assady M., Butt M. Representation problems in linguistic annotations: Ambiguity, variation, uncertainty, error and bias // Proceedings of the 14th Linguistic Annotation Workshop. Barcelona, 2020.
  6. Bell A., Ryan M. L. Possible Worlds Theory and Contemporary Narratology. Lincoln: University of Nebraska Press, 2019.
  7. Bird S., Klein E., Loper E. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. Sebastopol: O’Reilly Media, Inc., 2009.
  8. Cohen J. A coefficient of agreement for nominal scales // Educational and Psychological Measurement. 1960. Vol. 20 (1).
  9. Cohen J. Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit // Psychological Bulletin. 1968. Vol. 70 (4).
  10. Detkova J., Novitskiy V., Petrova M., Selegey V. Differential semantic sketches for Russian internet-corpora // Computational Linguistics and Intellectual Technologies. Papers from the Annual International Conference “Dialogue”. 2020. Vol. 19.
  11. Fleiss J. L. Measuring nominal scale agreement among many raters // Psychological Bulletin. 1971. Vol. 76 (5).
  12. Gavins J. Text World Theory: An Introduction. Edinburgh: Edinburgh University Press, 2007.
  13. Gibbons A., Whiteley S. Do worlds have (fourth) walls?: A Text World Theory approach to direct address in Fleabag // Language and Literature. 2020. Vol. 30 (2).
  14. Gwet K. L. Chapter 6 // Gwet K. L. Handbook of Inter-Rater Reliability. Gaithersburg: Advanced Analytics, LLC, 2014.
  15. Hayes A. F., Krippendorff K. Answering the call for a standard reliability measure for coding data // Communication Methods and Measures. 2007. Vol. 1 (1).
  16. Ho Y., Lugea J., McIntyre D., Wang J., Xu Z. Projecting (un)certainty: A text-world analysis of three statements from the Meredith Kercher murder case // English Text Construction. 2018. Vol. 11 (2).
  17. Ho Y.-F., Lugea J., McIntyre D., Xu Z., Wang J. Text-world annotation and visualization for crime narrative reconstruction // Digital Scholarship in the Humanities. 2019. Vol. 34 (2).
  18. Honnibal M., Montani I., Van Landeghem S., Boyd A. spaCy: Industrial-Strength Natural Language Processing in Python // Zenodo. 2020. https://dx.doi.org/10.5281/zenodo.1212303
  19. Hu Y., Mao H., McKenzie G. A natural language processing and geospatial clustering framework for harvesting local place names from geotagged housing advertisements // International Journal of Geographical Information Science. 2019. Vol. 33 (4).
  20. Jean-Yves A., Villaneau J., Lefeuvre A. Weighted Krippendorff’s alpha is a more reliable metrics for multi-coders ordinal annotations: Experimental studies on emotion, opinion and coreference annotation // Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. Gothenburg, 2014.
  21. Krippendorff K. Content Analysis: An Introduction to Its Methodology. Los Angeles, 2018.
  22. Landis J. R., Koch G. G. The measurement of observer agreement for categorical data // Biometrics. 1977. Vol. 33 (1).
  23. Mikhalkova E., Protasov T., Drozdova A., Bashmakova A., Gavin P. Towards annotation of text worlds in a literary work // Computational Linguistics and Intellectual Technologies. Papers from the Annual International Conference “Dialogue”. 2019. Vol. 18.
  24. Mikhalkova E., Protasov T., Gavin P., Bashmakova A., Drozdova A. Modelling narrative elements in a short story: A study on annotation schemes and guidelines // Proceedings of the 12th Language Resources and Evaluation Conference. Marseille, 2020.
  25. Peng S., Sun Z., Loftus S., Plank B. Different tastes of entities: Investigating human label variation in named entity annotations // The Third Workshop on Understanding Implicit and Underspecified Language. Malta, 2024.
  26. Raghunath R. Possible Worlds Theory and Counterfactual Historical Fiction. Cham: Springer Nature, 2020.
  27. Sang Y., Mou X., Li J., Stanton J., Yu M. A survey of machine narrative reading comprehension assessments // 31st International Joint Conference on Artificial Intelligence. Vienna: IJCAI, 2022.
  28. Sirinarang B., Wijitsopon R. A cognitive stylistic approach to mind style in the memoir man’s search for meaning // Journal of Studies in the English Language. 2021. Vol. 16 (1).
  29. Srivatsa S., Srinivasa S. Narrative plot comparison based on a bag-of-actors document model // Proceedings of the 29th ACM Conference on Hypertext and Social Media (HT’18) / Association for Computing Machinery. N. Y., 2018.
  30. Stockwell P. Cognitive Poetics: An Introduction. Abingdon-on-Thames: Routledge, 2020.
  31. Tinsley H. E., Weiss D. J.Interrater reliability and agreement of subjective judgments // Journal of Counseling Psychology. 1975. Vol. 22 (4).
  32. Uma A., Fornaciari T., Dumitrache A., Miller T., Chamberlain J., Plank B., Simpson E., Poesio M. SemEval-2021 Task 12: Learning with Disagreements // Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021). 2021. https://doi.org/10.18653/v1/2021.semeval-1.41
  33. Wang J., Ho Y.-F., Xu Z., McIntyre D., Lugea J. The visualisation of cognitive structures in forensic statements // 20th International Conference Information Visualisation (IV). Lisbon: IEEE, 2006. https://doi.org/10.1109/IV.2016.60
  34. Weber-Genzel L., Peng S., De Marneffe M. C., Plank B. VariErr NLI: Separating annotation error from human label variation // arXiv. 2024. https://doi.org/10.48550/arXiv.2403.01931
  35. Werth P. Accommodation and the myth of presupposition: The view from discourse // Lingua. 1993. Vol. 89 (1).
  36. Werth P. Extended metaphor – a text-world account // Language and Literature. 1994. Vol. 3 (2).
  37. Werth P. Text Worlds: Representing Conceptual Space in Discourse. Harlow: Longman, 1999.

Author information

Elena Vladimirovna Mikhalkova


European University at Saint Petersburg

About this article

Publication history

  • Received: August 3, 2024.
  • Published: September 16, 2024.


  • нарративные категории
  • теория текстовых миров
  • согласованность читателей
  • разметка художественного текста
  • метрика согласованности
  • надежность разметки
  • narrative categories
  • Text World Theory
  • inter-rater agreement
  • annotation of a literary text
  • agreement metric
  • annotation reliability


© 2024 The Author(s)
© 2024 Gramota Publishing, LLC

User license

Creative Commons Attribution 4.0 International (CC BY 4.0)