Digital studies of spoken speech: history, methodology, modern tools
Abstract
The aim of the research is to establish the interrelationships between the historical stages of development of speech data processing technologies and the formation of the modern methodological base of instrumental phonetics. The paper presents the evolution of the formation of instrumental approaches to the study of spoken speech – from the first mechanical devices of the 18th century to modern neural network architectures, and analyzes modern software and hardware solutions – from universal platforms for acoustic analysis to specialized systems for automatic alignment of speech data. The study includes a systematization of the methodological principles of each historical period, identifying their conceptual limitations and potential for solving current linguistic problems. The scientific novelty lies in the periodization of the development of instrumental phonetics, based on the interaction of technological capabilities and methodological concepts of each stage. As a result, three types of conceptual gaps (technological, semantic, cognitive) were identified, hindering the effective integration of modern digital technologies with traditional linguistic categories. The necessity of creating hybrid analytical platforms capable of overcoming the fragmentation between quantitative parameters of automatic processing and qualitative characteristics of phonological description is substantiated.
Research materials
- Articulate Instruments. https://www.articulateinstruments.com
- Boersma P., Weenink D. Praat: doing phonetics by computer (Version 6.1.03) (Computer software). 2025. http://www.praat.org/
- Max Planck Institute for Psycholinguistics. ELAN (EUDICO Linguistic Annotator) (Computer software). https://archive.mpi.nl/tla/elan
- PySound. https://github.com/brainteaser-ov/PySound
- Sound and Science. https://soundandscience.net/
- Speech Filing System (SFS). 1998. https://www.phon.ucl.ac.uk/resource/sfs/
- Rousselot P. J. Principes de phonétique expérimentale. P.: H. Welter, 1897.
References
- Бондарко Л. В. Фонетика современного русского языка. СПб.: Изд-во С.-Петерб. ун-та, 1998.
- Гончарова О. В. Pysound – цифровой сервис обработки и анализа звучащей речи // Фонетика сегодня: тезисы докладов IX Международной научной конференции (г. Москва, 5-7 декабря 2024 г.). М., 2024.
- Кейтер Дж. Компьютеры – синтезаторы речи. М.: Мир, 1985.
- Соломенник А. И. Технология синтеза речи: история и методология исследований // Вестник Московского университета. Серия 9: Филология. 2013. № 6.
- Трубецкой Н. С. Основы фонологии. М.: Аспент пресс, 2000.
- Фант Г. Акустическая теория речеобразования / пер. с англ. Л. А. Варшавского, В. И. Медведева; под ред. В. С. Григорьева. М.: Наука, 1964.
- Фланаган Дж. Л. Анализ, синтез и восприятие речи / пер. с англ.; под ред. А. А. Пирогова. М.: Связь, 1968.
- Щерба Л. В. О трояком аспекте языковых явлений и об эксперименте в языкознании // Известия Отделения русского языка и словесности Академии наук СССР. 1931. № 1.
- Gafni Ch. Phonetics and Phonology: An Introduction to the Science of Speech. 2025. https://www.researchgate.net/publication/388791051_Phonetics_and_Phonology_An_Introduction_to_the_Science_of_Speech
- Galazzi E. Pierre Jean Rousselot: la phonétique expérimentale au service de l’homme. Dossiers d’HEL // Linguistiques d’intervention. Des usages socio-politiques des savoirs sur le langage et les angues. 2014. https://shs.hal.science/halshs-01115159v1/document
- Gósy M. From stomatoscopy to BEA: the history of Hungarian experimental phonetics // Proceedings of 17th International Congress of Phonetic Sciences (Hong Kong, City University of Hong Kong). Hong Kong, 2011.
- Juang B., Rabiner L. Automatic Speech Recognition – A Brief History of the Technology Development. 2005. https://www.researchgate.net/publication/249888949_Automatic_Speech_Recognition_-_A_Brief_History_of_the_Technology_Development
- Kisler T., Reichel U. D., Schiel F. Multilingual processing of speech via web services // Computer Speech & Language. 2017. Vol. 45.
- Mattingly I. G. Speech Synthesis for Phonetic and Phonological Models // Current Trends in Linguistics / ed. by T. S. Sebeok. The Hague: Mouton, 1974. Vol. 12.
- McAuliffe M., Socolof M., Mihuc S., Wagner A., Sonderegger M. Montreal Forced Aligner: trainable text-speech alignment using Kaldi // Interspeech 2017: Conference Proceedings. 2017. https://doi.org/10.21437/Interspeech.2017-1386
- Oppenheim A. V., Schafer R., Yuen C. Digital Signal Processing // Systems, Man and Cybernetics. 1978. № 2.
- Peterson G. H., Barney H. L. Control methods used in a study of the vowels // Journal of the Acoustical Society of America. 1952. Vol. 24 (2).
- Stevens K. N. Acoustic phonetics. Cambridge: MIT Press, 1998.
- Stone M. A guide to analysing tongue motion from ultrasound images // Clinical linguistics and phonetics. 2005. № 19 (6-7).
- Taylor P. Text‑to‑Speech Synthesis. Cambridge: Cambridge University Press, 2009.
- Tillmann H. G. Experimental and Instrumental Phonetics: History // Encyclopedia of Language and Linguistics / ed. by K. Brown. Amsterdam, 2006.
Author information
About this article
Publication history
- Received: July 12, 2025.
- Published: August 19, 2025.
Keywords
- цифровая обработка речевого сигнала
- автоматическое распознавание речи
- инструментальные методы фонетики
- глубокие нейронные сети
- синтез и анализ речи
- digital speech signal processing
- automatic speech recognition
- instrumental methods of phonetics
- deep neural networks
- speech synthesis and analysis
Copyright
© 2025 The Author(s)
© 2025 Gramota Publishing, LLC