Natural Language Processing of A. Chekhov’s literary texts and of their English-language translation versions basedon the methods of lemmatization and part-of-speech tagging

Doklady Bashkirskogo Universiteta. 2023. Volume 8. No. 2. pp. 59-69.

Authors


Morozkina E. A.
Ufa University of Science and Technology
32 Zaki Validi st., 450076 Ufa, Republic of Bashkortostan, Russia
Kornilova A. D.*
Ufa University of Science and Technology
32 Zaki Validi st., 450076 Ufa, Republic of Bashkortostan, Russia

Abstract


The article offers to conduct a comparative analysis of the original literary texts and of their English- language translation versions using Natural Language Processing techniques in order to identify thecoefficient of lexical diversity. It is found out that A. Chekhov's play “The Cherry Orchard” and a number of his short stories are lexically more variable than their English-language translation versions. It turned out that in A. Chekhov's play “The Cherry Orchard” verbal lexical units are used more often than the units of other major parts of speech, while in A. Chekhov's short stories nouns prevail. An attempt is made to represent in the form of linear regression the dependence of the coefficient of lexical diversity on the volume of the text.

Keywords


  • lemmatization
  • part-of-speech tagging
  • natural language processing