• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

HSE University Develops Tool for Assessing Text Complexity in Low-Resource Languages

An installation at the National Library of the Republic of Tatarstan celebrating the history of Tatar writing, featuring symbols from various alphabets

An installation at the National Library of the Republic of Tatarstan celebrating the history of Tatar writing, featuring symbols from various alphabets
© Wikimedia Commons

Researchers at the HSE Centre for Language and Brain have developed a tool for assessing text complexity in low-resource languages. The first version supports several of Russia’s minority languages, including Adyghe, Bashkir, Buryat, Tatar, Ossetian, and Udmurt. This is the first tool of its kind designed specifically for these languages, taking into account their unique morphological and lexical features.

According to the Institute of Linguistics of the Russian Academy of Sciences, 155 languages are spoken in Russia. Some of them are used by relatively small communities—for example, around 80,000 people speak Adyghe, while 250,000 to 350,000 people speak Buryat, Ossetian, and Udmurt. Other languages, such as Bashkir and Tatar, have more than one million native speakers. All of these languages hold official status in various republics of Russia, making it essential not only to preserve them but also to create conditions for their development, including opportunities for learning and use in education and science. 

In 2025, a Presidential Decree approving the Fundamentals of the State Language Policy of the Russian Federation was adopted. It affirms linguistic diversity and outlines a strategy for the development and practical use of the languages spoken by the peoples of Russia. One way to advance these goals is to create digital tools that make working with low-resource languages easier and more accessible.

A team of scientists at the HSE Centre for Language and Brain has developed an online text complexity calculator for quick and easy assessment of text difficulty in several minority languages, taking into account their linguistic features. The calculator is based on Textometr, a tool created by Antonina Laposhina and Maria Lebedeva for evaluating the complexity of Russian-language texts.

The calculator developed by psycholinguists at HSE University evaluates texts across several parameters: word length and frequency based on data from language corpora; the percentage of vocabulary covered by the frequency list (ie the share of words in the text that appear among the 5,000 most frequent words in the respective language); and the distribution of parts of speech within the text. In addition, the calculator considers factors such as lexical density and diversity, as well as the text's narrativity and descriptiveness.

The key innovation is the use of the Flesch Reading Ease formula, adapted separately for each language, making it possible to assess text complexity and readability more accurately. 

The Flesch score is based on the number of words, sentences, and syllables, but the original coefficients were developed for English and do not work well for structurally different languages—such as the polysynthetic Adyghe language, in which the average word is much longer. In a 2025 study, Uliana Petrunina and Nina Zdorova recalculated the formula’s coefficients specifically for Adyghe, which significantly improved the accuracy of the readability assessment.

Uliana Petrunina

'The parameters of our calculator are adapted to the structural features of each of the six low-resource languages of Russia, using text corpora as well as frequency and morphological analyses. We also adapted the classic Flesch Reading Ease score. As a result, the algorithm can be easily reconfigured for other low-resource languages, regardless of their typological characteristics,' explains Uliana Petrunina, Research Fellow at the HSE Centre for Language and Brain and one of the developers of the tool.

The tool will help create comparable stimulus materials for linguistic experiments and provide teachers with a resource for selecting high-quality educational materials by difficulty level. This solution represents an important contribution to the preservation and development of Russia’s minority languages and to supporting the country’s linguistic diversity. 

Nina Zdorova

'Our tool allows researchers and teachers to select materials based on their linguistic complexity, which is particularly important for research and education in languages with limited resources,' says Nina Zdorova, one of the creators of the tool.

Future versions are expected to include additional low-resource languages that are underrepresented in linguistics, both in Russia and beyond.

See also:

Immersion in Second Language Environment Influences Bilinguals’ Perception of Emotions

Researchers at the Cognitive Health and Intelligence Centre at the HSE Institute for Cognitive Neuroscience have discovered how bilingual individuals process emotional words in their native (first) and non-native (second) languages. It was found that the link between word meaning and bodily sensations is weaker in a second language than in a first language. However, the more a person is immersed in a language environment, the smaller this difference becomes. The article has been published in Language, Cognition and Neuroscience.

HSE Students Among Winners of Yandex High-Tech Startup Accelerator

Yandex has announced the results of its Yandex AI Startup Lab accelerator, whose final round featured 12 IT projects. Over the course of three months, their creators—students and young entrepreneurs—worked alongside the company’s experts to develop their products. Four startups in digital marketing, medicine, and robotics were named the best, with their teams receiving cash prizes and cloud resource grants. Among them was Gradius, a startup founded by students from HSE University.

HSE to Bring Together Experts from World’s Leading Universities in St Petersburg

From May 18 to 22, 2026, HSE University–St Petersburg will become a hub of global academic dialogue. International Partners’ Week will bring together more than 100 delegates from 47 universities across 20 countries.

‘Any Real-Economy Company Can Use Our Products’

The HSE Centre for Financial Research and Data Analytics combines fundamental and applied work, including in areas unique to Russia such as the connection between sentiment in the media and social networks and financial markets. The HSE News Service spoke with the centre’s director, Professor Tamara Teplova, about its work.

Researchers Find More Effective Approach to Revealing Majorana Zero Modes in Superconductors

An international team of researchers, including physicists from HSE MIEM, has demonstrated that nonmagnetic impurities can help more accurately reveal Majorana zero modes—quantum states considered promising building blocks for quantum computing. The researchers found that these impurities shift the energy levels that typically obscure the Majorana signal, while leaving the mode itself largely unaffected, thereby making its spectral peak more distinct. The study has been published in Research.

New Development by HSE Scientists Helps Design Reliable Electronics Faster at a Lower Cost

Scientists from HSE MIEM have developed a new approach to modelling electrothermal processes in high-power electronic circuits on printed circuit boards (PCB). The method allows engineers to quickly and accurately predict how electronic components heat up during operation, helping prevent overheating and potential failures. The results have been published in Russian Microelectronics.

The Future of Cardiogenetics Lies in Artificial Intelligence

Researchers from the AI and Digital Science Institute at the HSE Faculty of Computer Science have developed a program capable of analysing regions of the human genome that were previously inaccessible for accurate interpretation in genetic testing. The program adapts large generative AI (GenAI) models for cardiogenetics to predict how specific mutations affect the function of individual genes.

HSE and Peking University Discuss Prospects for Expanding Cooperation

In Moscow, HSE University’s leadership met with a delegation from Peking University headed by its President, Gong Qihuang. During the meeting, the parties agreed to strengthen partnership between the two universities within the framework of the Cross Years of Russian–Chinese Cooperation in Education (2026–2027).

'Where Accurate Prediction of the Outcome Is Impossible, Stochastic Methods Come into Play'

The Laboratory of Stochastic Analysis and its Applications at HSE University studies systems and events in which randomness plays a central role. The goal is to predict various phenomena and how they evolve over time. The HSE News Service interviewed the laboratory's head Vladimir Panov and its academic supervisor Valentin Konakov.

HSE Researchers: Young Russians Have Sufficient Knowledge About Money but Lack Money Management Skills

Adolescents and young adults in Russia today are well versed in financial terminology: they know what bank cards, loans, interest rates, and online payments are. However, as researchers at HSE University have found, real money-management skills remain poorly developed among most young people. The study ‘Financial Literacy, Financial Culture, and Financial Autonomy of Youth’ has been published in Monitoring of Public Opinion: Economic and Social Changes.