HSE University Develops Tool for Assessing Text Complexity in Low-Resource Languages
,_interior_77.jpg)
Researchers at the HSE Centre for Language and Brain have developed a tool for assessing text complexity in low-resource languages. The first version supports several of Russia’s minority languages, including Adyghe, Bashkir, Buryat, Tatar, Ossetian, and Udmurt. This is the first tool of its kind designed specifically for these languages, taking into account their unique morphological and lexical features.
According to the Institute of Linguistics of the Russian Academy of Sciences, 155 languages are spoken in Russia. Some of them are used by relatively small communities—for example, around 80,000 people speak Adyghe, while 250,000 to 350,000 people speak Buryat, Ossetian, and Udmurt. Other languages, such as Bashkir and Tatar, have more than one million native speakers. All of these languages hold official status in various republics of Russia, making it essential not only to preserve them but also to create conditions for their development, including opportunities for learning and use in education and science.
In 2025, a Presidential Decree approving the Fundamentals of the State Language Policy of the Russian Federation was adopted. It affirms linguistic diversity and outlines a strategy for the development and practical use of the languages spoken by the peoples of Russia. One way to advance these goals is to create digital tools that make working with low-resource languages easier and more accessible.
A team of scientists at the HSE Centre for Language and Brain has developed an online text complexity calculator for quick and easy assessment of text difficulty in several minority languages, taking into account their linguistic features. The calculator is based on Textometr, a tool created by Antonina Laposhina and Maria Lebedeva for evaluating the complexity of Russian-language texts.
The calculator developed by psycholinguists at HSE University evaluates texts across several parameters: word length and frequency based on data from language corpora; the percentage of vocabulary covered by the frequency list (ie the share of words in the text that appear among the 5,000 most frequent words in the respective language); and the distribution of parts of speech within the text. In addition, the calculator considers factors such as lexical density and diversity, as well as the text's narrativity and descriptiveness.
The key innovation is the use of the Flesch Reading Ease formula, adapted separately for each language, making it possible to assess text complexity and readability more accurately.
The Flesch score is based on the number of words, sentences, and syllables, but the original coefficients were developed for English and do not work well for structurally different languages—such as the polysynthetic Adyghe language, in which the average word is much longer. In a 2025 study, Uliana Petrunina and Nina Zdorova recalculated the formula’s coefficients specifically for Adyghe, which significantly improved the accuracy of the readability assessment.
Uliana Petrunina
'The parameters of our calculator are adapted to the structural features of each of the six low-resource languages of Russia, using text corpora as well as frequency and morphological analyses. We also adapted the classic Flesch Reading Ease score. As a result, the algorithm can be easily reconfigured for other low-resource languages, regardless of their typological characteristics,' explains Uliana Petrunina, Research Fellow at the HSE Centre for Language and Brain and one of the developers of the tool.
The tool will help create comparable stimulus materials for linguistic experiments and provide teachers with a resource for selecting high-quality educational materials by difficulty level. This solution represents an important contribution to the preservation and development of Russia’s minority languages and to supporting the country’s linguistic diversity.
Nina Zdorova
'Our tool allows researchers and teachers to select materials based on their linguistic complexity, which is particularly important for research and education in languages with limited resources,' says Nina Zdorova, one of the creators of the tool.
Future versions are expected to include additional low-resource languages that are underrepresented in linguistics, both in Russia and beyond.
Nina Zdorova
See also:
Scientists Discover Why Europium 'Misbehaves'
Europium is a rare-earth metal responsible for the pure red glow in displays and other luminescent materials. For a long time, however, it refused to emit light when surrounded by certain organic molecules known as acylpyrazolone ligands. Chemists have now uncovered the reason: in europium complexes with these ligands, a 'black window' appears—a charge-transfer state in which the energy absorbed by the ligand is dissipated as heat rather than emitted as light. Understanding this mechanism opens the way to designing more efficient red-emitting materials for displays, fluorescent thermometers, and chemical sensors. The results have been published in Dalton Transactions.
HSE Economists Reveal How the Wage Gap Emerges Among Vocational School Graduates
HSE researchers examined the careers of 600,000 graduates of Russian secondary vocational education programmes and found that at the start of their careers, the gender wage gap reaches 23%, doubling after three years. This disparity is largely due to male and female students choosing different occupations when enrolling in vocational schools. These were the findings made by Sergey Roshchin, Natalya Yemelina, and Ksenia Rozhkova from of the HSE Faculty of Economic Sciences. The article has been published in Educational Studies.
HSE Researchers Make Aldehydes Perform Dual Function
Chemists from HSE University have discovered a way to carry out a reductive addition reaction without using an external reducing agent. Instead, the required 'resource' is supplied by the aldehyde itself, one of the reaction participants. This approach helps prevent unwanted side reactions, reduces toxicity, and simplifies the production and synthesis of organic molecules, including those used in the manufacture of medicines. The study has been published in Journal of Catalysis.
HSE Scientists Explain Why Findings in Autism Research Differ
Researchers from the Cognitive Health and Intelligence Centre at HSE University conducted the first-ever systematic review of studies on the specifics of emotion-from-motion perception in autism. The review showed that differences found between autistic and non-autistic individuals are largely associated with the experimental design and the types of tasks given to study participants. The review findings have been published in Research in Autism.
Tremors: Scientists Develop Method for Real-Time Tracking of Hazardous Underground Vibrations
Researchers from HSE MIEM and IPKON RAS have developed a new mathematical monitoring model that can identify the source of hazardous underground vibrations in real time. The technology could help reduce the risk of damage to buildings, roads, and other infrastructure located near quarries and mining sites. The paper has been published in Russian Mining Industry.
Education in a Changing World: Russian–Chinese Dialogue in Beijing
How are universities, vocational education systems, and researcher training programmes evolving in response to new challenges? These questions were at the heart of the international forum ‘Reconfiguring Education in a Changing World: Russia–China Dialogues on Higher Education, Skills, and Research Training’, held in China in mid-June. The event was jointly organised by the HSE Institute of Education and the Graduate School of Education of Peking University, which hosted the forum on its campus.
‘In Science, You Are Your Own Boss’
Polina Nasledskova is interested in identifying gaps in linguistics and topics that have been overlooked by other researchers. In an interview for the Young Scientists of HSE University project, she spoke about rare ordinal numerals in Nakh-Daghestanian languages, the benefits of knitting for concentration, and the beauty of the Patriarshy Bridge.
HSE Researchers Determine Which Internet Users Are More Likely to Fact-Check
Researchers at HSE University examined the strategies employed by Russian internet users to verify unreliable information and the factors that motivate them to do so. The study found that more than half of users who encounter potentially false information online attempt to verify it by locating the original source. The likelihood of fact-checking is influenced by several factors, including age, place of residence, social status, information literacy skills, and the use of AI. The findings have been published in Monitoring of Public Opinion: Economic and Social Changes.
Tabular Data Anonymisation Solution for Safe Use in AI Systems Developed at HSE University
The AI and Digital Science Institute at the HSE Faculty of Computer Science has developed a tabular data anonymisation service designed to prepare corporate datasets for use in analytics and AI applications. The solution can identify personal data in structured datasets, apply consistent and reproducible anonymisation rules, and generate the artifacts required for quality control, auditing, and subsequent use of data in secure environments.
Population Lifespan Is Governed by Mathematical Laws
Researchers at HSE University and MSU have established a universal law governing the time to extinction of a population in a random environment. Their analysis of the evolution of branching processes—complex probabilistic systems—shows that, regardless of the initial population size, extinction follows strict mathematical laws. The results have been published in the Journal of Applied Probability.


