• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

HSE University Develops Tool for Assessing Text Complexity in Low-Resource Languages

An installation at the National Library of the Republic of Tatarstan celebrating the history of Tatar writing, featuring symbols from various alphabets

An installation at the National Library of the Republic of Tatarstan celebrating the history of Tatar writing, featuring symbols from various alphabets
© Wikimedia Commons

Researchers at the HSE Centre for Language and Brain have developed a tool for assessing text complexity in low-resource languages. The first version supports several of Russia’s minority languages, including Adyghe, Bashkir, Buryat, Tatar, Ossetian, and Udmurt. This is the first tool of its kind designed specifically for these languages, taking into account their unique morphological and lexical features.

According to the Institute of Linguistics of the Russian Academy of Sciences, 155 languages are spoken in Russia. Some of them are used by relatively small communities—for example, around 80,000 people speak Adyghe, while 250,000 to 350,000 people speak Buryat, Ossetian, and Udmurt. Other languages, such as Bashkir and Tatar, have more than one million native speakers. All of these languages hold official status in various republics of Russia, making it essential not only to preserve them but also to create conditions for their development, including opportunities for learning and use in education and science. 

In 2025, a Presidential Decree approving the Fundamentals of the State Language Policy of the Russian Federation was adopted. It affirms linguistic diversity and outlines a strategy for the development and practical use of the languages spoken by the peoples of Russia. One way to advance these goals is to create digital tools that make working with low-resource languages easier and more accessible.

A team of scientists at the HSE Centre for Language and Brain has developed an online text complexity calculator for quick and easy assessment of text difficulty in several minority languages, taking into account their linguistic features. The calculator is based on Textometr, a tool created by Antonina Laposhina and Maria Lebedeva for evaluating the complexity of Russian-language texts.

The calculator developed by psycholinguists at HSE University evaluates texts across several parameters: word length and frequency based on data from language corpora; the percentage of vocabulary covered by the frequency list (ie the share of words in the text that appear among the 5,000 most frequent words in the respective language); and the distribution of parts of speech within the text. In addition, the calculator considers factors such as lexical density and diversity, as well as the text's narrativity and descriptiveness.

The key innovation is the use of the Flesch Reading Ease formula, adapted separately for each language, making it possible to assess text complexity and readability more accurately. 

The Flesch score is based on the number of words, sentences, and syllables, but the original coefficients were developed for English and do not work well for structurally different languages—such as the polysynthetic Adyghe language, in which the average word is much longer. In a 2025 study, Uliana Petrunina and Nina Zdorova recalculated the formula’s coefficients specifically for Adyghe, which significantly improved the accuracy of the readability assessment.

Uliana Petrunina

'The parameters of our calculator are adapted to the structural features of each of the six low-resource languages of Russia, using text corpora as well as frequency and morphological analyses. We also adapted the classic Flesch Reading Ease score. As a result, the algorithm can be easily reconfigured for other low-resource languages, regardless of their typological characteristics,' explains Uliana Petrunina, Research Fellow at the HSE Centre for Language and Brain and one of the developers of the tool.

The tool will help create comparable stimulus materials for linguistic experiments and provide teachers with a resource for selecting high-quality educational materials by difficulty level. This solution represents an important contribution to the preservation and development of Russia’s minority languages and to supporting the country’s linguistic diversity. 

Nina Zdorova

'Our tool allows researchers and teachers to select materials based on their linguistic complexity, which is particularly important for research and education in languages with limited resources,' says Nina Zdorova, one of the creators of the tool.

Future versions are expected to include additional low-resource languages that are underrepresented in linguistics, both in Russia and beyond.

See also:

Neural Network Maps as a Method for Constructing Mathematical Models

Scientists from HSE University–Nizhny Novgorod and the Institute of Physics Belgrade, Serbia, are jointly exploring the application of machine learning techniques and neural networks to the study of nonlinear dynamics. Natalya Stankevich, Leading Research Fellow at the Laboratory of Topological Methods in Dynamics of the Faculty of Informatics, Mathematics, and Computer Science at HSE University–Nizhny Novgorod, spoke to the HSE News Service about this international project.

HSE Scientists Develop Method to Compress Large Language Models Without Losing Quality

Researchers from the AI and Digital Science Institute at the HSE Faculty of Computer Science have developed a new compression method for large language models such as GPT and LLaMA that reduces their size by 25–36% without additional training or significant loss of accuracy. This is the first approach to use mathematical transformations—specifically, rotations of model weights—to make models more amenable to compression with structured matrices. The study results have been published in ACL Findings 2025. The code is available on GitHub.

Machine Learning Models Can Help Reduce Volatility and Boost Stock Market Returns

The use of machine learning models makes it possible to achieve greater accuracy in predicting risks in the Russian stock market compared to classical econometric approaches. The predictive power of these models increases by 23%, while the average investor’s return can reach up to 13% per annum. These conclusions were drawn by Nikita Lysenok from the Department of Financial Market Infrastructure at the HSE Faculty of Economic Sciences. The paper has been published in Fundamental and Applied Mathematics.

Pocket Money, Personal Interest, and Family Practices: What Shapes Students’ Economic Literacy?

University students' economic literacy depends not only on their field of study but also on their interest in economics, the learning environment, and family financial practices. For example, students who received pocket money irregularly tend to perform better on economic literacy tests than their peers who received financial support on a regular basis. These findings come from a study conducted by HSE University involving more than 1,100 students from five Russian universities. The findings have been published in Cakrawala Pendidikan.

Creative Work as a Remedy for Burnout

The creative, supportive atmosphere and innovative methods at the Centre for Sociocultural Research make it appealing to early-career scholars. Over years of working at HSE University, they grow into researchers and lecturers recognised both in Russia and abroad. Chief Research Fellow Zarina Lepshokova and Leading Research Fellow Ekaterina Bushina spoke about their journey at the centre and at HSE, their research, and the role of mentors in their academic success.

HSE Study Reveals Imbalance in the Generative AI Market

Researchers at HSE University analysed how effectively the global generative artificial intelligence market converts investment into real revenue, concluding that AI is currently developing faster than it is paying off. The results have been published in the journal Foresight and STI Governance.

‘Entering Robotics Now Means Growing with the Area’

Unmanned vehicles, courier robots, and smart speakers are rapidly becoming a part of our lives. In 2026, the HSE Faculty of Computer Science opens its new Bachelor’s Programme ‘Design of Intelligent Robotic Systems’ (DIRS). It will train specialists at the intersection of IT, artificial intelligence, and robotics. Academic Supervisor of DIRS Vadim Morgachev explains how studies are organised and why graduates of the programme ‘will definitely be accepted into the future.’

HSE Scientists Train Neural Network to 'Hear' Faults in Electric Motors

Researchers at the AI and Digital Science Institute of the HSE Faculty of Computer Science have developed a new method—the Signature-Guided Data Augmentation (SGDA) framework—that achieves 99% accuracy in motor fault detection and 86% accuracy in fault classification. The application of this approach can reduce industrial equipment repair costs, minimise downtime, and improve production safety. The study results have been published in Engineering Applications of Artificial Intelligence.

HSE and Sultan Qaboos University: Expanding Academic and Research Partnership

In May 2026, a delegation from Sultan Qaboos University (Oman) made an official visit to HSE University. The main purpose of the meeting was to discuss new forms of cooperation and partnership and to identify areas of mutual interest. Representatives of HSE’s Moscow, St Petersburg, and Nizhny Novgorod campuses spoke about their shared interest in expanding joint projects, primarily in the field of artificial intelligence—a strategic development priority for both universities.

HSE and Peking University Expand Cooperation in Civil Society Research

A seminar involving Russian and Chinese scholars on cooperation between the state and non-profit organisations was held in Beijing. Participants discussed the evolution of institutions, cooperation practices, and challenges facing the development of the non-profit sector, and also presented the results of their research. The meeting concluded with agreements to expand joint projects and academic exchanges.