- Issue
- Journal of Siberian Federal University. Humanities & Social Sciences. 2025 18 (10)
- Authors
- Epimakhova, Aleksandra S.; Kokanova, Elena S.
- Contact information
- Epimakhova, Aleksandra S. : Northern (Arctic) Federal University Arkhangelsk, Russian Federation; ; ORCID: 0009-0001-0282-5239; Kokanova, Elena S.: Northern (Arctic) Federal University Arkhangelsk, Russian Federation; ORCID: 0000-0001-6623-5636
- Keywords
- Tundra Nenets language; low-resource language; digital resources; parallel corpus
- Abstract
Information technology development lead to classifying languages into low- resourced and high-resourced ones. Nowadays, this depends on their presence in the digital environment and existence of natural language processing (NLP) tools. This paper aims at researching Nenets resources found when assembling a parallel Tundra Nenets corpus (Forest Nenets being classified as a separate language). The analysis shows that data become more diverse, with translated laws and religious texts, as well as media texts published online. Original Nenets texts are fiction texts, but it is not always possible to define the source and target language in a bilingual publication. Online Nenets-Russian Dictionary is the first digital service for this language in Russia. At present, no NLP libraries exist for the Nenets language, and their development depends on harvesting, structuring and labelling data, which requires collaboration of NLP and language speaking communities
- Pages
- 1924–1931
- EDN
- AJZMDG
- Paper at repository of SibFU
- https://elib.sfu-kras.ru/handle/2311/157509
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).