MORPHOLOGICALITY AND CORPUS-BASED TAGGING MODELS IN TURKIC LANGUAGES: A PROJECT FOR THE CREATION OF A CORPUS FOR THE KARAKALPAK LANGUAGE

Authors

  • Abdimurat Yesemuratov PhD in Philological Sciences, Independent Researcher
  • Raushan Aymbetova Teacher of Karakalpak Language and Literature, Karakalpak Academic Lyceum, Ministry of Internal Affairs, Republic of Uzbekistan

DOI:

https://doi.org/10.5281/zenodo.16757516

Keywords:

Karakalpak language; morphological analysis; corpus linguistics; Turkic languages; agglutinative languages; morphological tagging; low-resource languages; NLP; transfer learning; morphological tags.

Abstract

In the context of the digital transformation of linguistics and the rapid advancement of natural language processing (NLP) technologies, the development of morphological resources for low-resource languages has become a crucial task in applied linguistics. This study explores the possibilities of designing morphological corpora and tagged models for the Karakalpak language—an agglutinative Turkic language that remains unrepresented in digital linguistic repositories.

References

Veitsman, Y., & Hartmann, M. (2025). Recent advancements and challenges of Turkic Central Asian language processing. In Proceedings of the Workshop on NLP for Low-resource Languages (LoResLM 2025). Association for Computational Linguistics. https://aclanthology.org/2025.loreslm-1.25.pdf

Tukeyev, U. (2025). Morphological segmentation method for Turkic language neural machine translation. arXiv preprint. https://www.researchgate.net/publication/347838494

Çöltekin, Ç. (2022). Resources for Turkish natural language processing. Natural Language Engineering, 28(4), 543–566. https://doi.org/10.1017/S1351324921000382

Isbarov, J., Akhmedov, M., & Temirov, S. (2025). TUMLU: A unified and native language understanding benchmark for Turkic languages. arXiv. https://arxiv.org/abs/2502.11020

Turganbaeva, P. N. (2022). Ways of word formation in the Karakalpak language. Indiana Journal of Multidisciplinary Research, 2(1), 11–13. https://indianapublications.com/articles/IJMR_2%281%29_11-13_6258d0fab95314.63508070.pdf

Yazar, T., Kutlu, M., & Bayırlı, O. (2025). Diachronic resources for the fast evolving Turkish language. Language Resources and Evaluation. https://link.springer.com/article/10.1007/s10579-025-09857-w

Surrey Morphology Group. (2023–2025). Comparative morphosyntactic research on Turkic languages. University of Surrey. https://www.smg.surrey.ac.uk/projects

Otemisov, A. Z., & Esemuratov, A. E. (2024). The need to digitize Karakalpak language: problems and solutions. In Models and Methods in Modern Science: International Scientific Online Conference (MMMS-1103). https://doi.org/10.5281/zenodo.12670228

Downloads

Published

2025-08-06

How to Cite

Yesemuratov, A., & Aymbetova, R. (2025). MORPHOLOGICALITY AND CORPUS-BASED TAGGING MODELS IN TURKIC LANGUAGES: A PROJECT FOR THE CREATION OF A CORPUS FOR THE KARAKALPAK LANGUAGE. Development and Innovations in Science, 4(9), 5-13. https://doi.org/10.5281/zenodo.16757516