Paper Title
Improving NER Accuracy in Kazakh Texts Using Hybrid Machine Learning Models
Abstract
This research looks at the performance of different machine learning models for Named Entity Recognition (NER) in Kazakh, a low-resource language with complex linguistic patterns. We specifically evaluate the performance of Long Short-Term Memory (LSTM) networks, Conditional Random Fields (CRF), and a hybrid LSTM+TF-IDF model on the KazNERD. Our results reveal that the LSTM model obtained an accuracy of 57%, while the CRF model fared substantially better at 91%. The hybrid LSTM+TF-IDF model surpassed both, with an impressive 98% accuracy. We also look at the limitations of NER in Kazakh, particularly with ambiguous entities like "Nur-Sultan," and show how the hybrid model effectively distinguishes between different contexts. These findings demonstrate the possibility of hybrid techniques to improving NER systems for low-resource languages.
Keywords - Named Entity Recognition (NER), Kazakh language, machine learning, LSTM, CRF, hybrid models, low-resource languages, NLP, TF-IDF.