A Comprehensive Survey on Parts of Speech Tagging Approaches in Dravidian Languages
Parts of speech tagging is the act of assigning each word in a document a tag that corresponds to the meaning of the word in the particular context. It is important and act as a basic step in many natural language processing application from word sense disambiguation to speech recognition. Due to the variety in grammatical constructs and morphological differences, the approaches for tagging in different languages are widely varying. The theorotical approaches include supervised learning approaches as HMM based model, Maximum entropy model, SVM based taggers and CRF based taggers or unsupervised approaches as rule based taggers. The languages considered are morphologically rich Dravidian languages as Telugu, Malayalam, Tamil and Kannada. Different approaches are compared on their accuracy and analysis is done.
Keywords- Parts of speech tagging, Dravidian languages, Morphological analysis, Hidden Markov Model (HMM), Maximum Entropy (MaxEnt), Support Vector Machine (SVM), Conditional Random Fields (CRFs)