Testing And Applying Tools To Develop DOGRI-Hindi SMT System
The foundation of statistical analysis of any languages is the accessibility to the substantial corpus. We are dealing with Statistical Machine Translation System and require extensive sentence-aligned parallel corpus. Various parallel corpora do exist yet because of privacy rights or other lawful issues these are not shared by the engineers. So we are building up our own particular Dogri-Hindi sentence aligned parallel corpus. In this paper we are examining the different methodologies utilized by various specialists to create monolingual and bilingual parallel corpora with their favourable circumstances and confinements, instruments and procedures utilized by them in corpus development. We have automated some portion of corpus development and rest of the work is being done manually. We are taking written content from different sources translating and aligning it. Hindi text is being translated into Dogri text by utilizing existing machine translation system. In this paper we discussed about the approach applied by us in the development of Dogri-Hindi sentence-aligned parallel corpus.
Keywords: Parallel Corpus, Spell Checker, Translator