Developing TF IDF Vector Space Model (VSM) Algorithm for Information Retrieval from Indonesia Translation Version of Al Qur’an
Information Retrieval (IR) is a search for information that is usually in a text document. In this study, discussing IR against the Al Quran Indonesian translation consisting of 6236 verses and is a Muslim guideline so that the information contained in the Qur'an is very important for a Muslim. Corpus synonyms (thesaurus) were formed to support information retrieval so that search results became wider. Method used is the TF-IDF Vector Space Model (VSM) with the development of keyword weighting and query processes, namely the results of queries that are ranked first in the search retrieval result are queries for the next search process. Cosine similarity is used for document similarity calculations. The formation of a corpus synonym database (thesaurus) is done by developing a system so that it can be done automatically. In the testing phase, it is done by entering keywords using 1 word and 2 words or more (a sentence). The percentage of success of testing using 1 word reaches 100%. The success of search testing uses more than 1 word or a sentence, in the top 10 rankings of documents found, success reaches 95.6%. This research has proven that information retrieval by using corpus synonyms (thesaurus), and the addition of word weights from the first keyword sought to addrelevant level, because it significantly expands the search results and eliminates irrelevant documents.
Keywords - Alquran, Corpus, Information Retrieval, TF IDF, VSM, Cosine Similarity, Tesaurus