Fragmenting Queries in Search Engines to Improve Performance
In search engines, documents are retrieved using a query composed of terms. The more relevant documents we find on the top of the list, the better is the performance or the precision of the search engine. In this paper, we devised a new technique called Fragmenting Queries technique (FQT) that improves the precision by fragmenting the original query and generating from the fragments additional queries. The generated queries are composed of the terms combined in different ways. The documents retrieved from the generated queries are assessed using a heuristic to determine the non-relevant ones. These non-relevant documents are moved down the list to improve precision. The experiments are done on a test collection that contains a collection of documents, a collection of queries and a relevance judgment list for the queries. The test collection used is the WT2g which is composed of nearby 250,000 documents. The models used are vector and probabilistic models using the TFIDF, BM25, and DFR-BM25 weighing techniques. The new technique’s performance is compared against the performance of the original query and assessed based on the measurement of recall and precision ratios.
Keywords - Information Retrieval, Vector Model, Probabilistic Model.