Detecting Context Similarity Over Multiple Documents using Linguistic Features
Presenting a method of comparing multiple documents and determining if both documents have the same context. Context are defined by the setting of an event, statement, and ideas in which each document can be understood and assessed. The language handled by this study, will be in the Indonesian language. 3 different approaches will be used to determine context similarity. 2 of the approaches would be adopted from TF*IDF method, while the last one would extract information via evaluating Indonesian language’s forms. By using these methods, keywords would be automatically generated in the algorithm, requiring minimal human participation to get the desired result. Evaluation of the algorithm shows discerning results between matching and nonmatching documents.
Index Terms- Context Similarity, Linguistic, Similarity of Documents