Semantic String Similarity for Quora Question Pairs
Quora is a question-and-answer site where questions are asked, answered, edited and organized by its community of users. Users can collaborate by editing questions and suggesting edits to answers that have been submitted by other users. This collaboration is displayed as a thread on a single question with a list of similar/related question so that users would not had to answer similar questions once again.
Quora wanted to improve their similarity recognition system. So they released their dataset publically so that a particular solution can be found out for increasing the already existing solution. The main aim of this work was to apply various Natural Language Processing (NLP) concepts for feature engineering from the given dataset and apply and compare some machine learning models such as K- Nearest neighbor, Decision Tree, Random Forest, Extra Trees, AdaBoost and Xgboost to predict the similarity. We acquired a highest accuracy of 86.26% with Extra Trees.
Keywords - Machine Learning, Natural Language Processing, Ensemble Models, Semantic String Matching