Dimensionality Reduction for Classification of Filipino Text Documents Based on Improved Bayesian Vectorization Technique
Dimensionality reduction of feature vector size plays a vital role in enhancing the text processing capabilities to reduce the size of the feature vector used in the mining tasks to achieve a higher classification accuracy. While dimensionality reduction for text classification is becoming a great area of research in most languages, Filipino documents have received little or no attention from researchers. Thus, this paper addresses the issue of dimensionality reduction in representing relevant data from Filipino texts using an improved Bayesian vectorization technique. To validate the effectiveness of improved Bayesian vectorization, the model was compared to the Term Frequency and Inverse Document Frequency (TF-IDF) method. The outcomes are presented using standard measures such as precision, recall, f-score and accuracy. The results revealed that the improved Bayesian vectorization has significantly better results having 98% classification accuracy compared to 76% classification accuracy of the TF-IDF vectorization technique.
Keywords - Dimensionality Reduction, Bayesian Vectorization, Filipino Text Document, OPM Songs, Lyrics, Text Classification