Querying Compressed Bioinformatics Xml Documents

Bioinformatics data are the description of all the chemical interactions or the protein structure to form the gene in the living bodies. To make these data easy to be stored, transmit, retrieved, and unified the best way is to represent these data as an XML representation. However, XML documents suffer from high redundancy in its structure. This paper introduces a new XML compressor (BioXC) to compress the Bioinformatics XML documents and to retrieve information from the compressed XML documents without the need to decompress them. BioXC achieves 68.7% compression ratio and retrieves information based on different kinds of XQuery queries. Keywords— XML, Bioinformatics, information retrieval, XML compression, XQuery language.