Paper Title
Improved File based De-Duplication using Block Level Chunking Techniques
Abstract
Data de-duplication is an efficient technique that removes the duplicate data to minimize the storage cost and maximize the data processing speed. Generally, data de-duplication process consists of four operations: chunking, fingerprinting / hashing, indexing, and storing. Chunking is a principal component of the data de-duplication process that plays a vital role to improve de-duplication performance. This paper proposes two new combined approaches i.e. FFC and FVC. The proposed algorithms improve the performance gain of file-based de-duplication with use of fixed and variable length chunking techniques. The proposed algorithms are evaluated and their performance is measured in terms of de-duplication ratio, de-duplication time, total number of chunks generated and throughput using our generated datasets. The obtained results are verified with some open source tools available for data de-duplication.
Keywords - Data De-Duplication, Data Reduction, Chunking, Fast Information Retrieval, Storage Management