May 18, 2011

Data Compression for Linux


Data compression works so well that popular backup and networking tools have some built in. Linux has a lot of compression tools to choose from, and most of them let you choose a compression level also. Linux offers a specific compression utility suited for any task.
Linux compression tools are relatively simple to use and much more flexible, and each command-line compression utilities handle compression and decompression differently. Before we get to the usage of different compression utilities, first, consider the distinction between compression and archiving. Archiving is the process of a combining a number of files together into one file, it acts as briefcase in which all your files are kept. While compression is the process of storing information in a fewer bits with the use of encoding schemes.

The bzip2 compression utility uses the Burrow-Wheeler Transform (BWT) algorithm, which takes a block of data and rearranges it using a specific sorting algorithms. The only difference between compressed data block and the original data block is the placement order of the data. The biggest difference of BWT method and the other popular method is that BWT acts on an entire block of data at once whereas other compression utilities act in a few bytes at a time. But the block of data in BWT is limited in size since it handles its process in memory, so if the memory size is small, the block of data the BWT can handle will be small. Because of such limitation, bzip2 is best for small to midsize block of data compressing – such as images, e-mail attachments, and smaller compression needs.
Unlike bzip2, the gzip compression utility use Lempel-Ziv coding (LZ77). This compression technique is based on numerically indexing character string segments, based on their first appearance in a file, and then replacing those strings with numeric values in future occurrences. The compression doesn’t offer an enormous upside in file size reduction and the algorithm is complex. The gzip is able to compress much faster, although it doesn’t have quite the compression ratio of bzip2, so gzip is best suited for on-the-fly compression where size is not an issue. Other than speed, gzip is able to work with multiple formats like .gz, .Z, .tgz, and .zip extensions; while bzip2 only handles .bz extension files.
Identical to the Windows command-line compression utility, zip is compatible with MSDOS zip and PKZIP. Flexibility is one aspect that makes zip more compelling to use. In Linux, zip is not only a compression utility it is also an archiving utility that can encrypt using passwords.The main reason to use the zip utility is for cross-platform compatibility and the compression of zip is nearly identical to that of gzip.
Three tools with three different uses, that goes to show that Linux is flexible. Remember that each compression utility is best suited for specific task – bzip2 for small to mid-size files, gzip for larger files and on-the-fly compression, and finally for cross-platform compatibility zip is the tool for the job.

Data Compression for Linux


No comments:

Post a Comment