5 softwares and utilities that can compress large files in linux

There are several different data compression software that you can use on Linux. Your choice of the software to use will vary depending on the size of the file, when you compress large files you might want to use a different program than when you compress a large number of smaller files. Some of these programs are more popular than others and hence it is quite possible that only a few of them are available out of the box on your Linux distribution.  It is always possible to install your favorite one later.

Each of these data compression software has their own strengths and weaknesses. It is quite possible that different software support a different set of algorithms so if you are particular about which compression algorithm to use, then verify that your software supports it. Also, some algorithms perform better on some file types than on others.

As they are several software for compressing files and directories, I will cover just a few of them which are more popular. Some of these software do only compression and some of them does only archiving or bundling while others does both. The software that you want to use will depend very much on your requirements.

Gzip/Deflate

Gzip is the GNU Project's implementation of the popular DEFLATE algorithm. This is one of the most popular and widely used compression software and should be available on all Linux distributions. This is a pure compressor and a decompressor and is not an archiver, which means it is usually used in conjunction with the tar command to bundle files.

Gzip adds a header/footer and a checksum to the deflate compression, hence it is slightly slower than the pure vanilla deflate compression. This is also one of the preferred compression method used for http file transfer, the other being Deflate.

Bzip2

bzip2 is an open-source implementation of the Burrows-Wheeler algorithm. It is again a pure compressor like gzip, so you could use it with tar to achieve the archiving capabilities. It compresses better than gzip in most cases but is more resource intensive in terms of time and CPU.

This is a good alternative to Gzip when smaller compressed file size is a requirement, especially for long time storage and backup of files.

Zip

Probably one of the most popular formats, because it supports both compression and archiving. Its popularity also stems from its acceptance and implementation on the MS Windows platform. It also supports several compression algorithms though the most commonly used one is the Deflate.

If you want to un-compress the resultant archive on a windows machine, then zip might be a better alternative to gzip and bzip2 in that more people are familiar with zip than other formats.

p7zip

p7zip is an archiver which supports many of the compression algorithms along the lines of the zip command. It archives with .7z extension and creates 7z format archives that uses a variety of compression algorithms such bzip2, LZMA and LZMA2.

Rzip

rzip is a data compression software designed for large files to achieve far greater compression than any of the previous ones. It is based on bzip2 or the Burrows-Wheeler algorithm. Rzip uses a far greater history buffer, upto 900MB than either the gzip or bzip2 to find redundancy in the files. This helps it to achieve a higher compression ratio but the expense of more resources, especially the RAM (memory) available to the process.

Lrzip is a variant of Rzip, though the file format is different than rzip. This variation does not have a buffer limit and supports different algorithms like Deflate, bzip2, lzo etc.

When you compress large files, then rzip or lrzip is your best options in terms of compression ratio.

Tar

Tar which stands for tape archives does not compress the file, but rather bundles a set of files and folders into one bundled file. But tar can be used with the previous gzip, bzip2 or rzip commands to create a bundled archive file which is denoted by the extension .tar.gz or .tgz /or /.tar.bz2.