Dec 10

Multiple-thread gzip

I was recently backing up some very large datasets and became impatient with the typical time it takes from creating *.tar.gz archives. During one backup session I started searching for a possible method to utilize multiple threads. I came across pigz, which is basically a parallel implementation of gzip. I came across an interesting thread on linuxquestions.org discussing how to implement pigz with tar, instead of gzip with tar (http://bit.ly/1zPFbHS).

Here was one of the recommended solutions by user dru8274.
To compress:
tar -I pigz -cf out.tgz in/

To extract:
tar -I pigz -xf out.tgz in/

I performed a non-scientific benchmark test on a 5.0 GB directory with an 4-core (8-thread) Intel Xeon 3.70 3.7 Ghz. The directory contains a good mix of MS documents, ASCII text files and images. The typical tar -xzf command took 6 minutes, while using pigz instead of gzip resulted in a wall time of 3 minutes. This is a significant time saver!

Permanent link to this article: http://www.mattbilskie.com/multiple-thread-gzip/