The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Gzip::Faster - simple and fast gzip and gunzip

SYNOPSIS

    # Make a random input string
    my $input = join '', map {int (rand (10))} 0..0x1000;
    use Gzip::Faster;
    my $gzipped = gzip ($input);
    my $roundtrip = gunzip ($gzipped);
    if ($roundtrip ne $input) { die; }
    gzip_to_file ($input, 'file.gz');
    $roundtrip = gunzip_file ('file.gz');
    if ($roundtrip ne $input) { die; }

VERSION

This documents version 0.16 of Gzip::Faster corresponding to git commit 64e25a9b5082f20ef87be42c06988fd3fc0ba02e made on Sat Dec 10 22:56:52 2016 +0900.

DESCRIPTION

This module compresses to and decompresses from the gzip format.

The module offers two basic functions, "gzip" and "gunzip", which convert scalars to and from gzip format, and three convenience functions: "gzip_file" reads a file then compresses it; "gunzip_file" reads a file then uncompresses it; and "gzip_to_file" compresses a scalar and writes it to a file.

FUNCTIONS

gzip

    my $zipped = gzip ($plain);

This compresses $plain into the gzip format. The return value is the compressed version of $plain.

gunzip

    my $plain = gunzip ($zipped);

This uncompresses $zipped and returns the result of the uncompression. It returns the undefined value if $zipped is the undefined value or an empty string. Otherwise, it throws a fatal error if $zipped is not in the gzip format.

gzip_file

    my $zipped = gzip_file ('file');

This reads the contents of file into memory and then runs "gzip" on the file's contents. The return value and the possible errors are the same as "gzip", plus this may also throw an error if open fails.

gunzip_file

    my $plain = gunzip_file ('file.gz');

This reads the contents of file.gz into memory and then runs "gunzip" on the file's contents. The return value and the possible errors are the same as "gunzip", plus this may also throw an error if open fails.

gzip_to_file

    gzip_to_file ($plain, 'file.gz');

This compresses $plain in memory using "gzip" and writes the compressed content to 'file.gz'. There is no return value. The errors are the same as "gzip", plus this may also throw an error if open fails. As of this version, it does not write any gzip header information to file.gz.

deflate

    my $deflated = deflate ($plain);

This is similar to "gzip" except that it doesn't write the gzip header information. The output can be inflated either with "inflate" or with "gunzip".

There is an example of using "deflate" to write a PNG in the module in t/png.t.

This was added to the module in version 0.16.

inflate

    my $inflated = inflate ($deflated);

Inflate the output of "deflate". Although the code is slightly different, for all practical purposes this is identical to "gunzip", and it's included only for completeness.

The following example demonstrates using inflate with a PNG image.

    use File::Slurper 'read_binary';
    use FindBin '$Bin';
    use Gzip::Faster 'inflate';
    my $pngfile = "$Bin/larry-wall.png";
    my $pngdata = read_binary ($pngfile);
    if ($pngdata !~ /IHDR(.{13})/) {
        die "No header";
    }
    my ($height, $width, $bits) = unpack ("NNCCCCC", $1);
    if ($pngdata !~ /(....)IDAT(.*)$/s) {
        die "No image data";
    }
    my $length = unpack ("N", $1);
    my $data = substr ($2, 0, $length);
    my $idat = inflate ($data);
    for my $y (0..$height - 1) {
        my $row = substr ($idat, $y * ($width + 1), ($y + 1) * ($width + 1));
        for my $x (1..$width - 1) {
            my $pixel = substr ($row, $x, $x + 1);
            if (ord ($pixel) < 128) {
                print "#";
                next;
            }
            print " ";
        }
        print "\n";
    }

produces output

               ######              
             #########             
           #############           
          ###############          
          ################         
         ##################        
         ########   ########       
        #######      #######       
        ####          ######       
        ###           ######       
        ###           #######      
       ########    ##########      
       ####  ###    #  ######      
       #### # ##   #  ######       
       ####       #     ###        
        ###       #    ####        
                  ##   ###         
                  ##   ###         
              ######## ###         
             ##############        
            ##### #########        
            ## ## ##########       
             #   ##  ########      
             #       ##########    
          #####    ########### ### 
        ######     ################
      #########  ######  ##########
     ##########    ###   # ########
    # # #######    #     ##########
    #  ###### #          ##########

This was added to the module in version 0.16.

deflate_raw

This is similar to "deflate" except that it doesn't write the check sum value in the data at all. The output must be inflated with "inflate_raw".

This was added to the module in version 0.16.

inflate_raw

This inflates data output by "deflate_raw". Although the code is basically similar to "inflate" and "gunzip", it won't work on the output of "gzip" and "deflate".

This was added to the module in version 0.16.

METHODS

new

    my $gf = Gzip::Faster->new ();

Defaults to gzip compression.

This was added to the module in version 0.16.

zip

    my $zipped = $gf->zip ($plain);

Compress $plain. The type of compression can be set with "gzip" and "raw".

This was added to the module in version 0.16.

unzip

    my $plain = $gf->unzip ($zipped);

Uncompress $zipped. The type of uncompression can be set with "gzip" and "raw".

This was added to the module in version 0.16.

copy_perl_flags

    $gf->copy_perl_flags (1);

Copy the Perl flags like the utf8 flag into the header of the gzipped data.

This feature of the module was restored in version 0.16.

file_name

    my $filename = $gf->file_name ();
    $gf->file_name ('this.gz');

Get or set the file name. This only applies to the gzip format, since the deflate format has no header to store a name into. When you set a file name, then use "zip", the file name is subsequently deleted from the object, so it needs to be set each time "zip" is called.

The following example demonstrates storing and then retrieving the name:

    use utf8;
    use FindBin '$Bin';
    use Gzip::Faster;
    my $gf = Gzip::Faster->new ();
    $gf->file_name ("blash.gz");
    my $something = $gf->zip ("stuff");
    my $no = $gf->file_name ();
    if ($no) {
        print "WHAT?\n";
    }
    else {
        print "The file name has been deleted by the call to zip.\n";
    }
    my $gf2 = Gzip::Faster->new ();
    $gf2->unzip ($something);
    my $file_name = $gf2->file_name ();
    print "Got back file name $file_name\n";

produces output

    The file name has been deleted by the call to zip.
    Got back file name blash.gz

The module currently has a hard-coded limit of 1024 bytes as the maximum length of file name it can read back.

This was added to the module in version 0.16.

gzip_format

    $gf->gzip_format (1);

Switch between gzip and deflate formats. The default is gzip format.

This was added to the module in version 0.16.

raw

    $gf->raw (1);

Switch between raw inflate and inflate formats. Switching this on automatically switches off "gzip_format", since these are not compatible.

The sequence

    $gf->gzip_format (1);
    $gf->raw (1);
    $gf->raw (0);

ends up with $gf in the non-raw inflate format.

This was added to the module in version 0.16.

level

    $gf->level (9);

Set the compression level, from 0 (no compression) to 9 (best compression). Values outside the levels cause a warning and the level to be set to the nearest valid value, for example a value of 100 causes the level to be set to 9.

This was added to the module in version 0.16.

PERFORMANCE

This section compares the performance of Gzip::Faster with IO::Compress::Gzip / IO::Uncompress::Gunzip and Compress::Raw::Zlib.

Short text

This section compares the performance of Gzip::Faster and other modules on a short piece of English text. These results are produced by the file bench/benchmarks.pl in the distribution.

According to these results, Gzip::Faster is about five times faster to load, seven times faster to compress, and twenty-five times faster to uncompress than IO::Compress::Gzip and IO::Uncompress::Gunzip. Round trips are about ten times faster with Gzip::Faster.

Compared to Compress::Raw::Zlib, load times are about one and a half times faster, round trips are about three times faster, compression is about two and a half times faster, and decompression is about six times faster.

The versions used in this test are as follows:

    $IO::Compress::Gzip::VERSION = 2.069
    $IO::Uncompress::Gunzip::VERSION = 2.069
    $Compress::Raw::Zlib::VERSION = 2.069
    $Gzip::Faster::VERSION = 0.16

The size after compression is as follows:

    IO::Compress:Gzip size is 830 bytes.
    Compress::Raw::Zlib size is 830 bytes.
    Gzip::Faster size is 830 bytes.

Here is a comparison of load times:

                Rate Load IOUG Load IOCG  Load CRZ   Load GF
    Load IOUG 25.2/s        --       -4%      -66%      -77%
    Load IOCG 26.4/s        5%        --      -65%      -76%
    Load CRZ  74.5/s      195%      182%        --      -32%
    Load GF    110/s      337%      318%       48%        --

Here is a comparison of a round-trip:

                           Rate IO::Compress::Gzip Compress::Raw::Zlib  Gzip::Faster
    IO::Compress::Gzip   1310/s                 --                -66%          -90%
    Compress::Raw::Zlib  3867/s               195%                  --          -70%
    Gzip::Faster        12877/s               883%                233%            --

Here is a comparison of gzip (compression) only:

                                    Rate IO::Compress::Gzip Compress::Raw::Zlib::Deflate Gzip::Faster
    IO::Compress::Gzip            2564/s                 --                         -60%         -86%
    Compress::Raw::Zlib::Deflate  6452/s               152%                           --         -65%
    Gzip::Faster                 18182/s               609%                         182%           --

Here is a comparison of gunzip (decompression) only:

                                    Rate IO::Uncompress::Gunzip Compress::Raw::Zlib::Inflate Gzip::Faster
    IO::Uncompress::Gunzip        2844/s                     --                         -74%         -96%
    Compress::Raw::Zlib::Inflate 10884/s                   283%                           --         -84%
    Gzip::Faster                 69565/s                  2346%                         539%           --

The test file is in bench/benchmark.pl in the distribution.

Long text

This section compares the compression on a 2.2 megabyte file of Chinese text, which is the Project Gutenberg version of Journey to the West, http://www.gutenberg.org/files/23962/23962-0.txt, with the header and footer text removed.

The versions used in this test are as above.

The sizes are as follows:

    IO::Compress:Gzip size is 995387 bytes.
    Compress::Raw::Zlib size is 995387 bytes.
    Gzip::Faster size is 995823 bytes.

Note that the size of the file compressed with the command-line gzip, with the default compression, is identical to the size with Gzip::Faster::gzip, except for the 12 bytes in the file version used to store the file name:

    $ gzip --keep chinese.txt
    $ ls -l chinese.txt.gz 
    -rw-r--r--  1 ben  ben  995835 Oct 20 18:52 chinese.txt.gz

Here is a comparison of a round-trip:

                          Rate IO::Compress::Gzip Compress::Raw::Zlib   Gzip::Faster
    IO::Compress::Gzip  4.44/s                 --                 -2%            -7%
    Compress::Raw::Zlib 4.55/s                 2%                  --            -5%
    Gzip::Faster        4.80/s                 8%                  6%             --

Here is a comparison of gzip (compression) only:

                                   Rate IO::Compress::Gzip Compress::Raw::Zlib::Deflate Gzip::Faster
    IO::Compress::Gzip           5.05/s                 --                          -0%          -6%
    Compress::Raw::Zlib::Deflate 5.06/s                 0%                           --          -6%
    Gzip::Faster                 5.36/s                 6%                           6%           --

Here is a comparison of gunzip (decompression) only:

                                   Rate IO::Uncompress::Gunzip Compress::Raw::Zlib::Inflate Gzip::Faster
    IO::Uncompress::Gunzip       36.8/s                     --                         -18%         -20%
    Compress::Raw::Zlib::Inflate 45.1/s                    23%                           --          -2%
    Gzip::Faster                 46.0/s                    25%                           2%           --

For longer files, Gzip::Faster is not much faster and the underlying library's speed is the main factor.

BUGS

There is no way to select the level of compression. The level of compression offered by this module is the zlib default one, which is what you get if you run the command-line program gzip on a file without the options like --best or --fast.

The module doesn't check whether the input of "gzip" is already gzipped, and it doesn't check whether the compression was effective. That is, it doesn't check whether the output of "gzip" is actually smaller than the input.

Browser bugs and Gzip::Faster

Some web browsers have bugs which may affect users of this module.

Using "copy_perl_flags" with utf8-encoded text trips a browser bug in the Firefox web browser where it produces a content encoding error message.

Using deflate rather than gzip compression on world-wide web pages on the internet, trips browser bugs in some versions of Internet Explorer.

EXPORTS

The module exports "gzip", "gunzip", "gzip_file", "gunzip_file", and "gzip_to_file" by default. You can switch this blanket exporting off with

    use Gzip::Faster ();

or

    use Gzip::Faster 'gunzip';

whereby you only get gunzip and not the other functions exported. The functions "inflate", "deflate", "inflate_raw" and "deflate_raw" are exported on demand only. You can export all the functions from the module using

    use Gzip::Faster ':all';

DIAGNOSTICS

All errors are fatal.

Data input to inflate is not in libz format

The data given to "gunzip", "inflate", or "inflate_raw" was not in the expected format.

Error opening '$file': $!

This may be produced by "gunzip_file", "gzip_file", or "gzip_to_file".

Error closing '$file': $!

This may be produced by "gunzip_file", "gzip_file", or "gzip_to_file".

There are a number of other diagnostics but these are meant to detect bugs. A complete list of the other can be obtained by running the parse-diagnostics script which comes with Parse::Diagnostics on the files gzip-faster-perl.c and lib/Gzip/Faster.pm in the distribution.

INSTALLATION

Installation follows the standard Perl methods. If you do not know what the standard Perl module install methods are, detailed instructions can be found in the file README in the distribution. The following are some extra notes for people who get stuck.

Gzip::Faster requires the compression library zlib (also called libz) to be installed. The following message printed during perl Makefile.PL:

    You don't seem to have zlib available on your system.

or

    Warning (mostly harmless): No library found for -lz

or the following message at run-time:

    undefined symbol: inflate

indicate that Gzip::Faster was unable to link to libz.

Ubuntu Linux

On Ubuntu Linux, you may need to install zlib1g-dev using the following command:

    sudo apt-get install zlib1g-dev

Windows

Unfortunately at this time the module doesn't seem to install on ActiveState Perl. You can check the current status at http://code.activestate.com/ppm/Gzip-Faster/. However, the module seems to install without problems on Strawberry Perl, so if you cannot install via ActiveState, you could try that instead.

ACKNOWLEDGEMENTS

zgrim reported an important bug related to zlib.

Aristotle Pagaltzis contributed the benchmarking code for Compress::Raw::Zlib.

The tests in t/png.t use material taken from Image::PNG::Write::BW by Andrea Nall (<ANALL>).

SEE ALSO

gzip

Even within Perl, sometimes it's a lot easier to use the command line utility gzip as in

    system ("gzip file");

or `gzip file` than it is to try to figure out how to use some module or another.

mod_deflate and mod_gzip

These are Apache web server modules which compress web outputs on the fly.

PerlIO::gzip

This is a Perl extension to provide a PerlIO layer to gzip/gunzip. That means you can just add :gzip when you open a file to read or write compressed files:

    open my $in, "<:gzip", 'file.gz'

    open my $out, ">:gzip", 'file.gz'

and you never have to deal with the gzip format.

IO::Zlib
Compress::Zlib
Compress::Raw::Zlib
CGI::Compress::Gzip
IO::Compress::Gzip and IO::Uncompress::Gunzip

HISTORY

This module started as an experimental benchmark against IO::Compress::Gzip when profiling revealed that some web programs were spending the majority of their time in IO::Compress::Gzip. Because I also had some web programs in C, which use the raw zlib itself, I was aware that zlib itself was very fast, and I was surprised by the amount of time the Perl code was taking. I wrote this module to test IO::Compress::Gzip against a simplistic C wrapper. I released the module to CPAN because the results were very striking.

The code's ancestor is the example program zpipe supplied with zlib. See http://zlib.net/zpipe.c. Gzip::Faster is little more than zpipe reading to and and writing from Perl scalars.

Version 0.16 added "deflate" and related functions.

AUTHOR

Ben Bullock, <bkb@cpan.org>

COPYRIGHT & LICENCE

This package and associated files are copyright (C) 2014-2016 Ben Bullock.

You can use, copy, modify and redistribute this package and associated files under the Perl Artistic Licence or the GNU General Public Licence.