The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Gzip::RandomAccess - extract arbitrary bits of a gzip stream

SYNOPSIS

  use Gzip::RandomAccess;

  my $gzip = Gzip::RandomAccess->new($filename);  # short version
  my $gzip = Gzip::RandomAccess->new(
    file => 'foo.gz',
    index_file => '.foo.gz.idx',
    cleanup => 1,  # delete index when out of scope
  );

  # Extract 1024 bytes from the 128th byte in
  print $gzip->extract(127, 1024), "\n";

DESCRIPTION

This module allows you to randomly access a gzip deflate stream as if it were a regular file, even though gzip is not designed to be random-access. This is achieved by streaming the gzip file in advance, building an index mapping compressed byte offsets to uncompressed offsets, and at each point storing the 32KB of data gzip needs to prime its decompression engine from that point.

The mechanism is taken from zran.c, an example in the zlib distribution; this module wraps it up in a nice XS Perl API and provides index creation and cleanup mechanisms.

METHODS

new ($filename)

new (%args)

Create a new Gzip::RandomAccess object. A single filename is accepted, otherwise the following options as a hash or hashref:

file (required)

Path to the gzip file you want to access.

index_file (default: "$file.idx")

Path to the index file to use, or create if it does not already exist. If not provided, defaults to adding '.idx' to the filename.

index_span (default: 1024*1024)

Override the number of bytes between indexing points. A smaller number creates a larger index but allows you to random-access larger files faster.

cleanup (default: 0)

If set to a true value, automatically deletes the index file when the object is destroyed.

extract ($offset, $length)

Return uncompressed content from the gzip stream of length $length from offset $offset (starting at 0).

build_index

Builds the gzip index, rebuilding if necessary. (Uncompresses the whole file - may be slow).

index_available

Returns a boolean indicating if the gzip index has been created.

uncompressed_size

Returns the total number of uncompressed bytes in the gzip stream. Unlike zcat --list the value is not modulo 4GB.

file

index_file

cleanup

Accessors for constructor arguments.

CAVEATS

Not tested on Windows, or with any compression method other than deflate.

AUTHOR

Richard Harris <richardjharris@gmail.com>

The libzran library included in this distribution is based on work by Iain Wade, subsequently based on zran.c by Mark Alder.

ZLIB LICENSE

 This software is provided 'as-is', without any express or implied
 warranty.  In no event will the authors be held liable for any damages
 arising from the use of this software.

 Permission is granted to anyone to use this software for any purpose,
 including commercial applications, and to alter it and redistribute it
 freely, subject to the following restrictions:

  1. The origin of this software must not be misrepresented; you must not
     claim that you wrote the original software. If you use this software
     in a product, an acknowledgment in the product documentation would be
     appreciated but is not required.
  2. Altered source versions must be plainly marked as such, and must not be
     misrepresented as being the original software.
  3. This notice may not be removed or altered from any source distribution.