The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Compress::Bzip2 - Interface to Bzip2 compression library

SYNOPSIS

    use Compress::Bzip2 qw(:all :constant :utilities :gzip);

    ($bz, $status) = bzdeflateInit( [PARAMS] ) ;
    ($out, $status) = $bz->bzdeflate($buffer) ;

    ($bz, $status) = bzinflateInit( [PARAMS] ) ;
    ($out, $status) = $bz->bzinflate($buffer) ;

    ($out, $status) = $bz->bzflush() ;
    ($out, $status) = $bz->bzclose() ;

    $dest = memBzip($source);
        alias compress
    $dest = memBunzip($source);
        alias uncompress

    $bz = Compress::Bzip2->new( [PARAMS] );

    $bz = bzopen($filename or filehandle, $mode);
        alternate, with $bz created by new():
    $bz->bzopen($filename or filehandle, $mode);

    $bytesread = $bz->bzread($buffer [,$size]) ;
    $bytesread = $bz->bzreadline($line);
    $byteswritten = $bz->bzwrite($buffer);
    $errstring = $bz->bzerror(); 
    $status = $bz->bzeof();
    $status = $bz->bzflush();
    $status = $bz->bzclose() ;

    $status = $bz->bzsetparams( $param => $setting );

    $bz->total_in() ;
    $bz->total_out() ;

    $verstring = $bz->bzversion();

    $Compress::Bzip2::bzerrno

DESCRIPTION

The Compress::Bzip2 module provides a Perl interface to the Bzip2 compression library (see "AUTHOR" for details about where to get Bzip2). A relevant subset of the functionality provided by Bzip2 is available in Compress::Bzip2.

All string parameters can either be a scalar or a scalar reference.

The module can be split into two general areas of functionality, namely in-memory compression/decompression and read/write access to bzip2 files. Each of these areas will be discussed separately below.

DEFLATE

The Perl interface will always consume the complete input buffer before returning. Also the output buffer returned will be automatically grown to fit the amount of output available.

Here is a definition of the interface available:

($d, $status) = bzdeflateInit( [OPT] )

Initialises a deflation stream.

If successful, it will return the initialised deflation stream, $d and $status of BZ_OK in a list context. In scalar context it returns the deflation stream, $d, only.

If not successful, the returned deflation stream ($d) will be undef and $status will hold the exact bzip2 error code.

The function optionally takes a number of named options specified as -Name=>value pairs. This allows individual options to be tailored without having to specify them all in the parameter list.

Here is a list of the valid options:

-verbosity

Defines the verbosity level. Valid values are 0 through 4,

The default is -verbosity => 0.

-blockSize100k

Defines the buffering factor of compression method. The algorithm buffers all data until the buffer is full, then it flushes all the data out. Use -blockSize100k to specify the size of the buffer.

Valid settings are 1 through 9, representing a blocking in multiples of 100k.

Note that each such block has an overhead of leading and trailing synchronization bytes. bzip2 recovery uses this information to pull useable data out of a corrupted file.

A streaming application would probably want to set the blocking low.

-workFactor

The workFactor setting tells the deflation algorithm how much work to invest to compensate for repetitive data.

workFactor may be a number from 0 to 250 inclusive. The default setting is 30.

See the bzip documentation for more information.

Here is an example of using the deflateInit optional parameter list to override the default buffer size and compression level. All other options will take their default values.

    bzdeflateInit( -blockSize100k => 1, -verbosity => 1 );

($out, $status) = $d->bzdeflate($buffer)

Deflates the contents of $buffer. The buffer can either be a scalar or a scalar reference. When finished, $buffer will be completely processed (assuming there were no errors). If the deflation was successful it returns deflated output, $out, and a status value, $status, of Z_OK.

On error, $out will be undef and $status will contain the zlib error code.

In a scalar context bzdeflate will return $out only.

As with the internal buffering of the deflate function in bzip2, it is not necessarily the case that any output will be produced by this method. So don't rely on the fact that $out is empty for an error test. In fact, given the size of bzdeflates internal buffer, with most files it's likely you won't see any output at all until flush or close.

($out, $status) = $d->bzflush([flush_type])

Typically used to finish the deflation. Any pending output will be returned via $out. $status will have a value BZ_OK if successful.

In a scalar context bzflush will return $out only.

Note that flushing can seriously degrade the compression ratio, so it should only be used to terminate a decompression (using BZ_FLUSH) or when you want to create a full flush point (using BZ_FINISH).

The allowable values for flush_type are BZ_FLUSH and BZ_FINISH.

For a handle opened for "w" (bzwrite), the default is BZ_FLUSH. For a stream, the default for flush_type is BZ_FINISH (which is essentially a close and reopen).

It is strongly recommended that you only set the flush_type parameter if you fully understand the implications of what it does. See the bzip2 documentation for details.

Example

Here is a trivial example of using bzdeflate. It simply reads standard input, deflates it and writes it to standard output.

    use strict ;
    use warnings ;

    use Compress::Bzip2 ;

    binmode STDIN;
    binmode STDOUT;
    my $x = bzdeflateInit()
       or die "Cannot create a deflation stream\n" ;

    my ($output, $status) ;
    while (<>)
    {
        ($output, $status) = $x->bzdeflate($_) ;
    
        $status == BZ_OK
            or die "deflation failed\n" ;
    
        print $output ;
    }
    
    ($output, $status) = $x->bzclose() ;
    
    $status == BZ_OK
        or die "deflation failed\n" ;
    
    print $output ;

INFLATE

Here is a definition of the interface:

($i, $status) = inflateInit()

Initialises an inflation stream.

In a list context it returns the inflation stream, $i, and the zlib status code ($status). In a scalar context it returns the inflation stream only.

If successful, $i will hold the inflation stream and $status will be BZ_OK.

If not successful, $i will be undef and $status will hold the bzlib.h error code.

The function optionally takes a number of named options specified as -Name=>value pairs. This allows individual options to be tailored without having to specify them all in the parameter list.

For backward compatibility, it is also possible to pass the parameters as a reference to a hash containing the name=>value pairs.

The function takes one optional parameter, a reference to a hash. The contents of the hash allow the deflation interface to be tailored.

Here is a list of the valid options:

-small

small may be 0 or 1. Set small to one to use a slower, less memory intensive algorithm.

-verbosity

Defines the verbosity level. Valid values are 0 through 4,

The default is -verbosity => 0.

Here is an example of using the bzinflateInit optional parameter.

    bzinflateInit( -small => 1, -verbosity => 1 );

($out, $status) = $i->bzinflate($buffer)

Inflates the complete contents of $buffer. The buffer can either be a scalar or a scalar reference.

Returns BZ_OK if successful and BZ_STREAM_END if the end of the compressed data has been successfully reached. If not successful, $out will be undef and $status will hold the bzlib error code.

The $buffer parameter is modified by bzinflate. On completion it will contain what remains of the input buffer after inflation. This means that $buffer will be an empty string when the return status is BZ_OK. When the return status is BZ_STREAM_END the $buffer parameter will contains what (if anything) was stored in the input buffer after the deflated data stream.

This feature is useful when processing a file format that encapsulates a compressed data stream.

Example

Here is an example of using bzinflate.

    use strict ;
    use warnings ;
    
    use Compress::Bzip2;
    
    my $x = bzinflateInit()
       or die "Cannot create a inflation stream\n" ;
    
    my $input = '' ;
    binmode STDIN;
    binmode STDOUT;
    
    my ($output, $status) ;
    while (read(STDIN, $input, 4096))
    {
        ($output, $status) = $x->bzinflate(\$input) ;
    
        print $output 
            if $status == BZ_OK or $status == BZ_STREAM_END ;
    
        last if $status != BZ_OK ;
    }
    
    die "inflation failed\n"
        unless $status == BZ_STREAM_END ;

COMPRESS/UNCOMPRESS

Two high-level functions are provided by bzlib to perform in-memory compression. They are memBzip and memBunzip. Two Perl subs are provided which provide similar functionality.

$dest = memBzip($source);

Compresses $source. If successful it returns the compressed data. Otherwise it returns undef.

The source buffer can either be a scalar or a scalar reference.

$dest = memBunzip($source);

Uncompresses $source. If successful it returns the uncompressed data. Otherwise it returns undef.

The source buffer can either be a scalar or a scalar reference.

BZIP INTERFACE

A number of functions are supplied in bzlib for reading and writing gzip files. Unfortunately, most of them are not suitable. So, this module provides another interface, over top of the low level bzlib methods.

$bz = bzopen(filename or filehandle, mode)

This function returns an object which is used to access the other bzip2 methods.

The mode parameter is used to specify both whether the file is opened for reading or writing, with "r" or "w" respectively.

If a reference to an open filehandle is passed in place of the filename, it better be positioned to the start of a compression sequence.

$bytesread = $bz->bzread($buffer [, $size]) ;

Reads $size bytes from the compressed file into $buffer. If $size is not specified, it will default to 4096. If the scalar $buffer is not large enough, it will be extended automatically.

Returns the number of bytes actually read. On EOF it returns 0 and in the case of an error, -1.

$bytesread = $bz->bzreadline($line) ;

Reads the next line from the compressed file into $line.

Returns the number of bytes actually read. On EOF it returns 0 and in the case of an error, -1.

It IS legal to intermix calls to bzread and bzreadline.

At this time bzreadline ignores the variable $/ ($INPUT_RECORD_SEPARATOR or $RS when English is in use). The end of a line is denoted by the C character '\n'.

$byteswritten = $bz->bzwrite($buffer) ;

Writes the contents of $buffer to the compressed file. Returns the number of bytes actually written, or 0 on error.

$status = $bz->bzflush($flush) ;

Flushes all pending output to the compressed file. Works identically to the zlib function it interfaces to. Note that the use of bzflush can degrade compression.

Returns BZ_OK if $flush is BZ_FINISH and all output could be flushed. Otherwise the bzlib error code is returned.

Refer to the bzlib documentation for the valid values of $flush.

$status = $bz->bzeof() ;

Returns 1 if the end of file has been detected while reading the input file, otherwise returns 0.

$bz->bzclose

Closes the compressed file. Any pending data is flushed to the file before it is closed.

$bz->bzsetparams( [OPTS] );

Change settings for the deflate stream $bz.

The list of the valid options is shown below. Options not specified will remain unchanged.

-verbosity

Defines the verbosity level. Valid values are 0 through 4,

The default is -verbosity => 0.

-blockSize100k

For bzip object opened for stream deflation or write.

Defines the buffering factor of compression method. The algorithm buffers all data until the buffer is full, then it flushes all the data out. Use -blockSize100k to specify the size of the buffer.

Valid settings are 1 through 9, representing a blocking in multiples of 100k.

Note that each such block has an overhead of leading and trailing synchronization bytes. bzip2 recovery uses this information to pull useable data out of a corrupted file.

A streaming application would probably want to set the blocking low.

-workFactor

For bzip object opened for stream deflation or write.

The workFactor setting tells the deflation algorithm how much work to invest to compensate for repetitive data.

workFactor may be a number from 0 to 250 inclusive. The default setting is 30.

See the bzip documentation for more information.

-small

For bzip object opened for stream inflation or read.

small may be 0 or 1. Set small to one to use a slower, less memory intensive algorithm.

$bz->bzerror

Returns the bzlib error message or number for the last operation associated with $bz. The return value will be the bzlib error number when used in a numeric context and the bzlib error message when used in a string context. The bzlib error number constants, shown below, are available for use.

  BZ_CONFIG_ERROR
  BZ_DATA_ERROR
  BZ_DATA_ERROR_MAGIC
  BZ_FINISH
  BZ_FINISH_OK
  BZ_FLUSH
  BZ_FLUSH_OK
  BZ_IO_ERROR
  BZ_MAX_UNUSED
  BZ_MEM_ERROR
  BZ_OK
  BZ_OUTBUFF_FULL
  BZ_PARAM_ERROR
  BZ_RUN
  BZ_RUN_OK
  BZ_SEQUENCE_ERROR
  BZ_STREAM_END
  BZ_UNEXPECTED_EOF
$bzerrno

The $bzerrno scalar holds the error code associated with the most recent gzip routine. Note that unlike bzerror(), the error is not associated with a particular file.

As with bzerror() it returns an error number in numeric context and an error message in string context. Unlike bzerror() though, the error message will correspond to the bzlib message when the error is associated with bzlib itself, or the UNIX error message when it is not (i.e. bzlib returned Z_ERRORNO).

As there is an overlap between the error numbers used by bzlib and UNIX, $bzerrno should only be used to check for the presence of an error in numeric context. Use bzerror() to check for specific bzlib errors. The gzcat example below shows how the variable can be used safely.

Examples

Here is an example script which uses the interface. It implements a bzcat function.

    use strict ;
    use warnings ;
    
    use Compress::Bzip2 ;
    
    die "Usage: bzcat file...\n"
        unless @ARGV ;
    
    my $file ;
    
    foreach $file (@ARGV) {
        my $buffer ;
    
        my $bz = bzopen($file, "rb") 
             or die "Cannot open $file: $bzerrno\n" ;
    
        print $buffer while $bz->bzread($buffer) > 0 ;
    
        die "Error reading from $file: $bzerrno" . ($bzerrno+0) . "\n" 
            if $bzerrno != BZ_STREAM_END ;
        
        $gz->bzclose() ;
    }

Below is a script which makes use of bzreadline. It implements a very simple grep like script.

    use strict ;
    use warnings ;
    
    use Compress::Bzip2 ;
    
    die "Usage: bzgrep pattern file...\n"
        unless @ARGV >= 2;
    
    my $pattern = shift ;
    
    my $file ;
    
    foreach $file (@ARGV) {
        my $bz = bzopen($file, "rb") 
             or die "Cannot open $file: $bzerrno\n" ;
    
        while ($bz->bzreadline($_) > 0) {
            print if /$pattern/ ;
        }
    
        die "Error reading from $file: $bzerrno\n" 
            if $bzerrno != Z_STREAM_END ;
        
        $bz->bzclose() ;
    }

This script, bzstream, does the opposite of the bzcat script above. It reads from standard input and writes a bzip file to standard output.

    use strict ;
    use warnings ;
    
    use Compress::Bzip2 ;
    
    binmode STDOUT;     # bzopen only sets it on the fd
    
    my $bz = bzopen(\*STDOUT, "wb")
          or die "Cannot open stdout: $bzerrno\n" ;
    
    while (<>) {
        $bz->bzwrite($_) 
        or die "error writing: $bzerrno\n" ;
    }

    $bz->bzclose ;

memBzip

This function is used to create an in-memory bzip file. It creates a minimal bzip header.

    $dest = memBzip($buffer) ;

If successful, it returns the in-memory bzip file, otherwise it returns undef.

The buffer parameter can either be a scalar or a scalar reference.

memBunzip

This function is used to uncompress an in-memory bzip file.

    $dest = memBunzip($buffer) ;

If successful, it returns the uncompressed bzip file, otherwise it returns undef.

The buffer parameter can either be a scalar or a scalar reference. The contents of the buffer parameter are destroyed after calling this function.

EXPORT

Use the tags :all, :utilities, :constants, and :gzip.

Export tag :utilities

This gives an interface to the bzip2 methods.

    bzopen
    bzinflateInit
    bzdeflateInit
    memBzip
    memBunzip
    bzip2
    bunzip2
    bzlibversion
    $bzerrno

Export tag :gzip

This gives compatibility with Compress::Zlib.

    gzopen
    gzinflateInit
    gzdeflateInit
    memGzip
    memGunzip
    $gzerrno

Exportable constants

All the bzlib constants are automatically imported when you make use of Compress::Bzip2.

  BZ_CONFIG_ERROR
  BZ_DATA_ERROR
  BZ_DATA_ERROR_MAGIC
  BZ_FINISH
  BZ_FINISH_OK
  BZ_FLUSH
  BZ_FLUSH_OK
  BZ_IO_ERROR
  BZ_MAX_UNUSED
  BZ_MEM_ERROR
  BZ_OK
  BZ_OUTBUFF_FULL
  BZ_PARAM_ERROR
  BZ_RUN
  BZ_RUN_OK
  BZ_SEQUENCE_ERROR
  BZ_STREAM_END
  BZ_UNEXPECTED_EOF

SEE ALSO

The documentation for zlib, bzip2 and Compress::Zlib.

AUTHOR

Rob Janes, <rwjanes at primus.ca>

COPYRIGHT AND LICENSE

Copyright (C) 2005 by Rob Janes

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.3 or, at your option, any later version of Perl 5 you may have available.

AUTHOR

The Compress::Bzip2 module was originally written by Gawdi Azem azemgi@rupert.informatik.uni-stuttgart.de.

MODIFICATION HISTORY

See the Changes file.

2.00 Second public release of Compress::Bzip2.