The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Archive::Zip::StreamedUnzip - Read Zip Archives in streaming mode

SYNOPSIS

    use Archive::Zip::StreamedUnzip qw($StreamedUnzipError) ;

    my $z = new Archive::Zip::StreamedUnzip "my.zip"
        or die "Cannot open zip file: $StreamedUnzipError\n" ;


    # Iterate through a zip archive
    while (my $member = $z->next)
    {
        print $member->name() . "\n" ;
    }

    # Archive::Zip::StreamedUnzip::Member

    my $name = $member->name();
    my $content = $member->content();
    my $comment = $member->comment();

    # open a filehandle to read from a zip member
    $fh = $member->open("mydata1.txt");

    # read blocks of data
    read($fh, $buffer, 1234) ;

    # or a line at a time
    $line = <$fh> ;

    close $fh;

    $z->close();

DESCRIPTION

Archive::Zip::StreamedUnzip is a module that allows reading of Zip archives in streaming mode. This is useful if you are processing a zip coming directly off a socket without having to read the complete file into memory and/or store it on disk. Similarly it can be handy when woking with a pipelined command.

Working with a streamed zip file does have limitations, so most of the time Archive::Zip::SimpleUnzip and/or Archive::Zip are a better choice of module for reading file files.

For writing Zip archives, there is a companion module, called Archive::Zip::SimpleZip, that can create Zip archives.

Features

  • Read zip archive from a file, a filehandle or from an in-memory buffer.

  • Perl Filehandle interface for reading a zip member.

  • Supports deflate, store, bzip2, Zstandard (Zstd), Xz and lzma compression.

  • Supports Zip64, so can read archves larger than 4Gig and/or have greater than 64K members.

Constructor

     $z = new Archive::Zip::StreamedUnzip "myzipfile.zip" [, OPTIONS] ;
     $z = new Archive::Zip::StreamedUnzip \$buffer [, OPTIONS] ;
     $z = new Archive::Zip::StreamedUnzip $filehandle [, OPTIONS] ;

The constructor takes one mandatory parameter along with zero or more optional parameters.

The mandatory parameter controls where the zip archive is read from. This can be any one of the following

  • Input from a Filename

    When StreamedUnzip is passed a string, it will read the zip archive from the filename stored in the string.

  • Input from a String

    When StreamedUnzip is passed a string reference, like \$buffer, it will read the in-memory zip archive from that string.

  • Input from a Filehandle

    When StreamedUnzip is passed a filehandle, it will read the zip archive from that filehandle. Note the filehandle must be seekable.

See "Options" for a list of the optional parameters that can be specified when calling the constructor.

Options

None yet.

Methods

$z->next()

Returns the next member from the zip archive as a Archive::Zip::StreamedUnzip::Member object. See "Archive::Zip::StreamedUnzip::Member"

Standard usage is

    use Archive::Zip::StreamedUnzip qw($StreamedUnzipError) ;

    my $match = "hello";
    my $zipfile = "my.zip";

    my $z = new Archive::Zip::StreamedUnzip $zipfile
        or die "Cannot open zip file: $StreamedUnzipError\n" ;

    while (my $member = $z->next())
    {
        my $name = $member->name();
        my $fh = $member->open();
        while (<$fh>)
        {
            my $offset =
            print "$name, line $.\n" if /$match/;
        }
    }
$z->close()

Closes the zip file.

Archive::Zip::StreamedUnzip::Member

The next method returns a member object of type Archive::Zip::StreamedUnzip::Member that has the following methods.

$string = $m->name()

Returns the name of the member.

$data = $m->content()

Returns the uncompressed content.

$fh = $m->open()

Returns a filehandle that can be used to read the uncompressed content.

Examples

Iterate through a Zip file

    use Archive::Zip::StreamedUnzip qw($StreamedUnzipError) ;

    my $zipfile = "my.zip";
    my $z = new Archive::Zip::StreamedUnzip $zipfile
        or die "Cannot open zip file: $StreamedUnzipError\n" ;

    while (my $member = $z->next())
    {
        print "$member->name()\n";
    }

Filehandle interface

Here is a simple grep, that walks through a zip file and prints matching strings.

    use Archive::Zip::StreamedUnzip qw($StreamedUnzipError) ;

    my $match = "hello";
    my $zipfile = "my.zip";

    my $z = new Archive::Zip::StreamedUnzip $zipfile
        or die "Cannot open zip file: $StreamedUnzipError\n" ;

    while (my $member = $z->next())
    {
        my $name = $member->name();
        my $fh = $member->open();
        while (<$fh>)
        {
            my $offset =
            print "$name, line $.\n" if /$match/;
        }
    }

Nested Zip

Here is a script that will list the contents of a zip file along with any zip files that are embedded in it. In fact it will work with any level of nesting.

    sub walk
    {
        my $unzip  = shift ;
        my $depth = shift // 1;

        while (my $member = $unzip->next())
        {
            my $name = $unzip->name();
            print "  " x $depth . "$name\n" ;

            if ($name =~ /\.zip$/i)
            {
                my $fh = $member->open();
                my $newunzip = new Archive::Zip::StreamedUnzip $fh;
                walk($newunzip, $depth + 1);
            }
        }
    }

    my $unzip = new Archive::Zip::StreamedUnzip $zipfile
                or die "Cannot open '$zipfile': $StreamedUnzipError";

    print "$zipfile\n" ;
    walk($unzip) ;

Zip File Interoperability

The intention is to be interoperable with zip archives created by other programs, like pkzip or WinZip, but the majority of testing carried out used the Info-Zip zip/unzip programs running on Linux.

This doesn't necessarily mean that there is no interoperability with other zip programs like pkzip and WinZip - it just means that I haven't tested them. Please report any issues you find.

Compression Methods Supported

The following compression methods are supported

deflate (8)

This is the most common compression used in zip archives.

store (0)

This is used when no compression has been carried out.

bzip2 (12)

Only if the IO-Compress-Bzip2 module is available.

lzma (14)

Only if the IO-Compress-Lzma module is available.

Xz (95)

To read Xz content, the module IO::Uncompress::UnXz must be installed.

Zstandard (93)

To read Zstandard content, the module IO::Uncompress::UnZstd must be installed.

Zip64 Support

This modules supports Zip64, so it can read archves larger than 4Gig and/or have greater than 64K members.

Limitations

The following features are not currently supported.

SUPPORT

General feedback/questions/bug reports should be sent to https://github.com/pmqs/Archive-Zip-SimpleZip/issues (preferred) or https://rt.cpan.org/Public/Dist/Display.html?Name=Archive-Zip-SimpleZip.

SEE ALSO

Archive::Zip::SimpleUnzip, Archive::Zip::SimpleZip, Archive::Zip, IO::Compress::Zip, IO::Uncompress::UnZip

AUTHOR

This module was written by Paul Marquess, pmqs@cpan.org.

MODIFICATION HISTORY

See the Changes file.

COPYRIGHT AND LICENSE

Copyright (c) 2019 Paul Marquess. All rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.