The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

PAR::Intro - Introduction to Perl Archive Toolkit

SYNOPSIS

# This is a presentation, not a module.

Note that a more extensive tutorial is now available online as http://aut.dyndns.org/par-tutorial/ and has superceded materials in this introduction.

DESCRIPTION

What is PAR (Perl Archive Toolkit)?

  • Do what JAR (Java Archive) does for Perl

    • Platform-independent, compressed file format (zip)

    • Aggregates modules, scripts and other files into one file

    • Easy to generate, update and extract

  • Benefits of using PAR:

    • Reduced download and deployment time

    • Saves disk space by compression and selective packaging

    • Version consistency: solves forward-compatibility problems

    • Community support: par@perl.org

  • You can also turn a PAR file into a self-contained script

    • Bundles all necessary 3rd-party modules with it

    • Requires only core Perl to run on the target machine

    • If you use pp to compile the script...

    • ...you get an executable not even needing core perl

Getting Started

  • First, generate a PAR file with modules in it:

        % zip foo.par Hello.pm
        % zip -r foo.par lib/       # grab all modules in lib/
  • Using modules stored inside a PAR file:

        % perl -MPAR=./foo.par -MHello
        % perl -MPAR=./foo -MHello  # the .par part is optional
  • Or put it in @INC and use it just like a directory:

        % perl -MPAR -Ifoo.par -MHello
        % perl -MPAR -Ifoo -MHello  # ditto

Command-line Tools

  • Use pp to scan scripts and store dependencies as a PAR file:

        % pp -p source.pl           # makes 'source.par'
        % pp -B -p source.pl        # bundles core modules too
  • Use par.pl to run files from a Perl Archive:

        % par.pl foo.par            # looks for 'main.pl' by default
        % par.pl foo.par test.pl    # runs script/test.pl in foo.par
  • Use parl or parl.exe to run files from a Perl Archive:

        % parl foo.par
        % parl foo.par test.pl

Making Binary Executables

  • The pp utility can also generate binary executables:

        % pp -o packed.exe source.pl    # self-contained .exe
        % packed.exe                    # runs anywhere with the same OS
  • You can also bundle additional modules:

        # packs CGI + its dependencies, too
        % pp -o packed.exe -M CGI source.pl
  • Or pack one-liners:

        # turns one-liner into executable
        % pp -o packed.exe -e 'print "Hi!"'
  • Some notes:

    • The command-line options of pp are almost identical to perlcc's

    • Modules are read directly from the PAR file, not extracted

    • Shared object files (aka dll) are extracted with File::Temp

    • Tested on Win32, FreeBSD, Linux, AIX, Solaris, Darwin and Cygwin.

The Anatomy of a PAR file

  • Modules can reside in different directories in a PAR file:

        /lib/                       # standard location
        /arch/                      # for creating from blib/ 
        /i386-freebsd/              # i.e. $Config{archname}
        /5.8.0/                     # i.e. Perl version number
        /5.8.0/i386-freebsd/        # combination of the two above
        /                           # casual packaging only
  • Scripts are stored in one of the two locations:

        /script/                    # standard location
        /                           # casual packaging only
  • Shared libraries may be architecture- or perl-version-specific:

        /shlib/(5.8.0/)?(i386-freebsd/)?
  • PAR files may recursively contain other PAR files:

        /par/(5.8.0/)?(i386-freebsd/)?
     
  • Special files:

        /MANIFEST                   # index of the PAR's contents
        /SIGNATURE                  # digital signature(s)
        /META.yml                   # dependency, license info, etc.
        /Build.PL                   # self-contained installer
  • Programs can use PAR::read_file($filename) to read file contents inside PAR

  • Programs can use PAR::reload_libs() to reload modules within changed PARs

Derived Modules

  • Apache::PAR

    • Nathan Byrd's attempt to make self-contained Perl Handlers

    • Same as the WAR files for Java Servlets

    • Includes PerlRun and Registry handlers

  • App::Packer::Backend::PAR

    • Support module of Mattia Barbon's App::Packer suite

    • Makes it easy to pick-and-choose dependency scanners and packers

    • Fine-tuned distribution and packaging controls

  • CPANPLUS::Dist::PAR

    • Cross-platform PPM: Auto-generate PAR out of CPAN distributions

    • Use the bundled Build.PL to install PAR modules into system

Apache::PAR Demo

  • In httpd.conf:

        <VirtualHost *>
            <IfDefine MODPERL2>
            PerlModule Apache::ServerUtil
            </IfDefine>
            PerlModule Apache::PAR
            PARDir /opt/myapp
            PARFile /opt/myapp/myapp.par
        </VirtualHost>
  • In web.conf inside myapp.par:

        Alias /myapp/static/ ##PARFILE##/
        <Location /myapp/static>
            SetHandler perl-script
            PerlHandler Apache::PAR::Static
            PerlAddVar PARStaticDirectoryIndex index.html
            PerlSetVar PARStaticDefaultMIME text/html
        </Location>
    
        Alias /myapp/cgi-perl/ ##PARFILE##/
        <Location /myapp/cgi-perl>
            Options +ExecCGI
            SetHandler perl-script
            PerlHandler Apache::PAR::Registry
        </Location>

Future Development

  • Polish pp's features

    • Handles corner dependency cases for LWP, Tk, DBI...

    • Optional encryption support (but *not* obscuring)

    • Become a worthy competitor to PerlApp and Perl2Exe

  • Learning from JAR

    • Making par.pl's command line interface in sync with jar's

    • Digital signatures for PAR packages using Module::Signature

    • File layout compatibility?

  • Learning from FreeBSD Bento

    • Smoke test and make PAR automatically for each CPAN upload

    • Provide binary packages for users without a compiler

Overview of PAR.pm's Implementation

  • Here begins the scary part

    • Grues, Dragons and Jabberwocks abound...

    • You are going to learn unpleasant things about Perl internals

    • Go home now if you have heart condition or digest problems

  • PAR invokes five areas of Perl arcana:

    • @INC code references

    • On-the-fly source filtering

    • Faking <DATA> filehandle with PerlIO::scalar and IO::Scalar

    • Overriding DynaLoader::bootstrap to handle XS modules

    • Making self-bootstrapping binary executables

  • The first two only works on 5.6 or later

    • PerlIO::scalar is 5.8-specific; IO::scalar only needs 5.005

    • DynaLoader and %INC are there since Perl 5 was born

    • PAR currently needs 5.6, but a 5.005 port is possible

Code References in @INC

  • On 1999-07-19, Ken Fox submitted a patch to P5P

    • To "enable using remote modules" by putting hooks in @INC

    • It's accepted to come in Perl 5.6, but only get documented by 5.8

    • Type 'perldoc -f require' to read the nitty-gritty details

  • Code references in @INC may return a filehandle, or undef to 'pass':

        push @INC, \&my_sub;
        sub my_sub {
            my ($coderef, $filename) = @_;  # $coderef is \&my_sub
            open my $fh, "wget http://example.com/$filename |";
            return $fh;     # using remote modules, indeed!
        }
  • Perl 5.8 let you open a file handle to a string, so we just use that:

        open my $fh, '<', \($zip->memberNamed($filename)->contents);
        return $fh;
  • But Perl 5.6 does not have that, and I don't want to use temp files...

Source Filtering without Filter::* Modules

  • ... Undocumented features to the rescue!

    • It turns out that @INC hooks can return *two* values

    • The first is still the file handle

    • The second is a code reference for line-by-line source filtering!

  • This is how Acme::use::strict::with::pride works:

        # Force all modules used to use strict and warnings
        open my $fh, "<", $filename or return;
        my @lines = ("use strict; use warnings;\n", "#line 1 \"$full\"\n");
        return ($fh, sub {
            return 0 unless @lines; 
            push @lines, $_; $_ = shift @lines; return length $_;
        });
  • But we don't really have a filehandle for anything

    • Another undocumented feature to the rescue

    • We can actually omit the first return value altogether:

          # Return all contents line-by-line from the file inside PAR
          my @lines = split /(?<=\n)/, $zip->memberNamed($filename)->contents;
          return (sub { $_ = shift(@lines); return length $_ });

Faking the <DATA> Handle

  • The @INC filter stops when it sees __END__ or __DATA__

    • All contents below are lost

    • Breaks modules that read from the <DATA> filehandle

    • The same problem appears when we eval the main.pl script

  • Therefore, we insert a line before the final token to fake *DATA

    • It has to be the final line to belong to the correct package

    • It has to happen in compile time but not inside a BEGIN block

    • Here is what I came up with (but no longer needed in recent versions):

          $DATACache{$file} = $1 if ($program =~ s/^__DATA__\n?(.*)//ms);
          if (eval {require PerlIO::scalar; 1}) {
              "use PerlIO::scalar".
              "  ( open(*DATA, '<:scalar', \\\$PAR::DATACache{'$key'}) ? () : () )";
          }
          elsif (eval {require IO::Scalar; 1}) {
              # This will first load IO::Scalar, *then* tie the handles.
              "use IO::Scalar".
              "  ( tie(*DATA, 'IO::Scalar', \\\$PAR::DATACache{'$key'}) ? () : () )";
          }
          else {
              # only dies when it's used
              "use PAR (tie(*DATA, 'PAR::_data') ? () : ())\n";
          }
          sub PAR::_data::TIEHANDLE { return bless({}, shift) }
          sub PAR::_data::AUTOLOAD { die "Please install IO::Scalar first!\n" }

Overriding DynaLoader::bootstrap

  • XS modules have dynamically loaded libraries (.so or .dll)

    • They cannot be loaded as part of a zip file, so we extract them out

    • But I don't want to make any temporary auto/ directories

    • So we have to intercept DynaLoader's library-finding process

  • Module names are passed to bootstrap for XS loading

    • During the process, it calls dl_findfile to locate the file

    • So we wrap around both functions:

          no strict 'refs'; no warnings 'redefine';
          $bootstrap   = \&DynaLoader::bootstrap;
          $dl_findfile = \&DynaLoader::dl_findfile;
          *{'DynaLoader::bootstrap'}   = \&_bootstrap;
          *{'DynaLoader::dl_findfile'} = \&_dl_findfile;
  • Our _bootstrap just checks if the library is in PARs

    • If yes, extract it to a File::Temp temp file

      • The file will be automatically cleaned up when the program ends

    • It then pass the arguments to the original $bootstrap

    • Finally, our _dl_findfile intercepts known filenames and return it

Anatomy of a Self-Contained PAR executable

  • The par script ($0) itself

    • May be in plain-text (par.pl)

    • Or native executable format (par or par.exe)

  • Any number of embedded files

    • Typically used for bootstrapping PAR's various XS dependencies

    • Each section begins with the magic string "FILE"

    • Length of filename in pack('N') format and the filename (auto/.../)

    • File length in pack('N') and the file's content(not compressed)

  • One PAR file

    • This is just a zip file as usual

    • Beginning with the magic string "PK\003\004"

  • Ending section

    • A pack('N') number of the total length of FILE and PAR sections

    • Finally, there must be a 8-bytes magic string: "\012PAR.pm\012"

Self-Bootstrapping Tricks

  • All we can expect is a working perl interpreter

    • The self-contained script *must not* use any modules at all

    • Not even strict.pm or DynaLoader.pm

    • But to process PAR files, we need XS modules like Compress::Zlib

    • A chicken-egg problem

  • Solution: bundle all module and object files needed by PAR.pm

    • That's what the FILE section in the previous slide is for

    • Load modules to memory, and write object files to disk

    • Then use a local @INC hook to load them on demand

  • We want to minimize the amount of temporary files

    • First, try getting PerlIO::scalar loaded

      • So everything else can be in-memory

    • Next, try getting File::Temp loaded for better tempfile()

    • Set up an END hook to unlink all temp files up to this point

    • Load all other bundled files

    • Finally we are able to look in the compressed PAR section

  • This can be so much easier if we have a pure-perl inflate()

    • Patches welcome!

SEE ALSO

http://www.autrijus.org/par-tutorial/

http://www.autrijus.org/par-intro/ (English version)

http://www.autrijus.org/par-intro.zh/ (Chinese version)

PAR, pp, par.pl, parl

ex::lib::zip, Acme::use::strict::with::pride

App::Packer, Apache::PAR, CPANPLUS, Module::Install

AUTHORS

Autrijus Tang <autrijus@autrijus.org>

http://par.perl.org/ is the official PAR website. You can write to the mailing list at <par@perl.org>, or send an empty mail to <par-subscribe@perl.org> to participate in the discussion.

Please submit bug reports to <bug-par@rt.cpan.org>.

COPYRIGHT

Copyright 2002, 2003 by Autrijus Tang <autrijus@autrijus.org>.

This document is free documentation; you can redistribute it and/or modify it under the same terms as Perl itself.

See http://www.perl.com/perl/misc/Artistic.html