NAME
PAR::Intro - Introduction to Perl Archive Toolkit
SYNOPSIS
# This is a presentation, not a module.
DESCRIPTION
The following presentation is a POD version of the presentation Introduction to Perl Archive Toolkit, available online as http://www.autrijus.org/par-intro/ (English version) and http://www.autrijus.org/par-intro.zh/ (Chinese version).
What is PAR (Perl Archive Toolkit)?
- Do what JAR (Java Archive) does for Perl - Platform-independent, compressed file format (zip) - Aggregates modules, scripts and other files into one file - Easy to generate, update and extract
- Benefits of using PAR: - Decreased download and deployment time - Saves disk space by compression and selective packaging - Version consistency: solves forward-compatibility problems - Community support: par@perl.org
- You can also turn a PAR file into a self-contained script - Bundles all necessary 3rd-party modules with it - Requires only core Perl to run on the target machine - If you use pp
to compile the script... - ...you get an executable not even needing core perl
Getting Started
- First, generate a PAR file with modules in it:
% zip foo.par Hello.pm
% zip -r foo.par lib/ # grab all modules in lib/
- Using modules stored inside a PAR file:
% perl -MPAR=./foo.par -MHello
% perl -MPAR=./foo -MHello # the .par part is optional
- Or put it in @INC and use it just like a directory:
% perl -MPAR -Ifoo.par -MHello
% perl -MPAR -Ifoo -MHello # ditto
Command-line Tools
- Use pp
to scan scripts and store dependencies as a PAR file:
% pp -p source.pl # makes 'source.par'
% pp -B -p source.pl # bundles core modules too
- Use par.pl
to run files from a Perl Archive:
% par.pl foo.par # looks for 'main.pl' by default
% par.pl foo.par test.pl # runs script/test.pl in foo.par
- Use parl
or parl.exe
to run files from a Perl Archive:
% parl foo.par
% parl foo.par test.pl
Making Binary Executables
- The pp
utility can also generate binary executables:
% pp -o packed.exe source.pl # self-contained .exe
% packed.exe # runs anywhere with the same OS
- You can also bundle additional modules:
# packs CGI + its dependencies, too
% pp -o packed.exe -M CGI source.pl
- Or pack one-liners:
# turns one-liner into executable
% pp -o packed.exe -e 'print "Hi!"'
- Some notes: - The command-line options of pp
are almost identical to perlcc
's - Modules are read directly from the PAR file, not extracted - Shared object files (aka dll) are extracted with File::Temp - Tested on Win32, FreeBSD, Linux, AIX, Solaris and Darwin. - Unfortunately, Cygwin is currently known to fail.
The Anatomy of a PAR file
- Modules can reside in different directories in a PAR file:
/lib/ # standard location
/arch/ # for creating from blib/
/i386-freebsd/ # i.e. $Config{archname}
/5.8.0/ # i.e. Perl version number
/5.8.0/i386-freebsd/ # combination of the two above
/ # casual packaging only
- Scripts are stored in one of the two locations:
/script/ # standard location
/ # casual packaging only
- Special files:
/MANIFEST # index of the PAR's contents
/SIGNATURE # digital signature(s)
/META.yml # dependency, license info, etc.
/Build.PL # self-contained installer
- Programs can use PAR::read_file($filename)
to read file contents inside PAR
Derived Modules
- Apache::PAR - Nathan Byrd's attempt to make self-contained Perl Handlers - Same as the WAR files for Java Servlets - Includes PerlRun and Registry handlers
- App::Packer::Backend::PAR - Support module of Mattia Barbon's App::Packer suite - Makes it easy to pick-and-choose dependency scanners and packers - Fine-tuned distribution and packaging controls
- CPANPLUS::Dist::PAR - Cross-platform PPM: Auto-generate PAR out of CPAN distributions - Use the bundled Build.PL to install PAR modules into system
Apache::PAR Demo
- In httpd.conf
:
<VirtualHost *>
<IfDefine MODPERL2>
PerlModule Apache::ServerUtil
</IfDefine>
PerlModule Apache::PAR
PARDir /opt/myapp
PARFile /opt/myapp/myapp.par
</VirtualHost>
- In web.conf
inside myapp.par
:
Alias /myapp/static/ ##PARFILE##/
<Location /myapp/static>
SetHandler perl-script
PerlHandler Apache::PAR::Static
PerlAddVar PARStaticDirectoryIndex index.html
PerlSetVar PARStaticDefaultMIME text/html
</Location>
Alias /myapp/cgi-perl/ ##PARFILE##/
<Location /myapp/cgi-perl>
Options +ExecCGI
SetHandler perl-script
PerlHandler Apache::PAR::Registry
</Location>
Future Development
- Polish pp
's features - Handles corner dependency cases for LWP, Tk, DBI... - Optional encryption support (but *not* obscuring) - Become a worthy competitor to PerlApp and Perl2Exe
- Learning from JAR - Making par.pl's command line interface in sync with jar's - Digital signatures for PAR packages using Module::Signature - File layout compatibility?
- Learning from FreeBSD Bento - Smoke test and make PAR automatically for each CPAN upload - Provide binary packages for users without a compiler
Overview of PAR.pm's Implementation
- Here begins the scary part - Grues, Dragons and Jabberwocks abound... - You are going to learn unpleasant things about Perl internals - Go home now if you have heart condition or digest problems
- PAR invokes five areas of Perl arcana: - @INC code references - On-the-fly source filtering - Faking <DATA> filehandle with PerlIO::scalar and IO::Scalar - Overriding DynaLoader::bootstrap to handle XS modules - Making self-bootstrapping binary executables
- The first two only works on 5.6 or later - PerlIO::scalar is 5.8-specific; IO::scalar only needs 5.005 - DynaLoader and %INC are there since Perl 5 was born - PAR currently needs 5.6, but a 5.005 port is possible
Code References in @INC
- On 1999-07-19, Ken Fox submitted a patch to P5P - To "enable using remote modules" by putting hooks in @INC - It's accepted to come in Perl 5.6, but only get documented by 5.8 - Type 'perldoc -f require' to read the nitty-gritty details
- Code references in @INC may return a filehandle, or undef to 'pass':
push @INC, \&my_sub;
sub my_sub {
my ($coderef, $filename) = @_; # $coderef is \&my_sub
open my $fh, "wget http://example.com/$filename |";
return $fh; # using remote modules, indeed!
}
- Perl 5.8 let you open a file handle to a string, so we just use that:
open my $fh, '<', \($zip->memberNamed($filename)->contents);
return $fh;
- But Perl 5.6 does not have that, and I don't want to use temp files...
Source Filtering without Filter::* Modules
- ... Undocumented features to the rescue! - It turns out that @INC hooks can return *two* values - The first is still the file handle - The second is a code reference for line-by-line source filtering!
- This is how Acme::use::strict::with::pride
works:
# Force all modules used to use strict and warnings
open my $fh, "<", $filename or return;
my @lines = ("use strict; use warnings;\n", "#line 1 \"$full\"\n");
return ($fh, sub {
return 0 unless @lines;
push @lines, $_; $_ = shift @lines; return length $_;
});
- But we don't really have a filehandle for anything - Another undocumented feature to the rescue - We can actually omit the first return value altogether:
# Return all contents line-by-line from the file inside PAR
my @lines = split /(?<=\n)/, $zip->memberNamed($filename)->contents;
return (sub { $_ = shift(@lines); return length $_ });
Faking the <DATA> Handle
- The @INC filter stops when it sees __END__
or __DATA__
- All contents below are lost - Breaks modules that read from the <DATA> filehandle - The same problem appears when we eval
the main.pl script
- Therefore, we insert a line before the final token to fake *DATA - It has to be the final line to belong to the correct package - It has to happen in compile time but not inside a BEGIN block - Here is what I came up with (but no longer needed in recent versions):
$DATACache{$file} = $1 if ($program =~ s/^__DATA__\n?(.*)//ms);
if (eval {require PerlIO::scalar; 1}) {
"use PerlIO::scalar".
" ( open(*DATA, '<:scalar', \\\$PAR::DATACache{'$key'}) ? () : () )";
}
elsif (eval {require IO::Scalar; 1}) {
# This will first load IO::Scalar, *then* tie the handles.
"use IO::Scalar".
" ( tie(*DATA, 'IO::Scalar', \\\$PAR::DATACache{'$key'}) ? () : () )";
}
else {
# only dies when it's used
"use PAR (tie(*DATA, 'PAR::_data') ? () : ())\n";
}
sub PAR::_data::TIEHANDLE { return bless({}, shift) }
sub PAR::_data::AUTOLOAD { die "Please install IO::Scalar first!\n" }
Overriding DynaLoader::bootstrap
- XS modules have dynamically loaded libraries (.so
or .dll
) - They cannot be loaded as part of a zip file, so we extract them out - But I don't want to make any temporary auto/
directories - So we have to intercept DynaLoader's library-finding process
- Module names are passed to bootstrap
for XS loading - During the process, it calls dl_findfile
to locate the file - So we wrap around both functions:
no strict 'refs'; no warnings 'redefine';
$bootstrap = \&DynaLoader::bootstrap;
$dl_findfile = \&DynaLoader::dl_findfile;
*{'DynaLoader::bootstrap'} = \&_bootstrap;
*{'DynaLoader::dl_findfile'} = \&_dl_findfile;
- Our _bootstrap
just checks if the library is in PARs - If yes, extract it to a File::Temp temp file - The file will be automatically cleaned up when the program ends - It then pass the arguments to the original $bootstrap
- Finally, our _dl_findfile
intercepts known filenames and return it
Anatomy of a Self-Contained PAR executable
- The par script ($0) itself - May be in plain-text (par.pl) - Or native executable format (par or par.exe)
- Any number of embedded files - Typically used for bootstrapping PAR's various XS dependencies - Each section begins with the magic string "FILE" - Length of filename in pack('N') format and the filename (auto/.../) - File length in pack('N') and the file's content(not compressed)
- One PAR file - This is just a zip file as usual - Beginning with the magic string "PK\003\004"
- Ending section - A pack('N') number of the total length of FILE and PAR sections - Finally, there must be a 8-bytes magic string: "\012PAR.pm\012"
Self-Bootstrapping Tricks
- All we can expect is a working perl interpreter - The self-contained script *must not* use any modules at all - Not even strict.pm or DynaLoader.pm - But to process PAR files, we need XS modules like Compress::Zlib - A chicken-egg problem
- Solution: bundle all module and object files needed by PAR.pm - That's what the FILE
section in the previous slide is for - Load modules to memory, and write object files to disk - Then use a local @INC hook to load them on demand
- We want to minimize the amount of temporary files - First, try getting PerlIO::scalar loaded - So everything else can be in-memory - Next, try getting File::Temp loaded for better tempfile()
- Set up an END hook to unlink all temp files up to this point - Load all other bundled files - Finally we are able to look in the compressed PAR section
- This can be so much easier if we have a pure-perl inflate()
- Patches welcome!
SEE ALSO
ex::lib::zip, Acme::use::strict::with::pride
App::Packer, Apache::PAR, CPANPLUS, Module::Install
AUTHORS
Autrijus Tang <autrijus@autrijus.org>
PAR has a mailing list, <par@perl.org>, that you can write to; send an empty mail to <par-subscribe@perl.org> to join the list and participate in the discussion.
Please send bug reports to <bug-par@rt.cpan.org>.
COPYRIGHT
Copyright 2002, 2003 by Autrijus Tang <autrijus@autrijus.org>.
This document is free documentation; you can redistribute it and/or modify it under the same terms as Perl itself.