The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

mod_perl - Embed a Perl interpreter in the Apache server

DESCRIPTION

The Apache/Perl integration project brings together the full power of the Perl programming language and the Apache HTTP server. This is achieved by linking the Perl runtime library into the server and providing an object oriented Perl interface to the server's C language API. These pieces are seamlessly glued together by the `mod_perl' server plugin, making it is possible to write Apache modules entirely in Perl. In addition, the persistent interpreter embedded in the server avoids the overhead of starting an external interpreter and the penalty of Perl start-up (compile) time.

Without question, the most popular Apache/Perl module is Apache::Registry module. This module emulates the CGI environment, allowing programmers to write scripts that run under CGI or mod_perl without change. Existing CGI scripts may require some changes, simply because a CGI script has a very short lifetime of one HTTP request, allowing you to get away with "quick and dirty" scripting. Using mod_perl and Apache::Registry requires you to be more careful, but it also gives new meaning to the work "quick"! Apache::Registry maintains a cache of compiled scripts, which happens the first time a script is accessed by a child server or once again if the file is updated on disk.

Although it may be all you need, a speedy CGI replacement is only a small part of this project. Callback hooks are in place for each stage of a request. Apache-Perl modules may step in during the handler, header parser, uri translate, authentication, authorization, access, type check, fixup and logger stages of a request.

FAQ

Patrick Kane <modus@enews.com> maintains the mod_perl FAQ available at: http://chaos.dc.enews.com/mod_perl/

Apache/Perl API

See 'perldoc Apache' for info on how to use the Perl-Apache API.

See the lib/ directory for example modules and apache-modlist.html for a comprehensive list.

See the eg/ directory for example scripts.

mod_perl

For using mod_perl as a CGI replacement, the recommended configuration is as follows:

 Alias /perl/  /real/path/to/perl-scripts/

 <Location /perl>
 SetHandler  perl-script
 PerlHandler Apache::Registry
 Options ExecCGI
 </Location>

Now, any file accessed under /perl will be handled by mod_perl and the Apache::Registry module. The file must exist and be executable, in addition, 'Options ExecCGI' must be turned on. See the Apache::Registry module for details.

By default, mod_perl does not send any headers by itself, however, you may wish to change this:

    PerlSendHeader On   

With the recommended configuration, these options and Perl version 5.003_93 or higher (or 5.003_xx version with sfio), scripts running under Apache::Registry will look just like "normal" CGI scripts. See eg/perlio.pl as an example.

You may load additional modules via:

    PerlModule Apache::SSI SomeOther::Module

There is a limit of 10 PerlModule's, if you need more to be loaded when the server starts, use one PerlModule to pull in many or use the PerlScript directive described below.

Optionally:

    PerlScript  /full/path/to/script_to_load_at_startup.pl

This script will be loaded when the server starts. See eg/startup.pl for an example to start with.

In an access.conf <Directory /foo> or .htaccess you need:

    PerlHandler sub_routine_name

This is the name of the subroutine to call to handle each request. e.g. in the PerlModule Apache::Registry this is "Apache::Registry::handler".

If PerlHandler is not a defined subroutine, mod_perl assumes it is a package name which defines a subroutine named "handler".

    PerlHandler   Apache::Registry

Would load Registry.pm (if it is not already) and call it's subroutine "handler".

There are several stages of a request where the Apache API allows a module to step in and do something. The Apache documentation will tell you all about those stages and what your modules can do. By default, these hooks are disabled at compile time, see the INSTALL document for information on enabling these hooks. The following configuration directives take one argument, which is the name of the subroutine to call. If the value is not a subroutine name, mod_perl assumes it is a class name which implements a 'handler' subroutine.

    PerlInitHandler
    PerlTransHandler    
    PerlAuthenHandler
    PerlAuthzHandler
    PerlAccessHandler
    PerlTypeHandler
    PerlFixupHandler
    PerlLogHandler
    PerlHeaderParser (requires apache1.2b5 or higher)
    PerlCleanupHandler

I/O

Apache's i/o is not stream oriented. So, unless you have perl version 5.003_93 or higher, by default, you cannot print() to STDOUT from your script, use $r->print() instead. Nor can you read() from STDIN, use $r->read() or the $r->content methods to read POST data. In post 5.003 versions of Perl, two mechanisms have been introduced which allows redirecting the STDIN and STDOUT streams.

One mechanism takes advantage of the PerlIO abstraction and sfio discipline structures, such that STDIN and STDOUT are hooked up to the client by default if you configured perl with -Dusesfio (see Perl's INSTALL doc).

Otherwise, mod_perl will tie() STDOUT and STDIN to the client. In order for this to work, you must have Perl version 5.003_93 or higher.

Using CGI.pm and CGI::*

CGI.pm users must have version 2.32 of the package or higher, earlier versions will not work under mod_perl. If you have Perl version 5.003_93 or higher (or _xx+ version w/ sfio) , scripts may 'use CGI'. Otherwise, scripts need to 'use CGI::Switch' so i/o goes through Apache-> methods, this will also work with later versions of Perl.

The CGI::* modules (CGI::Request etal.) can be used untouched if your Perl is configured to use sfio and the following directive is present in the directory configuration:

    PerlSendHeader On   

If you use the SendHeaders() function, be sure to call $req_obj->cgi->done when you are done with a request, just as you would under CGI::MiniSrv.

MEMORY CONSUMPTION

No matter what, your httpd will be larger than normal to start, simply because you've linked with perl's runtime.

Here's I'm just running

 % /usr/bin/perl -e '1 while 1'

   PID USERNAME PRI NICE   SIZE   RES STATE   TIME   WCPU    CPU COMMAND
 10214 dougm     67    0   668K  212K run     0:04 71.55% 21.13% perl

Now with a few random modules:

 % /usr/bin/perl -MDBI -MDBD::mSQL -MLWP::UserAgent -MFileHandle -MIO -MPOSIX -e '1 while 1'

 10545 dougm     49    0  3732K 3340K run     0:05 54.59% 21.48% perl

Here's my httpd linked with libperl.a, not having served a single request:

 10386 dougm      5    0  1032K  324K sleep   0:00  0.12%  0.11% httpd-a

You can reduce this if you configure perl 5.003_xx+ with -Duseshrplib. Here's my httpd linked with libperl.sl, not having served a single request:

 10393 dougm      5    0   476K  368K sleep   0:00  0.12%  0.10% httpd-s

Now, once the server starts receiving requests, the embedded interpreter will compile code for each 'require' file it has not seen yet, each new Apache::Registry subroutine that's compiled, along with whatever modules it's use'ing or require'ing. Not to mention AUTOLOADing. (Modules that you 'use' will be compiled when the server starts unless they are inside an eval block.) httpd will grow just as big as our /usr/bin/perl would, or a CGI process for that matter, it all depends on your setup.

Newer Perl versions also have other options to reduce runtime memory consumption. See Perl's INSTALL file for details on -DPACK_MALLOC and -DTWO_POT_OPTIMIZE. With these options, my httpd shrinks down ~150K.

For me, once everything is compiled, the processes no longer grow, I can live with the size at that point. For others, this size might be too big, or they might be using a module that leaks or have code of their own that leaks, in any case using the apache configuration directive 'MaxRequestsPerChild' is your best bet to keep the size down, but at the same time, you'll be slowing things down when Apache::Registry scripts have to recompile. Tradeoffs...

SWITCHES

Normally when you run perl from the command line or have the shell invoke it with `#!', you may choose to pass perl switch arguments such as -w or -T. Since the command line is only parsed once, when the server starts, these switches are unavailable to mod_perl scripts. However, most command line arguments have a perl special variable equivilant. For example, the $^W variable coresponds to the -w switch. Consult perlvar for more details. The switch which enables taint checks does not have a special variable, so mod_perl provides the PerlTaintCheck directive to turn on taint checks. In httpd.conf, enable with:

 PerlTaintCheck On

Now, any and all code compiled inside httpd will be checked.

PERSISTENT DATABASE CONNECTIONS

Another popular use of mod_perl is to take advantage of it's persistance to maintain open database connections. The basic idea goes like so:

 #Apache::Registry script
 use strict;
 use vars qw($dbh);

 $dbh ||= SomeDbPackage->connect(...);

Since $dbh is a global variable, it will not go out of scope, keeping the connection open for the lifetime of a server process, establishing it during the script's first request for that process.

It's recommended that you use one of the Apache::* database connection wrappers. Currently for DBI users there is Apache::DBI and for Sybase users Apache::Sybase::DBlib. These modules hide the peculiar code example above. In addition, different scripts may share a connection, minimizing resource consumption. Example:

 use strict;
 my $dbh = Apache::DBI->connect(...);

Although $dbh shown here will go out of scope when the script ends, the Apache::DBI module's reference to it does not, keep the connection open.

STACKED HANDLERS

With the mod_perl stacked handlers mechanism, it is possible for more than one Perl*Handler to be defined and run during each stage of a request.

Perl*Handler directives can define any number of subroutines, e.g. (in config files)

 PerlTransHandler OneTrans TwoTrans RedTrans BlueTrans

With the method, Apache->push_handlers, callbacks can be added to the stack by scripts at runtime by mod_perl scripts.

Apache->push_handlers takes the callback hook name as it's first argument and a subroutine name or reference as it's second. e.g.:

 Apache->push_handlers("PerlLogHandler", \&first_one);

 $r->push_handlers("PerlLogHandler", sub {
     print STDERR "__ANON__ called\n";
     return 0;
 });

After each request, this stack is cleared out.

All handlers will be called unless a handler returns a status other than OK or DECLINED, this needs to be considered more. Post apache-1.2 will have a DONE return code to signal termiation of a stage, which Rob and I came up with while back when first discussing the idea of stacked handlers. 2.0 won't come for quite sometime, so mod_perl will most likely handle this before then.

example uses:

CGI.pm maintains a global object for it's plain function interface. Since the object is global, it does not go out of scope, DESTROY is never called. CGI->new can call:

 Apache->push_handlers("PerlCleanupHandler", \&CGI::_reset_globals);

This function will be called during the final stage of a request, refreshing CGI.pm's globals before the next request comes in.

Apache::DCELogin establishes a DCE login context which must exist for the lifetime of a request, so the DCE::Login object is stored in a global variable. Without stacked handlers, users must set

 PerlCleanupHandler Apache::DCELogin::purge

in the configuration files to destroy the context. This is not "user-friendly". Now, Apache::DCELogin::handler can call:

 Apache->push_handlers("PerlCleanupHandler", \&purge);

Persistent database connection modules such as Apache::DBI could push a PerlCleanupHandler handler that iterates over %Connected, refreshing connections or just checking that ones have not gone stale. Remember, by the time we get to PerlCleanupHandler, the client has what it wants and has gone away, we can spend as much time as we want here without slowing down response time to the client.

PerlTransHandlers may decide, based or uri or other condition, whether or not to handle a request, e.g. Apache::MsqlProxy. Without stacked handlers, users must configure:

 PerlTransHandler Apache::MsqlProxy::translate
 PerlHandler      Apache::MsqlProxy

PerlHandler is never actually invoked unless translate() sees the request is a proxy request ($r->proxyreq), if it is a proxy request, translate() set $r->handler("perl-script"), only then will PerlHandler handle the request. Now, users do not have to specify 'PerlHandler Apache::MsqlProxy', the translate() function can set it with push_handlers().

Includes, footers, headers, etc., piecing together a document, imagine (no need for SSI parsing!):

 PerlHandler My::Header Some::Body A::Footer

This was my first test:

 #My.pm
 package My;

 sub header {
     my $r = shift;
     $r->content_type("text/plain");
     $r->send_http_header;
     $r->print("header text\n");
 }
 sub body   { shift->print("body text\n")   }
 sub footer { shift->print("footer text\n") }
 1;
 __END__ 
 #in config
 <Location /foo>
 SetHandler "perl-script"
 PerlHandler My::header My::body My::footer
 </Location>

Parsing the output of another PerlHandler? this is a little more tricky, but consider:

 <Location /foo>
   SetHandler "perl-script"
   PerlHandler OutputParser SomeApp 
 </Location>
 <Location /bar>
   SetHandler "perl-script"
   PerlHandler OutputParser AnotherApp
 </Location>

Now, OutputParser goes first, but it untie's *STDOUT and re-tie's to it's own package like so:

 package OutputParser;

 sub handler {
     my $r = shift; 
     untie *STDOUT;     
     tie *STDOUT => 'OutputParser', $r;
 }

 sub TIEHANDLE {
     my($class, $r) = @_;
     bless { r => $r}, $class;
 }

 sub PRINT {
     my $self = shift;
     for (@_) {
         #do whatever you want to $_
         $self->{r}->print($_ . "[insert stuff]");
     }
 }

 1;
 __END__

tie of *STDOUT has worked since perl5.003_02 or so, no matter if sfio is configured.

To build in this feature, configure with:

 % perl Makefile.PL PERL_STACKED_HANDLERS=1 [PERL_FOO_HOOK=1,etc]

Another method 'Apache->can_stack_handlers' will return TRUE if mod_perl was configured with PERL_STACKED_HANDLERS=1, FALSE otherwise.

PERL METHOD HANDLERS

If a Perl*Handler is prototyped with '$$', this handler will be invoked as method. e.g.

 package My;
 @ISA = qw(BaseClass);

 sub handler ($$) {
     my($class, $r) = @_;
     ...;
 }

 package BaseClass;

 sub method ($$) {
     my($class, $r) = @_;
     ...;
 }

 __END__

Configuration:

 PerlHandler My

or

 PerlHandler My->handler

Since the handler is invoked as a method, it may inherit from other classes:

 PerlHandler My->method

In this case, the 'My' package inherits this method from 'BaseClass'.

To build in this feature, configure with:

 % perl Makefile.PL PERL_METHOD_HANDLERS=1 [PERL_FOO_HOOK=1,etc]

WARNINGS

Your scripts *will not* run from the command line (yet) unless you use CGI::Switch and no direct calls to Apache->methods.

SUPPORT

For comments, questions, bug-reports, announcements, etc., send mail to majordomo@listproc.itribe.net with the string "subscribe modperl" in the body.

AUTHOR

Doug MacEachern <dougm@osf.org>

2 POD Errors

The following errors were encountered while parsing the POD:

Around line 260:

'=item' outside of any '=over'

Around line 412:

You forgot a '=back' before '=head1'