The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

ModPerl2::Tools - a few hopefully useful tools

SYNOPSIS

 use ModPerl2::Tools;

 ModPerl2::Tools::spawn +{keep_fd=>[3,4,7], survive=>1}, sub {...};
 ModPerl2::Tools::spawn +{keep_fd=>[3,4,7], survive=>1}, qw/bash -c .../;

 ModPerl2::Tools::safe_die $status;
 $r->safe_die($status);
 $f->safe_die($status);

 $content=ModPerl2::Tools::fetch_url $url;
 $content=$r->fetch_url($url);

DESCRIPTION

This module is a collection of functions and methods that I found useful when working with mod_perl. I work mostly under Linux. So, I don't expect all of these functions to work on other operating systems.

Forking off long running processes

Sometimes one needs to spawn off a long running process as the result of a request. Under modperl this is not as simple as calling fork because that way all open file descriptors would be inherited by the child and, more subtle, the long running process would be killed when the administrator shuts down the web server. The former is usually considered a security issue, the latter a design decision.

There is already $r->spawn_proc_prog that serves a similar purpose as the spawn function. However, spawn_proc_prog is not usable for long running processes because it kills the children after a certain timeout.

Solution

 $pid=ModPerl2::Tools::spawn \%options, $subroutine, @parameters;

or

 $pid=ModPerl2::Tools::spawn \%options, @command_line;

spawn expects as the first parameter an options hash reference. The second parameter may be a code reference or a string.

In case of a code ref no other program is executed but the subroutine is called instead. The remaining parameters are passed to this function.

Note, the perl environment under modperl differs in certain ways from a normal perl environment. For example %ENV is not bound to the C-level environ. These modifications are not undone by this module. So, it's generally better to execute another perl interpreter instead of using the $subroutine feature.

The options parameter accepts these options:

keep_fd => \@fds

here an array of file descriptor numbers (not file handles) is expected. All other file descriptors except for the listed and file descriptor 2 (STDERR) are closed before calling $subroutine or executing @command_line.

survive => $boolean

if passed false the created process will be killed when Apache shuts down. if true it will survive an Apache restart.

The return code on success is the PID of the process. On failure undef or an empty string is returned.

The created process is not related as a child process to the current apache child.

Serving ErrorDocuments

Triggering ErrorDocuments from a registry script or even more from an output filter is not simple. The normal way as a handler is

  return Apache2::Const::STATUS;

This does not work for registry scripts. An output filter even if it returns a status can trigger only a SERVER_ERROR.

The main interface to enter standard error processing in Apache is ap_die() at C-level. Its Perl interface is hidden in Apache2::HookRun.

There is one case when an error message cannot be sent to the user. This happens if the HTTP headers are already on the wire. Then it is too late.

The various flavors of safe_die() take this into account.

safe_die won't return. Instead it calls ModPerl::Util::exit(0) which raises an exception.

ModPerl2::Tools::safe_die $status

This function is designed to be called from registry scripts. It uses Apache2::RequestUtil->request to fetch the current request object. So,

 PerlOption +GlobalRequest

must be enabled.

Usage example:

 ModPerl2::Tools::safe_die 401;
$r->safe_die($status)
$f->safe_die($status)

These 2 methods are to be used if a request object or a filter object are available.

Usage from within a filter:

 package My::Filter;
 use strict;
 use warnings;

 use ModPerl2::Tools;
 use base 'Apache2::Filter';

 sub handler : FilterRequestHandler {
   my ($f, $bb)=@_;
   $f->safe_die(410);
 }

The filter flavor removes the current filter from the request's output filter chain.

$r->headers_sent

This function checks if the HTTP_HEADER output filter is still present. If so, it returns an empty list, true otherwise.

The presence of this filter means no output has yet been written to the client. The HTTP status code and header fields can still be modified.

Fetching the content of another document

Sometimes a handler or a filter needs the content of another document in the web server's realm. Apache provides subrequests for this purpose.

The 2 fetch_url variants use a subrequest to fetch the content of another document. The document can even be fetched via mod_proxy from another server.

ModPerl2::Tools::fetch_url needs

 PerlOption +GlobalRequest

Usage:

 $content=ModPerl2::Tools::fetch_url '/some/where?else=42';

 $content=$r->fetch_url('/some/where?else=42');

 ($content, $headers)=
     $r->fetch_url('http://what.is/the/meaning/of?life=42');

If mod_proxy is available fetch_url can use it to fetch a document from another web server. If mod_ssl is configured to allow proxying SSL (see SSLProxyEngine) even the https scheme works. Another subtle point, ProxyErrorOverride may affect the output in case of an error.

Further, if fetch_url is passed a subroutine as the last argument the content is not accumulated in a single variable but passed brigade-wise to the function:

 ($content, $headers)=
     $r->fetch_url('http://what.is/the/meaning/of?life=42', sub {
         my ($subr, @brigade)=$_;
         ...
     });

The subroutine is called with the subrequest as the first parameter and a list of non-empty strings. The list itself may be empty if all buckets of the brigade do not contain data.

On success the resulting $content will be the empty string in this case.

fetch_url() normally strips almost all input HTTP header fields from the subrequest before running it. However, if the $r request object has a Host header field it is passed on. Also, a User-Agent header is set for the subrequest containing ModPerl2::Tools/$VERSION where $VERSION is the module's version.

If you need to pass more fields pass an array reference as the 2nd parameter to fetch_url():

 ($content, $headers)=
     $r->fetch_url('http://what.is/the/meaning/of?life=42', [qw/
         X-MyHeader my-value
         X-MyNextHeader my-next-value
     /]);

or even:

 ($content, $headers)=
     $r->fetch_url('http://what.is/the/meaning/of?life=42', [qw/
         X-MyHeader my-value
         X-MyNextHeader my-next-value
     /], sub {
         my ($subr, @brigade)=$_;
         ...
     });

If Host or User-Agent headers are passed this way they overwrite the default ones.

Note, though, the header fields are assigned to the subrequest just before the response handler is run. Earlier phases will see a copy of the main request's headers.

How does it work?

If the passed URL starts with https:// or http:// a subrequest for the URI / is initiated via $r->lookup_uri('/'). Before the subrequest is run it is changed into a proxy request for the passed URL. One precondition for this to work is that there are no access restrictions for the URL /.

Otherwise it is simply a subrequest for the passed URL.

Then ModPerl2::Tools::Filter::fetch_content_filter is installed as output filter for the subrequest. After that the subrequest is run.

The output filter collects all output.

When the request is done its $r->headers_out is copied into a normal hash and in list context the output string and this hash are returned. In scalar context only the string is returned.

HTTP header names are case insensitive. Their names are all converted to lower case in the $headers hash. There are 2 hash members in upper case. STATUS contains the HTTP status code of the subrequest and STATUSLINE the status line.

Useful functions for similar cases

Note, it is always better to process data one chunk at a time. Try hard to do that! Collecting data in memory should only be a last resort.

ModPerl2::Tools::Filter::read_bb $bucket_brigade, \@buffer

read_bb collects the data of a bucket brigade in the @buffer array. If an EOS bucket has been seen it returns true otherwise false.

A simple output filter that collects all data could look like:

 sub filter {
     my ($f, $bb)=@_;

     my @buffer;
     do_something(join '', @buffer)
         if ModPerl2::Tools::Filter::read_bb $bb, \@buffer;

     return Apache2::Const::OK;
 }
ModPerl2::Tools::Filter::fetch_content_filter

This function is a FilterRequestHandler. Is is controlled by 2 elements of $r->pnotes, out and force_fetch_content. out must be an array reference. It is passed to read_bb to collect the output. force_fetch_content is a flag. If false the filter does nothing and removes itself if the $r->status on the first invocation of the filter is not HTTP_OK.

Usage:

 my $subr=$r->lookup_uri(...);

 my $output=[];
 @{$subr->pnotes}{qw/out force_fetch_content/}=($output,1);
 $subr->add_output_filter
     (\&ModPerl2::Tools::Filter::fetch_content_filter);
 $subr->run;

 do_something(join '', @$output)

EXPORTS

None.

TODO

Look at APR to see what it provides to make things easier. For example apr_proc_create()

SEE ALSO

http://perl.apache.org

AUTHOR

Torsten Förtsch, <torsten.foertsch@gmx.net>

COPYRIGHT AND LICENSE

Copyright (C) 2010 by Torsten Förtsch

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.