The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Using Perl*Handlers

Description

This chapter discusses mod_perl's Perl*Handlers and presents examples of their use.

mod_perl Handlers

Apache distinguish between numerous phases for which it provides hooks (because the C functions are called ap_hook_<phase_name>) where modules can plug various callbacks to extend and alter the default behaviour of the webserver. mod_perl provides a Perl interface for most of the available hooks, so mod_perl modules writers can change the Apache behavior in Perl. These callbacks are usually referred to as handlers and therefore the configuration directives for the mod_perl handlers look like: PerlFooHandler, where Foo is one of the handler names. For example PerlResponseHandler configures the response callback.

A typical handler is simply a perl package with a handler subroutine. For example:

  package MyApache::CurrentTime;
  
  use strict;
  use warnings;
  
  use Apache::RequestRec ();
  use Apache::RequestIO ();
  
  use Apache::Const -compile => qw(OK);
  
  sub handler {
      my $r = shift;
  
      $r->content_type('text/plain');
      $r->print("Now is: " . scalar(localtime) . "\n");
  
      return Apache::OK;
  }
  1;

This handler simply returns the current date and time as a response.

Since this is a response handler, we configure it as a such in httpd.conf:

  PerlResponseHandler MyApache::CurrentTime

Since the response handler should be configured for a specific location, let's write a complete configuration section:

  PerlModule MyApache::CurrentTime
  <Location /time>
      SetHandler modperl
      PerlResponseHandler MyApache::CurrentTime
  </Location>

Now when a request is issued to /time this response handler will be executed.

Server Life Cycle

The following diagram depicts the Apache 2.0 server life cycle and highlights which handlers are available to mod_perl 2.0:

server life cycle

Apache 2.0 starts by parsing the configuration file. After the configuration file is parsed, the PerlOpenLogsHandler handlers are executed if any. After that it's a turn of PerlPostConfigHandler handlers to be run. When the post_config phase is finished the server immediately restarts, to make sure that it can survive graceful restarts after starting to serve the clients.

When the restart is completed, Apache 2.0 spawns the workers that will do the actual work. Depending on the used MPM, these can be threads, processes and a mixture of both. For example the worker MPM spawns a number of processes, each running a number of threads. When each child process is started PerlChildInit handlers are executed. Notice that they are run for each starting process, not a thread.

From that moment on each working thread processes connections until it's killed by the server or the server is shutdown.

Now let's discuss each of the mentioned startup handlers in detail.

PerlOpenLogsHandler

The open_logs phase happens just before the post_config phase.

Handlers registered by PerlOpenLogsHandler are usually used for opening module-specific log files.

At this stage the STDERR stream is not yet redirected to error_log, and therefore any messages to that stream will be printed to the console the server is starting from (if such exists).

This phase is of type RUN_ALL.

The handler's configuration scope is SRV.

For example here is the MyApache::OpenLogs handler that opens a custom log file:

  file:MyApache/OpenLogs.pm
  -----------------------
  package MyApache::OpenLogs;
  
  use strict;
  use warnings;
  
  use File::Spec::Functions;
  
  my $log_file = catfile "logs", "mylog";
  
  sub handler {
      my ($conf_pool, $log_pool, $temp_pool, $s) = @_;
  
      my $log_path = Apache::server_root_relative($conf_pool, $log_file);
      $s->warn("opening the log file: $log_path");
      open my $log, ">>$log_path" or die "can't open $log_path: $!";
  
      return Apache::OK;
  }
  1;

The open_logs phase handlers accept four arguments: the configuration pool, the logging streams pool, the temporary pool and the server object. In our example the handler uses the function Apache::server_root_relative() to set the full path to the log file, which is then opened. Of course in the real world handlers the module needs to be extended to provide an accessor that can write to this log file.

To configure this handler add to httpd.conf:

  PerlOpenLogsHandler MyApache::OpenLogs

PerlPostConfigHandler

The post_config phase happens right after Apache has processed the configuration files, before any child processes were spawned (which happens at the child_init phase).

This phase can be used for initializing things to be shared between all child processes. You can do the same in the startup file, but in the post_config phase you have an access to a complete configuration tree.

This phase is of type RUN_ALL.

The handler's configuration scope is SRV.

Example:

PerlChildInitHandler

META: PerlChildExitHandler?

The child_init phase happens immediately after the child process is spawned. Each child process will run the hooks of this phase only once in their life-time.

In the prefork MPM this phase is useful for pre-opening database connections (similar to Apache::DBI in mod_perl 1.0).

This phase is of type VOID.

The handler's configuration scope is SRV.

Example:

Bucket Brigades

Apache 2.0 allows multiple modules to filter both the request and the response. Now one module can pipe its output as an input to another module as if another module was receiving the data directly from the TCP stream. The same mechanism works with the generated response.

With I/O filtering in place, simple filters, like data compression and decompression, can be easily implemented and complex filters, like SSL, are now possible without needing to modify the the server code which was the case with Apache 1.3.

In order to make the filtering mechanism efficient and avoid unnecessary copying, the Bucket Brigades technology was introduced.

A bucket represents a chunk of data. Buckets linked together comprise a brigade. Each bucket in a brigade can be modified, removed and replaced with another bucket. The goal is to minimize the data copying where possible. Buckets come in different types, such as files, data blocks, end of stream indicators, pools, etc. To manipulate a bucket one doesn't need to know its internal representation.

The stream of data is represented by bucket bridades. When a filter is called it gets passed the brigade that was the output of the previous filter. This brigade is then manipulated by the filter (e.g., by modifying some buckets) and passed to the next filter in the stack.

The following diagram depicts an imaginary bucket brigade:

bucket brigades

The diagram tries to show that after the presented bucket brigade has passed through several filters some buckets where removed, some modified and some added. Of course the handler that gets the brigade cannot tell the history of the brigade, it can only see the existing buckets in the brigade.

We will discuss this topic in more details when we will talk about connection protocol and filter implementations.

Connection Cycle Phases

As we saw earlier, each child server (be it a thread or a process) is engaged in processing connections. Each connection may be served by different connection protocols, e.g., HTTP, POP3, SMTP, etc. Each connection may include more then one request, e.g., several HTTP requests can be served over a single connection, when a response includes several images.

The following diagram depicts the connection life cycle and highlights which handlers are available to mod_perl 2.0:

connection cycle

When a connection is issued by a client, it's first run through PerlPreConnectionHandler and then passed to the PerlProcessConnectionHandler, which generates the response. When PerlProcessConnectionHandler is reading data from the client, it can be filtered by connection input filters. The generated response can be also filtered though connection output filters. Filter are usually used for modifying the data flowing though them, but can be used for other purposes as well (e.g., logging interesting information).

Now let's discuss each of the mentioned handlers in detail.

PerlPreConnectionHandler

The pre_connection phase happens just after the server accepts the connection, but before it is handed off to a protocol module to be served. It gives modules an opportunity to modify the connection as soon as possible. The core server uses this phase to setup the connection record based on the type of connection that is being used. mod_perl itself uses this phase to register the connection input and output filters.

In mod_perl 1.0 during code development Apache::Reload was used to automatically reload modified since the last request Perl modules. It was invoked during post_read_request, the first HTTP request's phase. In mod_perl 2.0 pre_connection is the earliest phase, so if we want to make sure that all modified Perl modules are reloaded for any protocols and its phases, it's the best to set the scope of the Perl interpreter to the lifetime of the connection and invoke the Apache::Reload handler during the pre_connection phase. However this development-time advantage can become a disadvantage in production--for example if a connection, handled by HTTP protocol, is configured as KeepAlive and there are several requests coming on the same connection and only one handled by mod_perl and the others by the default images handler, the Perl interpreter won't be available to other threads while the images are being served.

This phase is of type RUN_ALL.

The handler's configuration scope is SRV, because it's not known yet which resource the request will be mapped to.

XXX: As of this moment PerlPreConnectionHandler is not being executed by mod_perl. Stay tuned.

Example:

A pre_connection handler accepts connection record and socket objects as its arguments:

  sub handler {
      my ($c, $socket) = @_;
      # ...
      return Apache::OK;
  }

PerlProcessConnectionHandler

The process_connection phase is used to actually process the connection that was received. Only protocol modules should assign handlers for this phase, as it gives them an opportunity to replace the standard HTTP processing with processing for some other protocols (e.g., POP3, FTP, etc.).

This phase is of type RUN_FIRST.

The handler's configuration scope is SRV. Therefore the only way to run protocol servers different than the core HTTP is inside dedicated virtual hosts.

Example:

META: echo example comes here

A process_connection handler accepts a connection record object as its only argument, a socket object can be retrieved from the connection record object.

  sub handler {
      my ($c) = @_;
      my $socket = $c->client_socket;
      # ...
      return Apache::OK;
  }

META: the echo example doesn't work with filter, because it reads and writes directly from/to the socket. Here comes the echo_filter example. But may be echo is not so good, use something like eliza/'lc' to show the retrieval of the data, here is some eliza protocol code plus an output lc filter.

  package Apache::Eliza2;
  
  use strict;
  use warnings FATAL => 'all';
  
  use Apache::Connection ();
  use APR::Bucket ();
  use APR::Brigade ();
  use APR::Util ();
  
  require Chatbot::Eliza;
  
  use APR::Const -compile => qw(SUCCESS EOF);
  use Apache::Const -compile => qw(OK MODE_GETLINE);
  
  my $eliza = new Chatbot::Eliza;
  
  sub handler {
      my Apache::Connection $c = shift;
  
      my $bb_in  = APR::Brigade->new($c->pool, $c->bucket_alloc);
      my $bb_out = APR::Brigade->new($c->pool, $c->bucket_alloc);
      my $last = 0;
  
      while (1) {
          my $rv = $c->input_filters->get_brigade($bb_in, Apache::MODE_GETLINE);
  
          if ($rv != APR::SUCCESS or $bb_in->empty) {
              my $error = APR::strerror($rv);
              unless ($rv == APR::EOF) {
                  warn "[eliza] get_brigade: $error\n";
              }
              $bb_in->destroy;
              last;
          }
  
          while (!$bb_in->empty) {
              my $bucket = $bb_in->first;
  
              $bucket->remove;
  
              if ($bucket->is_eos) {
                  $bb_out->insert_tail($bucket);
                  last;
              }
  
              my $data;
              my $status = $bucket->read($data);
              return $status unless $status == APR::SUCCESS;
  
              if ($data) {
                  $data =~ s/[\r\n]*$//;
                  $last++ if $data =~ /good bye/i;
                  $data = $eliza->transform( $data ) . "\n\n";
                  $bucket = APR::Bucket->new($data);
              }
  
              $bb_out->insert_tail($bucket);
          }
  
          my $b = APR::Bucket::flush_create($c->bucket_alloc);
          $bb_out->insert_tail($b);
          $c->output_filters->pass_brigade($bb_out);
          last if $last;
      }
  
      Apache::OK;
  }
  
  use base qw(Apache::Filter);
  use constant BUFF_LEN => 1024;
  
  sub lowercase : FilterConnectionHandler {
      my $filter = shift;
    
      while ($filter->read(my $buffer, BUFF_LEN)) {
          $filter->print(lc $buffer);
      }
    
      return Apache::OK;
  }
  
  1;

XXX: mention that we will talk about filters later, just show that it works here.

HTTP Request Cycle Phases

Those familiar with mod_perl 1.0 will find the HTTP request cycle in mod_perl 2.0 to be almost identical to the mod_perl 1.0's model. The only difference is in the response phase which now includes filtering.

The following diagram depicts the HTTP request life cycle and highlights which handlers are available to mod_perl 2.0:

HTTP cycle

From the diagram it can be seen that an HTTP request is processes by 11 phases, executed in the following order:

1 PerlPostReadRequestHandler (PerlInitHandler)
2 PerlTransHandler
3 PerlHeaderParserHandler (PerlInitHandler)
4 PerlAccessHandler
5 PerlAuthenHandler
6 PerlAuthzHandler
7 PerlTypeHandler
8 PerlFixupHandler
9 PerlResponseHandler
10 PerlLogHandler
11 PerlCleanupHandler

It's possible that the cycle will not be completed if any of the phases terminates it, usually when an error happens.

Notice that when the response handler is reading the input data it can be filtered through request input filters, which are preceeded by connection input filters if any. Similarly the generated response is first run through request output filters and eventually through connection output filters before it's sent to the client. We will talk about filters in detail later in this chapter.

Now let's discuss each of the mentioned handlers in detail.

PerlPostReadRequestHandler

The post_read_request phase is the first request phase and happens immediately after the request has been read and HTTP headers were parsed.

This phase is usually used to do processings that must happen once per request.

This phase is of type RUN_ALL.

The handler's configuration scope is SRV, because at this phase the request has not yet been associated with a particular filename or directory.

Example:

PerlTransHandler

The translate phase provides an opportunity to translate the request's URI into an corresponding filename.

In addition to doing the translation, this stage can be used to modify the URI itself and the request method. This is also a good place to register new handlers for the following phases based on the URI.

If no custom handlers is provided, the server's default rules (Alias directives and the like) will continue to be followed.

This phase is of type RUN_FIRST.

The handler's configuration scope is SRV, because at this phase the request has not yet been associated with a particular filename or directory.

Example:

PerlInitHandler

When configured inside any section, but <VirtualHost> this handler is an alias for PerlHeaderParserHandler described later. Otherwise it acts as an alias for PerlPostReadRequestHandler descibed earlier.

It is the first handler to be invoked when serving a request.

This phase is of type RUN_ALL.

Example:

PerlHeaderParserHandler

The header_parser phase is the first phase to happen after the request has been mapped to its <Location> (or equivalent). At this phase the handler can examine the request headers and to take a special action based on these. For example this phase can be used to block evil clients, while little resources were wasted on these.

This phase is of type RUN_ALL.

The handler's configuration scope is DIR.

Example:

PerlAccessHandler

The access_checker phase is the first of three handlers that are involved in authentication and authorization, and used for access control.

This phase can be used to restrict access from a certain IP address, time of the day or any other rule not connected to the user's identity.

This phase is of type RUN_ALL.

The handler's configuration scope is DIR.

Example:

PerlAuthenHandler

The check_user_id (authen) phase is called whenever the requested file or directory is password protected. This, in turn, requires that the directory be associated with AuthName, AuthType and at least one require directive.

This phase is usually used to verify a user's identification credentials. If the credentials are verified to be correct, the handler should return OK. Otherwise the handler returns AUTH_REQUIRED to indicate that the user has not authenticated successfully. When Apache sends the HTTP header with this code, the browser will normally pop up a dialog box that prompts the user for login information.

This phase is of type RUN_FIRST.

The handler's configuration scope is DIR.

PerlAuthzHandler

The auth_checker (authz) phase is used for authorization control. This phase requires a successful authentication from the previous phase, because a username is needed in order to decide whether a user is authorized to access the requested resource.

As this phase is tightly connected to the authentication phase, the handlers registered for this phase are only called when the requested resource is password protected, similar to the auth phase. The handler is expected to return DECLINED to defer the decision, OK to indicate its acceptance of the user's authorization, or AUTH_REQUIRED to indicate that the user is not authorized to access the requested document.

This phase is of type RUN_FIRST.

The handler's configuration scope is DIR.

Example:

PerlTypeHandler

The type_checker phase is used to set the response MIME type (Content-type) and sometimes other bits of document type information like the document language.

For example mod_autoindex, which performs automatic directory indexing, uses this phase to map the filename extensions to the corresponding icons which will be later used in the listing of files.

Of course later phases may override the mime type set in this phase.

This phase is of type RUN_FIRST.

The handler's configuration scope is DIR.

Example:

PerlFixupHandler

The fixups phase is happening just before the content handling phase. It gives the last chance to do things before the response is generated. For example in this phase mod_env populates the environment with variables configured with SetEnv and PassEnv directives.

This phase is of type RUN_ALL.

The handler's configuration scope is DIR.

Example:

PerlResponseHandler

The handler (response) phase is used for generating the response. This is probably the most important phase and most of the existing Apache modules do most of their work at this phase.

This is the only phase that requires two directives under mod_perl. For example:

  <Location /perl>
     SetHandler  perl-script
     PerlResponseHandler Apache::Registry
  </Location>

SetHandler tells Apache that mod_perl is going to handle the response generation. PerlResponseHandler tells mod_perl which handler is going to do the job.

This phase is of type RUN_FIRST.

The handler's configuration scope is DIR.

Example:

PerlLogHandler

The log_transaction phase happens no matter how the previous phases have ended up. If one of the earlier phases has aborted a request, e.g., failed authenication or 404 (file not found) errors, the rest of the phases up to and including the response phases are skipped. But this phase is always executed.

By this phase all the information about the request and the response is known, therefore the logging handlers usually record this information in various ways (e.g., logging to a flat file or a database).

This phase is of type RUN_ALL.

The handler's configuration scope is DIR.

Example:

PerlCleanupHandler

META: not implemented yet

This phase is of type XXX.

The handler's configuration scope is XXX.

I/O Filtering

Apache 2.0 considers all incoming and outgoing data as chunks of information, disregarding their kind and source or storage methods. These data chunks are stored in buckets, which form bucket brigades. Both input and output filters work on these bucket brigades and modify them if necessary.

Currently the mod_perl filters allow connection and request level filtering. Apache supports several other types, which mod_perl 2.0 will probably support in the future. mod_perl filter handlers specify the type of the filter using the method attributes.

Request filter handlers are declared using the FilterRequestHandler attribute. Consider the following request input and output filters skeleton:

  package MyApache::FilterRequestFoo;
  use base qw(Apache::Filter);
  
  sub input  : FilterRequestHandler {
      my($filter, $bb, $mode, $block, $readbytes) = @_;
      #...
  }
  
  sub output : FilterRequestHandler {
      my($filter, $bb) = @_;
      #...
  }
  
  1;

If the attribute is not specified, the default FilterRequestHandler attribute is assumed. Filters specifying subroutine attributes must subclass Apache::Filter, others only need to:

  use Apache::Filter ();

The request filters are usually configured in the <Location> or equivalent sections:

  PerlModule MyApache::FilterRequestFoo
  PerlModule MyApache::NiceResponse
  <Location /filter_foo>
      SetHandler modperl
      PerlResponseHandler     MyApache::NiceResponse
      PerlInputFilterHandler  MyApache::FilterRequestFoo::input
      PerlOutputFilterHandler MyApache::FilterRequestFoo::output
  </Location>

Now we have the request input and output filters configured.

The connection filter handler uses the FilterConnectionHandler attribute. Here is a similar example for the connection input and output filters.

  package MyApache::FilterConnectionBar;
  use base qw(Apache::Filter);
  
  sub input  : FilterConnectionHandler {
      my($filter, $bb, $mode, $block, $readbytes) = @_;
      #...
  }
  
  sub output : FilterConnectionHandler {
      my($filter, $bb) = @_;
      #...
  }
  
  1;

This time the configuration must be done outside the <Location> or equivalent sections, usually within the <VirtualHost> or the global server configuration:

  Listen 8005
  <VirtualHost _default_:8005>
      PerlModule MyApache::FilterConnectionBar
      PerlModule MyApache::NiceResponse
   
      PerlInputFilterHandler  MyApache::FilterConnectionBar::input
      PerlOutputFilterHandler MyApache::FilterConnectionBar::output
      <Location />
          SetHandler modperl
          PerlResponseHandler MyApache::NiceResponse
      </Location>
   
  </VirtualHost>

This accomplishes the configuration of the connection input and output filters.

[META:

Inside a connection filter the current connection object can be retrieved with:

  my $c = $filter->c;

Inside a request filter the current request object can be retrieved with:

  my $r = $filter->r;

This belongs to the Apache::Filter manpage and should be moved there when this page is created.

]

mod_perl provides two interfaces to filtering: a direct bucket brigades manipulation interface and a simpler, stream-oriented interface (XXX: as of this writing the latter is available only for the output filtering). The examples in the following sections will help you to understand the difference between the two interfaces.

PerlInputFilterHandler

The PerlInputFilterHandler handler registers a filter for input filtering.

This handler is of type VOID.

The handler's configuration scope is DIR.

The following sections include several examples of the PerlInputFilterHandler handler.

PerlOutputFilterHandler

The PerlOutputFilterHandler handler registers and configures output filters.

This handler is of type VOID.

The handler's configuration scope is DIR.

The following sections include several examples of the PerlOutputFilterHandler handler.

All-in-One Filter

Before we delve into the details of how to write filters that do something with the data, lets first write a simple filter that does nothing but snooping on the data that goes through it. We are going to develop the MyApache::FilterSnoop handler which can snoop on request and connection filters, in input and output modes.

But first let's develop a simple response handler that simply dumps the request's args and content as strings:

  file:MyApache/Dump.pm
  ---------------------
  package MyApache::Dump;
  
  use strict;
  use warnings;
  
  use Apache::RequestRec ();
  use Apache::RequestIO ();
  
  use Apache::Const -compile => qw(OK M_POST);
  
  sub handler {
      my $r = shift;
      $r->content_type('text/plain');
  
      $r->print("args:\n", $r->args, "\n");
  
      if ($r->method_number == Apache::M_POST) {
          my $data = content($r);
          $r->print("content:\n$data\n");
      }
  
      return Apache::OK;
  }
  
  sub content {
      my $r = shift;
  
      $r->setup_client_block;
  
      return '' unless $r->should_client_block;
  
      my $len = $r->headers_in->get('content-length');
      my $buf;
      $r->get_client_block($buf, $len);
  
      return $buf;
  }
  
 1;

which is configured as:

  PerlModule MyApache::Dump
  <Location /dump>
      SetHandler modperl
      PerlResponseHandler MyApache::Dump
  </Location>

If we issue the following request:

  % echo "mod_perl rules" | POST 'http://localhost:8002/dump?foo=1&bar=2'

the response will be:

  args:
  foo=1&bar=2
  content:
  mod_perl rules

As you can see it simply dumped the query string and the posted data.

Now let's write the snooping filter:

  file:MyApache/FilterSnoop.pm
  ----------------------------
  package MyApache::FilterSnoop;
  
  use strict;
  use warnings;
  
  use base qw(Apache::Filter);
  use Apache::FilterRec ();
  use APR::Brigade ();
  
  use Apache::Const -compile => qw(OK DECLINED);
  use APR::Const -compile => ':common';
  
  sub connection : FilterConnectionHandler { snoop("connection", @_) }
  sub request    : FilterRequestHandler    { snoop("request",    @_) }
  
  sub snoop {
      my $type = shift;
      my($filter, $bb, $mode, $block, $readbytes) = @_; # filter args
  
      # $mode, $block, $readbytes are passed only for input filters
      my $stream = defined $mode ? "input" : "output";
  
      # read the data and pass-through the bucket brigades unchanged
      my $ra_data = '';
      if (defined $mode) {
          # input filter
          my $rv = $filter->next->get_brigade($bb, $mode, $block, $readbytes);
          return $rv unless $rv == APR::SUCCESS;
          $ra_data = bb_sniff($bb);
      }
      else {
          # output filter
          $ra_data = bb_sniff($bb);
          my $rv = $filter->next->pass_brigade($bb);
          return $rv unless $rv == APR::SUCCESS;
      }
  
      # send the sniffed info to STDERR so not to interfere with normal
      # output
      my $direction = $stream eq 'output' ? ">>>" : "<<<";
      print STDERR "\n$direction $type $stream filter\n";
      my $c = 1;
      while (my($btype, $data) = splice @$ra_data, 0, 2) {
          print STDERR "    o bucket $c: $btype\n";
          print STDERR "[$data]\n";
          $c++;
      }
  
      return Apache::OK;
  }
  
  sub bb_sniff {
      my $bb = shift;
      my @data;
      for (my $b = $bb->first; $b; $b = $bb->next($b)) {
          $b->read(my $bdata);
          $bdata = '' unless defined $bdata;
          push @data, $b->type->name, $bdata;
      }
      return \@data;
  }
  
  1;

This package provides two filter handlers, one for connection and another for request filtering:

  sub connection : FilterConnectionHandler { snoop("connection", @_) }
  sub request    : FilterRequestHandler    { snoop("request",    @_) }

Both handlers forward their arguments to the snoop() function that does the real job. We needed to add these two subroutines in order to assign the two different attributes. Plus the functions pass the filter type to snoop() as the first argument, which gets shifted off @_ and the rest of the @_ are the arguments that were originally passed to the filter handler.

It's easy to know whether a filter handler is running in the input or the output mode. The arguments $filter and $bb are always passed, whereas the arguments $mode, $block, and $readbytes are passed only to input filter handlers.

If we are in the input mode, we retrieve the bucket brigade and immediately link it to $bb which makes the brigade available to the next filter. When this filter handler returns, the next filter on the stack will get the brigade. If we forget to perform this linking our filter will become a black hole in which data simply disappears. Next we call bb_sniff() which returns the type and the content of the buckets in the brigade.

If we are in the output mode, $bb already points to the current bucket brigade. Therefore we can read the contents of the brigade right away. After that we pass the brigade to the next filter.

Finally we dump to STDERR the information about the type of the current mode, and the content of the bucket bridage.

Let's snoop on connection and request filter levels in both directions by applying the following configuration:

  Listen 8008
  <VirtualHost _default_:8008>
      PerlModule MyApache::FilterSnoop
      PerlModule MyApache::Dump
  
      # Connection filters
      PerlInputFilterHandler  MyApache::FilterSnoop::connection
      PerlOutputFilterHandler MyApache::FilterSnoop::connection
  
      <Location /dump>
          SetHandler modperl
          PerlResponseHandler MyApache::Dump
          # Request filters
          PerlInputFilterHandler  MyApache::FilterSnoop::request
          PerlOutputFilterHandler MyApache::FilterSnoop::request
      </Location>
  
  </VirtualHost>

Notice that we use a virtual host because we want to install connection filters.

If we issue the following request:

  % echo "mod_perl rules" | POST 'http://localhost:8008/dump?foo=1&bar=2'

We get the same response, because our snooping filter didn't change anything. Though there was a lot of output printed to error_log. We present it all here, since it helps a lot to understand how filters work.

First we can see the connection input filter at work, as it processes the HTTP headers. We can see that for this request each header is put into a separate brigade with a single bucket. The data is conveniently enclosed by [] so you can see the new line characters as well.

  <<< connection input filter
      o bucket 1: HEAP
  [POST /dump?foo=1&bar=2 HTTP/1.1
  ]
  
  <<< connection input filter
      o bucket 1: HEAP
  [TE: deflate,gzip;q=0.3
  ]
  
  <<< connection input filter
      o bucket 1: HEAP
  [Connection: TE, close
  ]
  
  <<< connection input filter
      o bucket 1: HEAP
  [Host: localhost:8008
  ]
  
  <<< connection input filter
      o bucket 1: HEAP
  [User-Agent: lwp-request/2.01
  ]
  
  <<< connection input filter
      o bucket 1: HEAP
  [Content-Length: 14
  ]
  
  <<< connection input filter
      o bucket 1: HEAP
  [Content-Type: application/x-www-form-urlencoded
  ]
  
  <<< connection input filter
      o bucket 1: HEAP
  [
  ]

Here the HTTP header has been terminated by a double new line. So far all the buckets were of the HEAP type, meaning that they were allocated from the heap memory. Notice that the request input filters will never see the bucket brigade with HTTP header, it has been consumed by the last connection Apache core handler.

The following two entries are generated when MyApache::Dump::handler reads the POSTed content:

  <<< connection input filter
      o bucket 1: HEAP
  [mod_perl rules]
  
  <<< request input filter
      o bucket 1: HEAP
  [mod_perl rules]
      o bucket 2: EOS
  []

as we saw earlier on the diagram, the connection input filter is run before the request input filter. Since our connection input filter was passing the data through unmodified and no other connection input filter was configured, the request input filter sees the same data. The last bucket in the brigade received by the request input filter is of type EOS, meaning that all the input data from the current request has been received.

Next we can see that MyApache::Dump::handler has generated its response. However only the request output filter is filtering it at this point:

  >>> request output filter
      o bucket 1: TRANSIENT
  [args:
  foo=1&bar=2
  content:
  mod_perl rules
  ]

This happens because Apache hasn't sent yet the response HTTP headers to the client. Apache postpones the header sending so it can calculate and set the Content-Length header. This time the brigade consists of a single bucket of type TRANSIENT which is allocated from the stack memory, which will eventually be converted to the HEAP type, before the body of the response is sent to the client.

When the content handler returns Apache sends the HTTP headers through connection output filters (notice that the request output filters don't see it):

  >>> connection output filter
      o bucket 1: HEAP
  [HTTP/1.1 200 OK
  Date: Wed, 14 Aug 2002 07:31:53 GMT
  Server: Apache/2.0.41-dev (Unix) mod_perl/1.99_05-dev 
  Perl/v5.8.0 mod_ssl/2.0.41-dev OpenSSL/0.9.6d DAV/2
  Content-Length: 42
  Connection: close
  Content-Type: text/plain; charset=ISO-8859-1
  
  ]

Now the response body in the bucket of type HEAP is passed through the connection output filter, followed by the EOS bucket to mark the end of the request:

  >>> connection output filter
      o bucket 1: HEAP
  [args:
  foo=1&bar=2
  content:
  mod_perl rules
  ]
      o bucket 2: EOS
  []

Finally the output is flushed, to make sure that any buffered output is sent to the client:

  >>> connection output filter
      o bucket 1: FLUSH
  []

This module helps to understand that each filter handler can be called many time during each request and connection. It's called for each bucket brigade.

Also it's important to notice that the request input filter is called only if there is some POSTed data to read, if you run the same request without POSTing any data or simply running a GET request, the request input filter won't be called.

Connection Input Filter

Let's say that we want to test how our handlers behave when they are requested as HEAD requests, rather than GET. We can alter the request headers at the incoming connection level transparently to all handlers. So here is the input filter handler that does that by directly manipulating the bucket brigades:

  file:MyApache/InputFilterGET2HEAD.pm
  -----------------------------------
  package MyApache::InputFilterGET2HEAD;
  
  use strict;
  use warnings;
  
  use base qw(Apache::Filter);
  
  use Apache::RequestRec ();
  use Apache::RequestIO ();
  use APR::Brigade ();
  use APR::Bucket ();
  
  use Apache::Const -compile => 'OK';
  use APR::Const -compile => ':common';
  
  sub handler : FilterConnectionHandler {
      my($filter, $bb, $mode, $block, $readbytes) = @_;
  
      my $c = $filter->c;
      my $ctx_bb = APR::Brigade->new($c->pool, $c->bucket_alloc);
      my $rv = $filter->next->get_brigade($ctx_bb, $mode, $block, $readbytes);
      return $rv unless $rv == APR::SUCCESS;
  
      while (!$ctx_bb->empty) {
          my $bucket = $ctx_bb->first;
  
          $bucket->remove;
  
          if ($bucket->is_eos) {
              $bb->insert_tail($bucket);
              last;
          }
  
          my $data;
          my $status = $bucket->read($data);
          return $status unless $status == APR::SUCCESS;
  
          if ($data and $data =~ s|^GET|HEAD|) {
              $bucket = APR::Bucket->new($data);
          }
  
          $bb->insert_tail($bucket);
      }
  
      Apache::OK;
  }
  
  1;

The filter handler is called for each bucket brigade, which in turn includes buckets with data. The gist of any filter handler is to retrieve the bucket brigade sent from the previous filter, prepare a new empty brigade, and move buckets from the former brigade to the latter optionally modifying the buckets on the way, which may include removing or adding new buckets. Of course if the filter doesn't want to modify any of the buckets it may decide to pass through the original brigade without doing any work.

In our example the handler first removes the bucket at the top of the brigade and looks at its type. If it sees an end of stream, that removed bucket is linked to the tail of the bucket brigade that will go to the next filter and it doesn't attempt to read any more buckets. If this event doesn't happen the handler reads the data from that bucket and if it finds that the data is of interest to us, it modifies the data, creates a new bucket using the modified data and links it to the tail of the outgoing brigade, while discarding the original bucket. In our case the interesting data is a such that matches the regex /^GET/. If the data is not interesting to the handler, it simply links the unmodified bucket to the outgoing brigade.

The handler looks for data like:

  GET /perl/test.pl HTTP/1.1

and turns it into:

  HEAD /perl/test.pl HTTP/1.1

For example, consider the following response handler:

  file:MyApache/RequestType.pm
  ---------------------------
  package MyApache::RequestType;
  
  use strict;
  use warnings;
  
  use Apache::Const -compile => 'OK';
  
  sub handler {
      my $r = shift;
      $r->content_type('text/plain');
      $r->print("the request type was " . $r->method);
      Apache::OK;
  }
  1;

which returns to the client the request type it has issued. In the case of the HEAD request Apache will discard the response body, but it'll will still set the correct Content-Length header, which will be 24 in case of the GET request and 25 for HEAD. Therefore if this response handler is configured as:

  Listen 8005
  <VirtualHost _default_:8005>
      <Location />
          SetHandler modperl
          PerlResponseHandler +MyApache::RequestType
      </Location>
  </VirtualHost>

and a GET request is issued to /:

  panic% perl -MLWP::UserAgent -le \
  '$r = LWP::UserAgent->new()->get("http://localhost:8005/"); \
  print $r->headers->content_length . ": ".  $r->content'
  24: the request type was GET

where the response's body is:

  the request type was GET

And the Content-Length header is set to 24.

However if we enable the MyApache::InputFilterGET2HEAD input connection filter:

  Listen 8005
  <VirtualHost _default_:8005>
      PerlInputFilterHandler +MyApache::InputFilterGET2HEAD
  
      <Location />
          SetHandler modperl
          PerlResponseHandler +MyApache::RequestType
      </Location>
  </VirtualHost>

And issue the same GET request, we get only:

  25: 

which means that the body was discarded by Apache, because our filter turned the GET request into a HEAD request and if Apache wasn't discarding the body on HEAD, the response would be:

  the request type was HEAD

that's why the content length is reported as 25 and not 24 as in the real GET request.

Request Input Filter

Bucket Brigades and Stream-Oriented Request Output Filters

As mentioned earlier output filters can be written using the bucket brigades manipulation or the simplified stream-oriented interface.

First let's develop a response handler that send two lines of output: numerals 0-9 and the English alphabet:

  file:MyApache/SendAlphaNum.pm
  -------------------------------
  package MyApache::SendAlphaNum;
  
  use strict;
  use warnings;
  
  use Apache::RequestRec ();
  use Apache::RequestIO ();
  
  use Apache::Const -compile => qw(OK);
  
  sub handler {
      my $r = shift;
  
      $r->content_type('text/plain');
  
      $r->print(0..9, "0\n");
      $r->print('a'..'z', "\n");
  
      Apache::OK;
  }
  1;

The purpose of our request output filter is to reverse every line of the response, preserving the new line characters in their places.

Stream-oriented Output Filter

The first filter that we are going to implement is using the stream-oriented interface:

  file:MyApache/FilterReverse1.pm
  ----------------------------
  package MyApache::FilterReverse1;
  
  use strict;
  use warnings;
  
  use Apache::Filter ();
  
  use Apache::Const -compile => qw(OK);
  
  use constant BUFF_LEN => 1024;
  
  sub handler {
      my $filter = shift;
  
      while ($filter->read(my $buffer, BUFF_LEN)) {
          for (split "\n", $buffer) {
              $filter->print(scalar reverse $_);
              $filter->print("\n");
          }
      }
  
      Apache::OK;
  }
  1;

Next, we add the following configuration to httpd.conf:

  PerlModule MyApache::FilterReverse1
  PerlModule MyApache::SendAlphaNum
  <Location /reverse1>
      SetHandler modperl
      PerlResponseHandler     MyApache::SendAlphaNum
      PerlOutputFilterHandler MyApache::FilterReverse1
  </Location>

Now when a request to /reverse1 is made, the response handler MyApache::SendAlphaNum::handler() sends:

  1234567890
  abcdefghijklmnopqrstuvwxyz

as a response and the output filter handler MyApache::FilterReverse1::handler reverses the lines, so the client gets:

  0987654321
  zyxwvutsrqponmlkjihgfedcba

The Apache::Filter module loads the read() and print() methods which encapsulate the stream-oriented filtering interface.

The reversing filter is quite simple: in the loop it reads the data in the readline() mode in chunks up to the buffer length (1024 in our example), and then prints each line reversed while preserving the new line control characters at the end of each line. Behind the scenes $filter->read() retrieves the incoming brigade and gets the data from it, whereas $filter->print() appends to the new brigade which is then sent to the next filter in the stack. read() breaks the while loop, when the brigade is emptied or the end of stream is received.

In order not to distract the reader from the purpose of the example the used code is oversimplified and won't handle correctly input lines which are longer than 1024 characters and possibly using a different line termination pattern. So here is an example of a more complete handler, which does takes care of these issues:

  sub handler {
      my $filter = shift;
  
      my $left_over = '';
      while ($filter->read(my $buffer, BUFF_LEN)) {
          $buffer = $left_over . $buffer;
          $left_over = '';
          while ($buffer =~ /([^\r\n]*)([\r\n]*)/g) {
              $left_over = $1, last unless $2;
              $filter->print(scalar(reverse $1), $2);
          }
      }
      $filter->print(scalar reverse $left_over) if length $left_over;
  
      Apache::OK;
  }

In this handler the lines longer than the buffer's length are buffered up in $left_over and processed only when the whole line is read in, or if there is no more input the buffered up text is flushed before the end of the handler.

Bucket Brigades Output Filter

The second filter that we are going to implement is using the bucket brigades interface to accomplish exactly the same task as the first filter.

  package MyApache::FilterReverse2;
  
  use strict;
  use warnings;
  
  use Apache::Filter;
  
  use APR::Brigade ();
  use APR::Bucket ();
  
  use Apache::Const -compile => 'OK';
  use APR::Const -compile => ':common';
  
  sub handler  {
      my($filter, $bb) = @_;
  
      my $c = $filter->c;
      my $new_bb = APR::Brigade->new($c->pool, $c->bucket_alloc);
  
      while (!$bb->empty) {
          my $bucket = $bb->first;
  
          $bucket->remove;
  
          if ($bucket->is_eos) {
              $new_bb->insert_tail($bucket);
              last;
          }
  
          my $data;
          my $status = $bucket->read($data);
          return $status unless $status == APR::SUCCESS;
  
          if ($data) {
              $data = join "",
                  map {scalar(reverse $_), "\n"} split "\n", $data;
              $bucket = APR::Bucket->new($data);
          }
  
          $new_bb->insert_tail($bucket);
      }
  
      my $rv = $filter->next->pass_brigade($new_bb);
      return $rv unless $rv == APR::SUCCESS;
  
      Apache::OK;
  }
  1;

and the corresponding configuration:

  PerlModule MyApache::FilterReverse2
  PerlModule MyApache::SendAlphaNum
  <Location /reverse2>
      SetHandler modperl
      PerlResponseHandler     MyApache::SendAlphaNum
      PerlOutputFilterHandler MyApache::FilterReverse2
  </Location>

Now when a request to /reverse2 is made, the client gets:

  0987654321
  zyxwvutsrqponmlkjihgfedcba

as expected.

The bucket brigades output filter version is just a bit more complicated than the stream-oriented one. The handler receives the incoming bucket brigade $bb as its second argument. Since when the handler is completed it must pass a brigade to the next filter in the stack, we create a new bucket brigade into which we are going to put the modified buckets and which eventually we pass to the next filter.

The core of the handler is in removing buckets from the head of the bucket brigade $bb while there are some, reading the data from the buckets, reversing and putting it into a newly created bucket which is inserted to the end of the new bucket brigade. If we see a bucket which designates the end of stream, we insert that bucket to the tail of the new bucket brigade and break the loop. Finally we pass the created brigade with modified data to the next filter and return.

Filter Tips

Various tips to use in filters.

Altering the Content-Type

Let's say that you want to modify the Content-Type header in the request output filter:

  sub handler : FilterRequestHandler {
      my $filter = shift;
      ...
      $filter->r->content_type("text/html; charset=$charset");
      ...

Request filters have an access to the request object, so we simply modify it.

Handler (Hook) Types

For each phase there can be more than one handler assigned (also known as hooks, because the C functions are called ap_hook_<phase_name>). Phases' behavior varies when there is more then one handler registered to run for the same phase. The following table specifies that behavior for each handler:

    Directive                   Type
  --------------------------------------
  PerlOpenLogsHandler          RUN_ALL
  PerlPostConfigHandler        RUN_ALL
  PerlChildInitHandler         VOID

  PerlPreConnectionHandler     RUN_ALL
  PerlProcessConnectionHandler RUN_FIRST

  PerlPostReadRequestHandler   RUN_ALL
  PerlTransHandler             RUN_FIRST
  PerlInitHandler              RUN_ALL
  PerlHeaderParserHandler      RUN_ALL
  PerlAccessHandler            RUN_ALL
  PerlAuthenHandler            RUN_FIRST
  PerlAuthzHandler             RUN_FIRST
  PerlTypeHandler              RUN_FIRST
  PerlFixupHandler             RUN_ALL
  PerlResponseHandler          RUN_FIRST
  PerlLogHandler               RUN_ALL
  PerlCleanupHandler           XXX

  PerlInputFilterHandler       VOID
  PerlOutputFilterHandler      VOID

And here is the description of the possible types (For C API declarations see include/ap_config.h, which includes other types which aren't exposed by mod_perl.)

  • VOID

    Handlers of the type VOID will be all executed in the order they have been registered disregarding their return values. Though in mod_perl they are expected to return Apache::OK.

  • RUN_FIRST

    Handlers of the type RUN_FIRST will be executed in the order they have been registered until the first handler that returns something other than Apache::DECLINED. If the return value is Apache::DECLINED, the next handler in the chain will be run. If the return value is Apache::OK the next phase will start. In all other cases the execution will be aborted.

  • RUN_ALL

    Handlers of the type RUN_ALL will be executed in the order they have been registered until the first handler that returns something other than Apache::OK or Apache::DECLINED.

Also see mod_perl Directives Argument Types and Allowed Location

Hook Ordering (Position)

The following constants specify how the new hooks (handlers) are inserted into the list of hooks when there is at least one hook already registered for the same phase.

META: need to verify the following:

  • APR::HOOK_REALLY_FIRST

    run this hook first, before ANYTHING.

  • APR::HOOK_FIRST

    run this hook first.

  • APR::HOOK_MIDDLE

    run this hook somewhere.

  • APR::HOOK_LAST

    run this hook after every other hook which is defined.

  • APR::HOOK_REALLY_LAST

    run this hook last, after EVERYTHING.

META: more information in mod_example.c talking about position/predecessors, etc.

Maintainers

Maintainer is the person(s) you should contact with updates, corrections and patches.

  • Stas Bekman <stas (at) stason.org>

Authors

Only the major authors are listed above. For contributors see the Changes file.