The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

HTML::Mason::Admin - Mason Administrator's Guide

DESCRIPTION

This guide is intended for the sysadmin/webmaster in charge of installing, configuring, or tuning a Mason system.

SITE CONFIGURATION METHODS

There are three ways to configure a Mason site:

  • Minimal configuration, relying on default Mason behavior. Simplest and least flexible.

  • Configuration via httpd.conf directives. Medium complexity and flexibility.

  • Configuration via a handler script (handler.pl). Most complex and most flexible.

The next three sections discuss these methods in detail. We recommend that you start with the simplest method and work your way forward as the need for flexibility arises.

It is important to note that you cannot mix httpd.conf configuration directives with a handler script. Depending on how you declare your PerlHandler, one or the other will always take precedence and the other will be ignored.

Mason is very flexible, and you can replace parts of it by creating your own classes. This documentation assumes that you are simply using the classes provide in the Mason distribution. Customizing and subclassing is covered in the Subclassing document.

MINIMAL CONFIGURATION

The absolutely most minimal configuration looks like this:

    PerlModule HTML::Mason::ApacheHandler

    <FilesMatch "\.html$">
        SetHandler perl-script
        PerlHandler HTML::Mason::ApacheHandler
    </FilesMatch>

This configuration tells Apache to serve all .html files under your document root through Mason. The PerlModule line tells mod_perl to load Mason once at startup time, saving time and memory.

CONFIGURATION VIA httpd.conf DIRECTIVES

Mason's configuration parameters are set via mod_perl's PerlSetVar and PerlAddVar directives (the latter is only available in mod_perl version 1.24 and greater). Though these parameters are all strings in your httpd.conf file, Mason treats them as if they were several different types:

  • string

    The variable's value is simply taken literally and used. The string should be surrounded by quotes if the string contains whitespace, and these quotes will be automatically removed by Apache before Mason sees the variable.

  • boolean

    The variable's value is used as a boolean, and is subject to Perl's rules on truth/falseness. It is recommended that you use 0 (false) or 1 (true) for these arguments.

  • code

    The string is treated as a piece of code and eval'ed. This is used for parameters that expect subroutine references. For example, an anonymous subroutine might look like:

     PerlSetVar  MasonOutMode  "sub { ... }"

    A named subroutine call would look like this:

     PerlSetVar  MasonOutMode  "\&handle_output"
  • list

    To set a list parameter, use PerlAddVar for the values, like this:

     PerlAddVar  MasonPreloads  /foo/bar/baz.comp
     PerlAddVar  MasonPreloads  /foo/bar/quux.comp

    As noted above, PerlAddVar is only available in mod_perl 1.24 and up. This means that it is only possible to assign a single value (using PerlSetVar) to list parameters if you are using a mod_perl older than 1.24.

See HTML::Mason::Params for a full list of parameters.

CONFIGURING VIA HANDLER SCRIPT

For maximum flexibility, you may choose to write a custom script to create your Mason objects and handle requests. In our documentation and examples we call this script handler.pl and place it in the Apache conf/ subdirectory, though you may name it and place it wherever you like.

The handler.pl file is responsible for creating the AapcheHandler object and supplying the many parameters that control how your components are parsed and executed. It also provides the opportunity to execute arbitrary code at three junctures: the server initialization, the beginning of a request, and the end of a request.

Here is a simple handler.pl, also provided in the eg/ directory:

   #!/usr/bin/perl
   #
   # A basic, functional Mason handler.pl.
   #
   package MyMason::MyApp;
   
   # Bring in Mason with Apache support.
   use HTML::Mason::ApacheHandler;
   use strict;
   
   # List of modules that you want to use within components.
   { package HTML::Mason::Commands;
     use Data::Dumper;
   }
   
   # Create ApacheHandler object at startup.
   my $ah = HTML::Mason::ApacheHandler->new();
   
   sub handler
   {
       my ($r) = @_;
   
       my $status = $ah->handle_request($r);
       return $status;
   }
   
   1;

Copy this file into your Apache conf/ subdirectory, and place the following into your httpd.conf:

    PerlRequire conf/handler.pl

    <FilesMatch "\.html$">
        SetHandler perl-script
        PerlHandler MyMason::MyApp    # notice - no ::ApacheHandler!
    </FilesMatch>

replacing MyMason::MyApp with a package name of your choosing.

At this point, your configuration should act identically to a minimal httpd configuration. You can now configure your server by:

  • Adding parameters to the ApacheHandler constructor. e.g.

        HTML::Mason::ApacheHandler->new( ... );
  • Adding use statements for modules that you want to use within components. e.g.

  • Adding code before the handler subroutine, to be executed once by the parent httpd process.

  • Adding code inside the handler subroutine, to be executed before or after each request.

SERVER CONFIGURATION

Component root

The component root marks the top of your component hierarchy. When running Mason with the CGIHandler or ApacheHandler modules, this defaults to your document root.

The component root defines how component paths are translated into real file paths. If your component root is /usr/local/httpd/docs, a component path of /products/index.html translates to the file /usr/local/httpd/docs/products/index.html.

One cannot call a component outside the component root. If Apache passes a file through Mason that is outside the component root (say, as the result of an Alias) you will get a 404 and a warning in the logs.

You may also specify multiple component roots in the spirit of Perl's @INC. Each root is assigned a key that identifies the root mnemonically to a component developer. For example, in httpd.conf:

    PerlAddVar  MasonCompRoot "private => /usr/home/joe/comps"
    PerlAddVar  MasonCompRoot "main => /usr/local/www/htdocs"

or in handler.pl:

    comp_root => [ [ private => '/usr/home/joe/comps' ],
                   [ main    => '/usr/local/www/htdocs' ] ]

This specifies two component roots, a main component tree and a private tree which overrides certain components. The order is respected ala @INC, so private is searched first and main second. (We chose the => notation because it looks cleaner, but note that this is a list of lists, not a hash.)

Keys must be unique in a case-insensitive comparison.

Data directory

The data directory is a writable directory that Mason uses for various features and optimizations. By default, it is a directory called "mason" under your Apache server root.

Mason will create the directory on startup, if necessary, and set its permissions according to the web server User/Group.

External modules

Components will often need access to external Perl modules. There are three basic ways to bring them in.

  1. The httpd PerlModule directive:

        PerlModule CGI
        PerlModule LWP
  2. In the <%once> section of the component(s) that use the module.

        <%once>
        use CGI ':standard';
        use LWP;
        </%once>
  3. In a handler.pl:

        { package HTML::Mason::Commands;
          use CGI ':standard';
          use LWP;
          ... }

Each method has its own trade-offs:

  • The first and third method ensure that the module will be loaded by the Apache parent process at startup time, saving time and memory. The second method, in contrast, will cause the modules to be loaded by each server child. On the other hand this could save memory if the component and module are rarely used. See the mod_perl guide's tuning section and Vivek Khera's mod_perl tuning guide for more details on this issue.

  • The second and third method use the modules from inside the package used by components (HTML::Mason::Commands), meaning that exported method names and other symbols will be usable from components. The first method, in contrast, will import symbols into the main package. The significance of this depends on whether the modules export symbols and whether you want to use them from components.

  • The first and second method work with an Apache-only configuration, while the third method obviously requires a handler.pl. On the other hand, you approximate the effect of a handler.pl using a preloaded, top-level autohandler.

Declining image requests

Mason should be prevented from serving images, tarballs, and other binary files as regular components. Performance will suffer, and such a file may inadvertently contain a Mason character sequence such as "<%".

There are several ways to restrict which file types are handled by Mason.

One way is to specify a filename pattern in the Apache configuration, e.g.:

    <FilesMatch "(\.html|\.txt|^[^\.]+)$">
     SetHandler perl-script
     PerlHandler HTML::Mason
    </FilesMatch>

This directs Mason to handle only files with .html or .txt extension, as well as those files with no extension.

Another way, if you are using a handler.pl script, is to include a line like the following at the top of your handler() subroutine:

    return -1 if $r->content_type && $r->content_type !~ m|^text/|i;

This line handles requests for text/* MIME types, such as text/html and text/plain, and declines others.

Securing top-level components

Users may exploit a server-side scripting environment by invoking scripts with malicious or unintended arguments. Mason administrators need to be particularly wary of this because of the tendency to break out "subroutines" into individually accessible file components.

For example, a Mason developer might create a helpful shared component for performing sql queries:

    $m->comp('sql_select', table=>'employee', where=>'id=315');

This is a perfectly reasonable component to create and call internally, but clearly presents a security risk if accessible via URL:

    http://www.foo.com/sql_select?table=credit_cards&where=*

Of course a web user would have to obtain the name of this component through guesswork or other means, but obscurity alone does not properly secure a system. Rather, you should choose a site-wide policy for distinguishing top-level components from private components, and make sure your developers stick to this policy. You can then prevent private components from being served.

One solution is to place all private components inside a directory, say /private, that lies under the component root but outside the document root.

Another solution is to decide on a naming convention, for example, that all private components begin with "_", or that all top-level components must end in ".html". Then turn all private requests away with a 404 NOT_FOUND (rather than, say, a 403 FORBIDDEN which would provide more information than necessary). Use either an Apache directive

    PerlModule Apache::Constants
    <FilesMatch "^_">
    SetHandler perl-script
    PerlInitHandler Apache::Constants::NOT_FOUND
    </FilesMatch>

or a handler.pl directive:

    return 404 if $r->filename =~ m{^_[^/]+$};

Even after you've safely protected internal components, top-level components that process arguments (such as form handlers) still present a risk. Users can invoke such a component with arbitrary argument values via a handcrafted query string. Always check incoming arguments for validity and never place argument values directly into SQL, shell commands, etc.

Allowing directory requests

By default Mason will decline requests for directories, leaving Apache to serve up a directory index or a FORBIDDEN as appropriate. Unfortunately this rule applies even if there is a dhandler in the directory: /foo/bar/dhandler does not get a chance to handle a request for /foo/bar/.

If you would like Mason to handle directory requests, do the following:

1. Set the decline_dirs parameter to 0.

2. If you are using a handler.pl and it contains a "return -1" line to decline non-text requests (as given in the previous section), add a clause allowing directory types:

    return -1 if $r->content_type && $r->content_type !~ m|^text/|i
                 && $r->content_type !~ m|directory$|i;

The dhandler that catches a directory request is responsible for setting a reasonable content type.

DEVELOPMENT

Global variables

Global variables can make programs harder to read, maintain, and debug, and this is no less true for Mason components. Due to the persistent mod_perl environment, globals require extra initialization and cleanup care.

That said, there are times when it is very useful to make a value available to all Mason components: a DBI database handle, a hash of user session information, the server root for forming absolute URLs.

If you are using a handler.pl script you can initialize the global there, either outside the handler() subroutine (if you only need to set it once) or inside (if you need to set it every request). Because Mason by default parses components in strict mode, you'll need to invoke use vars to avoid a fatal globals warning.

    { package HTML::Mason::Commands;
      use vars qw($server_root);
    }

    ...

    $HTML::Mason::Commands::server_root = "http://www.mysite.com/";

Alternatively, you can initialize the global in the <%once> or <%init> section of a top-level autohandler:

    <%once>
    use vars qw($server_root);
    $server_root = "http://www.mysite.com/";
    <%once>

Sessions

Mason does not have a built-in session mechanism. However, with a page or so of code in your handler.pl, you can integrate Jeffrey Baker's Apache::Session into your application and make a tied global session variable available to all components.

The Mason Sessions How-To, at ..., is the best source of information about this surprisingly tricky subject.

Data caching

Data caching is implemented with DeWitt Clinton's Cache::Cache module. For full understanding of this section you should read the documentation for Cache::Cache as well as for relevant subclasses (e.g. Cache::FileCache).

Cache files

By default, Cache::FileCache is the subclass used for data caching, although this may be overriden by the developer. Cache::FileCache creates a separate subdirectory for every component that uses caching, and one file some number of levels underneath that subdirectory for each cached item. The root of the cache tree is data_dir/cache. The name of the cache subdirectory for a component is determined by the function HTML::Mason::Utils::data_cache_namespace.

Default constructor options

Ordinarily, when $m->cache is called, Mason passes to the cache constructor the namespace, username, and cache_root options, along with any other options given in the $m->cache method.

You may specify other default constructor options with the data_cache_defaults parameter. For example,

    data_cache_defaults =>
       { cache_class => 'SizeAwareFileCache',
         cache_depth => 2,
         default_expires_in => '1 hour' }

Any options passed to individual $m->cache calls override these defaults.

Disabling data caching

If for some reason you want to disable data caching entirely, use

    data_cache_defaults => {cache_class => 'NullCache'}

This subclass faithfully implements the cache API but never stores data.

PERFORMANCE

This section explains Mason's various performance enhancements and how to administer them.

Code cache

When Mason loads a component, it places it in a memory cache.

The maximum size of the cache is specified with the Interp's code_cache_max_size parameter; default is 10MB. When the cache fills up, Mason frees up space by discarding a number of components. The discard algorithm is least frequently used (LFU), with a periodic decay to gradually eliminate old frequency information. In a nutshell, the components called most often in recent history should remain in the cache. Very large components (over 20% of the maximum cache size) never get cached, on the theory that they would force out too many other components.

Note that the "size" of a component in memory cannot literally be measured. It is estimated by the length of the source text plus some overhead. Your process growth will not match the code cache size exactly.

You can prepopulate the cache with components that you know will be accessed often; see Preloading. Note that preloaded components possess no special status in the cache and can be discarded like any others.

Naturally, a cache entry is invalidated if the corresponding component source file changes.

To turn off code caching completely, set Interp's code_cache_max_size to 0.

Object files

The in-memory code cache is only useful on a per-process basis. Each process must build and maintain its own cache. Shared memory caches are conceivable in the future, but even those will not survive between web server restarts.

As a secondary, longer-term cache mechanism, Mason stores a compiled form of each component in an object file under data_dir/obj/component-path. Any server process can eval the object file and save time on parsing the component source file. The object file is recreated whenever the source file changes.

Besides improving performance, object files can be useful for debugging. If you feel the need to see what your source has been translated into, you can peek inside an object file to see exactly how Mason converted a given component to a Perl object. This is crucial for pre-1.10 Mason, in which error line numbers are based on the object file rather than the source file.

If you change any Compiler or Lexer parameters, you must remove object files previously created under that compiler or lexer for the changes to take effect.

If for some reason you don't want Mason to create object files, set the Interp's use_object_files parameter to 0.

Preloading

You can tell Mason to preload a set of components in the parent process, rather than loading them on demand, using the Interp's preloads parameter. Each child server will start with those components loaded in the memory cache. The trade-offs are:

time

a small one-time startup cost, but children save time by not having to load the components

memory

a fatter initial server, but the memory for preloaded components are shared by all children. This is similar to the advantage of using modules only in the parent process.

Try to preload components that are used frequently and do not change often. (If a preloaded component changes, all the children will have to reload it from scratch.)

Static source mode

As described above, Mason checks the timestamp of a component source file every time that component is called. This can add up to a lot of file stats.

If you have a live site with infrequent and well-controlled updates, you may choose to use static_source mode. In this mode Mason will not check source timestamps when it uses an in-memory cache or object file. The disadvantage is that you must remove object files and restart the server whenever you change component source; however this process can be easily automated.

ERROR REPORTING

When an error occurs, Mason can respond by:

  • showing a detailed error message in the browser in HTML.

  • die'ing, which sends a 501 to the browser and lets the error message go to the error logs.

The first behavior is ideal for development, where you want immediate feedback on the error. The second behavior is usually desired for production so that users are not exposed to messy error messages. You choose the behavior by setting error_mode to "output" or "fatal" respectively.

CONFIGURING VIRTUAL SITES

These examples extend the single site configurations given so far.

Multiple sites, one component root

If you want to share some components between your sites, arrange your httpd.conf so that all DocumentRoots live under a single component space:

    # Web site #1
    <VirtualHost www.site1.com>
        DocumentRoot /usr/local/www/htdocs/site1
        <Location />
            SetHandler perl-script
            PerlHandler HTML::Mason::ApacheHandler
        </Location>
    </VirtualHost>

    # Web site #2
    <VirtualHost www.site2.com>
        DocumentRoot /usr/local/www/htdocs/site2
        <Location />
            SetHandler perl-script
            PerlHandler HTML::Mason::ApacheHandler
        </Location>
    </VirtualHost>

    # Mason configuration
    PerlSetVar MasonCompRoot "/usr/local/www/htdocs"
    PerlSetVar MasonDataDir "/usr/local/mason"
    PerlModule HTML::Mason::ApacheHandler

The directory structure for this scenario might look like:

    /usr/local/www/htdocs/  # component root
        +- shared/          # shared components
        +- site1/           # DocumentRoot for first site
        +- site2/           # DocumentRoot for second site

Incoming URLs for each site can only request components in their respective DocumentRoots, while components internally can call other components anywhere in the component space. The shared/ directory is a private directory for use by components, inaccessible from the Web.

Multiple sites, multiple component roots

Sometimes your sites need to have completely distinct component hierarchies, e.g. if you are providing Mason ISP services for multiple users. In this case the component root must change depending on the site requested. Since you can't change an interpreter's component root dynamically, you need to maintain separate ApacheHandler objects for each site in your handler.pl:

    my %ah;
    foreach my $site (qw(site1 site2 site3)) {
        $ah{$site} = new HTML::Mason::ApacheHandler
            (comp_root => "/usr/local/www/$site",
             data_dir => "/usr/local/mason/$site");
    }

    ...

    sub handler {
        my ($r) = @_;
        my $site = $r->dir_config('site');
        $ah{$site}->handle_request($r);
    }

We assume each virtual server configuration section has a

    PerlSetVar site <site_name>

Above we pre-create all Mason objects in the parent. Another scheme is to create objects on demand in the child:

    my %ah;

    ...

    sub handler {
        my ($r) = @_;
        my $site = $r->dir_config('site');
        unless exists($ah{$site}) {
            # get comp_root from PerlSetVar as well
            my $comp_root = $r->dir_config('comp_root');
            $ah{$site} = new HTML::Mason::ApacheHandler(comp_root=>$comp_root,...);
        }
    }

The advantage of the second scheme is that you don't have to hardcode as much information in the handler.pl. The disadvantage is a slight memory and performance impact. On development servers this shouldn't matter; on production servers you may wish to profile the two schemes.

RUNNING OUTSIDE OF MOD_PERL

Although Mason is most commonly used in conjunction with mod_perl, the APIs are flexible enough to use in any environment. Below we describe the two most common alternative environments, CGI and standalone scripts.

Using Mason from a CGI script

The easiest way to use Mason via a CGI script is with the CGIHandler module module.

Here is a skeleton CGI script that calls a component and sends the output to the browser.

    #!/usr/bin/perl
    use HTML::Mason::CGIHandler;

    my $h = new HTML::Mason::CGIHandler
     (
      data_dir  => '/home/jethro/code/mason_data',
     );

    $h->handle_request;

The relevant portions of the httpd.conf file look like:

    DocumentRoot /path/to/comp/root
    ScriptAlias /cgi-bin/ /path/to/cgi-bin/

    Action html-mason /cgi-bin/mason_handler.cgi
    <FilesMatch "\.html$">
     SetHandler html-mason
    </FilesMatch>

This simply causes Apache to call the mason_handler.cgi script every time a file under the component root is requested. This script uses the CGIHandler class to do most of the heavy lifting. See that class's documentation ofr more details.

Using Mason from a standalone script

Mason can be used as a pure text templating solution -- like Text::Template and its brethren, but with more power (and of course more complexity).

Here is a bare-bones script that calls a component file and sends the result to standard output:

    my $interp = HTML::Mason::Interp->new (out_method=>\$outbuf);
    $interp->exec(<absolute-file-path>, <args>...);

Because no component root was specified, the root is set to '/' and any file on the system may be used as a component. If you have a well defined and contained component tree, you'll probably want to specify a component root.

Because no data directory was specified, object files will not be created and data caching will not work in the default manner. If performance is an issue, you will want to specify a data directory.

Here's a slightly fuller script that specifies a component root and data directory, and captures the result in a variable rather than sending to standard output:

    my $outbuf;
    my $interp = HTML::Mason::Interp->new
        (comp_root  => '/path/to/comp_root',
         data_dir   => '/path/to/data_dir',
         out_method => \$outbuf
         );
    $interp->exec(<component-path>, <args>...);

AUTHORS

Jonathan Swartz <swartz@pobox.com>, Dave Rolsky <autarch@urth.org>, Ken Williams <ken@mathforum.org>

SEE ALSO

HTML::Mason, HTML::Mason::Interp, HTML::Mason::ApacheHandler, HTML::Mason::Lexer, HTML::Mason::Compiler