Jochen Wiedmann

NAME

    HTML::EP - a system for embedding Perl into HTML

SYNOPSIS

    <html><head><title>CGI-Env</title></head>
    <ep-comment>
        This is an HTML document. You see. Perhaps you wonder about
        the unknown HTML tags like ep-comment above? They are part
        of the EP system. For example, this comment section will
        be removed and you won't see it in your browser.
    </ep-comment>
    <ep-perl>
        # This is an example of embedding Perl into the page.
        # We create a variable called env, containing our environment
        # variables. This variable will be used below.
        my $self = $_;
        my $env = [];
        my($var, $val);
        while (($var, $val) = each %ENV) {
            push(@$env, { var => $var, val => $val });
        }
        $self->{env} = {%ENV};
        '';
    </ep-perl>
    <body><h1>Your CGI environment</h1>
        Your CGI environment looks as follows:
        <table><tr><th>Variable</th><th>Value</th>
            <ep-list items="env" item="e">
               <tr><td>$e->var$</td><td>$e->val$</td>
            </ep-list>
        </table>
    </body>
    </html>

WARNING

THIS IS ALPHA SOFTWARE. It is *only* 'Alpha' because the interface (API) is not finalised. The Alpha status does not reflect code quality or stability. In particular the following things might change without further notice:

  • The way of passing variables as attributes. For example, currently one can use

        <ep-mail epparse-from="$cgi->email$" to="joe"
                 subject="Test mail">

    to pass the form variable email into method calls. I consider changing this to

        <ep-mail from="$cgi->email$" to="joe" subject="Test mail"
                 parse=1>

    or whatever. The decision mainly depends on performance considerations and readability. See "Variables" below.

  • The C interface of introducing own methods. This depends on whether I need changes for inserting the module into mod_perl. See "TODO" below.

DESCRIPTION

Have you ever written a CGI binary? Easy thing, isn't it? Was just fun!

Have you written two CGI binaries? Even easier, but not so much fun.

How about the third, fourth or fifth tool? Sometimes you notice that you are always doing the same:

  • Reading and parsing variables

  • Formatting output, in particular building tables

  • Sending mail out from the page

  • Building a database connection, passing CGI input to the database and vice versa

  • Talking to HTML designers about realizing their wishes

You see, it's soon to become a pain. Of course there are lots of little helpers around, for example the CGI module, the mod_perl suite and lots of it more. Using them make live a lot easier, but not so much as you like. CGI(3). mod_perl(3).

On the other hand, there are tools like PHP/FI or WebHTML. Incredibly easy to use, but not as powerfull as Perl. Why not get the best from both worlds? This is what EP wants to give you, similar to ePerl or HTML::EmbPerl. I personally believe that EP is simpler and better extendible than the latter two. ePerl(1). HTML::EmbPerm(3).

In short, it's a single, but extensible program, that scans an HTML document for certain special HTML tags. These tags are replaced by appropriate output generated by the EP. What remains is passed to the browser. Its just like writing HTML for an enhanced browser!

Prerequisites

As far as I know EP depends on no system dependent features. However, it relies on some other Perl modules:

CGI

The CGI module should be a part of your Perl core's installation. If not, you should definitely upgrade to Perl 5.004. :-) My thanks to Lincoln D. Stein <lstein@genome.wi.mit.edu>.

HTML::Parser

This module is used for parsing the HTML templates. My thanks to Gisle Aas <aas@sn.no>.

libwww

The LWP library contains a lot of utility functions, for example HTML and URL encoding and decoding. Again, my thanks to Gisle Aas <aas@sn.no>. :-)

Perl itself and the above modules are available from any CPAN mirror, for example

       ftp://ftp.funet.fi/pub/languages/perl/CPAN/modules/by-module

Installation

Installing this module (and the prerequisites from above) is quite simple. You just fetch the archive, extract it with

    gzip -cd HTML-EP-0.1000.tar.gz | tar xf -

(this is for Unix users, Windows users would prefer WinZip or something similar) and then enter the following:

    cd HTML-EP-0.1000
    perl Makefile.PL
    make
    make test

If any tests fail, let me know. Otherwise go on with

    make install

This will put the required Perl modules into a destination where Perl finds it by default. Additionally it will install a single CGI binary, called ep.cgi.

The docs are available online with

    perldoc HTML::EP

If you prefer an HTML version of the docs, try

    pod2html lib/HTML/EP.pm

in the source directory.

Using the CGI binary

I suggest that you choose an extension and configure your WWW server for feeding files with this extension into ep.cgi. For example, with Apache, you can add the following lines to your srm.conf:

    ScriptAlias /cgi-bin/ep.cgi /usr/bin/ep.cgi
    AddHandler x-ep-script .ep
    Action x-ep-script /cgi-bin/ep.cgi

This tells Apache that files with extension ep.cgi are handled by the CGI binary /usr/bin/ep.cgi. Make sure, that the ScriptAlias line is entered *before* any other ScriptAlias instruction!

From now on your server will never return files with extension .ep directly! Verify your installation by creating the following file:

    <ep-perl>
    print "content-type: text/plain\n\n";
    print "It worked! Your EP system is up and running.\n";
    </ep-perl>

Store it as /test.ep on your web server and retrieve the file via your Web server. You should neither see the ep-perl not the print instructions.

Available methods

All EP tags are starting with the prefix ep-. Some available tags are:

ep-comment

This is a multi-line tag for embedding comments into your HTML page. But why use this tag, instead of the usual HTML comment, <!--? The difference is, that the user will never see the former.

Example:

    <html>
        <!-- This is a comment. I like comments. --!>
        <ep-comment>
            This is another comment, but you won't see it
            in your browser. The HTML editor will show it
            to you, however!
        </ep-comment>
    </html>

Do not try to embed EP instructions into the comment section! They won't produce output, but they will be executed anyways.

ep-perl

This is for embedding Perl into your script. There are two versions of it: A multiline version is for embedding the Perl code immediately into your script. Example:

    <html>
        <head><title>The Date</title></head>
        <body>
            <h1>The Date</h1>
            <p>Hello, today its the</p>
            <p align=center>
            <ep-perl>
                # This little piece of Perl code will be executed
                # while scanning the page.
                #
                # Let's calculate the date!
                #
                my($sec,$min,$hour,$mday,$mon,$year)
                    = localtime(time);
                # Leave a string with the date as result. Will be
                # inserted into the HTML stream:
                sprintf("%02d.%02d.%04d", $mday, $mon+1, $year+1900);
            </ep-perl> 
            </p>
        </body>
    </html>

If you don't like to embed Perl code, you may store it into a different file. That's what the single-line version of ep-perl is for:

    <html>
        <head><title>The Date</title></head>
        <body>
            <h1>The Date</h1>
            <p>Hello, today its the</p>
            <p align=center>
            <ep-perl src="date.pl">
            </p>
        </body>
    </html>

You have noticed, that the little script's result was inserted into the HTML page, did you? It did return a date, in other words a string consisting of letters, digits and dots. There's no problem with inserting such a string into an HTML stream.

But that's not always the case! Say you have a string like

    Use </html> for terminating the HTML page.

This cannot be inserted as a raw string, for obvious reasons. Thus the ep-perl command has an attribute output. Use it like this:

    <ep-perl output=html>
        'Use </html> for terminating the HTML page.';
    </ep-perl>

Possible values of the output attribute are raw (default), html (HTML encoded) and url (URL encoded).

It's a common mistake, to use the Perl command return in embedded Perl. Never do that! If you need return (there are of course situations where returning can help), do it like this:

    <ep-perl>
        sub eval_me {
            if ($this) {
                return 'foo';
            } elsif ($that) {
                return 'bar';
            }
            '';
        }
        eval_me();
    </ep-perl>

See "Variables" below for interactions between Perl variables and EP variables.

ep-mail

This command will send an e-mail. The attributes will be used for creating the email header, in particular the subject, from and to attribute should be used. Example:

    <ep-mail subject="Howdy!" from="joe@ispsoft.de"
             to="bill@whitehouse.gov">
        Hello, Bill, old chap. How are you?

        Yours sincerely,
        Jochen
    </ep-mail>

Note, that you can still use EP variables in the E-mail body, for example the following works:

    <ep-mail subject="Test" epparse-from="$cgi->email$"
             to="joe@ispsoft.de">
        Hello, Joe,
        this e-mail was sent to you by $@cgi->name$.
    </ep-mail>

Note that we suppress conversion into HTML format in the mail body! See "Variables" below for details.

ep-errhandler

This command advices EP, what to do in case of errors. See "Error handling" below. Example:

    <ep-comment>
        Set the template being used for system errors.
    </ep-comment>
    <ep-errhandler type=system src=/templates/syserr.html>
    <ep-comment>
        Likewise, set the template for user errors.
    </ep-comment>
    <ep-errhandler type=user src=/templates/usererr.html>

If an error occurs, the given scripts are loaded and used as templates instead of the current one. You don't need external files! Instead you can use

    <ep-errhandler type=user>
        <HTML><HEAD><TITLE>User error</TITLE></HEAD>
        <BODY><H1>User error</H1>
        <P>Replace user and continue. :-)</P>
        <P>To be serious, the following problem happened:</P>
        <PRE>$errmsg$</PRE>
        <P>Please return to the calling page, fix the problem
        and retry.</P>
        </BODY></HTML>
    </ep-errhandler>

However, you might prefer to use a single error template and of course it's faster to use external error templates than parsing builtin templates. (At least, if no error occurs. :-)

ep-error

This command forces an error message. See "Error handling" below. You can trigger user or system errors by setting the type attribute to the values system (default) or user. The msg attribute is for setting the error message.

Example:

    <ep-comment>
        If no email address was entered, force a user error.
    </ep-comment>
    <ep-if epparse-eval="$cgi->email$" type=user>
        <ep-error msg="Missing email address">
    </ep-if>
ep-database

This command connects to a database. Its attributes are dsn, user and password corresponding to the same attributes of the DBI connect method. See DBI(3) for details on DBI.

Example:

    <ep-database dsn="DBI:mysql:test" user="joe"
                 password="Authorized?Me?">

You can use different database connections by using the dbh attribute:

    <ep-database dbh="dbh2" dsn="DBI:mSQL:test">

The dbh attribute advices EP to store the DBI handle in the given variable. (Default: dbh) See "Variables" below.

ep-query

This command executes an SQL statement. The query attribute will be used for passing the SQL statement. Of course a multiline version is available, thus

    <ep-query query="INSERT INTO foo VALUES (1, 'bar')">

is the same as

    <ep-query>
        INSERT INTO foo VALUES (1, 'bar')
    </ep-query>

If your query retrieves a result, use the result attribute to store it in a variable, for example like this:

    <ep-query query="SELECT * FROM employees" result="employees">

This will create a variable employees, an array ref of hash refs. You can use the ep-list command for displaying the output. See "Variables" below.

When using multiple database connections, use the dbh attribute for choosing the connection. (See the ep-database method above.)

ep-list

This command is used to display an array of refs. Lets assume, that the variable employees contains a an array ref of refs with the attributes name and department. Then you could create a table of employees as follows:

    <table><tr><th>Name</th><th>Department</th>
    <ep-list items="employees" item="e">
           <tr><td>$e->name$</td><td>$e->department$</td>
    </ep-list>
    </table>

This will be processed as follows: For any item in the array, retrieved from the variable employees, create a variable e and display the text between ep-list and /ep-list for it by replacing the patterns $e->name$ and $e->department$ with the corresponding values.

ep-input

This is usefull for reading an objects data out of CGI variables. Say you have a form with input fields describing an address, the field names being address_t_name, address_t_street, address_n_zip and address_t_city. By using the command

    <ep-input prefix="address_" dest="address">

the EP program will create a variable "address" for you which is an hash ref as follows:

    $cgi = $_->{cgi};
    $_->{address} = {
        name =>   { col => 'name',
                    val => $cgi->param("address_name"),
                    type => 't',
                  },
        street => { col => 'street',
                    val => $cgi->param("address_street"),
                    type => 't',
                  },
        zip =>    { col => 'zip',
                    val => $cgi->param("address_zip"),
                    type => 'n',
                  },
        city =>   { col => 'city',
                    val => $cgi->param("address_city"),
                    type => 't'
                    }
    };

In general column names beginning with address will be splitted into prefix_type_suffix, the type being either 't' for text or 'n' for number. The idea is generating SQL queries automatically out of the address variable.

Error handling

Error handling with EP is quite simple: All you do in case of errors is throwing a Perl exception. For example, DBI handles are created with the RaiseError attribute set to 1, so that SQL errors trigger a Perl exception. You never care for errors!

However, what happens in case of errors? In that case, EP will use the template that you have set with ep-errhandler and treat it like an ordinary EP document, by setting the variables errmsg and admin.

If you don't set an error handler, the following template will be used, which is well suited for creating an own error template:

    <html><head><title>Internal error</title></head>
    <body><h1>Internal error</h1>
    <p>An internal error occurred. The server has not been able to
    fullfill your request. The error message is:</p>
    <pre>
        $errmsg$
    </pre>
    <p>Please contact the Webmaster,
    <a href="mailto:$admin$">$admin$</a>,
    tell him the URL, the time and error message.
    </p>
    <p>We apologize for any inconvenience, please try again later!</p>
    <br><br><br><p>Yours sincerely,</p><p>The Webmaster</p>
    </body>
    </html>

Variables

It is important to understand, how EP variables work, in particular when working with ep-perl.

You always have an object $_, which is an instance of the HTML::EP class (a subclass of HTML::Parser). This object has certain attributes, in particular $_-{cgi}>, a CGI object and $_-{dbh}>, the DBI handle. (Of course valid after ep-database only.) If you want to set or modify a variable, you have to set $_-{varname}>. If you want to retrieve the value, use the same. Note that you cannot use $_ for a long time, as it will be changed by Perl loops and the like, thus your Perl code typically starts with

    $_ = $self;

But how do you access the variable from within EP documents? You just write

        $varname$

This will be replaced automatically by the parser with the value of $_->{varname}. Even more, the value will be converted into HTML source!

If varname is a structured variable, for example a hash or array ref, you may as well use

        $varname->attrname$

or

        $varname->0$

to access $_->{varname}->{attrname} or $_->{varname}->[0], respectively. A special value of varname is cgi: This will access the CGI variable of the same name, thus the following are equivalent:

        $cgi->email$

and

        $_->{cgi}->param('email');

But what, if you don't want your variable to be HTML encoded? You may as well use

        $@varname$      (Raw)
        $#varname$      (URL encoded)
        $~varname$      (SQL encoded)

The latter uses the $_->{dbh}->quote() method. In particular this implies that you have to be connected to a database, before using this tag!

You can even use these symbols in attributes of EP commands. For example, the following will be usefull when sending a mail:

    <ep-mail subject="Howdy!" epparse-from="$@cgi->email$"
             to="bill@whitehouse.gov">

By prefixing the attribute name with epparse- you tell the EP module that the attribute must be parsed before processing it. (Note, that one doesn't use an HTML encoded value in that case!) Another prefix is epperl-: In this case the attribute will be evaluated by the Perl interpreter, much similar to the ep-perl method. The perl code's result will be used as final attribute value.

WARNING: As already said, use of epparse- and epperl- is dangerous, as these things are likely to be changed in the future.

EXTENSIONS

It is quite easy to write own methods.

Single-line extensions

For example, suggest you want a method for accessing environment variables:

    <ep-env var="e">

The idea is to create a variable e, which is a hash ref of the current environment variables, so that you can use

    $e->REMOTE_AGENT$

for accessing the name of the users browser.

This can be done like this:

    <ep-perl package="HTML::EP">
    my $self = $_;

    # Write a handler for ep-env:
    sub env ($$) {
        my($self, $attr) = @_;
        my $var = $attr->{var};
        $self->{$var} = {%ENV};
        '';
    }

    # Register the handler in the list of handlers:
    $self->{_ep_funcs}->{'ep-env'} = { method => 'env' };
    # Return an empty string:
    '';
    </ep-perl>

The method attribute of the handler tells the EP module to call

    $_->env($attr);

if the ep-env tag is used. The argument $attr is a hash ref of the tags attributes. Note the use of the package attribute: By default the ep-perl code is executed in a Safe compartment. See Safe(3).

Multi-line extensions

But how to write methods, that use a <tag> .. </tag> syntax? As an example, we write a method for creating external files. The method receives two attributes, a file attribute for the files name and a contents attribute for the files contents. The method can be used in two ways:

    <ep-file file="test.dat" contents="Hi!">

or like this, in multiline mode:

    <ep-file file="test.dat">
        Hi!
    </ep-file>

Here it is:

    <ep-perl package="HTML::EP">
    # Write a handler for ep-file:
    sub file ($$) {
        my($self, $attr) = @_;
        my $contents = $attr->{contents};
        if (!defined($contents)) { # Multiline method, no "contents"
            return undef;          # attribute given; return undef
        }                          # until we are called again.
        my $file = $attr->{file};
        require Symbol;
        my $fh = Symbol::gensym();
        if (!open($fh, ">$file)  ||  !(print $fh ($contents))  ||
            !close($fh)) {
            die "Error while creating $file: $!";
        }
        '';
    }
    # Register the handler in the list of handlers
    # Note the use of the "default" attribute:
    $self->{_ep_funcs}->{'ep-env'} = { method => 'env',
                                       default => 'contents' };
    # Return an empty string:
    '';
    </ep-perl>

In other words: The method gets called twice, once for <ep-file> and once for </ep-fileE</gt>. If it thinks, it should enter multi-line mode (if the contents attribute is not set, it returns undef. In that case EP is looking at the default attribute of the handler which is telling, that the lines between <ep-file> and </ep-file> ought to be written into the contents attribute. Thus this attribute exists, if the method is called a second time.

Note the use of the Symbol package when accessing files: *Never* use global handles like

    open(FILE, ...)

as this might break future multithreading code!

Selfloaded methods

In the above examples the extension methods have been compiled immediately. This is not always a good idea: For example the ep-mail method is loading big external packages like Mail::Internet for sending the mail. In such cases you might wish to use HTML::EP's builtin self loader, which is quite similar to that of CGI. We choose ep-mail as an example:

    <ep-perl package="HTML::EP">
    my $self = $_;
    # Create a string that can be compiled for loading the method:
    $AUTOLOADED_SUBS{_ep_mail} = <<'end_of__ep_mail';
        require Mail::Internet;
        sub _ep_mail ($$) {
            my($self, $attr) = @_;
            ...
        }
    end_of__ep_mail

    $self->{_ep_funcs}->{ep-mail} = { method => '_ep_mail',
                                      default => 'body' };
    </ep-perl>

The advantage is that you have the method available, but the performance penalty of loading it is almost omitted, if the method is not used.

Namespace pollution

So far extensions are inserted at run time only, usually by loading them from external files. For example you might create extensions for building a WWW shop and put them in, say, shop.lib.

As soon as it is possible to make extensions permanent (with mod_perl or an HTML::EP server) and extensions will be loaded at startup, there will be more and more of such extension files. Experience shows, that namespace pollution will finally become a problem. For example, any virtual Web server might have a completely different method ep-buttons for inserting an automatically generated button frame.

Thus I propose to use a package model inherited from Perl's package model, and start using it now:

If you create a shop extension, put it into a separate package, HTML::EP::Shop, say. When working with the extension, use

    <ep-package name="HTML::EP::Shop">

as the very first line, so that your Parser object becomes blessed into the Shop package. And let your shop extension file look like this:

    use vars qw(@ISA);
    @ISA = qw(HTML::EP);

    # Define first extension method
    ...

    # Define second extension method
    ...

Note that this allows inheritage! Currently the extension file has to be loaded with

    <ep-perl package="HTML::EP::Shop" src="shop.lib">

but this will change in the future.

TODO

  • mod_perl support

  • Create an EP server that is accessible via a small C wrapper

AUTHOR AND COPYRIGHT

This module is

    Copyright (C) 1998    Jochen Wiedmann
                          Am Eisteich 9
                          72555 Metzingen
                          Germany

                          Phone: +49 7123 14887
                          Email: joe@ispsoft.de

All rights reserved.

You may distribute this module under the terms of either the GNU General Public License or the Artistic License, as specified in the Perl README file.

SEE ALSO

DBI(3), CGI(3), HTML::Parser(3)

1 POD Error

The following errors were encountered while parsing the POD:

Around line 726:

Unknown E content in E</gt>