Author image Dominique Quatravaux

NAME

XML::XPathScript - a Perl framework for XML stylesheets

SYNOPSIS

  use XML::XPathScript;
  my $xps = XML::XPathScript->new(xml => $xml, stylesheet => $stylesheet);

  # The short way:

  $xps->process();

  # The long way (caching the compiled stylesheet for reuse and
  # outputting to multiple files):

  my $compiled = XML::XPathScript->new(stylesheetfile => $filename)
         ->compile('$r');

  foreach my $xml (@xmlfiles) {
     use IO::File;

     my $currentIO=new IO::File(shift @outputfiles);

     XML::XPathScript->new(xml => $xml, compiledstylesheet=>$compiled)
         ->process(sub {$currentIO->print(shift)});
  };

  # Making extra variables available to the stylesheet dialect:

  my $handler=$xps->compile('$r');

  &$handler($xmltree,&Apache::print,Apache->request());

DESCRIPTION

This is the XML::XPathScript stylesheet framework, part of the AxKit project at http://axkit.org/.

XPathScript is a stylesheet language similar in many ways to XSLT (in concept, not in appearance), for transforming XML from one format to another format (possibly HTML, but XPathScript also shines for non-XML-like output).

Like XSLT, XPathScript has a dialect to mix up verbatim document portions and code. Also like XSLT, it leverages the powerful ``templates/apply-templates'' and ``cascading stylesheets'' design patterns, that greatly simplify the design of stylesheets for programmers. The availability of the XPath query language inside stylesheets promotes the use of a purely document-dependent, side-effect-free coding style. But unlike XSLT which uses its own dedicated control language with an XML-compliant syntax, XPathScript uses Perl which is terse and highly extendable.

The result of the merge is an extremely powerful environment for development tasks that involve rendering complex XML documents to other formats. Stylesheets written in XPathScript are very easy to create, extend and reuse, even if they manage hundreds of different XML tags.

STYLESHEET WRITER DOCUMENTATION

Creating stylesheets

See http://axkit.org/docs/xpathscript/guide.dkb for a head start. There you will learn how to markup the embedded dialect and fill in the template hash $t.

xpathscript Invocation

This CPAN module is bundled with an "xpathscript" shell tool that is to be invoked like this:

   xpathscript mydocument.xml mystylesheet.xps

It will produce the resulting document on standard output. More options will be added later (select output file, handle multiple output files, pass parameters to the stylesheet etc.).

Functions and global variables available in the stylesheet

A number of callback functions are available from the stylesheet proper. They apply against the current document and template hash, which are transparently passed back and forth as global variables (see "Global variables"). They are defined in the XML::XPathScript::Toys package, which is implicitly imported into all code written in the embedded stylesheet dialect.

    DO_SELF_AND_KIDS

    DO_SELF_ONLY

    DO_NOT_PROCESS

    Symbolic constants evaluating respectively to 1, -1 and 0, to be used as mnemotechnic return values in testcode routines instead of the numeric values which are harder to remember.

    findnodes($path)

    findnodes($path, $context)

    Returns a list of nodes found by XPath expression $path, optionally using $context as the context node (default is the root node of the current document). In scalar context returns a NodeSet object.

    findvalue($path)

    findvalue($path, $context)

    Evaluates XPath expression $path and returns the result, as either a "XML::XPath::Literal", a "XML::XPath::Boolean" or a "XML::XPath::Number" object. If the path returns a NodeSet, $nodeset->to_literal is called automatically for you (and thus a "XML::XPath::Literal" is returned). Note that for each of the objects stringification is overloaded, so you can just print the value found, or manipulate it in the ways you would a normal perl value (e.g. using regular expressions) - just beware that the result of such stringification will be UTF8-encoded (see perlunicode), as just about everything under the XML sun is.

    findvalues($path)

    findvalues($path, $context)

    Evaluates XPath expression $path as a nodeset expression, just like "findnodes" would, but returns a list of UTF8-encoded XML strings instead of node objects.

    findnodes_as_string($path)

    findnodes_as_string($path, $context)

    Similar to "findvalues" but concatenates the XML snippets. The result is not guaranteed to be valid XML though.

    matches($node, $path)

    matches($node, $path, $context)

    Returns true if the node matches the path (optionally in context $context)

    apply_templates()

    apply_templates($xpath)

    apply_templates($xpath, $context)

    apply_templates(@nodes)

    This is where the whole magic in XPathScript resides: recursively applies the stylesheet templates to the nodes provided either literally (last invocation form) or through an XPath expression (second and third invocation forms), and returns a string concatenation of all results. If called without arguments at all, renders the whole document.

    Calls to apply_templates() may occur both implicitly (at the top of the document, and for rendering subnodes when the templates choose not to handle that by themselves), and explicitly (from testcode routines).

    If appropriate care is taken in all templates (especially the testcode routines and the text() template), the string result of apply_templates need not be UTF-8 (see perlunicode): it is thus possible using XPathScript to produce output in any character set without an extra translation pass.

    call_template($node, $t, $templatename)

    EXPERIMENTAL - allows testcode routines to invoke a template by name, even if the selectors do not fit (e.g. one can apply template B to an element node of type A). Returns the stylesheeted string computed out of $node just like "apply_templates" would.

TECHNICAL DOCUMENTATION

The rest of this POD documentation is not useful to programmers who just want to write stylesheets; it is of use only to people wanting to call existing stylesheets or more generally embed the XPathScript motor into some wider framework.

XML::XPathScript is an object-oriented class with the following features:

  • an embedded Perl dialect that allows the merging of the stylesheet code with snippets of the output document. Don't be afraid, this is exactly the same kind of stuff as in Text::Template, HTML::Mason or other similar packages: instead of having text inside Perl (that one print()s), we have Perl inside text, with a special escaping form that a preprocessor interprets and extracts. For XPathScript, this preprocessor is embodied by the xpathscript shell tool (see "xpathscript Invocation") and also available through this package's API;

  • a templating engine, that does the apply-templates loop, starting from the top XML node and applying templates to it and its subnodes as directed by the stylesheet.

When run, the stylesheet is expected to fill in the template hash $t, which is a lexically-scoped variable made available to it at preprocess time.

Dependencies

Although XPathScript is a core component of AxKit, which depends on this module to be able to process XPathScript stylesheets, there is plenty of motivation for doing stylesheets outside of a WWW application server and so XML::XPathScript is also distributed as a standalone CPAN module. The AxKit XPathScript component inherits from this class and provides the coupling with the application framework by overloading and adding some methods.

XML::XPathScript requires the following Perl packages:

Symbol

For loading files from anonymous filehandles. Symbol is bundled with Perl.

File::Basename

For fetching stylesheets from system files. One may provide other means of fetching stylesheets through object inheritance (this is what AxKit does). File::Basename is bundled with Perl.

XML::Parser
XML::XPath

For the XML parser and XPath interpreter, obviously needed. Plans are to support the XML::libXML package as an alternative, which does the same as the above in C (and hence an order of magnitude faster).

Methods and class methods

new(key1=>value1,key2=>value2,...)

Creates a new XPathScript translator. The recognized named arguments are

xml => $xml

$xml is a scalar containing XML text, or a reference to a filehandle from which XML input is available, or an XML::XPath or XML::libXML object (support for the latter object class is very poor for now, as it involves unparsing and parsing back into XML::XPath).

An XML::XPathscript object without an xml argument to the constructor is only able to compile stylesheets (see "SYNOPSIS").

stylesheet => $stylesheet

$stylesheet is a scalar containing the stylesheet text, or a reference to a filehandle from which the stylesheet text is available. The stylesheet text may contain unresolved <!--#include --> constructs, which will be resolved relative to ".".

stylesheetfile => $filename

Same as stylesheet but let XML::XPathScript do the loading itself. Using this form, relative <!--#include -->s in the stylesheet file will be honored with respect to the dirname of $filename instead of "."; this provides SGML-style behaviour for inclusion (it does not depend on the current directory), which is usually what you want.

compiledstylesheet => $function

Re-uses a previous return value of compile() (see "SYNOPSIS" and "compile"), typically to apply the same stylesheet to several XML documents in a row.

process()
process($printer)
process($printer,@varvalues)

Processes the document and stylesheet set at construction time, and prints the result to STDOUT by default. If $printer is set, it must be either a reference to a filehandle open for output, or a reference to a string, or a reference to a subroutine which does the output, as in

   my $buffer="";
   $xps->process(sub {$buffer.=shift;});

or

   $xps->process(sub {print ANOTHERFD (shift);});

(not that the latter would be any good, since $xps->process(\*ANOTHERFD) would do exactly the same, only faster)

If the stylesheet was compile()d with extra varnames, then the calling code should call process() with a corresponding number of @varvalues. The corresponding lexical variables will be set accordingly, so that the stylesheet code can get at them (looking at "SYNOPSIS") is the easiest way of getting the meaning of this sentence).

extract($stylesheet)
extract($stylesheet,$filename)
extract($stylesheet,@includestack) # from include_file() only

The embedded dialect parser. Given $stylesheet, which is either a filehandle reference or a string, returns a string that holds all the code in real Perl. Unquoted text and <%= stuff %> constructs in the stylesheet dialect are converted into invocations of XML::XPathScript->current()->print(), while <% stuff %> constructs are transcripted verbatim.

<!-- #include --> constructs are expanded by passing their filename argument to "include_file" along with @includestack (if any) like this:

   $self->include_file($includefilename,@includestack);

@includestack is not interpreted by extract() (except for the first entry, to create line tags for the debugger). It is only a bandaid for include_file() to pass the inclusion stack to itself across the mutual recursion existing between the two methods (see "include_file"). If extract() is invoked from outside include_file(), the last invocation form should not be used.

This method does a purely syntactic job. No special framework declaration is prepended for isolating the code in its own package, defining $t or the like ("compile" does that). It may be overriden in subclasses to provide different escape forms in the stylesheet dialect.

include_file($filename)
include_file($filename,@includestack)

Resolves a <!--#include file="foo" --> directive on behalf of extract(), that is, returns the script contents of $filename. The return value must be de-embedded too, which means that extract() has to be called recursively to expand the contents of $filename (which may contain more <!--#include -->s etc.)

$filename has to be slash-separated, whatever OS it is you are using (this is the XML way of things). If $filename is relative (e.g. does not begin with "/" or "./"), it is resolved according to the basename of the stylesheet that includes it (that is, $includestack[0], see below) or "." if we are in the topmost stylesheet. Filenames beginning with "./" are considered absolute; this gives stylesheet writers a way to specify that they really really want a stylesheet that lies in the system's current working directory.

@includestack is the include stack currently in use, made up of all values of $filename through the stack, lastly added (innermost) entries first. The toplevel stylesheet is not in @includestack (e.g. the outermost call does not specify an @includestack).

This method may be overridden in subclasses to provide support for alternate namespaces (e.g. ``axkit://'' URIs).

compile()
compile(varname1, varname2,...)

Compiles the stylesheet set at new() time and returns an anonymous CODE reference. $stylesheet shall be written in the unparsed embedded dialect (e.g. ->extract($stylesheet) will be called first inside compile()).

varname1, varname2, etc. are extraneous arguments that will be made available to the stylesheet dialect as lexically scoped variables. "SYNOPSIS" shows a way to use this feature to pass the Apache handler to AxKit XPathScript stylesheets, which explains this feature better than a lengthy paragraph would do.

The return value is an opaque token that encapsulates a compiled stylesheet. It should not be used, except as the compiledstylesheet argument to new() to initiate new objects and amortize the compilation time. Subclasses may alter the type of the return value, but will need to overload process() accordingly of course.

The compile() method is idempotent. Subsequent calls to it will return the very same token, and calls to it when a compiledstylesheet argument was set at new() time will return said argument.

interpolating()
interpolating($boolean)

Gets (first call form) or sets (second form) the XPath interpolation boolean flag. If true, values set in $template->{pre} and similar may contain expressions within braces, that will be interpreted as XPath expressions and substituted in place: for example, when interpolation is on, the following code

   $t->{'link'}{pre} = '<a href="{@url}">';
   $t->{'link'}{post} = '</a>';

is enough for rendering a <link> element as an HTML hyperlink. The interpolation-less version is slightly more complex as it requires a testcode:

   $t->{'link'}{testcode} = sub {
      my ($currentnode, $t) = @_;
      my $url = findvalue('@url', $currentnode);
      $t->{pre}="<a href=\"$url\">";
      $t->{post}='</a>';
   };

Interpolation is on by default. A (now undocumented) global variable used to change the default to off, but don't do that.

print($text)

Outputs a chunk of text on behalf of the stylesheet. The default implementation is to use the second argument to "process", which was stashed in $self->{printer} by said function. Overloading this method in a subclass provides yet another method to redirect output.

current()

This class method (e.g. XML::XPathScript->current()) returns the stylesheet object currently being applied. This can be called from anywhere within the stylesheet, except a BEGIN or END block or similar.

Utility functions

The functions below are not methods.

XML::XPath::Function::document

An XPath function made available to XPath expressions in the stylesheet, that takes one parameter (a system filename) and returns a nodeset consisting of the root node of a foreign document that is parsed from said filename. This feature can be used to process several input documents at once with one stylesheet, even if their respective DTDs are unrelated.

gen_package_name()

Generates a fresh package name in which we would compile a new stylesheet. Never returns twice the same name.

BUGS

Due to the peculiar syntax allowed in the embedded dialect for accessing the template hash, this package is not reentrant and thus cannot transform several documents at once.

AUTHORS

Created by Matt Sergeant <matt@sergeant.org>

Improvements and feature merge with Apache::AxKit::Language::XPathScript by Yanick Champoux <yanick@babyl.dyndns.org> and Dominique Quatravaux <dom@ideax.com>

LICENSE

This is free software. You may distribute it under the same terms as Perl itself.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 95:

You can't have =items (as at line 122) unless the first thing after the =over is an =item