The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Template::Extract - Extract data structure from TT2-rendered documents

VERSION

This document describes version 0.25 of Template::Extract, released September 6, 2003.

SYNOPSIS

    use Template::Extract;
    use Data::Dumper;

    my $obj = Template::Extract->new;
    my $template = << '.';
    <ul>[% FOREACH record %]
    <li><A HREF="[% url %]">[% title %]</A>: [% rate %] - [% comment %].
    [% ... %]
    [% END %]</ul>
    .

    my $document = << '.';
    <html><head><title>Great links</title></head><body>
    <ul><li><A HREF="http://slashdot.org">News for nerds.</A>: A+ - nice.
    this text is ignored.</li>
    <li><A HREF="http://microsoft.com">Where do you want...</A>: Z! - yeah.
    this text is ignored, too.</li></ul>
    .

    print Data::Dumper::Dumper(
        $obj->extract($template, $document)
    );

DESCRIPTION

This module is a subclass of the Template toolkit, with added template extraction functionality. It can take a rendered document and its template together, and get the original data structure back, effectively reversing the process function.

This module is considered experimental. If you just wish to extract RSS-type information out of a HTML document, WWW::SherlockSearch may be a more robust solution.

METHODS

extract($template, $document, \%values)

This method takes three arguments: the template string, or a reference to it; a document string to match against; and an optional hash reference to store the extracted values into.

Extraction is done by transforming the result from Template::Parser to a highly esoteric regular expression, which utilizes the (?{...}) construct to insert matched parameters into the hash reference.

The special [% ... %] directive is taken as /.*?/s in regex terms, i.e. "ignore everything (as short as possible) between this identifier and the next one". For backward compatibility reasons, [% _ %] and [% __ %] are also accepted.

You may set $Template::Extract::DEBUG to a true value to display generated regular expressions.

CAVEATS

Currently, the extract method only handles [% GET %], [% SET %] and [% FOREACH %] directives, because [% WHILE %], [% CALL %] and [% SWITCH %] blocks are next to impossible to extract correctly.

There is no support for different PRE_CHOMP and POST_CHOMP settings internally, so extraction could fail silently on wrong places.

NOTES

This module's companion class, Template::Generate, is still in early experimental stages; it can take data structures and rendered documents, then automagically generates templates to do the transformation. If you are into related research, please mail any ideas to me.

SEE ALSO

Template, Template::Generate, Template::Parser

WWW::SherlockSearch

AUTHORS

Autrijus Tang <autrijus@autrijus.org>

COPYRIGHT

Copyright 2001, 2002, 2003 by Autrijus Tang <autrijus@autrijus.org>.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

See http://www.perl.com/perl/misc/Artistic.html