The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Template::Extract - Extract data structure from TT2-rendered documents

VERSION

This document describes version 0.11 of Template::Extract, released August 31, 2003.

SYNOPSIS

    use Template::Extract;
    use Data::Dumper;

    my $obj = Template::Extract->new;
    my $template = << '.';
    <ul>[% FOREACH record %]
    <li><A HREF="[% url %]">[% title %]</A>: [% rate %] - [% comment %].
    [% ... %]
    [% END %]</ul>
    .

    my $document = << '.';
    <html><head><title>Great links</title></head><body>
    <ul><li><A HREF="http://slashdot.org">News for nerds.</A>: A+ - nice.
    this text is ignored.</li>
    <li><A HREF="http://microsoft.com">Where do you want...</A>: Z! - yeah.
    this text is ignored, too.</li></ul>
    .

    print Data::Dumper::Dumper(
        $obj->extract($template, $document)
    );

DESCRIPTION

This module is a subclass of the Template toolkit, with added template extraction functionality. It can take a rendered document and its template together, and get the original data structure back, effectively reversing the process function.

This module is considered experimental. If you just wish to extract RSS-type information out of a HTML document, WWW::SherlockSearch may be a more robust solution.

METHODS

$obj->extract($template, $document, \%values)

This method takes three arguments: the template string, or a reference to it; a document string to match against; and an optional hash reference to store the extracted values into.

Extraction is done by transforming the result from Template::Parser to a highly esoteric regular expression, which utilizes the (?{...}) construct to insert matched parameters into the hash reference.

The special [% ... %] directive is taken as /.*?/s in regex terms, i.e. "ignore everything (as short as possible) between this identifier and the next one". For backward compatibility, [% _ %] and [% __ %] are also accepted (but deprecated).

You may set $Template::Extract::DEBUG to a true value to display generated regular expressions.

CAVEATS

Currently, the extract method only handles [% GET %], [% SET %] and [% FOREACH %] directives, because [% WHILE %], [% CALL %] and [% SWITCH %] blocks are next to impossible to extract correctly.

With perl v5.7.1 or earlier, nested capturing may sometimes suffer from off-by-one errors. Later perl versions supports the <$^N> variable and are free from such errors.

There is no support for different PRE_CHOMP and POST_CHOMP settings internally, so extraction could fail silently on wrong places.

NOTES

This module's companion class, Template::Generate, is still missing; it's supposed to take a data structure and the preferred rendering, and automagically generate a template to do the transformation. If you are into related research, please mail any ideas to me.

SEE ALSO

Template, Template::Parser, WWW::SherlockSearch

AUTHORS

Autrijus Tang <autrijus@autrijus.org>

COPYRIGHT

Copyright 2001, 2002, 2003 by Autrijus Tang <autrijus@autrijus.org>.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

See http://www.perl.com/perl/misc/Artistic.html