NAME
Template::Extract - Extract data structure from TT2-rendered documents
VERSION
This document describes version 0.11 of Template::Extract, released August 31, 2003.
SYNOPSIS
use Template::Extract;
use Data::Dumper;
my $obj = Template::Extract->new;
my $template = << '.';
<ul>[% FOREACH record %]
<li><A HREF="[% url %]">[% title %]</A>: [% rate %] - [% comment %].
[% ... %]
[% END %]</ul>
.
my $document = << '.';
<html><head><title>Great links</title></head><body>
<ul><li><A HREF="http://slashdot.org">News for nerds.</A>: A+ - nice.
this text is ignored.</li>
<li><A HREF="http://microsoft.com">Where do you want...</A>: Z! - yeah.
this text is ignored, too.</li></ul>
.
print Data::Dumper::Dumper(
$obj->extract($template, $document)
);
DESCRIPTION
This module is a subclass of the Template toolkit, with added template extraction functionality. It can take a rendered document and its template together, and get the original data structure back, effectively reversing the process
function.
This module is considered experimental. If you just wish to extract RSS-type information out of a HTML document, WWW::SherlockSearch may be a more robust solution.
METHODS
$obj->extract($template, $document, \%values)
This method takes three arguments: the template string, or a reference to it; a document string to match against; and an optional hash reference to store the extracted values into.
Extraction is done by transforming the result from Template::Parser to a highly esoteric regular expression, which utilizes the (?{...}) construct to insert matched parameters into the hash reference.
The special [% ... %]
directive is taken as /.*?/s
in regex terms, i.e. "ignore everything (as short as possible) between this identifier and the next one". For backward compatibility, [% _ %]
and [% __ %]
are also accepted (but deprecated).
You may set $Template::Extract::DEBUG
to a true value to display generated regular expressions.
CAVEATS
Currently, the extract
method only handles [% GET %]
, [% SET %]
and [% FOREACH %]
directives, because [% WHILE %]
, [% CALL %]
and [% SWITCH %]
blocks are next to impossible to extract correctly.
With perl v5.7.1 or earlier, nested capturing may sometimes suffer from off-by-one errors. Later perl versions supports the <$^N> variable and are free from such errors.
There is no support for different PRE_CHOMP and POST_CHOMP settings internally, so extraction could fail silently on wrong places.
NOTES
This module's companion class, Template::Generate, is still missing; it's supposed to take a data structure and the preferred rendering, and automagically generate a template to do the transformation. If you are into related research, please mail any ideas to me.
SEE ALSO
Template, Template::Parser, WWW::SherlockSearch
AUTHORS
Autrijus Tang <autrijus@autrijus.org>
COPYRIGHT
Copyright 2001, 2002, 2003 by Autrijus Tang <autrijus@autrijus.org>.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.