Template::Extract - Extract data structure from TT2-rendered documents
This document describes version 0.25 of Template::Extract, released September 6, 2003.
use Template::Extract; use Data::Dumper; my $obj = Template::Extract->new; my $template = << '.'; <ul>[% FOREACH record %] <li><A HREF="[% url %]">[% title %]</A>: [% rate %] - [% comment %]. [% ... %] [% END %]</ul> . my $document = << '.'; <html><head><title>Great links</title></head><body> <ul><li><A HREF="http://slashdot.org">News for nerds.</A>: A+ - nice. this text is ignored.</li> <li><A HREF="http://microsoft.com">Where do you want...</A>: Z! - yeah. this text is ignored, too.</li></ul> . print Data::Dumper::Dumper( $obj->extract($template, $document) );
This module is a subclass of the Template toolkit, with added template extraction functionality. It can take a rendered document and its template together, and get the original data structure back, effectively reversing the process function.
process
This module is considered experimental. If you just wish to extract RSS-type information out of a HTML document, WWW::SherlockSearch may be a more robust solution.
This method takes three arguments: the template string, or a reference to it; a document string to match against; and an optional hash reference to store the extracted values into.
Extraction is done by transforming the result from Template::Parser to a highly esoteric regular expression, which utilizes the (?{...}) construct to insert matched parameters into the hash reference.
The special [% ... %] directive is taken as /.*?/s in regex terms, i.e. "ignore everything (as short as possible) between this identifier and the next one". For backward compatibility reasons, [% _ %] and [% __ %] are also accepted.
[% ... %]
/.*?/s
[% _ %]
[% __ %]
You may set $Template::Extract::DEBUG to a true value to display generated regular expressions.
$Template::Extract::DEBUG
Currently, the extract method only handles [% GET %], [% SET %] and [% FOREACH %] directives, because [% WHILE %], [% CALL %] and [% SWITCH %] blocks are next to impossible to extract correctly.
extract
[% GET %]
[% SET %]
[% FOREACH %]
[% WHILE %]
[% CALL %]
[% SWITCH %]
There is no support for different PRE_CHOMP and POST_CHOMP settings internally, so extraction could fail silently on wrong places.
This module's companion class, Template::Generate, is still in early experimental stages; it can take data structures and rendered documents, then automagically generates templates to do the transformation. If you are into related research, please mail any ideas to me.
Template, Template::Generate, Template::Parser
WWW::SherlockSearch
Autrijus Tang <autrijus@autrijus.org>
Copyright 2001, 2002, 2003 by Autrijus Tang <autrijus@autrijus.org>.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
See http://www.perl.com/perl/misc/Artistic.html
To install Template::Extract, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Template::Extract
CPAN shell
perl -MCPAN -e shell install Template::Extract
For more information on module installation, please visit the detailed CPAN module installation guide.