OurNet::Site - Extract web pages via templates
use LWP::Simple; use OurNet::Site; my ($query, $hits) = ('autrijus', 10); my $found; # Create a bot $bot = OurNet::Site->new('google'); # Parse the result got from LWP::Simple $bot->callme($self, 0, get($bot->geturl($query, $hits)), \&callmeback); print '*** ' . ($found ? $found : 'No') . ' match(es) found.'; # Callback routine sub callmeback { my ($self, $himself) = @_; foreach my $entry (@{$himself->{response}}) { if ($entry->{url}) { print "*** [$entry->{title}]" . " ($entry->{score})" . " - [$entry->{id}]\n" . " URL: [$entry->{url}]\n" . " $entry->{preview}\n"; $found++; delete($entry->{url}); } } }
This module emulates a typical search engine by reading a XML script defining its aspects, and parses results on-the-fly accordingly.
Note that it also takes Inforia Quest .fmt scripts, available at http://www.inforian.com/. The author of course cannot support this usage.
As per v1.52, Site.pm also accepts Template Toolkit format templates with extention '.tt2' as site descriptors, provided that it contains at least one [% FOREACH entry %] block, and [% SET url.start %] accordingly.
[% FOREACH entry %]
[% SET url.start %]
Note that tt2 support is *highly* experimental and should not be relied upon until a more stable release comes.
Probably lots. Most notably the 'More' facilities is lacking. Also there is no template-generating abilities. This is a must, but I couldn't find enough motivation to do it. Maybe you could.
Currently, tt2 does not (quite) support incremental parsing in conjunction with OurNet::Query.
OurNet::Query
Autrijus Tang <autrijus@autrijus.org>
Copyright 2001 by Autrijus Tang <autrijus@autrijus.org>.
All rights reserved. You can redistribute and/or modify this module under the same terms as Perl itself.
To install OurNet::Site, copy and paste the appropriate command in to your terminal.
cpanm
cpanm OurNet::Site
CPAN shell
perl -MCPAN -e shell install OurNet::Site
For more information on module installation, please visit the detailed CPAN module installation guide.