OurNet::Site - Extract web pages via templates
use LWP::Simple; use OurNet::Site; my ($query, $hits) = ('autrijus', 10); my $found; # Create a bot $bot = OurNet::Site->new('google'); # Parse the result got from LWP::Simple $bot->callme($self, 0, get($bot->geturl($query, $hits)), \&callmeback); print '*** ' . ($found ? $found : 'No') . ' match(es) found.'; # Callback routine sub callmeback { my ($self, $himself) = @_; foreach my $entry (@{$himself->{response}}) { if ($entry->{url}) { print "*** [$entry->{title}]" . " ($entry->{score})" . " - [$entry->{id}]\n" . " URL: [$entry->{url}]\n" . " $entry->{preview}\n"; $found++; delete($entry->{url}); } } }
This module parses results returned from a typical search engine by reading a 'site descriptor' file defining its aspects, and parses results on-the-fly accordingly.
Since v1.52, OurNet::Site uses site descriptors in Template toolkit format with extention '.tt2' by default. The template should contains at least one [% FOREACH entry %] block, and [% SET url.start %] accordingly.
[% FOREACH entry %]
[% SET url.start %]
Alternatively, you can use a special XML format for site descriptor. See the .xml files in the Site directory for examples.
Finally, it also takes Inforia Quest .fmt-style site descriptors, available at http://www.pasia.com/. The author of course cannot support this usage.
Note that tt2 support is *highly* experimental and should not be relied upon until a more stable release comes.
Probably lots. Most notably the 'More' facilities is lacking. Also there is no template-generating abilities. This is a must, but I couldn't find enough motivation to do it. Maybe you could.
Currently, tt2 does not (quite) support incremental parsing in conjunction with OurNet::Query.
Also, the XML spec of site descriptor is not well-formed, let alone of a complete XML Schema or DTD description.
OurNet::Template, OurNet::Query
Autrijus Tang <autrijus@autrijus.org>
Copyright 2001 by Autrijus Tang <autrijus@autrijus.org>.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
See http://www.perl.com/perl/misc/Artistic.html
To install OurNet::Query, copy and paste the appropriate command in to your terminal.
cpanm
cpanm OurNet::Query
CPAN shell
perl -MCPAN -e shell install OurNet::Query
For more information on module installation, please visit the detailed CPAN module installation guide.