OurNet::Query - Scriptable queries with template extraction
use OurNet::Query; # Set query parameters my ($query, $hits) = ('autrijus', 10); my @sites = ('google', 'google'); # XXX: write more templates! my %found; # Generate a new Query object my $bot = OurNet::Query->new($query, $hits, @sites); # Perform a query my $found = $bot->begin(\&callback, 30); # Timeout after 30 seconds print '*** ' . ($found ? $found : 'No') . ' match(es) found.'; sub callback { my %entry = @_; my $entry = \%entry; unless ($found{$entry{url}}) { print "*** [$entry->{title}]" . " ($entry->{score})" . " - [$entry->{id}]\n" . " URL: [$entry->{url}]\n"; } $found{$entry{url}}++; }
This module provides an easy interface to perform multiple queries to internet services, and wraps them into your own format at once. The results are processed on-the-fly and are returned via callback functions.
Its interfaces resembles that of WWW::Search's, but implements it in a different fashion. While WWW::Search relies on additional subclasses to parse returned results, OurNet::Query uses site descriptors for search search engine, which makes it much easier to add new backends.
Site descriptors may be written in XML, Template toolkit format, or the .fmt format from the commercial Inforia Quest product.
The only confirmed, working site descriptor currently is google.tt2. The majority of *.xml descriptors are outdated, and need volunteers to either correct them, or convert them to .tt2 format.
.tt2
This package is supposedly to magically turn your web pages built with Template Toolkit into web services overnight, using diff-based induction heuristics; but this is not happening yet. Stay tuned.
There should be instructions of how to write templates in various formats.
Most Query Toolkit components are independently useful; they rely on several front-end interfaces to glue themselves together.
The indexing module MUST implement an indexing mechanism suitable to handle variable-byte encoding charsets, e.g. big-5 or utf8. Its index file SHOULD NOT require original data be presented, nor exceed the original data size on verage.
The interactive query module MUST accept context-free queries against any indexed database generated by the Search Engine, and provide feedbacks based on the entries contained within. It MUST develop a heuristic to accumulate user input, and build connections between entries based on relevancy.
This component MUST support the Template(3) Toolkit format, and MAY support additional template formats. It MUST be capable of taking a document and the original template used to generated it, and produce the original parameter list.
Template(3)
All simple assignment and loop directives MUST be supported; it SHOULD also accept nested loops and structure elements.
This includes a collection of oft-used web sites, akin to the WWW::Search or Inforia Quest collection. It SHOULD also support basic validation and variable interpolation within the descriptors.
WWW::Search
This module MUST be able to generate the original template, based on two or more distinct outputs. It SHOULD operate without any prompt of original structures, but MAY draw on such information to increase its accuracy.
All above components MUST come with at least one command-line utility, capable of exporting most of their functions to the normal user. The utilities SHOULD assume a common look-and-feel.
The Query Toolkit Manual MUST contain a tutorial, an overview of functions, and guides on how to embedd Query components into existing programs.
This milestone represents the raw, unconnected state of all tools. It provides all basic functionalities except for template generation, yet offers only fzindex / fzquery as useful user-accessible interfaces.
FuzzyIndex big-5 & latin-1 support ChatBot automatic building of default database T::Extract template toolkit support; nested fetch Site google (as proof-of-concept) bin/* all above interfaces pod/* overview of functions
This milestone aims to export a consistent interface to other developers, by populating the missing descriptor and documents.
FuzzyIndex gb-1312 support Site all major search engines and news sources T::Generate simple diff-based heuristic framework bin/* a parallel, configurable sitequery coupled with fzindex pod/* embbed-howto, including win32 COM+ port
This milestone will be the first feature-complete release of Query Toolkit, capable of being used in a more diversed environment.
OurNet::Site
Autrijus Tang <autrijus@autrijus.org>
Copyright 2001 by Autrijus Tang <autrijus@autrijus.org>.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
See http://www.perl.com/perl/misc/Artistic.html
To install OurNet::Query, copy and paste the appropriate command in to your terminal.
cpanm
cpanm OurNet::Query
CPAN shell
perl -MCPAN -e shell install OurNet::Query
For more information on module installation, please visit the detailed CPAN module installation guide.