The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

OurNet::Query - Scriptable queries with template extraction

SYNOPSIS

    use OurNet::Query;

    # Set query parameters
    my ($query, $hits) = ('autrijus', 10);
    my @sites = ('google', 'google'); # XXX: write more templates!
    my %found;

    # Generate a new Query object
    my $bot = OurNet::Query->new($query, $hits, @sites);

    # Perform a query
    my $found = $bot->begin(\&callback, 30); # Timeout after 30 seconds

    print '*** ' . ($found ? $found : 'No') . ' match(es) found.';

    sub callback {
        my %entry = @_;
        my $entry = \%entry;

        unless ($found{$entry{url}}) {
            print "*** [$entry->{title}]" .
                     " ($entry->{score})" .
                   " - [$entry->{id}]\n"  .
             "    URL: [$entry->{url}]\n";
        }

        $found{$entry{url}}++;
    }

DESCRIPTION

This module provides an easy interface to perform multiple queries to internet services, and wraps them into your own format at once. The results are processed on-the-fly and are returned via callback functions.

Its interfaces resembles that of WWW::Search's, but implements it in a different fashion. While WWW::Search relies on additional subclasses to parse returned results, OurNet::Query uses site descriptors for search search engine, which makes it much easier to add new backends.

Site descriptors may be written in XML, Template toolkit format, or the .fmt format from the commercial Inforia Quest product.

CAVEATS

The only confirmed, working site descriptor currently is google.tt2. The majority of *.xml descriptors are outdated, and need volunteers to either correct them, or convert them to .tt2 format.

This package is supposedly to magically turn your web pages built with Template Toolkit into web services overnight, using diff-based induction heuristics; but this is not happening yet. Stay tuned.

There should be instructions of how to write templates in various formats.

COMPONENTS

Most Query Toolkit components are independently useful; they rely on several front-end interfaces to glue themselves together.

Full-Text Search Engine (FuzzyIndex)

The indexing module MUST implement an indexing mechanism suitable to handle variable-byte encoding charsets, e.g. big-5 or utf8. Its index file SHOULD NOT require original data be presented, nor exceed the original data size on verage.

Interactive Queries (ChatBot)

The interactive query module MUST accept context-free queries against any indexed database generated by the Search Engine, and provide feedbacks based on the entries contained within. It MUST develop a heuristic to accumulate user input, and build connections between entries based on relevancy.

Template Extraction (Template::Extract)

This component MUST support the Template(3) Toolkit format, and MAY support additional template formats. It MUST be capable of taking a document and the original template used to generated it, and produce the original parameter list.

All simple assignment and loop directives MUST be supported; it SHOULD also accept nested loops and structure elements.

Site Descriptors (Site)

This includes a collection of oft-used web sites, akin to the WWW::Search or Inforia Quest collection. It SHOULD also support basic validation and variable interpolation within the descriptors.

Template Generation (Template::Generate)

This module MUST be able to generate the original template, based on two or more distinct outputs. It SHOULD operate without any prompt of original structures, but MAY draw on such information to increase its accuracy.

Front-End Interface (bin/*)

All above components MUST come with at least one command-line utility, capable of exporting most of their functions to the normal user. The utilities SHOULD assume a common look-and-feel.

Documentation (pod/*)

The Query Toolkit Manual MUST contain a tutorial, an overview of functions, and guides on how to embedd Query components into existing programs.

MILESTONES

Milestone 0 - v1.56 - 2001/09/01

This milestone represents the raw, unconnected state of all tools. It provides all basic functionalities except for template generation, yet offers only fzindex / fzquery as useful user-accessible interfaces.

    FuzzyIndex  big-5 & latin-1 support
    ChatBot     automatic building of default database 
    T::Extract  template toolkit support; nested fetch
    Site        google (as proof-of-concept)
    bin/*       all above interfaces
    pod/*       overview of functions

Milestone 1 - v1.6 - 2001/10/15

This milestone aims to export a consistent interface to other developers, by populating the missing descriptor and documents.

    FuzzyIndex  gb-1312 support
    Site        all major search engines and news sources
    T::Generate simple diff-based heuristic framework
    bin/*       a parallel, configurable sitequery coupled with fzindex
    pod/*       embbed-howto, including win32 COM+ port

Milestone 2 - v1.7 - 2002/01/01

This milestone will be the first feature-complete release of Query Toolkit, capable of being used in a more diversed environment.

SEE ALSO

OurNet::Site

AUTHORS

Autrijus Tang <autrijus@autrijus.org>

COPYRIGHT

Copyright 2001 by Autrijus Tang <autrijus@autrijus.org>.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

See http://www.perl.com/perl/misc/Artistic.html