The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

KinoSearch::Docs::Tutorial::BeyondSimple - A more flexible app structure.

DESCRIPTION

Goal

In this tutorial chapter, we'll rewrite our app from KinoSearch::Docs::Tutorial::Simple so that it behaves exactly the same, but offers greater possibilites for expansion.

To achieve this, we'll ditch KinoSearch::Simple and replace it with the six classes it uses internally:

Schema

The first item we're going need is a custom subclass of KinoSearch::Schema.

    # USConSchema.pm
    package USConSchema;
    use base 'KinoSearch::Schema';

A Schema subclass is analogous to an SQL table definition. It instructs other entities on how they should interpret the raw data in an inverted index and interact with it. First and foremost, it tells them what fields are available and how they're defined.

Since there's not much you can do with an SQL database before you define any tables, you might wonder how KinoSearch::Simple can add documents to an index without first declaring a Schema. The answer is: Simple modifies the Schema with each call to add_doc. Expanding on our SQL metaphor, it's as if each INSERT were preceded by either CREATE TABLE or UPDATE TABLE as needed. (The techniques used by Simple are described in KinoSearch::Docs::Cookbook::DynamicFields).

Since we know in advance that we're only going to be using three fields, we don't need to resort to such tricks; we can just declare all of them up front.

    our %fields = (
        title   => 'KinoSearch::Schema::FieldSpec',
        content => 'KinoSearch::Schema::FieldSpec',
        url     => 'KinoSearch::Schema::FieldSpec',
    );

Declaring a %fields hash with our is the first of two requirements for subclassing KinoSearch::Schema. The other is declaring an analyzer() subroutine, which must return an object which isa KinoSearch::Analysis::Analyzer:

    use KinoSearch::Analysis::PolyAnalyzer;

    sub analyzer { 
        return KinoSearch::Analysis::PolyAnalyzer->new( language => 'en' );
    }

As the same Schema subclass must, repeat must be used at both index-time and search time, it should be implemented as a free-standing Perl module that both invindexer.plx and search.cgi can use. Finish USConSchema.pm off with the obligatory true value, put it into a place where both your scripts will be able to find it, and adjust file system permissions as needed. This tutorial will assume that you have chosen to locate it in the cgi-bin directory.

Adaptations to invindexer.plx

In the indexing script, we'll replace our Simple object with a KinoSearch::InvIndexer. For the most part, it's a straight-up swap:

    use USConSchema;
    use KinoSearch::InvIndexer;

    ...

    my $invindexer = KinoSearch::InvIndexer->new(
        invindex => USConSchema->read($index_loc),
    );

    ... 

    foreach my $filename (@filenames) {
        my $doc = slurp_and_parse_file($filename);
        $invindexer->add_doc($doc);
    }

There's only one extra step required: at the end of the script, you must call finish() explicity. (KinoSearch::Simple calls finish() implicitly upon object destruction).

    $invindexer->finish;

Adaptations to search.cgi

In our search script, KinoSearch::Simple has served as a thin wrapper around Searcher and Hits. Swapping out Simple for these two classes is straightforward, except for the return value of the search() function.

    use USConSchema;
    use KinoSearch::Searcher;

    ...

    my $searcher = KinoSearch::Searcher->new(
        invindex => USConSchema->read($index_loc),
    );
    my $hits = $searcher->search(    # returns a Hits object, not a hit count
        query      => $q,
        offset     => $offset,
        num_wanted => $hits_per_page,
    );
    my $hit_count = $hits->total_hits;  # get the hit count here

    ...
    
    while ( my $hit = $hits->fetch_hit_hashref ) {
        ...
    }

$simple->search returns a hit count; in contrast, $searcher->search returns a Hits object, from which you may obtain a hit count via the total_hits() method.

Hooray!

Congratulations! Your app does the same thing as before... but now it's a lot easier to customize.

COPYRIGHT

Copyright 2005-2007 Marvin Humphrey

LICENSE, DISCLAIMER, BUGS, etc.

See KinoSearch version 0.20.