KinoSearch::Docs::Tutorial::BeyondSimple - A more flexible app structure.
In this tutorial chapter, we'll rewrite our app from KinoSearch::Docs::Tutorial::Simple so that it behaves exactly the same, but offers greater possibilites for expansion.
To achieve this, we'll ditch KinoSearch::Simple and replace it with the six classes it uses internally:
KinoSearch::Schema - Plan out your index.
KinoSearch::Schema::FieldSpec - Define the properties of an index field.
KinoSearch::Analysis::PolyAnalyzer - A one-size-fits-all parser/tokenizer.
KinoSearch::InvIndexer - Manipulate index content.
KinoSearch::Searcher - Search an index.
KinoSearch::Search::Hits - Iterate over hits returned by a Searcher.
The first item we're going need is a custom subclass of KinoSearch::Schema.
# USConSchema.pm package USConSchema; use base 'KinoSearch::Schema';
A Schema subclass is analogous to an SQL table definition. It instructs other entities on how they should interpret the raw data in an inverted index and interact with it. First and foremost, it tells them what fields are available and how they're defined.
Since there's not much you can do with an SQL database before you define any tables, you might wonder how KinoSearch::Simple can add documents to an index without first declaring a Schema. The answer is: Simple modifies the Schema with each call to add_doc. Expanding on our SQL metaphor, it's as if each INSERT were preceded by either CREATE TABLE or UPDATE TABLE as needed. (The techniques used by Simple are described in KinoSearch::Docs::Cookbook::DynamicFields).
add_doc
INSERT
CREATE TABLE
UPDATE TABLE
Since we know in advance that we're only going to be using three fields, we don't need to resort to such tricks; we can just declare all of them up front.
our %fields = ( title => 'KinoSearch::Schema::FieldSpec', content => 'KinoSearch::Schema::FieldSpec', url => 'KinoSearch::Schema::FieldSpec', );
Declaring a %fields hash with our is the first of two requirements for subclassing KinoSearch::Schema. The other is declaring an analyzer() subroutine, which must return an object which isa KinoSearch::Analysis::Analyzer:
%fields
our
use KinoSearch::Analysis::PolyAnalyzer; sub analyzer { return KinoSearch::Analysis::PolyAnalyzer->new( language => 'en' ); }
As the same Schema subclass must, repeat must be used at both index-time and search time, it should be implemented as a free-standing Perl module that both invindexer.plx and search.cgi can use. Finish USConSchema.pm off with the obligatory true value, put it into a place where both your scripts will be able to find it, and adjust file system permissions as needed. This tutorial will assume that you have chosen to locate it in the cgi-bin directory.
invindexer.plx
search.cgi
use
USConSchema.pm
In the indexing script, we'll replace our Simple object with a KinoSearch::InvIndexer. For the most part, it's a straight-up swap:
use USConSchema; use KinoSearch::InvIndexer; ... my $invindexer = KinoSearch::InvIndexer->new( invindex => USConSchema->read($index_loc), ); ... foreach my $filename (@filenames) { my $doc = slurp_and_parse_file($filename); $invindexer->add_doc($doc); }
There's only one extra step required: at the end of the script, you must call finish() explicity. (KinoSearch::Simple calls finish() implicitly upon object destruction).
$invindexer->finish;
In our search script, KinoSearch::Simple has served as a thin wrapper around Searcher and Hits. Swapping out Simple for these two classes is straightforward, except for the return value of the search() function.
use USConSchema; use KinoSearch::Searcher; ... my $searcher = KinoSearch::Searcher->new( invindex => USConSchema->read($index_loc), ); my $hits = $searcher->search( # returns a Hits object, not a hit count query => $q, offset => $offset, num_wanted => $hits_per_page, ); my $hit_count = $hits->total_hits; # get the hit count here ... while ( my $hit = $hits->fetch_hit_hashref ) { ... }
$simple->search returns a hit count; in contrast, $searcher->search returns a Hits object, from which you may obtain a hit count via the total_hits() method.
$simple->search
$searcher->search
Congratulations! Your app does the same thing as before... but now it's a lot easier to customize.
Copyright 2005-2007 Marvin Humphrey
See KinoSearch version 0.20.
To install KinoSearch, copy and paste the appropriate command in to your terminal.
cpanm
cpanm KinoSearch
CPAN shell
perl -MCPAN -e shell install KinoSearch
For more information on module installation, please visit the detailed CPAN module installation guide.