The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

KinoSearch::InvIndexer - Build inverted indexes.

SYNOPSIS

    use KinoSearch::InvIndexer;
    use MySchema;

    my $invindexer = KinoSearch::InvIndexer->new(
        invindex => MySchema->clobber('/path/to/invindex'),
    );

    while ( my ( $title, $content ) = each %source_docs ) {
        $invindexer->add_doc({
            title   => $title,
            content => $content,
        });
    }

    $invindexer->finish;

DESCRIPTION

The InvIndexer class is KinoSearch's primary tool for managing the content of inverted indexes, which may later be searched using KinoSearch::Searcher.

Concurrency

Only one InvIndexer may write to an invindex at a time. If a write lock cannot be secured, new() will throw an exception.

Indexes shared among multiple machines require special handling. First, be sure to read KinoSearch::Docs::NFS if you are considering locating an index on an NFS volume. Second, it is essential that every machine writing to a shared index identify itself with a unique lock_id, or the locking mechanism will malfunction.

METHODS

new

    my $invindex = MySchema->clobber('/path/to/invindex');
    my $invindexer = KinoSearch::InvIndexer->new(
        invindex    => $invindex,  # required
        lock_id     => $hostname   # default: ''
    );

Constructor. Takes labeled parameters.

  • invindex - An object of type KinoSearch::InvIndex.

  • lock_id - a string which differentiates this machine from others which may try to write to the same invindex.

add_doc

    $invindexer->add_doc( { field_name => $field_value } );
    # or ...
    $invindexer->add_doc( { field_name => $field_value }, boost => 2.5 );

Add a document to the invindex. The first argument must be a reference to hash comprised of field_name => field_value pairs. Ownership of the hash is assumed by the InvIndexer object.

After the hashref, labeled parameters are accepted.

  • <boost> - A scoring multiplier. Setting boost to something other than 1 causes a document to score better or worse against a given query relative to other documents.

add_invindexes

    $invindexer->add_invindexes( $another_invindex, $yet_another_invindex );

Absorb existing invindexes into this one. The other invindexes must use the same Schema as the invindex which was supplied to new().

delete_by_term

    $invindexer->delete_by_term( $field_name, $term_text );

Mark documents which contains the supplied term as deleted, so that they will be excluded from search results. The change is not apparent to search apps until a new Searcher is opened after finish() completes.

If the field is associated with an analyzer, $term_text will be processed automatically (so don't process it yourself).

$field_name must identify an indexed field, or an error will occur.

finish

    $invindexer->finish( 
        optimize => 1, # default: 0
    );

Finish processing any changes made to the invindex and commit. Until the commit happens near the end of the finish(), none of the changes made during an indexing session are permanent.

Calling finish() invalidates the InvIndexer, so if you want to make more changes you'll need a new one.

Takes one labeled parameter:

  • optimize - If optimize is set to 1, the invindex will be collapsed to its most compact form, a process which may take a while -- but which will yield the fastest queries at search time.

COPYRIGHT

Copyright 2005-2007 Marvin Humphrey

LICENSE, DISCLAIMER, BUGS, etc.

See KinoSearch version 0.20_01.