The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

KinoSearch::InvIndexer - build inverted indexes

WARNING

KinoSearch is alpha test software. The API and the file format are subject to change.

SYNOPSIS

    use KinoSearch::InvIndexer;
    use KinoSearch::Analysis::PolyAnalyzer;

    my $analyzer
        = KinoSearch::Analysis::PolyAnalyzer->new( language => 'en' );

    my $invindexer = KinoSearch::InvIndexer->new(
        invindex => '/path/to/invindex',
        create   => 1,
        analyzer => $analyzer,
    );

    $invindexer->spec_field( name => 'title' );
    $invindexer->spec_field( name => 'bodytext' );

    while ( my ( $title, $bodytext ) = each %source_documents ) {
        my $doc = $invindexer->new_doc($title);

        $doc->set_value( title    => $title );
        $doc->set_value( bodytext => $bodytext );

        $invindexer->add_doc($doc);
    }

    $invindexer->finish;

DESCRIPTION

The InvIndexer class is KinoSearch's primary tool for creating and modifying inverted indexes, which may be searched using KinoSearch::Searcher.

METHODS

new

    my $invindexer = KinoSearch::InvIndexer->new(
        invindex => '/path/to/invindex',  # required
        create   => 1,                    # default: 0
        analyzer => $analyzer,            # default: no-op Analyzer
    );

Create an InvIndexer object.

spec_field

    $invindexer->spec_field(
        name       => 'url',      # required
        analyzer   => undef,      # default: analyzer spec'd in new()
        indexed    => 1,          # default: 1
        analyzed   => 0,          # default: 1
        stored     => 0,          # default: 1
        compressed => 0,          # default: 0
    );

Define a field. This is analogous to defining a field in a database.

  • name - the field's name.

  • analyzer - By default, all indexed fields are analyzed using the analyzer that was supplied to new(). Supplying an alternate for a given field overrides the primary analyzer.

  • indexed - index the field, so that it can be searched later.

  • analyzed - analyze the field, using the relevant Analyzer. Fields such as "category" or "product_number" might be indexed but not analyzed.

  • stored - store the field, so that it can be retrieved when the document turns up in a search.

  • compressed - compress the stored field, using the zlib compression algorithm.

new_doc

    my $doc = $invindexer->new_doc;

Spawn an empty KinoSearch::Document::Doc object, primed to accept values for the fields spec'd by spec_field.

add_doc

    $invindexer->add_doc($doc);

Add a document to the invindex.

finish

    $invindexer->finish;

Finish the invindex. Invalidates the InvIndexer.

COPYRIGHT

Copyright 2005-2006 Marvin Humphrey

LICENSE, DISCLAIMER, BUGS, etc.

See KinoSearch version 0.05_05.