The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

OpenInteract2::FullTextIndexer - Base class for OI2 indexers

SYNOPSIS

 my $indexer = CTX->fulltext_indexer;
 
 # Or lookup a specific indexer:
 my $indexer = CTX->fulltext_indexer( 'Plucene' );
 
 # Add something to the index
 $indexer->add_to_index( 'page', '/foo/listing.html', \$foo_content );
 
 # Remove all index entries for something
 $indexer->remove_from_index( 'page', '/foo/listing.html' );
 
 # Refresh the index for a particular item
 $indexer->refresh_index( 'page', '/foo/listing.html', \$new_foo_content );
 
 # Search the index with default 'return_type' = 'object'
 my $results = $indexer->search_index({
     search_type => 'all',
     terms       => [ 'ulysses', 'grant' ],
 });
 foreach my $result ( @{ $results } ) {
     my $object = $result->[0];
     my $score  = $result->[1];
     print "Object ", ref( $object ), " with ID ", $object->id, " ",
           "was found with a score of $score\n";
 }
 
 # Search the index with different return types
 
 # return type of 'iterator' returns OpenInteract2::FullTextIterator
 
 my $results = $indexer->search_index({
     search_type => 'all',
     terms       => [ 'ulysses', 'grant' ],
     return_type => 'iterator',
 });
 while ( my $object = $results->get_next ) {
     print "Object ", ref( $object ), " with ID ", $object->id, " ",
           "was found\n";
 }
 
 # get additional information from iterator...
 while ( my ( $object, $item_num, $score ) = $results->get_next ) {
     print "Object $item_num is a ", ref( $object ), " with ID ",
           $object->id, " and a score of $score\n";
 }
 
 # return type of 'raw' returns arrayref of arrayrefs
 
 my $results = $indexer->search_index({
     search_type => 'all',
     terms       => [ 'ulysses', 'grant' ],
     return_type => 'raw',
 });
 foreach my $result ( @{ $results } ) {
     my ( $class, $id, $full_score, $score_info ) = @{ $result };
     print "Object $class with ID $id was found with total score ",
           "$full_score and individual term scores:\n";
     foreach my $term ( keys %{ $score_info } ) {
         print "  * $term: $score_info->{$term}\n";
     }
 }

DESCRIPTION

This is the base class for full-text indexers in OpenInteract2. All objects returned by the OpenInteract2::Context method fulltext_indexer() will meet this interface.

METHODS

Public Interface

new( \%params )

Instantiates a new indexer with parameters \%params.

You should not call this directly but instead get an indexer from the OpenInteract2::Context object:

 # get the default indexer
 my $indexer = CTX->fulltext_indexer;
 
 # get a specific indexer
 my $indexer = CTX->fulltext_indexer( 'soundex' );

add_to_index( $content_class, $content_id, \$content_text )

Indexes the text in the scalar reference \$content_text, categorizing it with $content_class and $content_id. The text in \$content_text is not modified by this operation.

While $content_class is typically an SPOPS subclass, it does not have to be. The class merely has to be able to retrieve, identify and describe an object. To do this it must implement:

  • Class method: fetch( $id )

    Returns an object with identifier $id.

  • Object method: id()

    Returns the identifier for an object.

  • Object method: object_description()

    Should return a hashref with the keys as described in SPOPS under object_description().

refresh_index( $content_class, $content_id, \$content_ref )

Removes existing records from the index marked by $content_class and $content_id then indexes \$content_ref.

remove_from_index( $content_class, $content_id )

Deletes all records from the index marked by $content_class and $content_id.

search_index( \%params )

Searches the index given the data in \%params:

  • terms (\@)

    Arrayref of terms to search for.

  • search_type ($): 'all' (default) or 'any'

    Determines if matching records must have all or any of the given terms.

  • return_type ($): 'object' (default), 'iterator' or 'raw'

    Determines what type of data to return.

    Using 'object' means you get back an arrayref of two-item arrayrefs -- the first is the object, the second the match score.

    Using 'iterator' means you get back a OpenInteract2::FullTextIterator object.

    Using 'raw' means you get back an arrayref of four-item arrayrefs - the first is the class, the second the ID, the third the full-score for this match and the fourth a hashref of match scores the keys as the terms searched and the values the match score for that term. (Generally this is just a count of the number of occurrences, but implementations are free to do whatever they want.)

SUBCLASSING

Optional Methods

In addition to overriding the interface method search_index() subclasses can implement:

init( \%params )

Gives you a chance to set values from \%params in the object.

No return value necessary.

_screen_results( $search_type, $results, @search_terms )

Remove any records from $results -- which is the return value from _run_search(), below -- that do not correspond to $search_type. The default implementation only acts when given a $search_type of 'all', removing records that do not have matches for all the @search_terms.

Return value should be an arrayref of the new results.

Mandatory Methods

Subclasses must implement:

add_to_index( $content_class, $content_id, \$content_ref )

remove_from_index( $content_class, $content_id )

_run_search( $search_type, @search_terms)

The $search_type is either 'any' or 'all'. This should only return an arrayref of records like this:

 [ $class, $id, full-score, { search-term => term-score, ... } ]

SEE ALSO

OpenInteract2::FullTextIterator

The 'full_text' package shipped with OI2.

COPYRIGHT

Copyright (c) 2004 Chris Winters. All rights reserved.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

AUTHORS

Chris Winters <chris@cwinters.com>