The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

OpenInteract2::FullTextRules - Rules for automatically indexing SPOPS objects

SYNOPSIS

 # In object's spops.ini file tell OI2 you want your objects to be
 # indexed; with this all 'save()' calls to the object will trigger
 # the object's 'description' and 'title' fields being indexed.
 
 [myobj]
 is_searchable = yes
 fulltext_field = description
 fulltext_field = title

METHODS

SPOPS Ruleset

ruleset_add( $class, \%ruleset_table )

Adds the necessary rules to the $class that puts this class in its ISA. Currently, these rules consist of:

  • post_save_action: reindex this object -- first obliterate all references in the index, then build the references anew (called on both INSERTs and UPDATEs)

  • post_remove_action: remove all references to this object from the index

Internal

_indexable_object_text()

Gets the text out of the object to index. Currently, we treat all text from the object as one big field.

Note that if you have defined 'fulltext_pre_index_method' as a configuration item in your class it is called before indexing. This is useful if you have a method to fetch external data into your object.

_tokenize( $text )

Breaks text down into tokens. This process is very simple. First we break the text into words, then we lower case each word, then we 'stem' each word. Here is a brief description of stemming:

 Truncation - Also referred to as "root/suffix management" or
 "Stemming" or "Word Stemming", truncation allows some search engines
 to recognize and shorten long words such as "plants" or "boating" to
 their root words (or word stems) "plant" and "boat." This makes
 searching for such words much easier because it is not necessary to
 consider every permutation of that word when trying to find it.1 In a
 search, the ability to enter the first part of a keyword, insert a
 symbol (usually *), and accept any variant spellings or word endings,
 from the occurrence of the symbol forward (e.g., femini* retrieves
 feminine, feminism, feminism, etc.).3 See also word variants, plurals
 and singulars.

(From: http://ollie.dcccd.edu/library/Module2/Books/concepts.htm)

We use the Lingua::Stem module for this, which implements the Porter algorithm for stemming, as do most implementations, apparently. (This is something that this class treats as a black box itself :)

Parameters:

  • text ($)

    Text to tokenize

SEE ALSO

OpenInteract2::FullTextIndexer in the 'full_text' package

COPYRIGHT

Copyright (c) 2004-2005 Chris Winters. All rights reserved.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

AUTHORS

Chris Winters <chris@cwinters.com>