NAME
Lucy::Analysis::Analyzer - Tokenize/modify/filter text.
SYNOPSIS
# Abstract base class.
DESCRIPTION
An Analyzer is a filter which processes text, transforming it from one form into another. For instance, an analyzer might break up a long text into smaller pieces (RegexTokenizer), or it might perform case folding to facilitate case-insensitive search (Normalizer).
CONSTRUCTORS
new
package MyAnalyzer;
use base qw( Lucy::Analysis::Analyzer );
our %foo;
sub new {
my $self = shift->SUPER::new;
my %args = @_;
$foo{$$self} = $args{foo};
return $self;
}
Abstract constructor. Takes no arguments.
ABSTRACT METHODS
transform
my $inversion = $analyzer->transform($inversion);
Take a single Inversion as input and returns an Inversion, either the same one (presumably transformed in some way), or a new one.
inversion - An inversion.
METHODS
transform_text
my $inversion = $analyzer->transform_text($text);
Kick off an analysis chain, creating an Inversion from string input. The default implementation simply creates an initial Inversion with a single Token, then calls transform(), but occasionally subclasses will provide an optimized implementation which minimizes string copies.
text - A string.
split
my $arrayref = $analyzer->split($text);
Analyze text and return an array of token texts.
text - A string.
dump
my $obj = $analyzer->dump();
Dump the analyzer as hash.
Subclasses should call dump() on the superclass. The returned object is a hash which should be populated with parameters of the analyzer.
Returns: A hash containing a description of the analyzer.
load
my $obj = $analyzer->load($dump);
Reconstruct an analyzer from a dump.
Subclasses should first call load() on the superclass. The returned object is an analyzer which should be reconstructed by setting the dumped parameters from the hash contained in dump
.
Note that the invocant analyzer is unused.
dump - A hash.
Returns: An analyzer.
INHERITANCE
Lucy::Analysis::Analyzer isa Clownfish::Obj.