NAME

KinoSearch::Analysis::Tokenizer - Split a string into tokens.

DEPRECATED

The KinoSearch code base has been assimilated by the Apache Lucy project. The "KinoSearch" namespace has been deprecated, but development continues under our new name at our new home: http://lucy.apache.org/

SYNOPSIS

my $whitespace_tokenizer
    = KinoSearch::Analysis::Tokenizer->new( pattern => '\S+' );

# or...
my $word_char_tokenizer
    = KinoSearch::Analysis::Tokenizer->new( pattern => '\w+' );

# or...
my $apostrophising_tokenizer = KinoSearch::Analysis::Tokenizer->new;

# Then... once you have a tokenizer, put it into a PolyAnalyzer:
my $polyanalyzer = KinoSearch::Analysis::PolyAnalyzer->new(
    analyzers => [ $case_folder, $word_char_tokenizer, $stemmer ], );

DESCRIPTION

Generically, "tokenizing" is a process of breaking up a string into an array of "tokens". For instance, the string "three blind mice" might be tokenized into "three", "blind", "mice".

KinoSearch::Analysis::Tokenizer decides where it should break up the text based on a regular expression compiled from a supplied pattern matching one token. If our source string is...

"Eats, Shoots and Leaves."

... then a "whitespace tokenizer" with a pattern of "\\S+" produces...

Eats, 
Shoots 
and 
Leaves.

... while a "word character tokenizer" with a pattern of "\\w+" produces...

Eats 
Shoots 
and 
Leaves

... the difference being that the word character tokenizer skips over punctuation as well as whitespace when determining token boundaries.

CONSTRUCTORS

new( [labeled params] )

my $word_char_tokenizer = KinoSearch::Analysis::Tokenizer->new(
    pattern => '\w+',    # required
);

pattern - A string specifying a Perl-syntax regular expression which should match one token. The default value is \w+(?:[\x{2019}']\w+)*, which matches "it's" as well as "it" and "O'Henry's" as well as "Henry".

INHERITANCE

KinoSearch::Analysis::Tokenizer isa KinoSearch::Analysis::Analyzer isa KinoSearch::Object::Obj.

COPYRIGHT AND LICENSE

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

To install KSx::Simple, copy and paste the appropriate command in to your terminal.

cpanm

cpanm KSx::Simple

CPAN shell

perl -MCPAN -e shell
install KSx::Simple

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	Go to GitHub issues (only if GitHub is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)