The norm_decoder caches the 256 possible byte => float pairs, obviating the need to call decode_norm over and over for a scoring implementation that knows how to use it.
KinoSearch::Search::Similarity - Calculate how closely two things match.
# ./MySimilarity.pm package MySimilarity; sub length_norm { my ( $self, $num_tokens ) = @_; return $num_tokens == 0 ? 1 : log($num_tokens) + 1; } # ./MySchema.pm package MySchema; use base qw( KinoSearch::Schema ); use MySimilarity; sub similarity { MySimilarity->new }
KinoSearch uses a close approximation of boolean logic for determining which documents match a given query; then it uses a variant of the vector-space model for calculating scores. Much of the match used when calculating these scores is encapsulated within the Similarity class.
Similarity objects are are used internally by KinoSearch's indexing and scoring classes. They are assigned using KinoSearch::Schema and KinoSearch::Schema::FieldSpec.
Only one method is publicly exposed at present.
To build your own Similarity implmentation, provide a new implementation of length_norm() under a new class name. The constructor will inherit the class name properly.
Similarity is implemented as a C-struct object, so you can't add any member variables to it.
my $multiplier = $sim->length_norm($num_tokens);
After a field is broken up into terms at index-time, each term must be assigned a weight. One of the factors in calculating this weight is the number of tokens that the original field.
Typically, we assume that the more tokens in a field, the less important any one of them is -- so that, e.g. 5 mentions of "Kafka" in a short article are given more heft than 5 mentions of "Kafka" in an entire book. The default implementation of length_norm expresses this using an inverted square root.
However, the inverted square root has a tendency to reward very short fields highly, which isn't always appropriate for fields you expect to have a lot of tokens on average. See KinoSearch::Contrib::LongFieldSim for a discussion.
Copyright 2005-2007 Marvin Humphrey
See KinoSearch version 0.20_01.
To install KinoSearch, copy and paste the appropriate command in to your terminal.
cpanm
cpanm KinoSearch
CPAN shell
perl -MCPAN -e shell install KinoSearch
For more information on module installation, please visit the detailed CPAN module installation guide.