The norm_decoder caches the 256 possible byte => float pairs, obviating the need to call decode_norm over and over for a scoring implementation that knows how to use it.

NAME

KinoSearch::Search::Similarity - Calculate how closely two things match.

SYNOPSIS

    # ./MySimilarity.pm
    package MySimilarity;

    sub length_norm { 
        my ( $self, $num_tokens ) = @_;
        return $num_tokens == 0 ? 1 : log($num_tokens) + 1;
    }

    # ./MySchema.pm
    package MySchema;
    use base qw( KinoSearch::Schema );
    use MySimilarity;
    
    sub similarity { MySimilarity->new }

DESCRIPTION

KinoSearch uses a close approximation of boolean logic for determining which documents match a given query; then it uses a variant of the vector-space model for calculating scores. Much of the match used when calculating these scores is encapsulated within the Similarity class.

Similarity objects are are used internally by KinoSearch's indexing and scoring classes. They are assigned using KinoSearch::Schema and KinoSearch::Schema::FieldSpec.

Only one method is publicly exposed at present.

SUBCLASSING

To build your own Similarity implmentation, provide a new implementation of length_norm() under a new class name. The constructor will inherit the class name properly.

Similarity is implemented as a C-struct object, so you can't add any member variables to it.

METHODS

length_norm

    my $multiplier = $sim->length_norm($num_tokens);

After a field is broken up into terms at index-time, each term must be assigned a weight. One of the factors in calculating this weight is the number of tokens that the original field.

Typically, we assume that the more tokens in a field, the less important any one of them is -- so that, e.g. 5 mentions of "Kafka" in a short article are given more heft than 5 mentions of "Kafka" in an entire book. The default implementation of length_norm expresses this using an inverted square root.

However, the inverted square root has a tendency to reward very short fields highly, which isn't always appropriate for fields you expect to have a lot of tokens on average. See KinoSearch::Contrib::LongFieldSim for a discussion.

COPYRIGHT

LICENSE, DISCLAIMER, BUGS, etc.

See KinoSearch version 0.20_01.

To install KinoSearch, copy and paste the appropriate command in to your terminal.

cpanm

cpanm KinoSearch

CPAN shell

perl -MCPAN -e shell
install KinoSearch

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)