NAME

Redis::NaiveBayes - A generic Redis-backed NaiveBayes implementation

VERSION

version 0.0.4

SYNOPSIS

    my $tokenizer = sub {
        my $input = shift;

        my %occurs;
        $occurs{$_}++ for split(/\s/, lc $input);

        return \%occurs;
    };

    my $bayes = Redis::NaiveBayes->new(
        namespace => 'playground:',
        tokenizer => \&tokenizer,
    );

DESCRIPTION

This distribution provides a very simple NaiveBayes classifier backed by a Redis instance. It uses the evalsha functionality available since Redis 2.6.0 to try to speed things up while avoiding some obvious race conditions during the untrain() phase.

The goal of Redis::NaiveBayes is to keep dependencies at minimum while being as generic as possible to allow any sort of usage. By design, it doesn't provide any sort of tokenization nor filtering out of the box.

METHODS

new

    my $bayes = Redis::NaiveBayes->new(
        namespace  => 'playground:',
        tokenizer  => \&tokenizer,
        correction => 0.1,
        redis      => $redis_instance,
    );

Instantiates a Redis::NaiveBayes instance using the provided correction, namespace and tokenizers.

If provided, it also uses a Redis instance (redis parameter) instead of instantiating one by itself.

A tokenizer is any subroutine that returns a HASHREF of occurrences in the item provided for train()ining or classify()ing.

flush

    $bayes->flush;

Cleanup all the possible keys this classifier instance could've touched. If you want to clean everything under the provided namespace, call _mrproper() instead, but beware that it will delete all the keys that match namespace*.

train

    $bayes->train("ham", "this is a good message");
    $bayes->train("spam", "price from Nigeria needs your help");

Trains as a label ("ham") the given item. The item can be any arbitrary structure as long as the provided tokenizer understands it.

untrain

    $bayes->untrain("ham", "I don't thing this message is good anymore")

The opposite of train().

classify

    my $label = $bayes->classify("Nigeria needs help");
    >>> "spam"

Gets the most probable category the provided item in is.

scores

    my $scores = $bayes->scores("any sort of message");

Returns a HASHREF with the scores for each of the labels known by the model

NOTES

This module is heavilly inspired by the Python implementation available at https://github.com/jart/redisbayes - the main difference, besides the obvious language choice, is that Redis::NaiveBayes focuses on being generic and minimizing the number of roundtrips to Redis.

TODO

Add support for additive smoothing

AUTHORS

Caio Romão <cpan@caioromao.com>
Stanislaw Pusep <stas@sysd.org>

COPYRIGHT AND LICENSE

This is free software, licensed under:

  The MIT (X11) License

To install Redis::NaiveBayes, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Redis::NaiveBayes

CPAN shell

perl -MCPAN -e shell
install Redis::NaiveBayes

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)