NAME

Algorithm::LossyCount - Memory-efficient approximate frequency count.

version 0.03

SYNOPSIS

``````  use strict;
use warnings;
use Algorithm::LossyCount;

my @samples = qw/a b a c d f a a d b b c a a .../;

my \$counter = Algorithm::LossyCount->new(max_error_ratio => 0.005);

my \$frequencies = \$counter->frequencies;
say \$frequencies->{a};  # Approximate freq. of 'a'.
say \$frequencies->{b};  # Approximate freq. of 'b'.
...``````

DESCRIPTION

Lossy-Counting is a approximate frequency counting algorithm proposed by Manku and Motwani in 2002 (refer "SEE ALSO" section below.)

The main advantage of the algorithm is memory efficiency. You can get approximate count of appearance of items with very low memory footprint, compared with total inspection. Furthermore, Lossy-Counting is an online algorithm. It is applicable to data set such that the size is unknown, and you can take intermediate result anytime.

METHODS

new(max_error_ratio => \$num)

Construcotr. `max_error_ratio` is the only mandatory parameter, that specifies acceptable error ratio. It is an error that give zero or a negative number as the value.

Add given `\$sample` to count.

frequencies([support => \$num])

Returns current result as HashRef. Its keys and values are samples and corresponding counts respectively.

If optional named parameter `support` is specified, returned HashRef will contain only samples having frequency greater than `(\$support - \$max_error_ratio) * \$num_samples`.

max_error_ratio

Returns `max_error_ratio` you've given to the constructor.

num_samples

Returns the total number of samples you've added.

Manku, Gurmeet Singh, and Rajeev Motwani. "Approximate frequency counts over data streams." Proceedings of the 28th international conference on Very Large Data Bases. VLDB Endowment, 2002.

AUTHOR

Koichi SATOH <sekia@cpan.org>

``  The MIT (X11) License``