Statistics::TopK - Implementation of the top-k streaming algorithm
use Statistics::TopK; my $counter = Statistics::TopK->new(10); while (my $val = <STDIN>) { chomp $val; $counter->add($val); } my @top = $counter->top; my %counts = $counter->counts;
The Statistics::TopK module implements the top-k streaming algorithm, also know as the "heavy hitters" algorithm. It is designed to process data streams and probabilistally calculate the k most frequent items while using limited memory.
Statistics::TopK
k
A typical example would be to determine the top 10 IP addresses listed in an access log. A simple solution would be to hash each IP address to a counter and then sort the resulting hash by the counter size. But the hash could theoretically require over 4 billion keys.
The top-k algorithm only requires storage space proportional to the number of items of interest. It accomplishes this by sacrificing precision, as it is only a probabilistic counter.
$counter = Statistics::TopK->new($k)
Creates a new Statistics::TopK object which is prepared to count the top $k elements.
$k
$count = $counter->add($element)
Count the given $element and return its approximate count (if any) in the Statistics::TopK object.
$element
Note that adding an element does not guarantee it will be counted yet, as the algorithm is probabilistic, and the occurrence of the current element might only be used decrease the count of one of the current top elements.
@top = $counter->top()
Returns a list of the top-k counted elements.
%counts = $counter->counts()
Returns a hash of the top-k counted elements and their counts.
http://en.wikipedia.org/wiki/Streaming_algorithm#Heavy_hitters
Please report any bugs or feature requests to http://rt.cpan.org/Public/Bug/Report.html?Queue=Statistics-TopK. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
You can find documentation for this module with the perldoc command.
perldoc Statistics::TopK
You can also look for information at:
GitHub Source Repository
http://github.com/gray/statistics-topk
AnnoCPAN: Annotated CPAN documentation
http://annocpan.org/dist/Statistics-TopK
CPAN Ratings
http://cpanratings.perl.org/d/Statistics-TopK
RT: CPAN's request tracker
http://rt.cpan.org/Public/Dist/Display.html?Name=Statistics-TopK
Search CPAN
http://search.cpan.org/dist/Statistics-TopK/
Copyright (C) 2009-2015 gray <gray at cpan.org>, all rights reserved.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
gray, <gray at cpan.org>
To install Statistics::TopK, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Statistics::TopK
CPAN shell
perl -MCPAN -e shell install Statistics::TopK
For more information on module installation, please visit the detailed CPAN module installation guide.