NAME

Text::SpeedyFx - tokenize/hash large amount of strings efficiently

VERSION

version 0.004

SYNOPSIS

    use Data::Dumper;
    use Text::SpeedyFx;

    my $sfx = Text::SpeedyFx->new;

    my $words_bag = $sfx->hash('To be or not to be?');
    print Dumper $words_bag;
    #$VAR1 = {
    #          '1422534433' => '1',
    #          '4120516737' => '2',
    #          '1439817409' => '2',
    #          '3087870273' => '1'
    #        };

    my $feature_vector = $sfx->hash_fv("thats the question", 5);
    print Dumper $feature_vector;
    #$VAR1 = [
    #          '0',
    #          '1',
    #          '0',
    #          '1',
    #          '0'
    #        ];

DESCRIPTION

XS implementation of a very fast combined parser/hasher which works well on a variety of bag-of-word problems.

Original implementation is in Java and was adapted for a better Unicode compliance.

METHODS

new([$seed])

Initialize parser/hasher, optionally using a specified $seed (default: 1).

hash($string)

Parses $string and returns a hash reference where keys are hashed tokens and values are respective count.

hash_fv($string, $n)

Parses $string and returns a feature vector with $n elements.

hash_min($string)

Parses $string and returns the hash with the lowest value.

REFERENCES

Extremely Fast Text Feature Extraction for Classification and Indexing by George Forman and Evan Kirshenbaum

AUTHOR

Stanislaw Pusep <stas@sysd.org>

COPYRIGHT AND LICENSE

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

To install Text::SpeedyFx, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Text::SpeedyFx

CPAN shell

perl -MCPAN -e shell
install Text::SpeedyFx

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)