Text::TFIDF::Ngram - Compute the TF-IDF measure for ngram phrases
version 0.0207
use Text::TFIDF::Ngram; my $obj = Text::TFIDF::Ngram->new( files => [qw( foo.txt bar.txt )], size => 3, ); my $w = $obj->tf( 'foo.txt', 'foo bar baz' ); my $x = $obj->idf('foo bar baz'); my $y = $obj->tfidf( 'foo.txt', 'foo bar baz' ); printf "TF: %.3f, IDF: %.3f, TFIDF: %.3f\n", $w, $x, $y; my $z = $obj->tfidf_by_file; print Dumper $z;
This module computes the TF-IDF ("term frequency-inverse document frequency") measure for a corpus of text documents.
For a working example program, please see the eg/analyze file in the distribution.
ArrayRef of filenames.
Integer ngram phrase size. Default is 1.
Boolean indicating that phrases with stopwords will be ignored. Default is 1.
HashRef of the ngram counts of each processed file. This is a computed attribute - providing it in the constructor will be ignored.
HashRef of the TF-IDF values in each processed file. This is a computed attribute - providing it in the constructor will be ignored.
$obj = Text::TFIDF::Ngram->new( files => \@files, size => $size, stopwords => $stopwords, );
Create a new Text::TFIDF::Ngram object. If the files argument is passed in, the ngrams of each file is stored.
Text::TFIDF::Ngram
Load the given file phrase counts.
$tf = $obj->tf( $file, $phrase );
Returns the frequency of the given phrase in the document file. This is not the "raw count" of the phrase, but rather the percentage of times it is seen.
$idf = $obj->idf($phrase);
Returns the inverse document frequency of a phrase.
$tfidf = $obj->tfidf( $file, $phrase );
Computes the TF-IDF weight for the given file and phrase. If the phrase is not in the corpus, a warning is issued and undef is returned.
$tfidf = $obj->tfidf_by_file;
Construct a HashRef of all files with all phrases and their tfidf values.
https://en.wikipedia.org/wiki/Tf%E2%80%93idf
Lingua::EN::Ngram
Lingua::StopWords
List::Util
Moo
Gene Boggs <gene@cpan.org>
This software is copyright (c) 2018 by Gene Boggs.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
To install Text::TFIDF::Ngram, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Text::TFIDF::Ngram
CPAN shell
perl -MCPAN -e shell install Text::TFIDF::Ngram
For more information on module installation, please visit the detailed CPAN module installation guide.