NAME

Text::TFIDF - Perl extension for computing the TF-IDF measure

SYNOPSIS

use Text::TFIDF;
my $Obj = new Text::TFIDF(file=>[file1,file2...]);
print $Obj->TFIDF($file,$word);

DESCRIPTION

The TF-IDF weight (ie, Frequency-Inverse Document Frequency) weight is used in information retrieval and text mining. It is a statistical measure used to see how important a word is in a document or collection of documents. This module is designed to only work on text documents at this time.

Currently, the module reads everything into memory. This should be altered in the future.

EXPORT

None by default.

new(file=>\@files)

Creates a new module. If the file argument is passed in, populates the module using those files.

TFIDF(file,word)

Computes the TF-IDF weight for the given document and word. If the file is not in the corpus used to populate the module, returns undef

TF(file,word)

Returns the frequency of the given word in the document.

IDF(word)

Returns the inverse document frequency of a word. That is, the ratio of the number of documents in the corpus divided by the number of documents containing the term and taking the logarithm of the result. Since the number of documents containing the term can be zero, we add one to the result to ensure a rational result.

process_files(@files)

Populates the document with the given list of files. This does not replace data currently in the document, rather, it adds to the list.

AUTHOR

Leigh Metcalf, <leigh@fprime.net<gt>

COPYRIGHT AND LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.12.3 or, at your option, any later version of Perl 5 you may have available.

To install Text::TFIDF, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Text::TFIDF

CPAN shell

perl -MCPAN -e shell
install Text::TFIDF

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)