The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Lingua::JA::TFIDF - TF/IDF calculator based on MeCab.

SYNOPSIS

  use Lingua::JA::TFIDF;
  use Data::Dumper;

  my $calc = Lingua::JA::TFIDF->new(%config);

  # calculate TF/IDF and return a result object.
  my $result = $calc->tfidf($text);
  print Dumper $result->list;

  # dump the result object.
  print Dumper $result->dump

  # or calculate just TF 
  print Dumper $calc->tf($text)->list;

DESCRIPTION

* This software is still in alpha release *

Lingua::JA::TFIDF is TF/IDF calculator based on MeCab. It has DF(Document Frequency) data set that was fetched from Yahoo Search API, beforehand.

METHODS

new(%config)

Instantiates a new Lingua::JA::TFIDF object. Takes the following parameters (optional).

  my $calc = Lingua::JA::TFIDF->new(
    df_file         => 'my_df_file',           # default is undef
    ng_word         => \@original_ngword,      # default is undef
    fetch_df        => 1,                      # default is undef
    fetch_df_save   => 'my_df_file',           # default is undef
    LWP_UserAgent   => \%lwp_useragent_config, # default is undef
    XML_TreePP      => \%xml_treepp_config,    # default is undef
    yahoo_api_appid => $myid,                  # default is undef
  );

tfidf($text);

Calculates TF/IDF score. If the text includes unknown words, Document Frequency score of unknown words are replaced the average score of known words. If you set TRUE value to fetch_df parameter on constructor, the calculator fetches the unknown word from Yahoo Search API.

tf($text);

Calculates TF score.

ng_word

Accessor method. You can replace NG word.

mecab

Inner accessor method.

df_data

Inner accessor method.

fetcher

Inner accessor method.

AUTHOR

Takeshi Miki <miki@cpan.org>

LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO