NAME

Algorithm::BayesianSets - perl implementation of Bayesian Sets

SYNOPSIS

  use Algorithm::BayesianSets;
  
  my $bs = Algorithm::BayesianSets->new;

  # add documents
  my %documents = (
      apple  => {
          fruit => 1,
          red   => 1,
      },
      banana => {
          fruit  => 1,
          yellow => 1,
      },
      cherry => {
          fruit => 1,
          pink  => 1,
      },
  );
  foreach my $id (keys %documents) {
      $bs->add_document($id, $documents{$id});
  }
  
  # calc alpha/beta parameters
  $bs->calc_parameters();
  
  # get similar documents
  my @queries = qw(apple);
  my $scores = $bs->calc_similarities(\@queries);
  
  # show output
  foreach my $id (keys %{ $scores }) {
      printf "%s\t%.4f\n", $id, $scores->{$id};
  }

DESCRIPTION

Algorithm::BayesianSets is a perl implementation of Bayesian Sets algorithm.

METHODS

new($threshold)

Create a new instance.

$threshold parameter is the threshold of the degree of document features. In add_document method, if the degree of the feature is less than the threshold, the feature isn't used.

add_document($id, $vector)

Add an input document to the instance of Algorithm::BayesianSets. $id parameter is the identifier of a document, and $vector parameter is the feature vector of a document. $vector parameter must be a hash reference, each key of $vector parameter is the identifier of the feature of documents and each value of $vector is the degree of the feature.

calc_parameters($c)

Calculate the alpha and beta parameters which are used in Bayesian Sets algorithm. $c parameter must be a real number (Default: 2.0).

calc_similarities($queries)

Calculate the similarities between the queries and input documents using Bayesian Sets algorithm. $queries parameter must be array reference, and each query in $queries needs to be included in the identifiers of input documents.

The output of this method is a hash reference, each key of the hash reference is the identifier of an input document and each value is the similarity between the queries and an input document.

_average_vector($vectors)

Get the average vector of input vectors.

_inner_product($vector1, $vector2)

Calculate the inner product value of input vectors.

AUTHOR

Mizuki Fujisawa <fujisawa@bayon.cc>

LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

To install Algorithm::BayesianSets, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Algorithm::BayesianSets

CPAN shell

perl -MCPAN -e shell
install Algorithm::BayesianSets

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)