The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Algorithm::BayesianSets - perl implementation of Bayesian Sets

SYNOPSIS

  use Algorithm::BayesianSets;
  
  my $bs = Algorithm::BayesianSets->new;

  # add documents
  my %documents = (
      apple  => {
          fruit => 1,
          red   => 1,
      },
      banana => {
          fruit  => 1,
          yellow => 1,
      },
      cherry => {
          fruit => 1,
          pink  => 1,
      },
  );
  foreach my $id (keys %documents) {
      $bs->add_document($id, $documents{$id});
  }
  
  # calc alpha/beta parameters
  $bs->calc_parameters();
  
  # get similar documents
  my @queries = qw(apple);
  my $scores = $bs->calc_similarities(\@queries);
  
  # show output
  foreach my $id (keys %{ $scores }) {
      printf "%s\t%.4f\n", $id, $scores->{$id};
  }

DESCRIPTION

Algorithm::BayesianSets is a perl implementation of Bayesian Sets algorithm.

METHODS

new($threshold)

Create a new instance.

$threshold parameter is the threshold of the degree of document features. In add_document method, if the degree of the feature is less than the threshold, the feature isn't used.

add_document($id, $vector)

Add an input document to the instance of Algorithm::BayesianSets. $id parameter is the identifier of a document, and $vector parameter is the feature vector of a document. $vector parameter must be a hash reference, each key of $vector parameter is the identifier of the feature of documents and each value of $vector is the degree of the feature.

calc_parameters($c)

Calculate the alpha and beta parameters which are used in Bayesian Sets algorithm. $c parameter must be a real number (Default: 2.0).

calc_similarities($queries)

Calculate the similarities between the queries and input documents using Bayesian Sets algorithm. $queries parameter must be array reference, and each query in $queries needs to be included in the identifiers of input documents.

The output of this method is a hash reference, each key of the hash reference is the identifier of an input document and each value is the similarity between the queries and an input document.

_average_vector($vectors)

Get the average vector of input vectors.

_inner_product($vector1, $vector2)

Calculate the inner product value of input vectors.

AUTHOR

Mizuki Fujisawa <fujisawa@bayon.cc>

SEE ALSO

Bayesian Sets (Paper)

http://www.gatsby.ucl.ac.uk/~heller/bsets.pdf

bsets, The Bayesian Sets Algorithm (Matlab code)

http://chasen.org/~daiti-m/dist/bsets/

LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.