Mizuki Fujisawa


Algorithm::BayesianSets - perl implementation of Bayesian Sets


  use Algorithm::BayesianSets;
  my $bs = Algorithm::BayesianSets->new;

  # add documents
  my %documents = (
      apple  => {
          fruit => 1,
          red   => 1,
      banana => {
          fruit  => 1,
          yellow => 1,
      cherry => {
          fruit => 1,
          pink  => 1,
  foreach my $id (keys %documents) {
      $bs->add_document($id, $documents{$id});
  # calc alpha/beta parameters
  # get similar documents
  my @queries = qw(apple);
  my $scores = $bs->calc_similarities(\@queries);
  # show output
  foreach my $id (keys %{ $scores }) {
      printf "%s\t%.4f\n", $id, $scores->{$id};


Algorithm::BayesianSets is a perl implementation of Bayesian Sets algorithm.



Create a new instance.

$threshold parameter is the threshold of the degree of document features. In add_document method, if the degree of the feature is less than the threshold, the feature isn't used.

add_document($id, $vector)

Add an input document to the instance of Algorithm::BayesianSets. $id parameter is the identifier of a document, and $vector parameter is the feature vector of a document. $vector parameter must be a hash reference, each key of $vector parameter is the identifier of the feature of documents and each value of $vector is the degree of the feature.


Calculate the alpha and beta parameters which are used in Bayesian Sets algorithm. $c parameter must be a real number (Default: 2.0).


Calculate the similarities between the queries and input documents using Bayesian Sets algorithm. $queries parameter must be array reference, and each query in $queries needs to be included in the identifiers of input documents.

The output of this method is a hash reference, each key of the hash reference is the identifier of an input document and each value is the similarity between the queries and an input document.


Get the average vector of input vectors.

_inner_product($vector1, $vector2)

Calculate the inner product value of input vectors.


Mizuki Fujisawa <fujisawa@bayon.cc>


Bayesian Sets (Paper)


bsets, The Bayesian Sets Algorithm (Matlab code)



This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.