++ed by:

3 PAUSE users
1 non-PAUSE user.

Ken Williams
and 2 contributors


AI::Categorizer::Hypothesis - Embodies a set of category assignments


 use AI::Categorizer::Hypothesis;
 # Hypotheses are usually created by the Learner's categorize() method.
 # (assume here that $learner and $document have been created elsewhere)
 my $h = $learner->categorize($document);
 print "Assigned categories: ", join ', ', $h->categories, "\n";
 print "Best category: ", $h->best_category, "\n";
 print "Assigned scores: ", join ', ', $h->scores( $h->categories ), "\n";
 print "Chosen from: ", join ', ', $h->all_categories, "\n";
 print +($h->in_category('geometry') ? '' : 'not '), "assigned to geometry\n";


A Hypothesis embodies a set of category assignments that a categorizer makes about a single document. Because one may be interested in knowing different kinds of things about the assignments (for instance, what categories were assigned, which category had the highest score, whether a particular category was assigned), we provide a simple class to help facilitate these scenarios.



Returns a new Hypothesis object. Generally a user of AI::Categorize doesn't create a Hypothesis object directly - they are returned by the Learner's categorize() method. However, if you wish to create a Hypothesis directly (maybe passing it some fake data for testing purposes) you may do so using the new() method.

The following parameters are accepted when creating a new Hypothesis:


A required parameter which gives the set of all categories that could possibly be assigned to. The categories should be specified as a reference to an array of category names (as strings).


A hash reference indicating the assignment score for each category. Any score higher than the threshold will be considered to be assigned.


A number controlling which categories should be assigned - any category whose score is greater than or equal to threshold will be assigned, any category whose score is lower than threshold will not be assigned.


An optional string parameter indicating the name of the document about which this hypothesis was made.


Returns an ordered list of the categories the document was placed in, with best matches first. Categories are returned by their string names.


Returns the name of the category with the highest score in this hypothesis. Bear in mind that this category may not actually be assigned if no categories' scores exceed the threshold.


Returns true or false depending on whether the document was placed in the given category.


Returns a list of result scores for the given categories. Since the interface is still changing, and since different Learners implement scoring in different ways, not very much can officially be said about the scores, except that a good score is higher than a bad score. Individual Learners will have their own procedures for determining scores, so you cannot compare one Learner's score with another Learner's - for instance, one Learner might always give scores between 0 and 1, and another Learner might always return scores less than 0. You often cannot compare scores from a single Learner on two different categorization tasks either.


Returns the list of category names specified with the all_categories constructor parameter.


Returns the value of the document_name parameter specified as a constructor parameter, or undef if none was specified.


Ken Williams <ken@mathforum.org>


This distribution is free software; you can redistribute it and/or modify it under the same terms as Perl itself. These terms apply to every file in the distribution - if you have questions, please contact the author.