Data::Classifier - A tool for classifying data with regular expressions


    use strict;
    use warnings;
    use Data::Classifier;
    my $yaml = <<EOY;
    name: Root
        - name: BMW
              - name: Diesel
                      model: "d\$"
              - name: Sports
                      model: "i\$"
                      seats: 2
              - name: Really Expensive
                      model: "^M"
    my $classifier = Data::Classifier->new(yaml => $yaml);
    my $attributes1 = { model => '325i', seats => 4 };
    my $class1 = $classifier->process($attributes1);
    my $attributes2 = { model => '535d', seats => 4 };
    my $class2 = $classifier->process($attributes2);
    my $attributes3 = { model => 'M3', seats => 2 };
    my $class3 = $classifier->process($attributes3);
    print "$attributes2->{model}: ", $class2->fqn, "\n";
    print "$attributes3->{model}: ", $class3->fqn, "\n";
    #no real sports car has 4 seats
    print "$attributes1->{model}: ", $class1->fqn, "\n";


This module provides tools to classify sets of data contained in hashes against a predefined class hierarchy. Testing against a class is performed using regular expressions stored in the class hierarchy. It is also possible to modify the behavior of the system by subclassing and overloading a few methods.

Note that this module may not be particularly usefull on its own. It is designed to be used as a base class for implementing other systems, such as Config::BuildHelper.


Using this module involves creating an instance of the classifier object, passing the class hierarchy in via a YAML file, a YAML string, or prebuilt data structure, and any optional arguments:

    $classifier = Data::Classifier->new(file => 'classes.yaml', debug => 1);
    $classifier = Data::Classifier->new(yaml => $yaml_string);
    $classifier = Data::Classifier->new(tree => $hashref);

Class Definition File

The class definition file is a very specific tree format, normally stored in a YAML file. Each node of the tree is a map with the same set of keys, some of which are optional:


The textual name of the node being defined.

data (optional)

Extra data to be returned with classification results.

children (optional)

A sequence of nodes that exists under this node.

match (optional)

A map of keys to test against incomming data and regular expressions to apply to that data. For a match to be true, all items in the map must match the data.

Matching Semantics

By default, this class has very specific matching semantics. For a dataset to match a node, everything listed under the match definition must match the specified data. Additionally, a node which contains no match definition will have all of it's children searched but can never be a match itself.


$result = $classifier->process($attr)

Classify the data contained in the hash reference stored in $attr and return an instance of Data::Classifier::Result. See the documentation for that class for more information.


Return a textual representation of the class hierarchy stored in RAM.

More Information

The rest of this module is documented in Data::Classifier::Result, which you use to access the results of classification.


This class can be subclassed to change its behavior. The following methods are available for overloading:


This method is invoked by $classifier->process() when it needs to return a new instance of a result class. Simply return an instance of your class here, such as:

    sub return_result {
            my ($self, $result) = @_;
            return Data::Classifier::Result->new($result);
$classifier->check_match($matchlist, $attributes)

This method is invoked by $classifier->recursive_match() at each node of the tree that contains a match attribute. The entire contents of the match attribute will be passed in as $matchlist and the hashref given to $classifier->process() will be passed in via $attributes. Return true to indicate a match and false to indicate no match.

$classifier->recursive_search($attributes, $node)

This method is invoked by $classifier->process() to recursively search the entire tree. If you need to change the semantics of how the classifier treats matches against nodes with out a match attribute, you would do that here.


Here are a few ideas for improvements to this class:


A class that stores it's tree in a SQL database, reconstructs it at startup, and passes it in using the tree argument to new.


This module was created and documented by Tyler Riddle <>.


There are no known bugs at this time.

Please report any bugs or feature requests to, or through the web interface at I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.