The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME TREE::FP - Perl implementation of the FP-Tree

SYNOPSIS use Tree:FP;

        $fptree = Tree::FP->new('a','b','c');
        
        $insert_successful = $fptree->insert_tree('c','a','b');
        
        $decimal_support = $fptree->support;
        $fptree->set_support(0.3);
        
        $decimal_confidence = $fptree->confidence;
        $fptree->set_confidence(0.25);
        
        @rules = $fptree->association_rules;
        
        $fptree->reset_tree;
        
        $error_string = $fptree->err;
        

DESCRIPTION

Tree:FP is a Perl implmentation of the FP-Tree based association rule mining algorithm (association rules == market basket analysis). For a detailed explanation, see "Mining Frequent Patterns without Candidate Generation" by Jiawei Han, Jian Pei, and Yiwen Yin, 2000. Contrarywise, most books on data mining will have information on this algorithm.

The short version is this: instead of generating a huge number of candidate sets for the apriori algorithm and requiring multiple database scans, compress information into a new data structure, a Frequent Pattern (or FP) tree, then mine the tree.

VERSION 0.02

METHODS

new( LIST ) The new method is called with a list of frequent items from the transactional database listed in descending support order. Given the following DB Item | Count ------------ itm1 | 2 itm2 | 4 itm3 | 3 itm4 | 5 itm5 | 3 The code for creating a new FP-Tree would be: $fptree = Tree::FP->new('itm4','itm2','itm3','itm5','itm1');

NOTE: The list can also be of integers, which is the more likely scenario for most TDBs.

insert_tree( LIST ) This is the method used to populate the FP-Tree. The list consists of all items for one transaction. The items need NOT be in any order. The method returns 0 (false) is an error occurred, and 1 (true) otherwise. Example: $fptree->insert_tree('itm1','itm2','itm3');

support() Returns the current minimum percentage support for the FP-tree, expressed as a decimal. 10% = 0.1

set_support( FLOAT ) Sets the current minimum percentage support for the FP-tree.

confidence() Returns the current minimum percentage confidence for the FP-tree, expressed as a decimal. 10% = 0.1 NOTE: Currently this method has no effect on performance of the FP-Tree. Future versions may allow result filtering based on confidence.

set_confidence( FLOAT ) Sets the current minimum percentage confidence for the FP-tree.

association_rules() Returns list of assocation rules for the FP-Tree meeting minimum support, listed in descending order of confidence. Each element of the list is actually an FP_Tree_association_rule object with the following four methods: left - returns left side of association rule (a "pattern") as a list, each element of which is an item right - returns right side of association rule (a "pattern") as a list, each element of which is an item support - support for the rule confidence - confidence for the rule

Example: @rules = $fptree->association_rules; @left = $rules[0]->left; @right = $rules[0]->right; $support = $rules[0]->support; $confidence = $rules[0]->confidence;

reset_tree() When resetting the support level of the FP-Tree, it is necessary to call the reset_tree method to clear the previously mined patterns. This is not called from the set_support method as it is entirely possible that with a user interface, the support may be changed several times before the tree is mined. While the reset_tree method is not computationally intensive, it is not free either.

err() Returns the last error that occurred in the FP-Tree. In general, if any of the methods returns false (0 or undef), this method will provide details as to what went wrong.

NOTES This package includes three other packages, namely FP_Tree_node, FP_Tree_header_node, and FP_Tree_association_rule. Outside of the context of an FP-Tree, it is not likely that they have much utility, however feel free to use these. Also, there is a combinations function in this package that can be used for finding all combinations of elements of an array (but this must be explicitly exported).

AUTHOR Martin Paczynski, nitram@cpan.org.

COPYRIGHT

Copyright 2003, Martin Paczynski, nitram@cpan.org, all rights reserved.

This package is free software and is provided "as is" without express or implied warranty. It may be used, redistributed and/or modified under the same terms as Perl itself.