14 Sep 2003 18:40:27 UTC
- Distribution: Tree-FP
- Module version: 0.04
- Source (raw)
- Browse (raw)
- How to Contribute
- Issues (1)
- Testers (474 / 0 / 0)
- KwaliteeBus factor: 0
- 91.96% Coverage
- License: unknown
- Activity24 month
- Download (13.33KB)
- MetaCPAN Explorer
- Subscribe to distribution
- This version
- Latest versionNITRAM Martin Paczynski
TREE::FP - Perl implementation of the FP-Tree
use Tree::FP; $fptree = Tree::FP->new('a','b','c'); $insert_successful = $fptree->insert_tree('c','a','b'); $decimal_support = $fptree->support; $fptree->set_support(0.3); $decimal_confidence = $fptree->confidence; $fptree->set_confidence(0.25); @rules = $fptree->association_rules; $fptree->reset_tree; $error_string = $fptree->err;
Tree:FP is a Perl implmentation of the FP-Tree based association rule mining algorithm (association rules == market basket analysis). For a detailed explanation, see "Mining Frequent Patterns without Candidate Generation" by Jiawei Han, Jian Pei, and Yiwen Yin, 2000. Contrarywise, most books on data mining will have information on this algorithm.
The short version is this: instead of generating a huge number of candidate sets for the apriori algorithm and requiring multiple database scans, compress information into a new data structure, a Frequent Pattern (or FP) tree, then mine the tree.
The new method is called with a list of frequent items from the transactional database listed in descending support order. Given the following DB =head1
Item | Count ------------ itm1 | 2 itm2 | 4 itm3 | 3 itm4 | 5 itm5 | 3
The code for creating a new FP-Tree would be: $fptree = Tree::FP->new('itm4','itm2','itm3','itm5','itm1');
NOTE: The list can also be of integers, which is the more likely scenario for most TDBs.
This is the method used to populate the FP-Tree. The list consists of all items for one transaction. The items need NOT be in any order. The method returns 0 (false) is an error occurred, and 1 (true) otherwise.
Returns the current minimum percentage support for the FP-tree, expressed as a decimal. 10% = 0.1
Sets the current minimum percentage support for the FP-tree.
Returns the current minimum percentage confidence for the FP-tree, expressed as a decimal. 10% = 0.1 NOTE: Currently this method has no effect on performance of the FP-Tree. Future versions may allow result filtering based on confidence.
Sets the current minimum percentage confidence for the FP-tree.
Returns list of assocation rules for the FP-Tree meeting minimum support, listed in descending order of confidence. Each element of the list is actually an FP_Tree_association_rule object with the following four methods:
left - returns left side of association rule (a "pattern") as a list, each element of which is an item right - returns right side of association rule (a "pattern") as a list, each element of which is an item support - support for the rule confidence - confidence for the rule
@rules = $fptree->association_rules; @left = $rules->left; @right = $rules->right; $support = $rules->support; $confidence = $rules->confidence;
Returns the last error that occurred in the FP-Tree. In general, if any of the methods returns false (0 or undef), this method will provide details as to what went wrong.
This package includes three other packages, namely FP_Tree_node, FP_Tree_header_node, and FP_Tree_association_rule. Outside of the context of an FP-Tree, it is not likely that they have much utility, however feel free to use these. Also, there is a combinations function in this package that can be used for finding all combinations of elements of an array (but this must be explicitly exported).
Martin Paczynski, firstname.lastname@example.org.
Copyright 2003, Martin Paczynski, email@example.com, all rights reserved.
This package is free software and is provided "as is" without express or implied warranty. It may be used, redistributed and/or modified under the same terms as Perl itself.