Author image Dan Collis Puro

NAME

DBIx::Tree::NestedSet

SYNOPSIS

Implements a "Nested Set" parent/child tree.

DESCRIPTION

This module implements a "Nested Set" parent/child tree, and is focused (at least in my mind) towards offering methods that make developing web applications easier. It should be generally useful, though.

See the "SEE ALSO" section for resources that explain the advantages and features of a nested set tree. This module gives you arbitrary levels of categorization, the ability to put in metadata associated with a category via simple method arguments and storage via DBI. It's been tested on MySQL but I've taken pains to avoid using MySQL specific SQL statements.

The basic thing is that a nested set tree is "expensive" on updates because you have to edit quite a bit of the tree on inserts, deletes, or the movement of nodes. Conversely, it is "cheaper" on just queries of the tree because nearly every action (getting children, getting parents, getting siblings, etc) can be done *with one SQL query*. So if you're developing apps that require many reads and few updates to a tree (like pretty much every web app I've ever built) a nested set should offer significant performance advantages over the recursive queries required by an adjacency list model.

Whew. Say that fast three times.

You'll need to create a table in your database and then pass options to new(). See the "Table Definition" section for an example "create table" statement.

METHODS

new

new() accepts a number of parameters. You MUST pass new() a valid DBI handle.

dbh

The DBI handle returned by DBI::connect().

id_name

The name of the unique ID associated with this category. Defaults to "id".

left_column_name

The name of the column that describes the left hand side of a node. Defaults to "lft".

right_column_name

The name of the column that describes the right hand side of a node. Defaults to "rght".

table_name

The name of the table that describes the nested set. Defaults to "nested_set".

No_RaiseError

By default this module will turn on the "RaiseError" attribute in $dbh. Setting the "No_RaiseError" value to true (because you do not want RaiseError enabled or because it is turned it on elsewhere) will disable this behavior.

no_locking

Setting this option to a true value will disable file locking for methods that alter the tree stored via DBI. Currently, we lock the entire table, as most "editing" methods have the potential to edit every value on even minor changes.

no_alter_table

Don't do the automagical table altering stuff used to create columns on-the-fly. See "add_child_to_right" for a description of how this module stores meta-data. Turning off the automagical table altering will probably increase performance, but you won't be able to add in meta-data whenever you want on adding or updating nodes.

Turning off automagical table altering will cause the module to error out if you try and add in new meta-data that doesn't have a column defined for it in the DBI table. You are warned.

It probably makes sense to turn off automagical table altering after you've put the application into production and you're done development, but that depends on how you build your app.

trace

Will turn on DBI::trace() at the level you specify here and output some additional debugging info to STDERR.

db_type

The type of RDBMS you're using, currently drivers are only implemented for MySQL and SQLite. Defaults to MySQL if not defined. Drivers abstract non-portable (or non-implemented) SQL. See DBIx::Tree::NestedSet::MySQL and DBIx::Tree::NestedSet::SQLite for examples.

Examples:

 #Create a nested set tree, including SQL
 my $tree=DBIx::Tree::NestedSet->new(dbh=>$dbh);
 $tree->create_default_table();

 #Create a nested set tree using SQLite and a few tweaked defaults
 my $tree=DBIx::Tree::NestedSet->new(dbh=>$dbh,db_type=>'SQLite',id_name=>'pageID');
 $tree->create_default_table();

create_default_table

Create a Nested Set table in the data source defined in $dbh that will work for the db_type you specify in new(). Any options (id_name, left_column_name, etc.) you pass to new() will be respected as well.

get_default_create_table_statement

Return the SQL used to create the table above as a scalar.

get_root

Gets the id of the "root" node of the tree.

add_child_to_right

This will add a child to the "right" of all its siblings.

Takes the following parameters as a hash:

id

The ID of the parent node we want to add the child to. If you don't give an ID or the id isn't valid, it will add the child under the root node.

Any other parameter passed in as a hash will cause the module to alter the table to add a column to hold it, and then store that data for you. Example:

Say you have a table that looks like:

 +----------+--------------+------+-----+---------+----------------+
 | Field    | Type         | Null | Key | Default | Extra          |
 +----------+--------------+------+-----+---------+----------------+
 | id       | mediumint(9) |      | PRI | NULL    | auto_increment |
 | lft      | mediumint(9) |      | MUL | 0       |                |
 | rght     | mediumint(9) |      | MUL | 0       |                |
 | name     | varchar(255) |      | MUL |         |                |
 +----------+--------------+------+-----+---------+----------------+

and you execute:

 $tree->add_node_to_right(id=>$tree->get_root(),name=>'Foo Name',template=>'Bar');

Then the module will create a node named "Foo Name" under the root as the "rightmost" child. The "template" column will be created and "Bar" will be put in this nodes "template" column. The table would then look like:

 +----------+--------------+------+-----+---------+----------------+
 | Field    | Type         | Null | Key | Default | Extra          |
 +----------+--------------+------+-----+---------+----------------+
 | id       | mediumint(9) |      | PRI | NULL    | auto_increment |
 | lft      | mediumint(9) |      | MUL | 0       |                |
 | rght     | mediumint(9) |      | MUL | 0       |                |
 | name     | varchar(255) |      | MUL |         |                |
 | template | varchar(255) |      | MUL |         |                |
 +----------+--------------+------+-----+---------+----------------+

Feel free to tweak the columns after the module creates them (or create them in advance, it doesn't really matter). You may want to add indeces if you're going to be doing other selects on the nested_set table.

This table altering behavior allows you to store metadata about a node simply, with a tradeoff that your metadata could be "flat" and potentially poorly normalized.

Returns the id of the newly added child.

add_child_to_left

Same as add_child_to_right, except this puts the child to the left of its siblings.

edit_node

Edits a node and will exhibit the same "table altering" behavior of add_child_to_right. Pass in parameters as a hash, and "id" controls which node you're editing.

Example:

 #All other values are retained, we're just changing the name of the node
 #with the id in "$edit_id"
 $tree->edit_node(id=>$edit_id,name=>'New Name');

get_id_by_key

Looks up a node(s) by a key name and key value. Takes two parameters:

key_name

The name of the column in the database you're doing a lookup on.

key_value

The value you want to look up.

If there is more than one node found, we return an array reference. Otherwise we return a scalar. If nothing is found, you'll get a non-true value.

Example:

 my $node=$tree->get_id_by_key(key_name=>'name',key_value=>'Foo Name');
 if(ref $node){
     #We have more than one id returned.
 } else {
     #We have a single id/node.
 }

get_self_and_parents_flat

This will get a node and it's parents down to the root node. Takes the id of the starting node as a hash.

Returns an arrayref of hashrefs (AoH). The hashrefs will have as keys the column names of the table, including those automatically added by the add_*() and edit_node() methods.

This method does NOT return a "nested hash" or "nested array" of nodes, hence the "flat" in the method name.

Additionally there will be a "level" hashkey that's the level of the node, with level 1 being the root.

Example:

 my $self_and_parents=$tree->get_self_and_parents(id=>$starting_id);
 foreach(@$self_and_parents){
     print 'ID: '.$_->{id}.' is at level '.$_->{level}."\n";
 }

Besides arrays of hashrefs being easy to use, this object is PERFECT for passing to HTML::Template::param(). Returns non-true in the event a node doesn't have parents.

get_parents_flat

Same as get_self_and_parents_flat but excludes the starting node.

delete_self_and_children

Similar to get_self_and_children, but deletes nodes from the starting id inclusively. Returns an arrayref of the IDs that were deleted or a non-true value if none.

Example: my $ids=$tree->delete_self_and_children(id=>$delete_from);

Will delete from the ID in $delete_from and $ids will contain an arrayref of the deleted IDs.

delete_children

Similar to delete_self_and_children, but leaves the starting id untouched. This method just deletes the children (recursively) of the starting node.

get_self_and_children_flat

Nearly identical to get_self_and_parents flat, except it retrieves the children of the starting node (and the starting node itself) recursively.

Takes a depth parameter additionally, which will specify how far down in the tree from the starting node to go.

Example:

 my $self_and_children=$tree->get_self_and_children_flat(id=>$start_id,depth=>2);

Will retrieve an AoH starting from $start_id going down a maximum of 2 levels.

get_children_flat

Same as get_self_and_children_flat but excludes the starting node.

swap_nodes

Takes two parameters: first_id and second_id. It will "swap" the nodes represented by these ids, essentially replacing one node with the other. Children will tag along and order will be preserved. swap_nodes() can be used to reorder nodes in a tree OR swap nodes to different levels within a tree.

Example:

 $tree->swap_nodes(first_id=>$first_id,second_id=>$second_id);

$first_id and $second_id will be "swapped" in the tree.

get_hashref_of_info_by_id

Will return a hashref of the information associated with a node specified by the "id" parameter. Umm. . . Except "level".

This is probably dumb, but in this case you don't need to pass in the ID as a hash, because this method only every takes one argument. Returns "undef" if a node without that ID isn't found.

Example:

 my $node_info=$tree->get_hashref_of_info_by_id($node_id);
 print $node_info->{id};

get_hashref_of_info_by_id_with_level

Just like get_hashref_of_info_by_id, except returns the "level" of the node within the tree as well, where the "root" node is level 1. Computing the level is quite a bit more expensive, so you should use get_hashref_of_info_by_id normally.

create_report

Returns a very simple report (in a scalar) of the tree. Takes a few parameters:

id

The id to start the report from. If none is given, it'll start from the root node.

indent_level

The number of spaces to indent each level with. Defaults to 2 spaces per level.

Example:

 my $report=$tree->create_report(indent_level=>4);
 print $report;

Will create a report starting from the "root" with 4 spaces of indentation per level.

TABLE DEFINITION

The base "nested_set" table definition for MySQL is below. Please see each driver class (DBIx::Tree::NestedSet::MySQL or DBIx::Tree::NestedSet::SQLite currently) for create statements specific to your RDBMS. Columns will be added when you pass extra parameters to methods noted above, unless "no_alter_table" is set to true in the constructor.

You can add columns you're going to use proactively, and/or "tweak" the columns after you've let this module create them. Just make sure that you use valid SQL column names for the attributes you pass to the edit_node() and add_*() methods.

 ########################################
 #MySQL specific.
 CREATE TABLE nested_set (
   id mediumint(9) NOT NULL primary key,
   lft mediumint(9) NOT NULL,
   rght mediumint(9) NOT NULL
 );
 CREATE INDEX lft nested_set(lft);
 CREATE INDEX rght nested_set(rght);
 ########################################

This module has been tested on MySQL 3.x and 4.x and SQLite 2.x.

WHY?

I've implemented a couple different nested tree models in the past, from a flat "one column per level" monstrosity to a typical "adjacency list" parent/child model.

The "one column per level" model was a BEAR to work with, especially when it came to adding more levels, editing/deleting children and creating parent lists.

An "adjacency list" is the typical "id/parent_id" model, as illustrated below:

           food                food_id   parent_id
           ==================  =======   =========
           Food                001       NULL
           Beans and Nuts      002       001
           Beans               003       002
           Nuts                004       002
           Black Beans         005       003
           Pecans              006       004

(That table was ripped off directly from DBIx::Tree)

The recursive queries involved with "adjacency list" models always bugged me and I couldn't get acceptable performance metrics without caching bits of the tree.

The "nested set" model appears, theoretically, to be perfect for most of the web applications I develop: it's very fast to create lists of children and parents, at the cost of much more complicated and processor-intense updating.

I've also taken pains to create methods that are useful for web application development but not specific to it.

If you have an application that sees many reads of a nested tree but not as many writes or updates, the "nested set" model this module implements should offer significant performance benefits over an adjacency list.

SEE ALSO

DBIx::Tree, which implements an "adjacency list" model of nested trees.

DBIx::NestedSet::Manage which is included with this distribution and implements a CGI::Application and HTML::Based system for managing trees via DBIx::NestedSet and implements most DBIx::NestedSet methods.

  http://www.intelligententerprise.com/001020/celko.jhtml
  http://www.dbmsmag.com/9603d06.html
  http://www.dbmsmag.com/9604d06.html
  http://www.dbmsmag.com/9605d06.html
  http://www.dbmsmag.com/9606d06.html

For those last three links, the "Nested Set" discussion starts about halfway through the articles.

BUGS

Yes. I'm sure there are some. Please contact me if you find any.

Things to avoid:

  • Keep the names of columns, the table, and any automagically added meta-data keys to fit m/^[_A-Za-z\d]+$/, which is A-Z, a-z, digits, and the underscore. And don't use SQL reserved words.

TODO

I may implement some or all of these. PATCHES ARE WELCOME!

  • Methods to translate an adjacency list into a nested set tree.

  • The ability to associate other user-defined SQL statements with methods. "Pre-" and "post-" triggered SQL.

  • Create methods to get children that DO implement "nested array" trees.

  • Do benchmarking to see how a nested set model performs under various scenarios.

  • Maybe create a "traversal" system other than the very simple:

     my $nodes=$tree->get_self_and_children(id=$tree->get_root);
     foreach my $node(@$nodes){
         #do something with the hashref that represents this node.
     }

THANKS

The following folks have provided patches, bug alerts, ideas, guidance and suggestions related directly to this module. THANKS! Sorry if I left anyone out.

Giuseppe Maxia

gmax on www.perlmonks.org. He pushed me to make it more RDBMS-independent and offered other suggestions to improve the module and documentation.

Martin Kamerbeek

www.procolix.com, a core WebGUI developer. One of the original guineau pigs. Bug fixes and feature enhancements.

Hansen

On www.perlmonks.org, algorithm improvement for node dropping.

Tilly

On www.perlmonks.org for the original idea.

AUTHOR

Dan Collis Puro, Geekuprising.com. Email: dan at geekuprising dot com.

This model was inspired by the perlmonks.org thread below:

http://www.perlmonks.org/index.pl?node_id=354049

See "Tilly's" response in particular. I'm "Hero Zzyzzx".

LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.