The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Tree::DAG_Node - (super)class for representing nodes in a tree

SYNOPSIS

Using as a base class:

  package Game::Tree::Node; # or whatever you're doing
  use Tree::DAG_Node;
  @ISA = qw(Tree::DAG_Node);
  ...your own methods overriding/extending the methods in Tree...

Using as a class of its own:

  use Tree::DAG_Node;
  my $root = Tree::DAG_Node->new();
  $root->name("I'm the tops");
  $new_daughter = Tree::DAG_Node->new();
  $new_daughter->name("More");
  $root->add_daughter($new_daughter);
  ...

DESCRIPTION

This class encapsulates/makes/manipulates objects that represent nodes in a tree structure. The tree structure is not an object itself, but is emergent from the linkages you create between nodes. This class provides the methods for making linkages that can be used to build up a tree, while preventing you from ever making any kinds of linkages which are not allowed in a tree (such as having a node be its own mother or ancestor, or having a node have two mothers).

This is what I mean by a "tree structure", a bit redundantly stated:

* A tree is a special case of an acyclic directed graph.

* A tree is a network of nodes where there's exactly one root node (i.e., 'the top'), and the only primary relationship between nodes is the mother-daugher relationship.

* No node can be its own mother, or its mother's mother, etc.

* Each node in the tree has exactly one "parent" (node in the "up" direction) -- except the root, which is parentless.

* Each node can have any number (0 to any finite number) of daughter nodes. A given node's daughter nodes constitute an ordered list. (However, you are free to consider this ordering irrelevant. Some applications do need daughters to be ordered, so I chose to consider this the general case.)

* A node can appear in only one tree, and only once in that tree. Notably (notable because it doesn't follow from the two above points), a node cannot appear twice in its mother's daughter list.

* In other words, there's an idea of up (toward the root) versus down (away from the root), and left (i.e., toward the start (index 0) of a given node's daughter list) versus right (toward the end of a given node's daughter list).

Trees as described above have various applications, among them: representing syntactic constituency, in formal linguistics; representing contingencies in a game tree; representing abstract syntax in the parsing of any computer language -- whether in expression trees for programming languages, or constituency in the parse of a markup language document. (Some of these might not use the fact that daughters are ordered.)

(Note: B-Trees are a very special case of the above kinds of trees, and are best treated with their own class. Check CPAN for modules encapsulating B-Trees; or if you actually want a database, and for some reason ended up looking here, go look at AnyDBM_File.)

Many base classes are not usable except as such -- but Tree::DAG_Node can be used as a normal class. You can go ahead and say:

  use Tree::DAG_Node;
  my $root = Tree::DAG_Node->new();
  $root->name("I'm the tops");
  $new_daughter = Tree::DAG_Node->new();
  $new_daughter->name("More");
  $root->add_daughter($new_daughter);

and so on, constructing and linking objects from Tree::DAG_Node and making useful tree structures out of them.

Note: A passing acquintance with the source code for this class is assumed for anyone using this class as a base class -- especially if you're overriding existing methods, and definitely if you're overriding linkage methods.

OBJECT CONTENTS

Implementationally, each node in a tree is an object, in the sense of being an arbitrarily complex data structure that belongs to a class (presumably Tree::DAG_Node, or ones derived from it) that provides methods.

The attributes of a node-object are:

mother -- this node's mother. undef if this is a root.
daughters -- the (possibly empty) list of daughters of this node.
name -- the name for this node.

Need not be unique, or even printable. This is printed in some of the various dumper methods, but it's up to you if you don't put anything meaningful or printable here.

attributes -- whatever the user wants to use it for.

Presumably a hashref to whatever other attributes the user wants to store without risk of colliding with the object's real attributes. (Example usage: attributes to an SGML tag -- you wouldn't want the existence of a "mother=foo" pair in such a tag to collide with a node object's 'mother' attribute.)

Aside from (by default) initializing it to {}, and having the access method called "attributes" (described a ways below), I don't do anything with the "attributes" in this module. I basically intended this so that users who don't want/need to bother deriving a class from Tree::DAG_Node, could still attach whatever data they wanted in a node.

"mother" and "daughters" are attributes that relate to linkage -- they are never written to directly, but are changed as appropriate by the "linkage methods", discussed below.

The other two (and whatever others you may add in derived classes) are simply accessed thru the same-named methods, discussed further below.

MAIN CONSTRUCTOR, AND INITIALIZER

the constructor CLASS->new() or CLASS->new({...options...})

This creates a new node object, calls $object->_init({...options...}) to provide it sane defaults (like: undef name, undef mother, no daughters, 'attributes' setting of a new empty hashref), and returns the object created. (If you just said "CLASS->new()" or "CLASS->new", then it pretends you called "CLASS->new({})".)

Read on if you plan on using Tree::DAG_New as a base class.

There are, in my mind, two ways to do object construction:

Way 1: create an object, knowing that it'll have certain uninteresting sane default values, and then call methods to change those values to what you want. Example:

    $node = Tree::DAG_Node->new;
    $node->name('Supahnode!');
    $root->add_daughter($node);
    $node->add_daughters(@some_others)

Way 2: be able to specify some/most/all the object's attributes in the call to the constructor. Something like:

    $node = Tree::DAG_Node->new({
      name => 'Supahnode!',
      mother => $root,
      daughters => \@some_others
    });

After some deliberation, I've decided that the second way is a Bad Thing. First off, it is not markedly more concise than the first way. Second off, it often requires subtly different syntax (e.g., \@some_others vs @someothers). Third off, it means that in a derived class where you add new attributes, you have to totally rewrite an _init from scratch, supporting all the switches from the old class. This turns what would have been the simple setting of default attribute values into a major task. In other words, supporting attribute-setting options for a constructor pointlessly complicates the inheritance.

So, the _init you (can) provide in your derived class, in my opinion, should merely initialize (with sane defaults) the attributes of the new object (at least the ones where undef is not a sane value); having options for default values is not worth the bother for either the class programmer or the class user.

(This is not to say that options in general for a constructor are bad -- random_network, discussed far below, necessarily takes options. But note that these are not options for the default values of attributes.)

the method $node->_init({...options...})

Initialize the object's attribute values. See the discussion above. Presumably this should be called only by the guts of the new constructor -- never by the end user.

If, in a derived class, you add attributes beyond what are in the base class, you should probably provide an _init_[methodname] to initialize them, and have your _init call that method -- at least if undef won't do as an initial value for them.

Please see the source for more information.

LINKAGE-RELATED METHODS

Note that with all methods in this document, unless the documentation for a particular method says "this method returns thus-and-such a value", then you should not rely on it returning anything meaningful.

$node->daughters

This returns the (possibly empty) list of daughters for $node.

$node->mother

This returns what node is $node's mother. This is undef if $node has no mother -- i.e., if it is a root.

$mother->add_daughters( LIST )

This method adds the node objects in LIST to the (right) end of $mother's daughter list. Making a node N1 the daughter of another node N2 also means that N1's mother attribute is "automatically" set to N2; it also means that N1 stops being anything else's daughter as it becomes N2's daughter.

If you try to make a node its own mother, a fatal error results. If you try to take one of a a node N1's ancestors and make it also a daughter of N1, a fatal error results. A fatal error results if anything in LIST isn't a node object.

If you try to make N1 a daughter of N2, but it's already a daughter of N2, then this is a no-operation -- it won't move such nodes to the end of the list or anything; it just skips doing anything with them.

$node->add_daughter( LIST )

An exact synonym for $node->add_daughters(LIST)

$mother->add_daughters_left( LIST )

This method is just like add_daughters, except that it adds the node objects in LIST to the (left) beginning of $mother's daughter list, instead of the (right) end of it.

$node->add_daughter_left( LIST )

An exact synonym for $node->add_daughters_left( LIST )

Note:

The above link-making methods perform basically an unshift or push on the mother node's daughter list. To get the full range of list-handling functionality, copy the daughter list, and change it, and then call set_daughters on the result:

          @them = $mother->daughters;
          @removed = splice(@them, 0,2, @new_nodes);
          $mother->set_daughters(@them);

Or consider a structure like:

          $mother->set_daughters(
                                 grep($_->name =~ /NP/ ,
                                      $mother->daughters
                                     )
                                );
$mother->remove_daughters( LIST )

This removes the nodes listed in LIST from $mother's daughter list. This is a no-operation if LIST is empty. If there are things in LIST that aren't a current daughter of $mother, they are ignored.

Not to be confused with $mother->clear_daughters.

$node->remove_daughter( LIST )

An exact synonym for $node->remove_daughters( LIST )

This removes node from the daughter list of its mother. If it has no mother, this is a no-operation.

$mother->clear_daughters

This unlinks all $mother's daughters.

Not to be confused with $mother->remove_daughters( LIST ).

$mother->set_daughters( LIST )

This unlinks all $mother's daughters, and replaces them with the daughters in LIST.

Currently implemented as just $mother->clear_daughters followed by $mother->add_daughters( LIST ).

OTHER ATTRIBUTE METHODS

$node->name or $node->name(SCALAR)

In the first form, returns the value of the node object's "name" attribute. In the second form, sets it to the value of SCALAR.

$node->attributes or $node->attributes(SCALAR)

In the first form, returns the value of the node object's "attributes" attribute. In the second form, sets it to the value of SCALAR. I intend this to be used to store a reference to a (presumably anonymous) hash the user can use to store whatever attributes he doesn't want to have to store as object attributes. In this case, you needn't ever set the value of this. (_init has already initialized it to {}.) Instead you can just do...

  $node->attributes->{'foo'} = 'bar';

...to write foo => bar.

$node->attribute or $node->attribute(SCALAR)

An exact synonym for $node->attributes or $node->attributes(SCALAR)

OTHER METHODS TO DO WITH RELATIONSHIPS

$node->is_node

This always returns true. More pertinently, $object->can('is_node') is true (regardless of what is_node would do if called) for objects belonging to this class or for any class derived from it.

$node->ancestors

Returns the list of this node's ancestors, starting with its mother, then grandmother, and ending at the root. It does this by simply following the 'mother' attributes up as far as it can. So if $item IS the root, this returns an empty list.

$node->root

Returns the root of whatever tree $node is a member of. If $node is the root, then the result is $node itself.

$node->is_daughter_of($node2)

Returns true iff $node is a daughter of $node2. Currently implemented as just a test of ($it->mother eq $node2).

$node->self_and_descendants

Returns a list consisting of itself (as element 0) and all the descendants of $node. Returns just itself if $node is a terminal_node.

(Note that it's spelled "descendants", not "descendents".)

$node->descendants

Returns a list consisting of all the descendants of $node. Returns empty-list if $node is a terminal_node.

(Note that it's spelled "descendants", not "descendents".)

$node->leaves_under

Returns a list (going left-to-right) of all the leaf nodes under $node. ("Leaf nodes" are also called "terminal nodes" -- i.e., nodes that have no daughters.) Returns $node in the degenerate case of $node being a leaf itself.

$node->left_sister

Returns the node that's the immediate left sister of $node. If $node is the leftmost (or only) daughter of its mother (or has no mother), then this returns undef.

$node->left_sisters

Returns a list of nodes that're sisters to the left of $node. If $node is the leftmost (or only) daughter of its mother (or has no mother), then this returns an empty list.

$node->right_sisters

Returns the node that's the immediate right sister of $node. If $node is the rightmost (or only) daughter of its mother (or has no mother), then this returns undef.

$node->right_sisters

Returns a list of nodes that're sisters to the right of $node. If $node is the rightmost (or only) daughter of its mother (or has no mother), then this returns an empty list.

$node->my_daughter_index

Returns what index this daughter is, in its mother's daughter list. In other words, if $node is ($node->mother->daughters)[3], then $node->my_daughter_index returns 3.

As a special case, returns 0 if $node has no mother.

$node->address or $anynode->address(ADDRESS)

With the first syntax, returns the address of $node within its tree, based on its position within the tree. An address is formed by noting the path between the root and $node, and concatenating the daughter-indices of the nodes this passes thru (starting with 0 for the root, and ending with $node).

For example, if to get from node ROOT to node $node, you pass thru ROOT, A, B, and $node, then the address is determined as:

* ROOT's my_daughter_index is 0.

* A's my_daughter_index is, suppose, 2. (A is index 2 in ROOT's daughter list.)

* B's my_daughter_index is, suppose, 0. (B is index 0 in A's daughter list.)

* $node's my_daughter_index is, suppose, 4. ($node is index 4 in B's daughter list.)

The address of the above-described $node is, therefore, "0:2:0:4".

(As a somewhat special case, the address of the root is always "0"; and since addresses start from the root, all addresses start with a "0".)

The second syntax, where you provide an address, starts from the root of the tree $anynode belongs to, and returns the node corresponding to that address. Returns undef if no node corresponds to that address. Note that this routine may be somewhat liberal in its interpretation of what can constitute an address; i.e., it accepts "0.2.0.4", besides "0:2:0:4".

Also note that the address of a node in a tree is meaningful only in that tree as currently structured.

(Consider how ($address1 cmp $address2) may be magically meaningful to you, if you mant to figure out what nodes are to the right of what other nodes.)

$node->common(LIST)

Returns the lowest node in the tree that is ancestor-or-self to the nodes $node and LIST.

If the nodes are far enough apart in the tree, the answer is just the root.

If the nodes aren't all in the same tree, the answer is undef.

As a degenerate case, if LIST is empty, returns $node.

$node->common_ancestor(LIST)

Returns the lowest node that is ancestor to all the nodes given (in nodes $node and LIST). In other words, it answers the question: "What node in the tree, as low as possible, is ancestor to the nodes given ($node and LIST)?"

If the nodes are far enough apart, the answer is just the root -- except if any of the nodes are the root itself, in which case the answer is undef (since the root has no ancestor).

If the nodes aren't all in the same tree, the answer is undef.

As a degenerate case, if LIST is empty, returns $node's mother; that'll be undef if $node is root.

YET MORE METHODS

$node->walk_down({ callback => \&foo, callbackback => \&foo, ... })

Performs a depth-first recursion of the structure at and under $node. What it does at each node depends on the value of the options hashref, which you must provide. There are three options, "callback" and "callbackback" (at least one of which must be defined, as a sub reference), and "_depth". This is what walk_down does, in pseudocode form:

* Start at the $node given.

* If there's a callback, call it with $node as the first argument, and the options hashref as the second argument (which contains the potentially useful _depth, remember). This function must return true or false -- if false, it will block the next step:

* If $node has any daughter nodes, increment _depth, and call $daughter->walk_down(options_hashref) for each daughter (in order, of course), where options_hashref is the same hashref it was called with. When this returns, decrements _depth.

* If there's a callbackback, call just it as with callback (but tossing out the return value). Note that callback returning false blocks recursion below $node, but doesn't block calling callbackback for $node. (Incidentally, in the unlikely case that $node has stopped being a node object, callbackback won't get called.)

* Return.

$node->walk_down is the way to recursively do things to a tree (if you start at the root) or part of a tree. It's even the basis for plenty of the methods in this class. See the source code for examples both simple and horrific.

Note that if you don't specify _depth, it effectively defaults to 0. You should set it to scalar($node->ancestors) if you want _depth to reflect the true depth-in-the-tree for the nodes called, instead of just the depth below $node. (If $node is the root, there's difference, of course.)

@lines = $node->dump_names({ ...options... });

Dumps, as an indented list, the names of the nodes starting at $node, and continuing under it. Options are:

* _depth -- A nonnegative number. Indicating the depth to consider $node as being at (and so the generation under that is that plus one, etc.). Defaults to 0. You may choose to use set _depth => scalar($node->ancestors).

* tick -- a string to preface each entry with, between the indenting-spacing and the node's name. Defaults to empty-string. You may prefer "*" or "-> " or someting.

* indent -- the string used to indent with. Defaults to " " (two spaces). Another sane value might be ". " (period, space). Setting it to empty-string suppresses indenting.

The dump is not printed, but is returned as a list, where each item is a line, with a "\n" at the end.

the constructor CLASS->random_network({...options...})
the method $node->random_network({...options...})

In the first case, constructs a randomly arranged network under a new node, and returns the root node of that tree. In the latter case, constructs the network under $node.

Currently, this is implemented a bit half-heartedly, and half-wittedly. I basically needed to make up random-looking networks to stress-test the various tree-dumper methods, and so wrote this. If you actually want to rely on this for any application more serious than that, I suggest examining the source code and seeing if this does really what you need (say, in reliability of randomness); and feel totally free to suggest changes to me (especially in the form of "I rewrote random_network, here's the code...")

It takes four options:

* max_node_count -- maximum number of nodes this tree will be allowed to have (counting the root). Defaults to 25.

* min_depth -- minimum depth for the tree. Defaults to 2. Leaves can be generated only after this depth is reached, so the tree will be at least this deep -- unless max_node_count is hit first.

* max_depth -- maximum depth for the tree. Defaults to 3 plus min_depth. The tree will not be deeper than this.

* max_children -- maximum number of children any mother in the tree can have. Defaults to 4.

the constructor CLASS->lol_to_tree($lol);

Converts something like bracket-notation for "Chomsky trees" (or rather, the closest you can come with Perl list-of-lists(-of-lists(-of-lists))) into a tree structure. Returns the root of the tree converted.

The conversion rules are that: 1) if the last (possibly the only) item in a given list is a scalar, then that is used as the "name" attribute for the node based on this list. 2) All other items in the list represent daughter nodes of the current node -- recursively so, if they are list references; otherwise, (non-terminal) scalars are considered to denote nodes with that name. So ['Foo', 'Bar', 'N'] is an alternate way to represent [['Foo'], ['Bar'], 'N'].

An example will illustrate:

  use Tree::DAG_Node;
  $lol =
    [
      [
        [ [ 'Det:The' ],
          [ [ 'dog' ], 'N'], 'NP'],
        [ '/with rabies\\', 'PP'],
        'NP'
      ],
      [ 'died', 'VP'],
      'S'
    ];
   $tree = Tree::DAG_Node->lol_to_tree($lol);
   $diagram = $tree->draw_ascii_tree;
   print map "$_\n", @$diagram;

...returns this tree:

                   |                   
                  <S>                  
                   |                   
                /------------------\   
                |                  |   
              <NP>                <VP> 
                |                  |   
        /---------------\        <died>
        |               |              
      <NP>            <PP>             
        |               |              
     /-------\   </with rabies\>       
     |       |                         
 <Det:The>  <N>                        
             |                         
           <dog>                       
$node->tree_to_lol_notation({...options...})

Dumps a tree (starting at $node) as the sort of LoL-like bracket notation you see in the above example code. Returns just one big block of text. The only option is "multiline" -- if true, it dumps the text as the sort of indented structure as seen above; if false (and it defaults to false), dumps it all on one line (with no indenting, of course).

For example, starting with the tree from the above example, this:

  print $tree->tree_to_lol_notation, "\n";

prints the following (which I've broken over two lines for sake of printablitity of documentation):

  [[[['Det:The'], [['dog'], 'N'], 'NP'], [["/with rabies\x5c"],
  'PP'], 'NP'], [['died'], 'VP'], 'S'], 

Doing this:

  print $tree->tree_to_lol_notation({ multiline => 1 });

prints the same content, just spread over many lines, and prettily indented.

$node->tree_to_lol

Returns that tree (starting at $node) represented as a LoL, like what $lol, above, holds. (This is as opposed to tree_to_lol_notation, which returns the viewable code like what gets evaluated and stored in $lol, above.)

Lord only knows what you use this for -- maybe for feeding to Data::Dumper, in case tree_to_lol_notation doesn't do just what you want?

$list_r = $node->draw_ascii_tree({ ... options ... })

Draws a nice ASCII-art representation of the tree structure at-and-under $node, with $node at the top. Returns a reference to the list of lines (with no "\n"s or anything at the end of them) that make up the picture.

This takes parameters you set in the options hashref:

* "no_name" -- if true, draw_ascii_tree doesn't print the name of the node; simply prints a "*". Defaults to 0 (i.e., print the node name.)

* "h_spacing" -- number 0 or greater. Sets the number of spaces inserted horizontally between nodes (and groups of nodes) in a tree. Defaults to 1.

* "h_compact" -- number 0 or 1. Sets the extent to which draw_ascii_tree tries to save horizontal space. Defaults to 1. If I think of a better scrunching algorithm, there'll be a "2" setting for this.

* "v_compact" -- number 0, 1, or 2. Sets the degree to which draw_ascii_tree tries to save vertical space. Defaults to 1.

This occasionally returns trees that are a bit cock-eyed in parts; if anyone can suggest a better drawing algorithm, I'd be appreciative.

$node->delete_tree

Destroys the entire tree that $node is a member of (starting at the root), by nulling out each node-object's attributes (including, most importantly, its linkage attributes -- hopefully this is more than sufficient to eliminate all circularity in the data structure), and then moving it into the class DEADNODE.

Use this when you're finished with the tree in question, and want to free up its memory. (If you don't do this, it'll get freed up anyway when your program ends.)

If you try calling any methods on any of the node objects in the tree you've destroyed, you'll get an error like:

  Can't locate object method "leaves_under"
    via package "DEADNODE".

So if you see that, that's what you've done wrong.

The delete_tree method is needed because Perl's garbage collector would never (as currently implemented) see that it was time to de-allocate the memory the tree uses -- until either you call $node->delete_tree, or until the program stops (at "global destruction" time, when everything is unallocated).

Incidentally, there are better ways to do garbage-collecting on a tree, ways which don't require the user to explicitly call a method like delete_tree -- they involve dummy classes, as explained at http://mox.perl.com/misc/circle-destroy.pod

However, introducing a dummy class concept into Tree::DAG_Node would be rather a distraction. If you want to do this with your derived classes, via a DESTROY in a dummy class (or in a tree-metainformation class, maybe), then feel free to.

The only case where I can imagine delete_tree failing to totally void the tree, is if you use the hashref in the "attributes" attribute to store (presumably among other things) references to other nodes' "attributes" hashrefs -- which 1) is maybe a bit odd, and 2) is your problem, because it's your hash structure that's circular, not the tree's. Anyway, consider:

      # null out all my "attributes" hashes
      $anywhere->root->walk_down({
        'callback' => sub {
          $hr = $_[0]->attributes; %$hr = (); return 1;
        }
      });
      # And then:
      $anywhere->delete_tree;

(I suppose delete_tree is a "destructor", or as close as you can meaningfully come for a circularity-rich data structure in Perl.)

RAMBLINGS

Currently I don't assume anything about the class membership of nodes being manipulated, other than by testing whether each one provides a method is_node, a la:

  die "Not a node!!!" unless &UNIVERSAL::can($node, "is_node");

So, as far as I'm concerned, a given tree's nodes are free to belong to different classes, just so long as they provide is_node, the few methods that this class relies on to navigate the tree, and have the same internal object structure -- at least as far as the daugher and method attributes. Presumably this would be the case for any object belonging to a class derived from Tree::DAG_Node, or belonging to Tree::DAG_Node itself.

AUTHOR

Sean M. Burke sburke@netadventure.net