++ed by:

3 PAUSE users

David F. Houghton


TPath::Forester - a generator of TPath expressions for a particular class of nodes


version 1.007


  # we apply the TPath::Forester role to a class

    package MyForester;

    use Moose;                                      # for simplicity we omit removing Moose droppings, etc.
    use MooseX::MethodAttributes;                   # needed if you're going to add some attributes

    with 'TPath::Forester';                         # compose in the TPath::Forester methods and attributes

    # define abstract methods
    sub children    { $_[1]->children }             # our nodes know their children
    sub parent      { $_[1]->parent }               # our nodes know their parent
    sub has_tag     {                               # our nodes have a tag attribute which is
       my ($self, $node, $tag) = @_;                #   their only tag
       $node->tag eq $tag;
    sub matches_tag { 
       my ($self, $node, $re) = @_;
       $node->tag =~ $re;

    # define an attribute
    sub baz :Attr   {   
      # the canonical order of arguments, none of which we need
      # my ($self, $node, $index, $collection, @args) = @_;

  # now select some nodes from a tree

  my $f     = MyForester->new;                      # make a forester
  my $path  = $f->path('//foo/>bar[@depth = 4]');   # compile a path
  my $root  = fetch_tree();                         # get a tree of interest
  my @nodes = $path->select($root);                 # find the nodes of interest

  # say our nodes have a text method that returns a string

  $f->add_test( sub { shift->text =~ /^\s+$/ } );   # ignore whitespace nodes
  $f->add_test( sub { shift->text =~ /^-?\d+$/ } ); # ignore integers
  $f->add_test( sub { ! length shift->text } );     # ignore empty nodes

  # reset to ignoring nothing



A TPath::Forester understands your trees and hence can translate TPath expressions into objects that will select the appropriate nodes from your trees. It can also generate an index appropriate to your trees if you're doing multiple selects on a particular tree.

TPath::Forester is a role. It provides most, but not all, methods and attributes required to construct TPath::Expression objects. You must specify how to find a node's children and its parent (you may have to rely on a TPath::Index for this), and you must define how a tag string or regex may match a node, if at all.

Why "Forester"

Foresters are people who can tell you about trees. A class with the role TPath::Forester can also tell you about trees. I think now "arborist" sounds better, but I don't feel like refactoring everything to use a new name.



A TPath::LogStream required by the @log attribute. By default it is TPath::StderrLog. This attribute is required by the @log attribute from TPath::Attributes::Standard.


Whether to use xpath-style index predicates, with [1] being the index of the first element, or zero-based indices, with [0] being the first index. This only affects non-negative indices. This attribute is false by default.


Whether selectors are case-insensitive in their matchign of tags. This attribute is false by default.


add_test, has_tests, clear_tests

Add a code ref that will be used to test whether a node is ignorable. The return value of this code will be treated as a boolean value. If it is true, the node, and all its children, will be passed over as possible items to return from a select.

Example test:

  $f->add_test(sub {
      my ($forester, $node, $index) = @_;
      return $forester->has_tag('foo');

Every test will receive the forester itself, the node, and the index as arguments. This example test will cause the forester $f to ignore foo nodes.

This method has the companion methods has_tests and clear_tests. The former says whether the list is empty and the latter clears it.


Expects a name, a code reference, and possibly options. Adds the attribute to the forester.

If the attribute name is already in use, the method will croak unless you specify that this attribute should override the already named attribute. E.g.,

  $f->add_attribute( 'foo', sub { ... }, -override => 1 );

If you specify the attribute as overriding and the name is *not* already in use, the method will carp. You can use the -force option to skip all this checking and just add the attribute.

Note that the code reference will receive the forester, a node, an index, a collection of nodes, and optionally any additional arguments. If you want the attribute to evaluate as undefined for a particular node, it must return undef for this node.


Expects a TPath::Context, an attribute name, and an optional parameter list. Returns the value of the attribute in that context.


Takes a TPath expression and returns a TPath::Expression.


Takes a tree node and returns a TPath::Index object that TPath::Expression objects can use to cache information about the tree rooted at the given node.


Expects a TPath::Context and returns the parent of the context node according to the index. If your nodes know their own parents, you probably want to override this method. See also TPath::Index.


Expects a node. Returns id of node, if any. By default this method always returns undef. Override if your node has some defined notion of id.


Expects an attribute name and optionally a list of arguments. Returns a code reference instantiating the attribute. This method is required for attributes such as




Note the unescaped colon preceding the attribute name.

Autoloading is useful for this such as HTML or XML trees, where nodes may have ad hoc attributes.

This method must be defined by each forester requiring attribute auto-loading. The default method will always return undef, and if one attempts to use it to autoload an attribute an error will be thrown during expression compilation.


Expects a node, and an index.

Returns whether the context node is a leaf. Override this with something more efficient where available. E.g., where the node provides an is_leaf method,

  sub is_leaf { $_[1]->is_leaf }


Expects a node and an index.

Returns whether the context node is the root. Delegates to index.

Override this with something more efficient where available. E.g., where the node provides an is_root method,

  sub is_root { $_[1]->is_root }


Expects a node and a string. Returns whether the node, in whatever sense is appropriate to this sort of node, "has" the string as a tag. See the required tag method.


Expects a node and a compiled regex. Returns whether the node, in whatever sense is appropriate to this sort of node, has a tag that matches the regex. See the required tag method.


Expects a node and possibly an options hash. Returns a node of the type understood by the forester.

If your forester must coerce things into a tree of the right type, override this method, which otherwise just passes through its second argument.

Note, if you do need to override the default wrap, you'll have to jump through a few Moose hoops. The basic pattern is

  use Moose;
  with 'TPath::Forester' => { -excludes => 'wrap' };

      no warnings 'redefine';
      sub wrap {
          my ($self, $node, %opts) = @_;
          return $node if blessed $node and $node->isa('MyNode');
          # coerce

See TPath::Forester::Ref for an example.


TPath::Attributes::Standard, TPath::TypeCheck



Expects a node and an index. Returns the children of the node as a list.


Expects a node and returns the value selectors are matched against, or undef if the node has no tag.

If your node type cannot be so easily mapped to a particular tag, you may want to override the has_tag and matches_tag methods and supply a no-op method for tag.


David F. Houghton <dfhoughton@gmail.com>


This software is copyright (c) 2013 by David F. Houghton.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.