The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Tree::Parser - Module to parse formatted files into tree structures

SYNOPSIS

  use Tree::Parser;
  
  # create a new parser object with some input
  my $tp = Tree::Parser->new($input);
  
  # set a parse filter
  $tp->setParseFilter(sub {
      my ($line_iterator) = @_;
      my $line = $line_iterator->next();
      my ($tabs, $node) = $line =~ /(\t*)(.*)/;
      my $depth = length $tabs;
      return ($depth, $node);
  });
  
  # parse our input and get back a tree
  my $tree = $tp->parse();
  
  # set our deparse filter
  $tp->setDeparseFilter(sub { 
      my ($tree) = @_;
      return ("\t" x $tree->getDepth()) . $tree->getNodeValue();
  });
  
  # deparse our tree and get back a string
  my $tree_string = $tp->deparse();

DESCRIPTION

This module can parse various types of input (formatted and containing hierarchal information) into a tree structures. It can also deparse the same tree structures back into a string. It accepts various types of input, such as; strings, filenames, array references. The tree structure is a hierarchy of Tree::Simple objects.

The parsing is controlled through a parse filter, which is used to process each "line" in the input (see setParseFilter below for more information about parse filters).

The deparseing as well is controlled by a deparse filter, which is used to covert each tree node into a string representation.

This module can be viewed (somewhat simplistically) as a serialization tool for Tree::Simple objects. Properly written parse and deparse filters can be used to do "round-trip" tree handling.

METHODS

Constructor

new ($tree | $input)

The constructor is used primarily for creating an object instance. Initializing the object is done by the _init method (see below).

Input Processing

setInput ($input)

This method will take varios types of input, and pre-process them through the prepareInput method below.

prepareInput ($input)

The prepareInput method is used to pre-process certain types of $input. It accepts any of the follow types of arguments:

an Array::Iterator object

This just gets passed on through.

an array reference containing the lines to be parsed

This type of argument is used to construct an Array::Iterator instance.

a filename which ends in .tree

The file is opened, its contents slurped into an array, which is then used to construct an Array::Iterator instance.

a string

The string is expected to have embedded newlines, and in fact must have at least, more than one as a single node tree does not make much sense.

It then returns an Array::Iterator object ready for the parser.

Filter Methods

setParseFilter ($filter)

A parse filter is a subroutine reference which is used to process each element in the input. As the main parse loop runs, it calls this filter routine and passes it the Array::Iterator instance which represents the input. To get the next element/line/token in the iterator, the filter must call next, the element should then be processed by the filter. A filter can if it wants advance the iterator further by calling next more than once if nessecary, there are no restrictions as to what it can do. However, the filter must return these two values in order to correctly construct the tree:

the depth of the node
the value of the node (which can be anything; string, array ref, object instanace, you name it)

The following is an example of a very basic filter which simply counts the number of tab characters to determine the node depth and then captures any remaining character on the line.

  $tree_parser->setParseFilter(sub {
      my ($iterator) = @_;
      my $line = $iterator->next();
      # match the tables and all that follows it
      my ($tabs, $node) = ($line =~ /(\t*)(.*)/);
      # calculate the depth by seeing how long
      # the tab string is.
      my $depth = length $tabs;
      # return the depth and the node value
      return ($depth, $node);
  }); 
setDeparseFilter ($filter)

The deparse filter is the opposite of the parse filter, it takes each element of the tree and returns a string representation of it. The filter routine gets passed a Tree::Simple instance and is expected to return a single string. However, this is not enforced we actually will gobble up all the filter returns, but keep in mind that each element returned is considered to be a single line in the output, so multiple elements will be treated as mutiple lines.

Here is an example of a deparse filter. This can be viewed as the inverse of the parse filter example above.

  $tp->setDeparseFilter(sub { 
      my ($tree) = @_;
      return ("\t" x $tree->getDepth()) . $tree->getNodeValue();
  });

Accessors

getTree

This method returns the tree held by the parser or set through the constructor.

Parse/Deparse

parse

Parsing is pretty automatic once everthing is set up. This routine will check to be sure you have all you need to proceed, and throw an execption if not. Once the parsing is complete, the tree will be stored interally as well as returned from this method.

deparse

This method too is pretty automatic, it verifies that it has all its needs, throwing an exception if it does not. It will return an array of lines in list context, or in scalar context it will join the array into a single string seperated by newlines.

Private Methods

_init ($tree | $input)

This will initialize the slots of the object. If given a $tree object, it will store it. If given some other kind of input, it will process this through the prepareInput method.

_parse

This is where all the parsing work is done. If you are truely interested in the inner workings of this method, I suggest you refer to the source. It is a very simple algorithm and should be easy to understand.

_deparse

This is where all the deparsing work is done. As with the _parse method, if you are interested in the inner workings, I suggest you refer to the source.

TO DO

Make some default filters,.. turn them into constants which can be used.

Make a way to define the "indent" instead of defining a filter.

BUGS

None that I am aware of. Of course, if you find a bug, let me know, and I will be sure to fix it. This module, in an earlier/simpler form, has been and is being used in production for approx. 1 year now without incident. This version has been improved and the test suite added.

CODE COVERAGE

I use Devel::Cover to test the code coverage of my tests, below is the Devel::Cover report on this module's test suite.

 ------------------------------ ------ ------ ------ ------ ------ ------ ------
 File                             stmt branch   cond    sub    pod   time  total
 ------------------------------ ------ ------ ------ ------ ------ ------ ------
 /Tree/Parser.pm                 100.0   82.6   73.3  100.0  100.0   25.6   93.2
 t/10_Tree_Parser_test.t         100.0    n/a    n/a  100.0    n/a   19.5  100.0
 t/20_Tree_Parser_inputs_test.t   98.9   50.0    n/a  100.0    n/a   35.8   98.1
 t/30_Tree_Parser_errors_test.t   95.5    n/a    n/a   90.0    n/a   19.1   93.8
 ------------------------------ ------ ------ ------ ------ ------ ------ ------
 Total                            98.9   81.2   73.3   96.4  100.0  100.0   95.4
 ------------------------------ ------ ------ ------ ------ ------ ------ ------

SEE ALSO

DEPENDENCIES

This module uses two other module which I have written, you will need to install these both.

Tree::Simple
Array::Iterator

AUTHOR

stevan little, <stevan@iinteractive.com>

COPYRIGHT AND LICENSE

Copyright 2004 by Infinity Interactive, Inc.

http://www.iinteractive.com

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.