The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Tree::Walker - Iterate along hierarchical structures

VERSION

Version 0.01

SYNOPSIS

Tree::Walker provides an iterator framework for hierarchical things, starting with but not limited to the filesystem. It returns its results in the form of a Data::Table::Lazy, so there are plenty of handy tools available. It can be subclassed for things other than the filesystem, or you can tell it to use another class - either way.

UNIVERSAL METHODS

These methods constitute the API for Tree::Walker and are written in a universal fashion.

new

The new method sets up a walk. [possibly a walk method just to set one up and run it?]

The components of a walk are: =over =item The starting point (for the filesystem, a string representing the directory to start walking in) =item Restrictions on the walk (for the filesystem, extensions to be looked for or a pattern to match) =item A general set of handlers to be taken if some specific item is matched =item What information to be returned for each node (for the filesystem, the name, type, full path, timestamp, and size of each file/directory) =back

The walker is designed to be subclassed for walking different hierarchical structures; see Tree::Walker::Subclass for information about how that works.

walk, walk_all, walk_all_simple

walk returns an iterator that will return one item from the walk each time it's called. The returns are in the form of an arrayref of fields as specified in the walker query.

walk_all runs that iterator until it's done, returning the list of results.

walk_all_simple is a walk_all that only returns the list of first result elements (probably the tag, you see; good for quick filtering)

walk_table

Returns a Data::Table::Lazy table encapsulating a walk iterator. Only works if that module is installed; otherwise croaks.

walkdir (start, parameters, action)

Called with a string, an arrayref, and a subroutine, this function will build and call a walker, then run the iteration by repeated calls to the subroutine, like this:

   use Tree::Walker;
   
   my @file_list;
   walkdir '.', [suppress_nodes => 1], sub {
       push @file_list, $_[2];
   }

mapdir

Another little quickie, this one allows even briefer syntax if your subroutine is small.

    use Tree::Walker;
    
    my @pm_list = mapdir { $_[2] } '.', '.pm';

interpret_parameters

The interpret_parameters method sets up the parameters for the walk. Most of the work is done by interpret_parameters_class, which can be overridden, but the basic behavior is provided by the base class.

OVERRIDABLE OR PARTLY OVERRIDABLE METHODS

These methods work with the filesystem in the unadorned Tree::Walker but are overridden in subclasses (for example see Net::FTP::Walker).

interpret_parameters

The interpret_parameters methods interprets the parameters passed to ->new and sets up the walk environment.

The base class provides three different modes: =over =item Directory walking is the core functionality; you provide a start directory as the first parameter. =item Explicit file check; the first parameter is a string that points to a file, not a directory. This filespec can be a full relative path; it doesn't just have to be a name. =item List walk; the first parameter is an arrayref of either strings or arrayrefs. If the latter, then the first member of each child arrayref is the type tag for the rest, and the rest are interpreted recursively as subwalks. =back

The base class provides list (composite) walking,

The rest of the parameters mostly just apply to directory walks, which can be restricted in a number of different ways. There are four types of parameters: walk parameters, filter parameters, additional fields, and field selection. Field selection obviously applies to all types of walk, not just directory walks, as it determines what fields are actually returned by the call. Let's look at the four types separately.

There is actually only one walk parameter, postfix. If this is false, then it is a prefixed walk, and each node will appear in the results list before its children. If it's true, then nodes follow their children (this is necessary if you want a total-size number for each directory).

Parameters for filtering the results of filesystem walking are as follows, for filters applied to filenames (not directory names): =over =item ext - an extension that files must match to be returned =item ext_list - a list (arrayref) of extensions, one of which must be matched by a file to be returned =item pattern - a regexp that filenames must match for the file to be returned =item exists - return only existing files or non-existing files, for any files that have been specified explicitly =item filter - if all else fails, you can write your own filters here =back

The filter parameter contains either a coderef that will be passed the entire list of headers below and returns a boolean (false = don't return this row, true = return this row) or an arrayref [<coderef, field, field, field...]> that specifies which fields the coderef wants to see or an arrayref of such arrayrefs, e.g. [[<coderef, field, ...], [<coderef>, field, ...], ...]>

In the end, all the other filters go into the same filter structure anyway, so this part is very easy to subclass.

To select whether or not to return directories, or files, use: =over =item suppress_leaves - (at the abstract level) if set, non-expandable nodes will not be returned =item suppress_nodes - (at the abstract level) if set, expandable nodes will not be returned - doesn't affect the walk =item prune - a name or list of names that, if encountered, will not be walked at all =back

There's a shortcut for filesystem queries (or rather, a set of shortcuts). If the second parameter is not a hashref but rather a string, then: =over =item If it starts with a period but doesn't have a vertical bar | it will be understood as ext. =item If it starts with a period but does have at least one vertical bar | it will be ext_list. =item Otherwise, it will be taken as a pattern, which is a crippled regexp but quick and easy. =back

If one of these options is taken, suppress_nodes is also set because the idea is fast, easy ways to get data, and you probably just want file information. And of course you're locked into the defaults for everything else.

To add fields to the list of result fields, you can pass in a fields parameter that consists of an arrayref: [[<coderef, field, field, ...], ...]>. After the normal fields are generated, each of these field generators is called in sequence, and each returns a list of values to be named according to the list following the coderef.

Finally, to restrict the list of fields actually returned on each call to the generator, simply pass in a list of names under select = ['name1', 'name2'...]>.

walk_init ()

Initializes a walk. Doesn't do anything in the filesystem.

qualify (tag, stack)

Given the tag for a node and the stack above it, fully qualify the tag as a locator.

type (tag, stack)

Given the tag for a node, visits it (does initial retrieval) and tells us its type.

data_available

Returns a list of the fields the walker can return (i.e. the fields the driver knows about) and the default order in which they'll be returned.

For the filesystem, these are: =over =item name - the name of the file or directory =item role - the role of the node (specified at the outset) =item indent - the indentation level =item path - the path of the file or directory, built for the host OS using File::Spec =item dev - device number of the filesystem (this and the next 12 are the standard perl 'stat' fields) =item ino - inode number =item mode - file mode as integer =item nlink - number of (hard) links to the file =item uid - numeric user ID of owner =item gid - numeric group ID of owner =item rdev - device identifier for special files =item size - total size of file in bytes =item atime - last access time =item mtime - last modify time =item ctime - inode change time (these three all in seconds since 00:00 January 1, 1970 GMT) =item blksize - block size of filesystem =item blocks - actual number of blocks allocated to the file =item modestr - file mode as interpreted Unix-style mode string =item type - the first character of the modestr (for convenience) =back

get_data, get_data_class

Given the context and a node, gets the configured data for that node. Again, class-specific fields are handled in the get_data_class function.

headers

Returns names for the fields in each returned line.

get_children, get_left, get_right

The get_children method, called on a node, returns a list of its children (to be interpreted in turn by qualify and type). The get_left and get_right functions take that list and divide it according to the walk type.

AUTHOR

Michael Roberts, <michael at vivtek.com>

BUGS

Please report any bugs or feature requests to bug-tree-walker at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Tree-Walker. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Tree::Walker

You can also look for information at:

ACKNOWLEDGEMENTS

LICENSE AND COPYRIGHT

Copyright 2012 Michael Roberts.

This program is free software; you can redistribute it and/or modify it under the terms of the the Artistic License (2.0). You may obtain a copy of the full license at:

http://www.perlfoundation.org/artistic_license_2_0

Any use, modification, and distribution of the Standard or Modified Versions is governed by this Artistic License. By using, modifying or distributing the Package, you accept this license. Do not use, modify, or distribute the Package, if you do not accept this license.

If your Modified Version has been derived from a Modified Version made by someone other than you, you are nevertheless required to ensure that your Modified Version complies with the requirements of this license.

This license does not grant you the right to use any trademark, service mark, tradename, or logo of the Copyright Holder.

This license includes the non-exclusive, worldwide, free-of-charge patent license to make, have made, use, offer to sell, sell, import and otherwise transfer the Package with respect to any patent claims licensable by the Copyright Holder that are necessarily infringed by the Package. If you institute patent litigation (including a cross-claim or counterclaim) against any party alleging that the Package constitutes direct or contributory patent infringement, then this Artistic License to you shall terminate on the date that such litigation is filed.

Disclaimer of Warranty: THE PACKAGE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS "AS IS' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES. THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT ARE DISCLAIMED TO THE EXTENT PERMITTED BY YOUR LOCAL LAW. UNLESS REQUIRED BY LAW, NO COPYRIGHT HOLDER OR CONTRIBUTOR WILL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THE PACKAGE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.