The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Text::BIP (Blosxom Infrastructure Package) -- an object-oriented module for facilitating event-based file system indexing.

SYNOPSIS

 #!/usr/bin/perl -w
 
 use Text::BIP;

 # create object and initialize
 my $bip = new Text::BIP;
 $bip->depth(1); # do no index subdirectories. default is 0 recurse through all. 
 $bip->base('/some/path/name');
 
 # ... or initialize in the constructor.
 my $bip = Text::BIP->new( { depth=>1, base=>'/some/path/name' } );

 # set a file handler for .txt files.
 $bip->file_handler(\&hdlr_file,'txt');
 $bip->index_handler(\&hdlr_index);
 
 # index a directory (base) and include all subdirectories
 $bip->index();

 # index again but using an alternate directory
 $bip->index('/some/other/path/name');

 # simple file handler that dumps the values for each file found to the screen 
 sub hdlr_file {
  print "Dir: ".$_[0]->dir."\n";
  print "Relative Dir: ".$_[0]->relative_dir."\n";
  print "Relative Dir (base overide): ".$_[0]->relative_dir('/some/other/path/name')."\n";
  print "File: ".$_[0]->file."\n";
  print "Extension: ".$_[0]->ext."\n";
  print "Name: ".$_[0]->name."\n";
  print "Relative Name: ".$_[0]->relative_name."\n";
  print "Relative Name (base overide): ".$_[0]->relative_name('/some/other/path/name')."\n";
  print "\n";
 }
 
 # simple index handler that prints the name of a subdirectory.
 sub hdlr_index { print "FOLDER\nName: ".$_[0]->name."\n\n"; }

DESCRIPTION

The purpose of this module is to provide a lightweight mechanism for facilitating event-based file system indexing. In many ways it's File::Find with a slightly more specific and object-oriented interface.

When Rael Dornfest released blosxom, his lightweight yet feature-packed weblog application, I was intrigued by how much could be done with so little. The one feature that made the biggest impression on me is how blosxom used the file system as a simple hierarchical document database. I began to apply this technique in a number of my scripts whose scope was outside of the realm of the traditional weblog uses blosxom was designed to handle. To better organize and reuse my code, I created a module that implemented an extensible framework that I could begin dropping into my scripts. The result became BIP.

BIP (Blosxom Infrastructure Package) an object-oriented module that delivers an event-based (callback) framework for indexing a file system similarly to blosxom. While there are some similarities to blosxom, BIP implements extensibility differently because of its different goals. It places extensibility over all other things and, to a certain extent, turn's blosxom's plugin architecture inside out. BIP plugs into your code rather then you plugging code into it like with blosxom.

METHODS

BIP->new( [ { depth=>integer, base=>'/path/name' } ] )

The constructor method. Can optionally set depth and base values through a hash reference. Automatically calls the init method.

$bip->init( [ {depth=>integer, base=>'/path/name' } ] )

Clears the stash and other internal variables include the base and depth. Can optionally set depth and base values through a hash reference while initializing. Is called by new.

$bip->depth( [ $int ] )

Returns the maximum directory depth setting. The default is 0, no limit. A depth of 1 means do not index any subdirectories found. If an optional integer parameter is passed, it sets the traversal depth.

$bip->base('/path/name')

Returns the path that BIP will begin indexing at unless overridden. This value is also used by "relative" "Indexing Methods" unless overridden also.

$bip->stash( $key, [$value] )

A simple mechanism for setting and getting info. If the optional $value parameter is passed it sets the value. This method useful for handlers to manipulate BIP's state and persist results after indexing.

$bip->index( ['/some/path/name'] )

Launches the traversal of a directory structure and calls handlers during operation. Providing an optional path parameter overrides any value that was set in base.

HANDLER METHODS

This group of methods are used to register callback handlers that BIP will call while indexing.

are necessarily required, but BIP is rather worthless unless at least one handler and more specifically either a file or index handler, has been set.

$bip->prerun( \&coderef )

Sets a reference to a routine that will be called when index is called, but before traversal.

$bip->postrun( \&coderef )

Sets a reference to a routine that will be called right before index returns control to its caller.

$bip->index_handler( \&coderef )

Sets a reference to a routine that will be called when a directory is encountered, but before traversing it.

$bip->file_handler( \&coderef, ext[, ext1, ext2... extn] )

Sets a reference to a routine that will be called when a file of a certain extension is encountered. You can register a handler for a multiple extensions with one call or set each extension individually.

 $bip->file_handler(\&foo,'htm','html','php');
 
 # OR
 
 $bip->file_handler(\&foo, 'htm');
 $bip->file_handler(\&foo, 'html');
 $bip->file_handler(\&foo, 'php');

Giving a handler an extension of * (asterisk) will cause the handler to be run on any files that are encountered and does not have a handler explicitly defined otherwise.

$bip->clear_handlers

Unregisters all handlers for all extensions including read_handlers.

EXPERIMENTAL METHODS

I've been trying out two experimental methods I'm not sure are valuable or are done as they should. These methods facilitate what I think is a more elegant means of reading in or parsing files after traversal.

If you were to create something just like Blosxom (why you'd do that when you have blosxom is another issue) you may have some code (in psuedo) like this:

Without them you may use BIP to traverse a path and the output their contents with some like the quasi-psuedo code below.

 foreach (@files) {
        if $_ is $ext1
                print &read_file_ext1()."\n";
        elsif $_ is $ext2
                print &read_file_ext2()."\n";
        elsif $_ is $ext3 or $_ is $ext4 
                my %data = &parse_file_ext3_or_ext4()
                foreach keys %data {
                        print "$_: ".$data{ $_ }."\n";
                }
        }
 }

With them you would do something like this: (More quasi-psuedo code.)

 $bip->read_handler(\&read_file_ext1,'ext1');
 $bip->read_handler(\&read_file_ext2,'ext2');
 $bip->read_handler(\&parse_file_ext3_or_ext4,'ext3','ext4');
 
 # Then later you would just have to do
 foreach (@files) {
        print $bip->read_file($_);
 }
 

The details of these methods are as follows.

$bip->read_handler( \&coderef, ext[, ext1, ext2... extn])

Sets a handler for reading a specific file based on files extension. Like file_handler you can register a handler routine to multiple extensions or set each individually. You can also pass an extension of '*' (asterisk) will cause the handler to be run on any files that does not have a read_handler explicitly defined. The return type a handler is at the discretion of the handler routine's author. It is recommended that you do not return a value of undef unless an error has occurred.

$bip->read_file( '/full/path/to/file' )

Calls the associated read_handler and passes through the $file parameter. The return type is at the discretion of the handler routine's author. It is recommended that you do not return a value of undef unless an error has occurred.

INDEXING METHODS

The following methods are for handler functions to get the current state of BIP while processing an index call. They are only relevant during traversal.

$bip->index_depth

The current depth (levels of subdirectories) from the starting point of the index.

$bip->dir

The current directory.

$bip->relative_dir( [ /some/other/path ] )

The current directory relative to base or the optional parameter passed in.

$bip->file

The current file name.

$bip->ext

The current file names extension.

$bip->name

The fully path qualified filename.

$bip->relative_name( [/some/other/path] )

The relative path and filename.

DEPENDENCIES

BIP makes use of the DirHandle and File::Spec packages which are part of the standard distribution of perl.

SEE ALSO

http://www.blosxom.com/ - Rael Dornfest's blosxom web site, File::Find

TO DO

These are some enhancements I thought of adding. Feedback to their implementation (or dropping them) is appreciated.

  • More explicit options for hidden (.*) files and symlinks? How?

  • Default to wildcard (*) if no extension has been specified while registering file and index handlers?

  • Deletion of a specific handler.

  • Ability to cancel the traversal of a directory from a handler.

  • A method for mapping a file system path to a HTTP document root generating a URL.

  • More usage examples and optional utility classes.

LICENSE

The software is released under the Artistic License. The terms of the Artistic License are described at http://www.perl.com/language/misc/Artistic.html.

AUTHOR & COPYRIGHT

Except where otherwise noted, Text::BIP is Copyright 2003-4, Timothy Appnel, cpan@timaoutloud.org. All rights reserved.

8 POD Errors

The following errors were encountered while parsing the POD:

Around line 209:

'=item' outside of any '=over'

Around line 241:

You forgot a '=back' before '=head1'

Around line 247:

'=item' outside of any '=over'

Around line 279:

You forgot a '=back' before '=head1'

Around line 317:

'=item' outside of any '=over'

Around line 332:

You forgot a '=back' before '=head1'

Around line 337:

'=item' outside of any '=over'

Around line 365:

You forgot a '=back' before '=head1'