The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

File::Collector - Base class for custom File::Collector classes for classifying files and calling File::Collector::Processor methods for processing files

VERSION

version 0.037

OVERVIEW

File::Collector and its companion module File::Collector::Processor are base classes designed to make it easier to create custom modules for classifying and processing a collection of files as well as generating and processing data related to files in the collection.

For example, let's say you need to import raw files from one directory into some kind of repository. Let's say that files in the directory need to be filtered and the content of the files needs to be parsed, validated, rendered and/or changed before getting imported. Complicating things further, let's say that the name and location of the file in the target repository is dependent upon the content of the files in some way and that you also have to check to make sure the file hasn't already been imported into the repository.

This kind of task can be acomplished with a series of one-off scripts that process and import your files in stages. Each script produces output suitable for the next one. But running separate scripts for each processing stage can be slow, tedious, error-prone and a headache to maintain and organize.

The File::Collector and File::Collector::Processor base modules make it trivial to chain file processing modules into one logical package to make complicated file processing more robust, testable, and simpler to code.

SYNOPSIS

There are three steps to using File::Collector. First, create at least one Collector class for classifying and filtering files. Next, create a Processor class your Collector class will use to process the classified files. Finally, write a script to construct a new File::Collector object to collect and process your files.

Step 1: Create the Collector classes

  package File::Collector::YourCollector;
  use strict; use warnings;

  # You Collector must use Role::Tiny or you will get an error
  use Role::Tiny

  # Here we add in the package containing the processing methods associated
  # with the Collector (see below).
  use File::Collector::YourCollector::Processor;

  # Objects can store information about the files in the collection which
  # can be accessed by other Collector and Processor classes.
  use SomeObject;

  # Add categories for file collections with the _init_processors method. These
  # categories are used as labels for Processor objects which contain
  # information about the files and can run methods on them. In the example
  # below, we add two file collection categories, "good" and "bad."
  sub _init_processors {
    return qw ( good bad );
  }

  # Next we add a _classify_file method that is called once for each file added
  # to our Collector object. The primary job of this method is to add files and
  # any associated objects to a Processor for further processing.
  sub _classify_file {
    my $s = shift;

    # First, we create an object and associate it with our file using the
    # _add_obj method. There is no requirement that you create objects but they
    # will make data about your file easily available to other classes.
    # Offloading as much logic as possible to objects will keep classes simple.

    # Note how we pass the name of the current file being processed to the
    # object by using the "selected" method which intelligently generates the
    # full path to the file currently being processed by _classify_file. Also
    # note that we don't have to bother passing the name of the file to _add_obj
    # method since this method can figure out which file is being processed by
    # calling the "selected" method as well.
    my $data = SomeObject->new( $s->selected );
    $s->_add_obj('data', $data);

    # Now that we know something about our file, we can classify the files
    # according to any criteria of our choosing.
    # to a processor category
    if ( $data->{has_good_property} ) {
      $s->_classify('good');
    } else {
      $s->_classify('bad');
    }
  }

  # Finally, the _run_processes method contains method calls to your
  # Processor methods.
  sub _run_processes {
    my $s = shift;

    # Below are the methods we can run on the files in our collection. The
    # "good_files" method returns the collection of files classified as "good"
    # and the "do" method is a method that automatically iterates over the
    # files. The "modify" method is one of the methods in our Processor class
    # (see below).
    $s->good_files->do->modify;

    # Run methods on files classified as "bad"
    $s->bad_files->do->fix;
    $s->bad_files->do->modify;

    # You can call methods found in any of the earlier Processor classes in your
    # file processing chain.
    $s->good_files->do->import;
    $s->bad_files->do->import;
  }

Step 2: Create your Processor classes.

  # The Processor class must have the same package name as the Collector class
  # but with "::Processor" tacked on to the end.
  package File::Collector::YourCollector::Processor;

  # This line is required to get access to the methods from the base class.
  use parent 'File::Collector::Processor';

  # This custom method is run once for each file in a collection when we use
  # the "do" method.
  sub modify {
    my $s = shift;

    # Skip the file if it has already been processed.
    next if ($s->attr_defined ( 'data', 'processed' ));

    # Properties of objects added by Collector classes can be easily accessed.
    my @values = $s->get_obj_prop ( 'data', 'header_values' );

    # You can call methods found inside objects, too. Here we run the
    # add_header() method on the data object and pass on values to it.
    $s->obj_meth ( 'data', 'add_header', \@values );
  }

  # We can add as many additional custom methods as we need.
  sub fix {
    ...
  }

Step 3: Construct the Collector

Once your classes have been created, you can run all of your collectors and processors simply by constructing a Collector object.

The constructor takes three types of arguments: a list of the files and/or directories you want to collect; an array of the names of the Collector classes you wish to use in the order you wish to employ them; and finally, an option hash, which is optional.

   my $collector = File::Collector::YourClassifier->new(
     # The first arguments are a list of resources to be added
     'my/dir', 'a_file.txt',

     # The next argument is an array of Collector class names listed in the same
     # order you want them to run
     [ 'File::Collector::First', 'File::Collector::YourCollector'],

     # Finally, an optional hash argument for options can be supplied
     { recurse => 0 });

   # The C<$collector> object has some useful methods:
   $collector->get_count;  # returns the total number of files in the collection

   # Convenience methods with a little under-the-hood magic make it painless to
   # iterate over files and run methods on them.
   while ($collector->next_good_file) {
     $collector->print_short_name;
   }

   # Iterators can be easily created from C<Processor> objects:
   my $iterator = $s->get_good_files;
   while ( $iterator->next ) {
     # run C<Processor> methods and do other stuff to "good" files
     $iterator->modify_file;
   }

DESCRIPTION

Collector Methods

The methods can be run on Collector objects after they've been constructed.

new( $dir, $file, ..., [ @custom_collector_classes ], \%opts )

new( $dir, $file, ..., [ @custom_collector_classes ] )

new( $dir, $file, ..., )

  my $collector = File::Collector->new( 'my/directory',
                                        [ 'Custom::Classifier' ]
                                        { recurse => 0 } );

Creates a Collector object that collects files from the directories and files in the argument list. Once collected, the files are processed by the @custom_collector_classes in the order supplied by the array argument. Each of your Collector classes must use Role::Tiny or you will receive an error.

An option hash can be supplied to turn directory recursion off by setting recurse to false.

new returns an object which contains all the files, their processing classes, and any data you have associated with the files. This object has serveral methods that can be used to inspect the object.

add_resources( $dir, $file, ... )

  $collector->add_resources( 'myfile1.txt', '/my/home/dir/files/', ... );

Adds additional file resources to an existing collection and processes them. This method accepts no option hash and the same one supplied to the new constructor is used.

get_count()

  $collector->get_count;

Returns the total number of files in the collection.

get_files()

  my @all_files = $collector->get_files;

Returns a list of the full path of each file in the collection.

get_file( $file_path )

  my $file = $collector->get_file( '/full/path/to/file.txt' );

Returns a reference of the data and objects associated with a file.

list_files_long()

Prints the full path names of each file in the collection, sorted alphabetically, to STDOUT.

list_files()

Same as list_files_long but prints the files' paths relative to the top level directory shared by all the files in the collections.

next_FILE_CATEGORY_file()

  while ($collector->next_good_file) {
    my $file = $collector->selected;
    ...
  }

Retrieves the first file from the the collection of files indicated by FILE_CATEGORY. Each subsequent next call iterates over the list of files. Returns a boolean false when the file is exhausted. Provides an easy way to iterate over files and perform operations on them.

FILE_CATEGORY must be a valid processor name as supplied by one of the _init_processors method.

FILE_CATEGORY_files()

  my $processor = $collector->good_files;

Returns the File::Processor object for the category indicated by FILE_CATEGROY.

get_FILE_CATEGORY_files()

Similar to FILE_CATEGORY_files() except a shallow clone of the File::Processor object is returned. Useful if you require separate iterators for files in the same category.

FILE_CATEGORY must be a valid processor name as supplied by one of the _init_processors method.

isa_FILE_CATEGORY_file()

Returns a boolean value reflecting whether the file being iterated over belongs to a category.

Private Methods

The following private methods are used in a child classes of File::Collector you provide. See the SYNOPSIS for examples of these methods in use.

_init_processors()

  sub _init_processors {
    return qw ( 'category_1', 'category_2' );
  }

Creates new file categories. Internally, this method adds a new Processor object to the Collector for each category added so that Processor methods from custom Processor classes can be run on individual categories of files.

_classify_file()

  sub _classify_file {
    my $s = shift;

    # File classifying and analysis logic goes here
  }

Use this method to classify files and to associate objects with your files using the methods provided by the Collector class. This method is run once for each file in the collection.

_run_processes()

  sub _run_processes {
    my $s = shift;

    # Processor method calls go here
  }

In this method, you should place various calls to Processor methods.

_classify( $category_name )

This method is typically called from within the _classify_file method. It adds the file currently getting pocessed to a collection of $category_name files contained within a Processor object which, in turn, belongs to the Collector object. The $category_name must match one of the processor names provided by the _init_processor methods.

_add_obj( $object_name, $object )

Like the _classify method, this method is typically called from within the _classify_file method. It associates the object specified by $object to an arbitrary name, specified by $object_name, with the file currently getting processed.

Iteration Methods

These methods can be used while iterating over a collection of files. See the SYNOPSIS for some examples of these methods in use.

get_obj_prop( $obj_name, $property_name )

Returns the contents of an object's property.

get_obj( $obj_name )

Returns an object associated with a file.

set_obj_prop( $obj_name, $property_name, $value )

Sets an object's property.

obj_meth( $obj_name, $method_name, $method_args );

Runs the $method_name method on the object specified in $obj_name. Arguments are passed via $method_args.

get_filename()

Retrieves the name of file being processed without the path.

selected()

Returns the full path and file name of the file being processed.

has_obj( $obj_name );

Returns a boolean value reflecting the existence of the obj in $obj_name.

attr_defined( $obj_name, $attr_name )

Returns a boolean value reflecting if the atrribute specified by $attr_name is defined in the $obj_name object.

Returns a shortened path, relative to all the files in the entire collection, and the file name of the file being processed or in an iterator.

REQUIRES

SUPPORT

Perldoc

You can find documentation for this module with the perldoc command.

  perldoc File::Collector

Websites

The following websites have more information about this module, and may be of help to you. As always, in addition to those websites please use your favorite search engine to discover more resources.

Source Code

The code is open to the world, and available for you to hack on. Please feel free to browse it and play with it, or whatever. If you want to contribute patches, please send me a diff or prod me to pull from your repository :)

https://github.com/sdondley/File-Collector

  git clone git://github.com/sdondley/File-Collector.git

BUGS AND LIMITATIONS

You can make new bug reports, and view existing ones, through the web interface at https://github.com/sdondley/File-Collector/issues.

INSTALLATION

See perlmodinstall for information and options on installing Perl modules.

SEE ALSO

File::Collector::Processor

AUTHOR

Steve Dondley <s@dondley.com>

Special Thanks

Thanks to all the generous monks at the PerlMonks community for patiently answering my (sometimes asinine) questions. A very special mention goes to jcb whose advice was invaluable to improving the quality of this module. Another shout out to Hippo for the suggesting Role::Tiny to help make the module more flexible.

COPYRIGHT AND LICENSE

This software is copyright (c) 2019 by Steve Dondley.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.