The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

File::Repl - Perl module that provides file replication utilities

SYNOPSIS

  use File::Repl;

  %con = (
    dira     => 'C:/perl',
    dirb     => 'M:/perl',
    verbose  => '1',
    age      => '10',
  );

  $ref=File::Repl->New(\%con);
  $r1 = $ref->Update('\.p(l|m)','a<>b',1);
  $r2 = $ref->Update('\.t.*','\.tmp$','a<>b',1);

DESCRIPTION

The File:Repl provides simple file replication and management utilities. Its main functions are

File Replication

Allowing two directory structures to be maintained, ensuring files that meet selection logic criteria are mirrored and otherwise synchronized.

Bulk Renaming

Allowing files in a directory structure to be renamed according to the selection logic.

Compressing

Allowing files in a directory structure to be compressed according to a given logic.

Process

Run a common perl process against files in a directory structure according to selection logic.

Deletion

Allowing files in a directory structure to be deleted according to the selection logic.

METHODS

New(%con)

The New method constructs a new File-Repl object. Options are passed in the form of a hash reference \%con which define the file directories to be operated on and other parameters. The directories are scanned and each file is stat'ed. The hash keys have the following definitions-

dira

This identifies the first directory to be scanned (required).

dirb

This identifies the second directory to be scanned (required). If the object is only to have methods operate on it that operate on a single directory then dirb can be set to the same value as dira. This minimizes the directory structure to be sesarched.

verbose

The verbose flag has several valid values:

verbose = 0

No verbosity (default mode).

verbose = 1

All file copies and deletes are printed.

verbose = 2

Tombstone file trunkations are printed, and any timestamp changes made. Any file copies or deletes that would have been made that failed the agelimit criteria are printed.

verbose = 3

Configuration settings (from %con) and Files meeting the match criteria are printed.

verbose = 4

Files identified in each directory that match the regex requirements (from the Update method) are printed.

age

This specifies the maximum age of a file in days. Files older than this will be ignored by Update, Rename, Compress and Delete methods.

If the age is specified as a negative number files newer than this age will be ignored.

A default value of zero causes no age limit to be tested - all files are accepted on age limits.

recurse

When set to FALSE only files at the top level of the dira and dirb are scanned. Default value is TRUE

ttl

This is the time to live (ttl for any tombstoned file, in days. Default value is 31.

nocase

Switches for case sensitivity - default is TRUE (case insensitive).

mkdirs

If either directory dira or dirb do not exist will attempt to create the directory if set TRUE. Default value is FALSE.

Update(regex, [noregex,] action, commit)

The Update method makes the file updates requested - determined by the %con hash (from the New method) and four associated arguments.

This method also allows files to be tombstoned (ie removed from the replicated file sets). A file is tombstoned by appending .remove to the file name. The first Update will cause the file to be set to zero size, and any replica files to be renamed (so that the original file does not return). The next update after the ttl has expired will cause deletion of all file replicas.

If a directory is tombstoned (by adding .remove to its name) the directory and contents are removed and a file with the directory name and the .remove suffix replaces it. The file is removed as a normally tombstoned file. Note that tombstoning ignores the Update action qualifiers.

The Update method returns a reference to data structures evaluated during the method call. This is based on the method arguments, and allows arrays and hash's of the file structure meeting the selection criteria to be returned. See "EXAMPLES". Note that the aonly, bonly, amatch and bmatch array references, and the common hash reference all refer to the file structure state BEFORE the Update method makes any changes.

regex

A regular expression, used to match all file names that are to be maintained.

noregex

An optional regular expression used to match all files not to be maintained (ie excluded from the operation).

action

defines the action to be performed. Note that tombstoning activities ignore the action and assume the A<>B directive for those files and directories being tombstoned.

a>b

Files in the 'a' directory are to be replicated to the 'b' directory if a replica exists in 'b' directory and the timestamp is older than that of the file in the 'a' directory.

a<b

Files in the 'b' directory are to be replicated to the 'a' directory if a replica exists in 'a' directory and the timestamp is older than that of the file in the 'b' directory.

a<>b

Files in the 'a' directory are to be replicated to the 'b' directory if a replica exists in 'b' directory and the timestamp is older than that of the file in the 'a' directory. Files in the 'b' directory are to be replicated to the 'a' directory if a replica exists in 'a' directory and the timestamp is older than that of the file in the 'b' directory.

A>B

Files in the 'a' directory are to be replicated to the 'b' directory - even if no replica exists in 'b' directory. If a replica already exists in the 'b' directory with a timestamp that is newer than that of the file in the 'a' directory it is not modified.

A>B!

Files in the 'a' directory are to be replicated to the 'b' directory - even if no replica exists in 'b' directory. If a replica already exists in the 'b' directory with a timestamp that is newer than that of the file in the 'a' directory it is not modified. Orphan files in the 'b' directory are deleted.

A<B

Files in the 'b' directory are to be replicated to the 'a' directory - even if no replica exists in 'a' directory. If a replica already exists in the 'a' directory with a timestamp that is newer than that of the file in the 'b' directory it is not modified.

A<B!

Files in the 'b' directory are to be replicated to the 'a' directory - even if no replica exists in 'a' directory. If a replica already exists in the 'a' directory with a timestamp that is newer than that of the file in the 'b' directory it is not modified. Orphan files in the 'a' directory are deleted.

A<>B

Files in the 'a' directory are to be replicated to the 'b' directory - even if no replica exists in 'b' directory. If a replica already exists in the 'b' directory with a timestamp that is newer than that of the file in the 'a' directory it is not modified. Files in the 'b' directory are to be replicated to the 'a' directory - even if no replica exists in 'a' directory. If a replica already exists in the 'a' directory with a timestamp that is newer than that of the file in the 'b' directory it is not modified.

commit

When set TRUE makes changes required - set FALSE to show potential changes (which are printed to STDOUT)

Rename(regex, [noregex], namesub, commit)

The Rename method is used to rename files in the dira directory structure in the object specified in the New method.

regex

A regular expression, used to match all file names that are to be renamed.

noregex

An optional regular expression used to match all files not to be renamed (ie excluded from the operation).

namesub

The argument used for a perl substitution command is applied to the file name to create the file's new name.

e.g. /\.pl$/\.perl/

This examplewill rename all files (that meet regex and noregex criteria) from .pl to .perl

commit

When set TRUE makes renames required - set FALSE to show potential changes (which are printed to STDOUT)

Process

Not yet implemeneted

Compress

Not yet implemented

Delete(regex, [noregex], commit)

The Delete method removes files from the dira directory structure in the object specified in the New method.

regex

A regular expression, used to match all file names that are to be deleted.

noregex

An optional regular expression used to match all files not to be deleted (ie excluded from the operation).

commit

When set TRUE makes deletions required - set FALSE to show potential changes (which are printed to STDOUT)

Version

The Version method returns the File::Repl module version. No calling argument is necessary.

REQUIRED MODULES

  File::Find;
  File::Copy;
  File::Basename;

  Win32::API       (Win32 platforms only)

TIMEZONE AND FILESYSTEMS

On FAT filesystems, mtime resolution is 1/30th of a second. A fudge of 2 seconds is used for synching FAT with other filesystems. Note that FAT filesystems save the local time in UTC (GMT).

On FAT filesystems, "stat" adds TZ_BIAS to the actual file times (atime, ctime and mtime) and conversley "utime" subtracts TZ_BIAS from the supplied parameters before setting file times. To maintain FAT at UTC time, we need to do the opposite.

If we don't maintain FAT filesystems at UTC time and the repl is between FAT and NON-FAT systems, then all files will get replicated whenever the TZ or Daylight Savings Time changes.

EXAMPLES

A simple example that retrieves and prints the working variables from the Update method

  $ref=File::Repl->New(\%hash);
  $my=$ref->Update('.*','A>B',1);

  $sub = sub {  # simple sub that determines the reference type and prints the associated values
    my ($ref) =$_[0];
    if ( ref($ref) eq "SCALAR" ) {
      print "  SCALAR $ref\n";
    }elsif( ref($ref) eq "ARRAY" ) {
      print "  ARRAY";
      foreach (@$ref) {
        print "\t$_\n";
      }
    }elsif( ref($ref) eq "HASH" ) {
      print "  HASH ";
      foreach (keys %$ref) {
        print "\t$_ => $$ref{$_}\n";
      }
    }elsif( ref($ref) eq "REF" ) {
      &$sub($$ref);
    }else{
      print "  VALUE\t$ref\n";
    }
    print "\n";
  };
  foreach my $key (sort keys %$my) {
    print "$key:\n";
    &$sub($$my{$key});
  }

and a sample output

  References and values of $my
  amatch:
    ARRAY /a/b/c/d/e/dummy.c
          /a/b
          /a/b/c/d/e/bar.pl
          /a/b/c/d/e/ABCDE.XYZ
          /a
          /a/b/c/d/e/foo.tst
          /a/b/c/d
          /a/b/c/d/e
          /a/b/c

  aonly:
    ARRAY /a/b/c/d/e/foo.tst
          /a/b/c/d/e/dummy.c
          /a/b/c/d/e/ABCDE.XYZ

  bmatch:
    ARRAY /a/b
          /a/b/c/d/e/bar.pl
          /a
          /a/b/c/d
          /a/b/c/d/e
          /a/b/c

  bonly:
    ARRAY
  common:
    HASH  /a/b => /a/b
          /a/b/c/d/e/bar.pl => /a/b/c/d/e/bar.pl
          /a => /a
          /a/b/c/d => /a/b/c/d
          /a/b/c/d/e => /a/b/c/d/e
          /a/b/c => /a/b/c

The amatch and bmatch array references are those files and directory's in the adir and bdir structures that met the regex and negregex regular expression criteria. The aonly and bonly array references give those files and directories that exist only in that directory structure.

The common hash reference identifies those files and directories that exist in both dira and dirb directory structures. The key is for the dira, and value for dirb. Note that, depending on the nocase value the key and value may show differences in case on FAT and NTFS file systems.

A similar approach could be used to determine the referenced data from $ref. This would give access to

The following script can be called from a windows explorer prompt as the alternative to the windows delete function (the windows delete action might potentially be reversed by a replication). Obviously this will only function for file sand directories that are regularly synchronised using this module.

     use strict;
     use warnings;
     my(@files) = @ARGV;
     END {print "\nDONE -- PRESS ENTER\n";<STDIN>};
     print << "End_Of_Header";
     ================================================================================
     
     Executing $0\n
     
     This will mark files and/or directories for removal by the File::Repl file 
     synchronisation utility.
     
     Files will be set to zero size when first processed by the File::Repl module, 
     and finally removed after the tombstone period is expired.
     
     To reverse this process simply remove the added .remove file extension immediatly
     
     ================================================================================
     
     
     End_Of_Header
     
     foreach my $file (@files){
       print "$file\n";
       if ($file =~ m/\.remove$/){
          print "\t-is already marked for removal\n";
       }elsif (-f $file){
         unless (rename "$file","$file.remove"){
           print "Unable to rename $file to $file.remove\n";
         }else{
          print "\t-marked for removal\n";
       }
       }
       elsif (-d $file){
         unless (rename "$file","$file.remove"){
           print "Unable to rename $file to $file.remove\n";
         }else{
          print "\t-marked for removal\n";
       }
         }else{
         print "File $file not found !!\n";
       }
     }
alist (blist)

a hash of file names (the key) and values (mtime) of all files in the adir (or bdir) structure.

atype (btype)

a hash of file names (the key) and values (file mode - from a stat operation) of all files in the adir (or bdir) structure.

In addition the scalar values of various settings determined when the New method is called can be determined.

AUTHOR

Dave Roberts <droberts@cpan.org>

ACKNOWLEDGMENTS

Thanks to Nigel Hodgson for his many contributions in developing this utility and helpin understanding fiel system specifics.

SUPPORT

You can send bug reports and suggestions for improvements on this module to me at droberts@cpan.org. However, I can't promise to offer any other support for this package.

COPYRIGHT

This module is Copyright � 2000 to 2010 Dave Roberts. All rights reserved.

This script is free software; you can redistribute it and/or modify it under the same terms as Perl itself. This script is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. The copyright holder of this script can not be held liable for any general, special, incidental or consequential damages arising out of the use of the script.

CHANGE HISTORY

$Log: Repl.pm $ Revision 2.3 2015/11/03 18:07:45 Dave.Roberts interim release - corrections to help manage situations when a file and directory are compared

Revision 2.2 2015/11/01 22:36:38 Dave.Roberts removed Win32::Admin requirement from documentation

Revision 2.1 2015/07/15 20:51:29 Dave.Roberts added timestamp info for reading directories

Revision 2.0 2015/07/15 20:00:30 dave New major version, now with Win32::AdminMisc depandency removed as this module becomes more difficult to acquire and build for recent Perl releases

Revision 1.1 2015/07/15 19:58:06 Dave.Roberts Initial revision

Revision 1.31 2014/01/25 21:27:59 Dave.Roberts as advised from CPAN testing modified to include =encoding utf8 and escape the < and > characters in the pod with < and > respectively.

Revision 1.29 2010/05/04 15:02:05 Dave.Roberts corrected documentation - layout near Update method was incorrect

Revision 1.28 2010/04/27 14:55:00 Dave.Roberts minor code improvements in output messages for the Delete method

Revision 1.27 2010/04/13 08:36:52 Dave.Roberts added functionality for testing negative ages. This allows files older than the age specified to be selected (excluding all files younger)

Revision 1.26 2010/04/12 16:29:57 Dave.Roberts added Version method to return the File::Repl version corrected silly mistake in documentation - in definition of %con hash

Revision 1.25 2010/04/12 16:04:54 Dave.Roberts added example script for tombstoning removed windows linefeed characters from file

Revision 1.24 2010/04/07 02:00:11 Dave.Roberts modified code to remove the use of a hash as a reference - this was generating warnings as this use of a hash has beeen depreciated.

Revision 1.21 2002/02/07 10:37:39 Dave.Roberts corrected mode identified for Update method (the check used previously was invalid), and also synopsis for use of Update method (args incorrectly ordered)

Revision 1.20 2002/01/09 12:51:17 Dave.Roberts corrected errors in tombstoning of directories - subs $del and $deltree in particular

Revision 1.19 2001/11/21 21:28:19 Dave.Roberts resolved error in determining file age, especially when the 'a' file is missing evaluated the current time at start (set $runtime), and then removed many "time" calls

Revision 1.18 2001/08/22 07:10:41 Dave.Roberts logic change so that we don't use the Win32::API on win9x machines

Revision 1.17 2001/08/03 09:38:29 Dave.Roberts corrected code error (lines 572/3) where $$ was incorrectly used corrected code error (lines 572/3) where $$ was incorrectly used in truncation code

Revision 1.16 2001/08/02 22:09:02 Dave.Roberts corrected code for the Rename routine

Revision 1.15 2001/07/17 21:05:43 Dave.Roberts small changes to _arraysort - simplifying code

Revision 1.14 2001/07/12 21:51:50 jj768 additional documentation - and minor code changes

Revision 1.13 2001/07/12 15:18:43 Dave.Roberts code tidy up and reorganisation fixed logic errors (A>B! mode in Update method was not copying new files from A to B), also for A<B! removed several local variables and used referred object directly

Revision 1.12 2001/07/11 10:30:16 Dave.Roberts resolved various errors introduced in 1.11 - mainly associsated with reference errors rehacked fc subroutine - to give more logical messages still in need of more documentation - esp of object reference returned and associated variables

Revision 1.11 2001/07/06 14:52:53 jj768 double referencing of blessed object removed (from New method) and subsequent methods updated. Requires Testing. Update and other methods now return reference to data arrays and hashs evaluated during method call

Revision 1.10 2001/07/06 08:23:48 Dave.Roberts code changes to allow the colume info to be detected correctly using Win32::AdminMisc when a drive letter is specified (was only working with UNC names)

Revision 1.9 2001/06/27 13:35:53 Dave.Roberts minor presentation changes

Revision 1.8 2001/06/27 12:59:22 jj768 logic to prevent "Use of uninitialized value in pattern match (m//)" errors on use of $vol{FileSystemName}

Revision 1.6 2001/06/21 12:32:15 jj768

*** empty log message ***

Revision 1.5 2001/06/20 20:39:21 Dave.Roberts minor header changes

Revision 1.4 2001/06/20 19:55:21 jj768 re-built module source files as per perlmodnew manpage

Revision 1.1 2001/06/20 19:53:03 Dave.Roberts Initial revision

Revision 1.3.5.0 2001/06/19 10:34:11 jj768 Revised calling of the New method to use a hash reference, rather than a hash directly

Revision 1.3.4.0 2001/06/19 09:48:38 jj768 intermediate development revision. Introduced Delete method and the _generic subroutine (used for all methods except New) this is preparatory to the hash being passed as a reference

Revision 1.3.3.0 2001/06/14 15:42:48 jj768 minor code changes in constructing hash and improvement in documentation -still need more docs on Timezones.