The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Dir::Split - Split files of a directory to subdirectories

SYNOPSIS

 use Dir::Split qw(split_dir);

 $return = split_dir(
     mode    =>    'num',

     source  =>    '/source',
     target  =>    '/target',

     options => {  verbose      =>        1,
                   override     =>        0,
     },
     sub_dir => {  identifier   =>    'sub',
                   file_limit   =>        2,
                   file_sort    =>      '+',
     },
     suffix  => {  separator    =>      '-',
                   continue     =>        1,
                   length       =>        5,
     },
 ); 

DESCRIPTION

Dir::Split moves files to either numbered or characteristic subdirectories.

numeric splitting

Numeric splitting is an attempt to gather files from a source directory and split them to numbered subdirectories within a target directory. Its purpose is to automate the archiving of a great amount of files, that are likely to be indexed by numbers.

characteristic splitting

Characteristic splitting allows indexing by using leading characters of filenames. While numeric splitting is being characterised by dividing file amounts, characteristic splitting tries to keep up the contentual recognition of data.

FUNCTIONS

split_dir

Splits files to subdirectories.

 $return = split_dir(
     mode    =>    'num',

     source  =>    '/source',
     target  =>    '/target',

     options => {  verbose      =>         1,
                   override     =>         0,
     },
     sub_dir => {  identifier   =>     'sub',
                   file_limit   =>         2,
                   file_sort    =>       '+',
     },
     suffix  => {  separator    =>       '-',
                   continue     =>         1,
                   length       =>         5,
     },
 ); 

It is of tremendous importance to notice that checking the return value is a must. Leaving the return code untouched will not allow appropriate gathering of harmless debug data (such as existing files) and system operations that failed. split_dir() does only report verbose output of mkpath to STDOUT. See OPTIONS / debug on how to become aware of existing files and failed system operations (copy & unlink).

RETURN VALUES

(1)

Files moved successfully.

(0)

No action.

(-1)

EXISTS.

(see OPTIONS / debug)

(-2)

FAILURE.

(see OPTIONS / debug)

OPTIONS

numeric

Split files to subdirectories with a numeric suffix.

 %options = (  
     mode    =>    'num',

     source  =>    '/source',
     target  =>    '/target',

     options => {  verbose     =>         1,
                   override    =>         0,
     },
     sub_dir => {  identifier  =>     'sub',
                   file_limit  =>         2,
                   file_sort   =>       '+',
     },
     suffix  => {  separator   =>       '-',
                   continue    =>         1,
                   length      =>         5,
     },
 );

options (mandatory)

  • mode

    num for numeric.

  • source

    source directory.

  • target

    target directory.

  • options / verbose

    If enabled, mkpath will output the pathes on creating subdirectories.

     MODES
       1  enabled
       0  disabled
  • options / override

    overriding of existing files.

     MODES
       1  enabled
       0  disabled
  • sub_dir / identifier

    prefix of each subdirectory created.

  • sub_dir / file_limit

    limit of files per subdirectory.

  • sub_dir / file_sort

    sort order of files.

     MODES
       +  ascending
       -  descending
  • suffix / separator

    suffix separator.

  • suffix / continue

    numbering continuation.

     MODES
       1  enabled
       0  disabled    (will start at 1)

    If numbering continuation is enabled, and numeric subdirectories are found within target directory which match the given identifier and separator, then the suffix numbering will be continued. Disabling numbering continuation may cause interfering with existing files.

  • suffix / length

    character length of the suffix.

    This option will have no effect if its smaller than the current length of the highest suffix number.

characteristic

Split files to subdirectories with a characteristic suffix. Files are assigned to subdirectories which suffixes equal the leading character (s) of their filenames.

 %options = (  
     mode    =>    'char',

     source  =>    '/source',
     target  =>    '/target',

     options => {  verbose     =>         1,
                   override    =>         0,
     },
     sub_dir => {  identifier  =>     'sub',
     },
     suffix  => {  separator   =>       '-',
                   case        =>   'upper',
                   length      =>         1,
     },
 );

options (mandatory)

  • mode

    char for characteristic.

  • source

    source directory.

  • target

    target directory.

  • options / verbose

    If enabled, mkpath will output the pathes on creating subdirectories.

     MODES
       1  enabled
       0  disabled
  • options / override

    overriding of existing files.

     MODES
       1  enabled
       0  disabled
  • sub_dir / identifier

    prefix of each subdirectory created.

  • suffix / separator

    suffix separator.

  • suffix / case

    lower/upper case of the suffix.

     MODES
       lower
       upper
  • suffix / length

    character length of the suffix.

    < 4 is highly recommended (26 (alphabet) ^ 3 == 17'576 suffix possibilites). Dir::Split will not prevent using suffix lengths greater than 3. Imagine splitting 1'000 files and using a character length > 20. The file rate per subdirectory will almost certainly approximate 1/1 - which equals 1'000 subdirectories.

    Whitespaces in suffixes will be removed.

tracking

%Dir::Split::track keeps count of how many files the source and directories / files the target consists of. It may prove its usefulness, if the amount of files that could not be transferred due to existing ones has to be counted. Each time a new splitting is attempted, the track will be reseted.

 %Dir::Split::track = (  
     source  =>    {  files  =>    512  
     },
     target  =>    {  dirs   =>    128,
                      files  =>    512,
     },
 );

Above example: directory consisting of 512 files successfully splitted to 128 directories.

debug

existing

If split_dir() returns a EXISTS, this implys that the override option is disabled and files weren't moved due to existing files within the target subdirectories; they will have their paths appearing in @Dir::Split::exists.

 file    @Dir::Split::exists    # existing files, not attempted to
                                # be overwritten.

failures

If split_dir() returns a FAILURE, this most often implys that the override option is enabled and existing files could not be overriden. Files that could not be copied / unlinked, will have their paths appearing in the according keys in %Dir::Split::failure.

 file    @{$Dir::Split::failure{copy}}      # files that couldn't be copied,
                                            # most often on overriding failures.

         @{$Dir::Split::failure{unlink}}    # files that could be copied but not unlinked,
                                            # rather seldom.

It is recommended to evaluate those arrays on FAILURE.

A @Dir::Split::exists array may coexist.

traversing

Traversal processing of files within the source directory may not be activated by passing an argument to the object constructor, it requires the following variable to be set to true:

 # traversal mode
 $Dir::Split::Traverse = 1;

No depth limit e.g. all underlying directories / files will be evaluated.

options

 # unlink files in source
 $Dir::Split::Traverse_unlink = 1;

Unlinks files after they have been moved to their new locations.

 # remove directories in source
 $Dir::Split::Traverse_rmdir = 1;

Removes the directories after the files have been moved. In order to take effect, this option requires the $Dir::Split::Traverse_unlink to be set.

It is not recommended to turn on the latter options $Dir::Split::Traverse_unlink and $Dir::Split::Traverse_rmdir, unless you're aware of the consequences they imply.

EXAMPLES

Assuming /source contains 5 files:

 +- _123
 +- abcd
 +- efgh
 +- ijkl
 +- mnop

After splitting the directory tree in /target will look as following:

numeric splitting

 +- system-00001
 +-- _123
 +-- abcd
 +- system-00002
 +-- efgh
 +-- ijkl
 +- system-00003
 +-- mnop

characteristic splitting

 +- system-_
 +-- _123
 +- system-a
 +-- abcd
 +- system-e
 +-- efgh
 +- system-i
 +-- ijkl
 +- system-m
 +-- mnop

EXPORT

split_dir() is exportable.

SEE ALSO

File::Basename, File::Copy, File::Find, File::Path, File::Spec

20 POD Errors

The following errors were encountered while parsing the POD:

Around line 514:

Expected '=item *'

Around line 518:

Expected '=item *'

Around line 522:

Expected '=item *'

Around line 526:

Expected '=item *'

Around line 535:

Expected '=item *'

Around line 543:

Expected '=item *'

Around line 547:

Expected '=item *'

Around line 551:

Expected '=item *'

Around line 559:

Expected '=item *'

Around line 563:

Expected '=item *'

Around line 576:

Expected '=item *'

Around line 614:

Expected '=item *'

Around line 618:

Expected '=item *'

Around line 622:

Expected '=item *'

Around line 626:

Expected '=item *'

Around line 635:

Expected '=item *'

Around line 643:

Expected '=item *'

Around line 647:

Expected '=item *'

Around line 651:

Expected '=item *'

Around line 659:

Expected '=item *'