The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

File::Split

SYNOPSIS

 Splits files.

 my $fs = File::Split->new({keepSource=>'1'});

 my $files_out = $fs->split_file({'parts' => 10},'filepath');

 Creates ten files named 'filepath.1','filepath.2',...,'filepath.10'.



 =head1 DESCRIPTION

File::Split defaults to removing the now-split file.

 my $fs = File::Split->new({keepSource=>'1'});

Split the file into ten equal-sized parts called filepath.1,filepath.2,...

 my $files_out = $fs->split_file({'parts' => 10},'filepath');

Split the file into multiple parts with a size of 1000 lines or less.

 my $files_out = $fs->split_file({'lines' => 1000},'filepath');

Split files into sub-sections based on a substring value. Gives filepath.MB, filepath.SK

 my $files_out = $fs->split_file({'substr'=>{pos=>'10000',val=>['MB','SK']}},'filepath');

Split file based on regular expressions grouped in a hash of arrays of regular expressions. Gives files filepath.BC, filepath.AB,...

 my $files_out = $fs->split_file({'grep'=>{
                                    'BC'=>['\t(V\d\C\d\C\d)\t'],
                                    'AB'=>['\t(T\d\C\d\C\d)\t'],
                                    'SK'=>['\t(S\d\C\d\C\d)\t'],
                                    'MB'=>['\t(R\d\C\d\C\d)\t'],
                                    'ON'=>['\t(P\d\C\d\C\d)\t','\t(N\d\C\d\C\d)\t','\t(M\d\C\d\C\d)\t','\t(L\d\C\d\C\d)\t','\t(K\d\C\d\C\d)\t'],
                                    'QC'=>['\t(G\d\C\d\C\d)\t','\t(H\d\C\d\C\d)\t','\t(J\d\C\d\C\d)\t','\t(K\d\C\d\C\d)\t','\t(S\d\C\d\C\d)\t'],
                                    'NS'=>['\t(B\d\C\d\C\d)\t'],
                                    'NB'=>['\t(E\d\C\d\C\d)\t'],
                                    'PE'=>['\t(C\d\C\d\C\d)\t'],
                                    'NL'=>['\t(A\d\C\d\C\d)\t'],
                                    'NT'=>['\t(X\d\C\d\C\d)\t'],
                                    'NU'=>[],
                                    'YT'=>['\t(Y\d\C\d\C\d)\t'],
                                        }
                                },'dat/zip411Bus040710.TXT');

Split file on array of regular expressions. filename extensions are based on the matched value.

 $files_out = $fs->split_file({'grep'=>['\t(MB)\t','\t(SK)\t','\t(NB)\t','\t(NL)\t','\t(NT)\t','\t(NS)\t','\t(YT)\t','\t(PE)\t','\t(NU)\t','\t(BC)\t','\t(ON)\t','\t(AB)\t','\t(QC)\t']},'dat/zip411Bus041013.TXT');

Merge any file that matches 'filepath_for_reconstructed_file*'

 my $out_name = $fs->merge_file('filepath_for_reconstructed_file');

 

CAVEATS

This script isn't fully mature, and interfaces may change.

File::Split will create empty files if you split an empty file. If you request five parts, you will receive five parts.

File::Split will return undef if you try to split a non-existant file.

AUTHOR

Phil Middleton