Text::Parts - split text file to some parts(from one line start to another/same line end)
If you want to split Text file to some number of parts:
use Text::Parts; my $splitter = Text::Parts->new(file => $file); my (@parts) = $splitter->split(num => 4); foreach my $part (@parts) { while(my $l = $part->getline) { # or <$part> # ... } }
If you want to split Text file by about specified size:
use Text::Parts; my $splitter = Text::Parts->new(file => $file); my (@parts) = $splitter->split(size => 10); # size of part will be more than 10. # same as the previous example
If you want to split CSV file:
use Text::Parts; use Text::CSV_XS; # don't work with Text::CSV_PP if you want to use {binary => 1} option my $csv = Text::CSV_XS->new(); my $splitter = Text::Parts->new(file => $file, parser => $csv); my (@parts) = $splitter->split(num => 4); foreach my $part (@parts) { while(my $col = $part->getline_parser) { # getline_parser returns parsed result print join "\t", @$col; # ... } }
with Parallel::ForkManager:
my $splitter = Text::Parts->new(file => $file); my (@parts) = $splitter->split(num => 4); my $pm = new Parallel::ForkManager(4); foreach my $part (@parts) { $pm->start and next; # do the fork while (my $l = $part->getline) { # ... } } $pm->wait_all_children;
This moudle splits file by specified number of part. The range of each part is from one line start to another/same line end. For example, file content is the following:
1111 22222222222222222222 3333 4444
If $splitter->split(num => 3), split like the following:
$splitter->split(num => 3)
1st part: 1111 22222222222222222222
2nd part: 3333
3rd part: 4444
At first, split method trys to split by bytes of file size / 3, Secondly, trys to split by bytes of rest file size / the number of rest part. So that:
split
1st part : 36 bytes / 3 = 12 byte + bytes to line end(if needed) 2nd part : (36 - 26 bytes) / 2 = 5 byte + bytes to line end(if needed) last part: rest part of file
$s = Text::Parts->new(file => $filename); $s = Text::Parts->new(file => $filename, parser => Text::CSV_XS->new({binary => 1}));
Constructor. It can take following optins:
number how many you want to split.
file size how much you want to split. This value is used for calucurating num. If file size is 100 and this value is 25, num is 4.
num
target file which you want to split.
Pass parser object(like Text::CSV_XS->new()). The object must have method which takes filehandle and whose name is getline as default. If the object's method is different name, pass the name to parser_method option.
getline
parser_method
name of parser's method. default is getline.
If this options is true, check line start and move to this position before <$fh> or parser's getline/parser_method. It may be useful when parser's getline/parser_method method doesn't work correctly when parsing wrong format.
<$fh>
default value is 0.
my $file = $s->file; $s->file($filename);
get/set target file.
my $parser_object = $s->parser; $s->parser($parser_object);
get/set paresr object.
my $method = $s->parser_method; $s->parser_method($method);
get/set paresr method.
my @parts = $s->split(num => $num); my @parts = $s->split(size => $size);
Try to split target file to $num of parts. The returned value is array of Text::Parts::Part object. If you pass size => bytes, calcurate $num from file size / $size.
$num
size => bytes
$size
This returns array of Text::Parts::Part object. See "Text::Parts::Part METHODS".
This method doesn't actually split file, only calcurate the start and end poition of parts.
my $eol = $s->eol; $s->eol($eol);
get/set end of line string. default value is $/.
@filenames = $part->write_files(name_format => 'path/to/name%d.txt', num => 4);
name_format is the format of filename. %d is replaced by number. For example:
name_format
path/to/name1.txt path/to/name2.txt path/to/name3.txt path/to/name4.txt
The rest of arguments are as same as split.
Text::Parts::Part objects are returned by split method.
my $line = $part->getline;
return 1 line. You can use <$part>, also.
<$part>
my $line = <$part>
my $parsed = $part->getline_parser;
returns parsed result.
my $all = $part->all; $part->all(\$all);
return all of the part. just read from start to end position.
read
If scalar reference is passed as argument, the content of the part is into the passed scalar.
$part->eof;
If current position is the end of parts, return true.
$part->write_file($filename);
Write the contents of the part to $filename.
Ktat, <ktat at cpan.org>
<ktat at cpan.org>
Please report any bugs or feature requests to bug-text-parts at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Text-Parts. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
bug-text-parts at rt.cpan.org
You can find documentation for this module with the perldoc command.
perldoc Text::Parts
You can also look for information at:
RT: CPAN's request tracker
http://rt.cpan.org/NoAuth/Bugs.html?Dist=Text-Parts
AnnoCPAN: Annotated CPAN documentation
http://annocpan.org/dist/Text-Parts
CPAN Ratings
http://cpanratings.perl.org/d/Text-Parts
Search CPAN
http://search.cpan.org/dist/Text-Parts/
Copyright 2011 Ktat.
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See http://dev.perl.org/licenses/ for more information.
To install Text::Parts, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Text::Parts
CPAN shell
perl -MCPAN -e shell install Text::Parts
For more information on module installation, please visit the detailed CPAN module installation guide.