NAME
Iterator::File -- A file iterator, optionally stateful and verbose.
SYNOPSIS
use
Iterator::File;
## Simplest form...
$i
= iterator_file(
'mydata.txt'
);
while
(
$i
++ ) {
&something_interesting
(
$i
);
}
## Disable auto-chomp, emit status, and allow us to resume if ^C...
$i
= iterator_file(
'mydata.txt'
,
'chomp'
=> 0,
'status'
=> 1,
'resume'
=> 1,
);
while
(
$i
++ ) {
&something_interesting
(
$i
);
}
## OO style...
$i
= iterator_file(
'mydata.txt'
);
while
(
$i
->
next
() ) {
&something_interesting
(
$i
->value() );
}
DESCRIPTION
Iterator_File
is an attempt to take some repetition & tedium out of processing a flat file. Whenever doing so, I found myself adapting prior scripts so that processes could be resumed, emit status, etc. Hence an itch (and this module) was born.
FUNCTIONS
- iterator_file($file, %config)
-
Returns an
Iterator::File
object. See%config
section below for additional information on options.
METHODS
- new(%config)
-
The constructor returns a new
Iterator::File
object, handling arugment defaults & validation, and automatically invokinginitialize
. - initialize()
-
Executes all startup work required before iteration. E.g., opening resources, detecting if a prior process terminated early & resuming, etc.
- next(), '++'
-
Increment the iterator & return the new value.
- value(), string context
-
Return the current value, without advancing.
- advance_to( $location )
-
Advance the iterator to $location. If $location is behind the current location, behavior is undefined. (I.e., don't do that.)
- finish()
-
Automatically invoked when the complete list is process. If the process dies before the last item of the list, this process is intentionally not invoked.
%config options
General
- chmop
-
Automatically chomp each line. Default: enabled.
- verbose
-
Enable verbose messaging for things such as temporary files. Default: disabled.
Note: for status messages, see
Status
below - debug
-
Enable debugging messages. It can also be enabled by setting the environmental variable ITERATOR_FILE_DEBUG to something true (to avoid modifying code to enable it). Default: disabled.
Resume
- resume
-
If enabled,
Iterator::File
will keep track of which lines you've seen, even between invokations. That way if you program unexpectedly dies (e.g., via a bug or ^C), you can pick up where you left off just by running your program again. Default: disabled. - repeat_on_resume
-
If enabled,
Iterator::File
will error on the side of giving you the same line twice between invocations. E.g., if your program were to be restarted after dieing on the 100th line,repeat_on_resume
would give you the 100th line on the 2nd invocation (verus the 101th). Default: disabled. - update_frequency
-
How often to update state. For very large data sets with light individual processing requirements, it may be worth setting to something other than 1. Default: 1.
- state_class
-
Options:
Iterator::File::State::TempFile
andIterator::File::State::IPCShareable
. TempFile is the default and in a lot of cases should be good enough. If you have philosophical objections to a frequently changing value living on disk (or a really, really slow disk), you can used shared memory via IPC::Sharable.
Status
- status_method
-
What algorithm to use to display status. Options are
emit_status_logarithmic
,emit_status_fixed_line_interval
, andemit_status_fixed_time_interval
.emit_status_fixed_time_interval
will display status logarithmically. I.e., 1, 2, 3 ... 9, 10, 20, 30 ... 90, 100, 200, 300 ... 900, 1000, 2000, etc.emit_status_fixed_line_interval
display status every X lines, where X is defined bystatus_line_interval
.emit_status_fixed_time_interval
display status every X lines, where X is defined bystatus_time_interval
.Default: emit_status_logarithmic.
- status_line_interval
-
If
status_method
isemit_status_fixed_line_interval
, controls how frequently to display status. Default: 10 (lines). - status_time_interval
-
If
status_method
isemit_status_time_line_interval
, controls how frequently to display status. Default: 2 (seconds). - status_filehandle
-
Filehandle to use for printing status. Default: STDERR.
- status_line
-
Format of status line. Default: "Processing row '%d'...\n".
BUGS & CAVEATS
Do not call chop or chomp on the iterator!! Unfortuntely, doing so destorys your object & leaves you with a plain ol' string. :(
SEE ALSO
Iterator::File
AUTHOR
William Reardon, <wdr1@pobox.com>
COPYRIGHT AND LICENSE
Copyright (C) 2008 by William Reardon
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 276:
You forgot a '=back' before '=head1'