NAME
Parallel::Batch - Run a large number of similar processes using bounded parallelism
SYNOPSIS
use Parallel::Batch;
my $batch = Parallel::Batch->new({code => \&frobnicate,
jobs => [ ... ],
maxprocs => 8});
$batch->run();
DESCRIPTION
Parallel::Batch solves a common problem allowing modern multi-CPU computers to be used efficiently: you have a large number of independent pieces of data that all need to be processed somehow, and can run several of these processes at the same time.
There are a few trivial ways to execute a large number of jobs. You could run the entire set serially, but this will not use all the available processing speed. You could also create n processes at once to run all jobs simultaneously, but this tends to quickly exhast other resources like memory and I/O bandwidth, making the entire process slower. Or you could divide the set into m equally-sized groups and have each processor run its subset serially, but this will usually waste time at the end if some jobs take longer than others to finish.
This module works by calling fork()
to create a new process, invoking a user-specified function on the next piece of data within this process, and returning once all data has been thusly processed and all processes exited. It also keeps track of the total number of jobs in progress, and will keep this under a set limit by delaying new forks until existing processes terminate.
CONSTRUCTOR
new
Options:
The following options can be passed to the constructor in a hashref, or retrieved or changed later using their own accessor methods
- code
-
coderef to be run on each piece of data. It will be passed a single argument, which is an element of the
jobs
array. - jobs
-
Array of data objects to be processed.
- maxprocs
-
Maximum number of child processes that should be running at any time.
- progress_cb
-
Hashref of progress callbacks
METHODS
run
Start running the jobs, and return once all are completed.
PROGRESS NOTIFICATION
Parallel::Batch can report its progress through applicaton-defined callbacks as it runs. If the progress_cb
argument is a hashref containing any of the following keys, they will be called at the places descibed:
- start
-
Will be called just before any processes are spawned.
- new
-
Will be called after each new process has been created.
- finish
-
Will be called when a child process exits.
- done
-
Will be called after all jobs are completed and all child processes have terminated.
SEE ALSO
Mention other useful documentation such as the documentation of related modules or operating system documentation (such as man pages in UNIX), or any relevant external documentation such as RFCs or standards.
If you have a mailing list set up for your module, mention it here.
If you have a web site set up for your module, mention it here.
AUTHOR
Stephen Cavilia, <sac@atomicradi.us<gt>
COPYRIGHT AND LICENSE
Copyright (C) 2011 by Stephen Cavilia
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.12.2 or, at your option, any later version of Perl 5 you may have available.