The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Thread::Pool - group of threads for performing similar jobs

SYNOPSIS

 use Thread::Pool;
 $pool = Thread::Pool->new(
  {
   autoshutdown => 1, # default: 1 = yes
   workers => 5,      # default: 1
   pre => sub {shift; print "starting worker with @_\n",
   do => sub {shift; print "doing job for @_\n"; reverse @_},
   post => sub {shift; print "stopping worker with @_\n",
   stream => sub {shift; print "streamline with @_\n",
   monitor => sub { print "monitor with @_\n",
  },
  qw(a b c)           # parameters to "pre" subroutine
 );

 $pool->job( qw(d e f) );              # not interested in result

 $jobid = $pool->job( qw(g h i) );
 @result = $pool->result( $jobid );    # wait for result to be ready
 print "Result is @result\n";

 $jobid = $pool->job( qw(j k l) );
 @result = $pool->result_dontwait( $jobid ); # do _not_ wait for result
 print "Result is @result\n";          # may be empty when not ready yet

 @result = $pool->waitfor( qw(m n o) ); # submit and wait for result
 print "Result is @result\n";

 $pool->add;           # add worker(s)
 $pool->remove;        # remove worker(s)
 $pool->workers( 10 ); # set number of workers
 $pool->join;          # wait for all removed worker threads to finish

 $workers = $pool->workers; 
 $todo    = $pool->todo;
 $removed = $pool->removed;
 print "$workers workers, $todo jobs todo, $removed workers removed\n";

 $pool->autoshutdown( 1 ); # shutdown when object is destroyed
 $pool->shutdown;          # wait until all jobs done
 $pool->abort;             # finish current job and remove all workers

 $done    = $pool->done;   # simple thread-use statistics
 $notused = $pool->notused;

 Thread::Pool->remove_me;  # inside "do" only

DESCRIPTION

                    *** A note of CAUTION ***

 This module only functions on Perl versions 5.8.0-RC3 and later.
 And then only when threads are enabled with -Dusethreads.  It is
 of no use with any version of Perl before 5.8.0-RC3 or without
 threads enabled.

                    *************************
The Thread::Pool allows you to set up a group of (worker) threads to execute
a (large) number of similar jobs that need to be executed asynchronously.  The
routine that actually performs the job (the "do" routine), must be specified
as a name or a reference to a (anonymous) subroutine.

Once a pool is created, jobs can be executed at will and will be assigned to the next available worker. If the result of the job is important, a job ID is issued. The job ID can then later be used to obtain the result.

Initialization parameters can be passed during the creation of the threads::farm object. The initialization ("pre") routine can be specified as a name or as a reference to a (anonymous) subroutine. The "pre" routine can e.g. be used to create a connection to an external source using a non-threadsafe library.

When a worker is told to finish, the "post" routine is executed if available.

Results of jobs must be obtained seperately, unless a "stream" or a "monitor" routine is specified. Then the result of each job will be streamed to the "stream" or "monitor" routine in the order in which the jobs were submitted.

Unless told otherwise, all jobs that are assigned, will be executed before the pool is allowed to be destroyed. If a "stream" or "monitor" routine is specified, then all results will be handled by that routine before the pool is allowed to be destroyed.

CLASS METHODS

The following class methods are available.

new

 $pool = Thread::Pool->new(
  {
   do => sub { print "doing with @_\n" },        # must have
   pre => sub { print "starting with @_\n",      # default: none
   post => sub { print "stopping with @_\n",     # default: none

   workers => 5,      # default: 1
   autoshutdown => 1, # default: 1 = yes

   stream => sub { print "streamline with @_\n", # default: none
   monitor => sub { print "monitor with @_\n",   # default: none
  },

  qw(a b c)           # parameters to "pre" routine

 );

The "new" method returns the Thread::Pool object.

The first input parameter is a reference to a hash that should at least contain the "do" key with a subroutine reference.

The other input parameters are optional. If specified, they are passed to the the "pre" subroutine whenever a new worker is added.

Each time a worker thread is added, the "pre" subroutine (if available) will be called inside the thread. Each time a worker thread is removed, the "post" routine is called. Its return value(s) are saved only if a job ID was requested when removing the thread. Then the result method can be called to obtain the results of the "post" subroutine.

The following field must be specified in the hash reference:

do
 do => 'do_the_job',            # assume caller's namespace

or:

 do => 'Package::do_the_job',

or:

 do => \&SomeOther::do_the_job,

or:

 do => sub {print "anonymous sub doing the job\n"},

The "do" field specifies the subroutine to be executed for each job. It must be specified as either the name of a subroutine or as a reference to a (anonymous) subroutine.

The specified subroutine should expect the following parameters to be passed:

 1..N  any parameters that were passed with the call to L<job>.

Any values that are returned by this subroutine after finishing each job, are accessible with result if a job ID was requested when assigning the job.

The following fields are optional in the hash reference:

pre
 pre => 'prepare_jobs',         # assume caller's namespace

or:

 pre => 'Package::prepare_jobs',

or:

 pre => \&SomeOther::prepare_jobs,

or:

 pre => sub {print "anonymous sub preparing the jobs\n"},

The "pre" field specifies the subroutine to be executed each time a new worker thread is started (either when starting the pool, or when new worker threads are added with a call to either add or workers) and once when a "monitor" routine is specified. It must be specified as either the name of a subroutine or as a reference to a (anonymous) subroutine.

The specified subroutine should expect the following parameters to be passed:

 1..N  any additional parameters that were passed with the call to L<new>.

You can determine whether the "pre" routine is called for a new worker thread or for a monitoring thread by checking caller() inside the "pre" routine.

post
 post => 'cleanup_after_worker',        # assume caller's namespace

or:

 post => 'Package::cleanup_after_worker',

or:

 post => \&SomeOther::cleanup_after_worker,

or:

 post => sub {print "anonymous sub cleaning up after the worker removed\n"},

The "post" field specifies the subroutine to be executed each time a worker thread is removed (either when being specifically removed, or when the pool is shutdown specifically or implicitely when the Thread::Pool object is destroyed. It must be specified as either the name of a subroutine or as a reference to a (anonymous) subroutine.

The specified subroutine should expect the following parameters to be passed:

 1..N  any additional parameters that were passed with the call to L<new>.

Any values that are returned by this subroutine after closing down the thread, are accessible with the result method, but only if the thread was removed and a job ID was requested.

workers
 workers => 5, # default: 1

The "workers" field specifies the number of worker threads that should be created when the pool is created. If no "workers" field is specified, then only one worker thread will be created. The workers method can be used to change the number of workers later.

autoshutdown
 autoshutdown => 0, # default: 1

The "autoshutdown" field specified whether the shutdown method should be called when the object is destroyed. By default, this flag is set to 1 indicating that the shutdown method should be called when the object is being destroyed. Setting the flag to a false value, will cause the shutdown method not to be called, causing potential loss of data and error messages when threads are not finished when the program exits.

The setting of the flag can be later changed by calling the autoshutdown method.

stream
 stream => 'in_order_of_submit',        # assume caller's namespace

or:

 stream => 'Package::in_order_of_submit',

or:

 stream => \&SomeOther::in_order_of_submit,

or:

 stream => sub {print "anonymous sub called in order of submit\n"},

The "stream" field specifies the subroutine to be executed for streaming the results of the "do" routine. If specified, the "stream" routine is called once for the result of each "do" subroutine, but in the order in which the jobs were submitted rather than in the order in which the result were obtained (which is by the very nature of threads, indeterminate).

The specified subroutine should expect the following parameters to be passed:

 1     the Thread::Pool object to which the worker thread belongs.
 2..N  the values that were returned by the "do" subroutine

The "stream" routine is executed in any of the threads that are created for the Thread::Pool object. The system attempts to call the "stream" routine in the same thread from which the values are obtained, but when things get out of sync, other threads may stream the result of a job. If you want only one thread to stream all results, use the "monitor" routine.

monitor
 monitor => 'in_order_of_submit',       # assume caller's namespace

or:

 monitor => 'Package::in_order_of_submit',

or:

 monitor => \&SomeOther::in_order_of_submit,

or:

 monitor => sub {print "anonymous sub called in order of submit\n"},

The "monitor" field specifies the subroutine to be executed for monitoring the results of the "do" routine. If specified, the "monitor" routine is called once for the result of each "do" subroutine, but in the order in which the jobs were submitted rather than in the order in which the result were obtained (which is by the very nature of threads, indeterminate).

The specified subroutine should expect the following parameters to be passed:

 1..N  the values that were returned by the "do" subroutine

To be able to use this function, the Thread::Queue::Any::Monitored module must also be available. It will be loaded automatically if it has not been used yet.

The "monitor" routine is executed in its own thread. This means that all results have to be passed between threads, and therefore be frozen and thawed with Storable. If you can handle the streaming from different threads, it is probably wiser to use the "stream" routine feature.

POOL METHODS

The following methods can be executed on the Thread::Pool object.

job

 $jobid = $pool->job( @parameter );     # saves result
 $pool->job( @parameter );              # does not save result

The "job" method specifies a job to be executed by any of the available workers. Which worker will execute the job, is indeterminate. When it will happen, depends on the number of jobs that still have to be done when this job was submitted.

The input parameters are passed to the "do" subroutine as is.

If a return value is requested, then the return value(s) of the "do" subroutine will be saved. The returned value is a job ID that should be used as the input parameter to result or result_dontwait.

waitfor

 @result = $pool->waitfor( @parameter ); # submit job and wait for result

The "waitfor" method specifies a job to be executed, wait for the result to become ready and return the result. It is in fact a shortcut for using job and result.

The input parameters are passed to the "do" subroutine as is.

The return value(s) are what was returned by the "do" routine. The meaning of the return value(s) is entirely up to you as the developer.

result

 @result = $pool->result( $jobid );

The "result" method waits for the specified job to be finished and returns the result of that job.

The input parameter is the job id as returned from the job assignment.

The return value(s) are what was returned by the "do" routine. The meaning of the return value(s) is entirely up to you as the developer.

If you don't want to wait for the job to be finished, but just want to see if there is a result already, use the result_dontwait method.

result_dontwait

 @result = $pool->result_dontwait( $jobid );

The "result_dontwait" method returns the result of the job if it is available. If the job is not finished yet, it will return undef in scalar context or the empty list in list context.

The input parameter is the job id as returned from the job assignment.

If the result of the job is available, then the return value(s) are what was returned by the "do" routine. The meaning of the return value(s) is entirely up to you as the developer.

If you want to wait for the job to be finished, use the result method.

todo

 $todo = $pool->todo;

The "todo" method returns the number of jobs that are still left to be done.

add

 $tid = $pool->add;             # add 1 worker thread
 @tid = $pool->add( 5 );

The "add" method adds the specified number of worker threads to the pool and returns the thread ID's (tid) of the threads that were created.

The input parameter specifies the number of workers to be added. If no number of workers is specified, then 1 worker thread will be added.

In scalar context, returns the thread ID (tid) of the first worker thread that was added. This usually only makes sense if you're adding only one worker thread.

In list context, returns the thread ID's (tid) of the worker threads that were created.

Each time a worker thread is added, the "pre" routine (if available) will be called inside the thread.

remove

 $pool->remove;                 # remove 1 worker thread
 $pool->remove( 5 );            # remove 5 worker threads

 $jobid = $pool->remove;        # remove 1 worker thread, save result
 @jobid = $pool->remove( 5 );   # remove 5 worker threads, save results

The "remove" method adds the specified number of special "remove" job to the lists of jobs to be done. It will return the job ID's if called in a non-void context.

The input parameter specifies the number of workers to be removed. If no number of workers is specified, then 1 worker thread will be removed.

In void context, the results of the execution of the "post" subroutine(s) is discarded.

In scalar context, returns the job ID of the result of the first worker thread that was removed. This usually only makes sense if you're removing only one worker thread.

In list context, returns the job ID's of the result of all the worker threads that were removed.

Each time a worker thread is removed, the "post" routine is called. Its return value(s) are saved only if a job ID was requested when removing the thread. Then the result method can be called to obtain the results of the "post" subroutine.

workers

 $workers = $pool->workers;     # find out number of worker threads
 $pool->workers( 10 );          # set number of worker threads

The "workers" method can be used to find out how many worker threads there are currently available, or it can be used to set the number of worker threads.

The input value, if specified, specifies the number of worker threads that should be available. If there are more worker threads available than the number specified, then superfluous worker threads will be removed. If there are not enough worker threads available, new worker threads will be added.

The return value is the current number of worker threads.

join

 $pool->join;

The "join" method waits until all of the worker threads that have been removed have finished their jobs. It basically cleans up the threads that are not needed anymore.

The "shutdown" method call the "join" method after removing all the active worker threads. You therefore seldom need to call the "join" method seperately.

removed

 $removed = $pool->removed;

The "removed" method returns the number of worker threads that were removed over the lifetime of the object.

autoshutdown

 $pool->autoshutdown( 1 );
 $autoshutdown = $pool->autoshutdown;

The "autoshutdown" method sets and/or returns the flag indicating whether an automatic shutdown should be performed when the object is destroyed.

shutdown

 $pool->shutdown;

The "shutdown" method waits for all jobs to be executed, removes all worker threads, handles any results that still need to be streamed, before it returns. Call the abort method if you do not want to wait until all jobs have been executed.

It is called automatically when the object is destroyed, unless specifically disabled by providing a false value with the "autoshutdown" field when creating the pool with new, or by calling the autoshutdown method.

Please note that the "shutdown" method does not disable anything. It just shuts all of the worker threads down. After a shutdown it is possible to add jobs, but they won't get done until workers are added.

abort

The "abort" method waits for all worker threads to finish their current job, removes all worker threads, before it returns. Call the shutdown method if you want to wait until all jobs have been done.

Please note that the "abort" method does not disable anything. It just shuts all of the worker threads down. After an abort it is possible to add jobs, but they won't get done until workers are added.

Also note that any streamed results are not handled. If you want to handle any streamed results, you can call the shutdown method after calling the "abort" method.

done

 $done = $pool->done;

The "done" method returns the number of jobs that has been performed by the removed worker threads of the pool.

The "done" method is typically called after the shutdown method has been called.

notused

 $notused = $pool->notused;

The "notused" method returns the number of removed threads that have not performed any jobs. It provides a heuristic to determine how many workers you actually need for a specific application: a value > 0 indicates that you have specified too many worker threads for this application.

The "notused" method is typically called after the shutdown method has been called.

INSIDE JOB METHODS

The following methods only make sense inside the "pre", "do", "post", "stream" and "monitor" routines.

self

 $pool = Thread::Pool->self;

The class method "self" returns the object to which this thread belongs. It is available within the "pre", "do", "post", "stream" and "monitor" subroutines only.

remove_me

 Thread::Pool->remove_me;

The "remove_me" class method only makes sense within the "do" subroutine. It indicates to the job dispatcher that this worker thread should be removed from the pool. After the "do" subroutine returns, the worker thread will be removed.

jobid

 Thread::Pool->jobid;

The "jobid" class method only makes sense within the "do" subroutine in streaming mode. It returns the job ID value of the current job. This can be used connection with the dont_set_result and the set_result methods to have another thread set the result of the current job.

dont_set_result

 Thread::Pool->dont_set_result;

The "dont_set_result" class method only makes sense within the "do" subroutine. It indicates to the job dispatcher that the result of this job should not be saved. This is for cases where the result of this job will be placed in the result hash at some time in the future by another thread using the set_result method.

set_result

 Thread::Pool->self->set_result( $jobid,@param );

The "set_result" object method only makes sense within the "do" subroutine. It allows you to set the result of other jobs than the one currently being performed.

This method is only needed in very special situations. Normally, just returning values from the "do" subroutine is enough to have the result saved. This method is exposed to the outside world in those cases where a specific thread becomes responsible for setting the result of other threads (which used the dont_set_result method to defer saving their result.

The first input parameter specifies the job ID of the job for which to set the result. The rest of the input parameters is considered to be the result to be saved. Whatever is specified in the rest of the input parameters, will be returned with the result or result_dontwait methods.

CAVEATS

Passing unshared values between threads is accomplished by serializing the specified values using Storable. This allows for great flexibility at the expense of more CPU usage. It also limits what can be passed, as e.g. code references can not be serialized and therefore not be passed.

EXAMPLES

For now the only examples available, are those found in the "t" directory.

AUTHOR

Elizabeth Mattijsen, <liz@dijkmat.nl>.

Please report bugs to <perlbugs@dijkmat.nl>.

COPYRIGHT

Copyright (c) 2002 Elizabeth Mattijsen <liz@dijkmat.nl>. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO

threads, Thread::Queue::Any, Storable.