The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

IPC::Exe - Execute processes or Perl subroutines & string them via IPC. Think shell pipes.

SYNOPSIS

  use IPC::Exe qw(exe bg);

  my @pids = &{
         exe qw( ls  /tmp  a.txt ), \"2>#",
      bg exe qw( sort -r ),
         exe sub { print "[", shift, "] 2nd cmd: @_\n"; print "three> $_" while <STDIN> },
      bg exe 'sort',
         exe "cat", "-n",
         exe sub { print "six> $_" while <STDIN>; print "[", shift, "] 5th cmd: @_\n" },
  };

is like doing the following in a modern Unix shell:

  ls /tmp a.txt 2> /dev/null | { sort -r | [perlsub] | { sort | cat -n | [perlsub] } & } &

except that [perlsub] is really a perl child process with access to main program variables in scope.

DESCRIPTION

This module was written to provide a secure and highly flexible way to execute external programs with an intuitive syntax. In addition, more info is returned with each string of executions, such as the list of PIDs and $? of the last external pipe process (see "RETURN VALUES"). Execution uses exec command, and the shell is never invoked.

The two exported subroutines perform all the heavy lifting of forking and executing processes. In particular, exe( ) implements the KID_TO_READ version of

  http://perldoc.perl.org/perlipc.html#Safe-Pipe-Opens

while bg( ) implements the double-fork technique illustrated at

  http://perldoc.perl.org/perlfaq8.html#How-do-I-start-a-process-in-the-background?

EXAMPLES

Let's dive right away into some examples. To begin:

  my $exit = system( "myprog $arg1 $arg2" );

can be replaced with

  my $exit = &{ exe 'myprog', $arg1, $arg2 };

exe( ) returns a LIST of PIDs, the last item of which is $? (of default &READER). To get the actual exit value $exitval, shift right by eight $? >> 8.

Extending the previous example,

  my $exit = system( "myprog $arg1 $arg2 $arg3 > out.txt" );

can be replaced with

  my $exit = &{ exe 'myprog', $arg1, $arg2, [ '>', 'out.txt' ] };

The previous two examples will wait for 'myprog' to finish executing before continuing the main program.

Extending the previous example again,

  # cannot obtain $exit of 'myprog' because it is in background
  system( "myprog $arg1 $arg2 $arg3 > out.txt &" );

can be replaced with

  # just add 'bg' before 'exe' in previous example
  my $bg_pid = &{ bg exe 'myprog', $arg1, $arg2, [ '>', 'out.txt' ] };

Now, 'myprog' will be put in background and the main program will continue without waiting.

To monitor the exit value of a background process:

  my $bg_pid = &{
      bg sub {
             # same as 2nd previous example
             my ($pid) = &{
                 exe 'myprog', $arg1, $arg2, [ '>', 'out.txt' ]
             };

             # check if exe() was successful
             defined($pid) or die("Failed to run process in background");

             # handle exit value here
             print STDERR "background exit value: " . ($? >> 8) . "\n";
         }
  } or die("Failed to send process to background");

Instead of using backquotes or qx( ),

  # slurps entire STDOUT into memory
  my @stdout = `$program @ARGV`;

  # handle STDOUT here
  for my $line (@stdout)
  {
      print "read_in> $line";
  }

we can read the STDOUT of one process with:

  my ($pid) = &{
      # execute $program with arguments
      exe $program, @ARGV,

      # handle STDOUT here
      sub {
          while (my $line = <STDIN>)
          {
              print "read_in> $line";
          }

          # set exit status of main program
          waitpid($_[0], 0);
      },
  };

  # check if exe() was successful
  defined($pid) or die("Failed to run process");

  # exit value of $program
  my $exitval = $? >> 8;

Perform tar copy of an entire directory:

  use Cwd qw(chdir);

  my @pids = &{
      exe sub { chdir $source_dir or die $! }, qw(/bin/tar  cf - .),
      exe sub { chdir $target_dir or die $! }, qw(/bin/tar xBf -),
  };

  # check if exe()'s were successful
  defined($pids[0]) && defined($pids[1])
    or die("Failed to run processes");

  # was un-tar successful?
  my $error = pop(@pids);

Here is an elaborate example to pipe STDOUT of one process to the STDIN of another, consecutively:

  my @pids = &{
      # redirect STDERR to STDOUT
      exe $program, @ARGV, \"2>&1",

      # 'perl' receives STDOUT of $program via STDIN
      exe sub {
              my ($pid) = &{
                  exe qw(perl -e), 'print "read_in> $_" while <STDIN>; exit 123',
              };

              # check if exe() was successful
              defined($pid) or die("Failed to run process");

              # handle exit value here
              print STDERR "in-between exit value: " . ($? >> 8) . "\n";

              # this is executed in child process
              # no need to return
          },

      # 'sort' receives STDOUT of 'perl'
      exe qw(sort -n),

      # [perlsub] receives STDOUT of 'sort'
      exe sub {
              # find out command of previous pipe process
              # if @_[1..$#_] is an empty list, previous process was a [perlsub]
              my ($child_pid, $prog, @args) = @_;

              # output: "last_pipe[12345]> sort -n"
              print STDERR "last_pipe[$child_pid]> $prog @args\n";

              # print sorted, 'perl' filtered, output of $program
              print while <STDIN>;

              # find out exit value of previous 'sort' pipe process
              waitpid($_[0], 0);
              warn("Bad exit for: @_\n") if $?;

              return $?;
          },
  };

  # check if exe()'s were successful
  defined($pids[0]) && defined($pids[1]) && defined($pids[2])
    or die("Failed to run processes");

  # obtain exit value of last process on pipeline
  my $exitval = pop(@pids) >> 8;

Shown below is an example of how to capture STDERR and STDOUT after sending some input to STDIN of the child process:

  # reap child processes 'xargs' when done
  local $SIG{CHLD} = 'IGNORE';

  # like IPC::Open3; filehandles are returned on-the-fly
  my ($pid, $TO_STDIN, $FROM_STDOUT, $FROM_STDERR) = &{
      exe +{ stdin => 1, stdout => 1, stderr => 1 }, qw(xargs ls -ld),
  };

  # check if exe() was successful
  defined($pid) or die("Failed to run process");

  # ask 'xargs' to 'ls -ld' three files
  print $TO_STDIN "/bin\n";
  print $TO_STDIN "does_not_exist\n";
  print $TO_STDIN "/etc\n";

  # cause 'xargs' to flush its stdout
  close($TO_STDIN);

  # print captured outputs
  print "stderr> $_" while <$FROM_STDERR>;
  print "stdout> $_" while <$FROM_STDOUT>;

  # close filehandles
  close($FROM_STDOUT);
  close($FROM_STDERR);

Of course, more exe( ) calls may be chained together as needed:

  # reap child processes 'xargs' when done
  local $SIG{CHLD} = 'IGNORE';

  # like IPC::Open2; filehandles are returned on-the-fly
  my ($pid1, $TO_STDIN, $pid2, $FROM_STDOUT) = &{
      exe +{ stdin  => 1 }, sub { "2>&1" }, qw(perl -ne), 'print STDERR "360.0 / $_"',
      exe +{ stdout => 1 }, qw(bc -l),
  };

  # check if exe()'s were successful
  defined($pid1) && defined($pid2)
    or die("Failed to run processes");

  # ask 'bc -l' results of "360 divided by given inputs"
  print $TO_STDIN "$_\n" for 2 .. 8;

  # we redirect stderr of 'perl' to stdout
  #   which, in turn, is fed into stdin of 'bc'

  # print captured outputs
  print "360 / $_ = " . <$FROM_STDOUT> for 2 .. 8;

  # close filehandles
  close($TO_STDIN);
  close($FROM_STDOUT);

Important: Some non-Unix platforms, such as Win32, require interactive processes (shown above) to know when to quit, and can neither rely on close($TO_STDIN), nor kill(TERM => $pid);

SUBROUTINES

Both exe( ) and bg( ) are optionally exported. They each return CODE references that need to be called.

exe( )

  exe \%EXE_OPTIONS, &PREEXEC, LIST, @REDIRECTS, &READER

\%EXE_OPTIONS is an optional hash reference to instruct exe( ) to return STDIN / STDERR / STDOUT filehandle(s) of the executed child process. See "SETTING OPTIONS".

LIST is exec( ) in the child process after the parent is forked, where the child's stdout is redirected to &READER's stdin. It is optional if &PREEXEC is provided.

&PREEXEC is called right before exec( ) in the child process, so we may reopen filehandles or do some child-only operations beforehand. It is optional if LIST is provided.

&PREEXEC could return a LIST of @REDIRECTS to perform common filehandle redirections and/or modify binmode settings. The @REDIRECTS may be optionally specified (as references) after LIST. Returning these strings (or references to them) will do the following preset actions:

  "2>#"  or "2>null"   silence  stderr
   ">#"  or "1>null"   silence  stdout
  "2>&1"               redirect stderr to  stdout
  "1>&2" or ">&2"      redirect stdout to  stderr
  "2>&-"               close    stderr
  "1><2" or "2><1"     swap     stdout and stderr
                       (+) shell-way works too:
                           \"3>&1", \"1>&2", \"2>&3", \"3>&-"

  "0:crlf"             does binmode(STDIN,  ":crlf")
  "1:raw" or "1:"      does binmode(STDOUT, ":raw")
  "2:utf8"             does binmode(STDERR, ":utf8")

&PREEXEC could also return array references in the mix to perform open operations. If open fails, IPC::Exe will die. Minimal validation is done for the array items, so be careful. Examples:

  [ ">",  "/path/file" ]   does open(STDOUT, ">",  "/path/file")
  [ ">>", "/path/file" ]   does open(STDOUT, ">>", "/path/file")
  [ "2>", "/path/file" ]   does open(STDERR, ">",  "/path/file")
  [ *FH, "+>>", $file ]    does open(FH, "+>>", $file)

If references to array refs are returned by &PREEXEC, then sysopen will be used instead:

  \[ *FH, $file, O_RDWR ]           does sysopen(FH, $file, O_RDWR)
  \[ *FH, $file, O_WRONLY, 0644 ]   does sysopen(FH, $file, O_WRONLY, 0644)

It is important to note that the actions & return of &PREEXEC matters, as it may be used to redirect filehandles before &PREEXEC becomes the exec process. If @REDIRECTS are provided along with &PREEXEC, the filehandle operations returned by &PREEXEC are done first prior to @REDIRECTS, in return-order.

&PREEXEC is called with arguments passed to the CODE reference returned by exe( ).

&READER is called with ($child_pid, LIST) as its arguments. LIST corresponds to the positional arguments passed in-between &PREEXEC and @REDIRECTS.

If exe( )'s are chained, &READER calls itself as the next exe( ) in line, which in turn, calls the next &PREEXEC, LIST, etc.

&READER is always called in the parent process.

&PREEXEC is always called in the child process.

waitpid( $_[0], 0 ) in &READER to set exit status $? of previous process executing on the pipe. close( $IPC::Exe::PIPE ) can also be used to close the input filehandle and set $? at the same time (for Unix platforms only).

If LIST is not provided, &PREEXEC will still be called.

If &PREEXEC is not provided, LIST will still exec.

If &READER is not provided, it defaults to something like

  sub { print while <STDIN>; waitpid($_[0], 0); return $? } # $_[0] is the $child_pid

exe( &READER ) simply returns &READER.

exe( ) with no arguments returns an empty list.

bg( )

  bg \%BG_OPTIONS, &BACKGROUND

\%BG_OPTIONS is an optional hash reference to instruct bg( ) to wait a certain amount of time for PREEXEC to complete (for non-Unix platforms only). See "SETTING OPTIONS".

&BACKGROUND is called after it is sent to the init process.

If &BACKGROUND is not a CODE reference, return an empty list upon execution.

bg( ) with no arguments returns an empty list.

This experimental feature is not enabled by default:

  • Upon failure of background to init process, bg( ) can fallback by calling &BACKGROUND in parent or child process if $IPC::Exe::bg_fallback is true. To enable fallback feature, set

      $IPC::Exe::bg_fallback = 1;

SETTING OPTIONS

exe( )

\%EXE_OPTIONS is a hash reference that can be provided as the first argument to exe( ) to control returned values. It may be used to return or assign STDIN / STDERR / STDOUT filehandle(s) of the child process to emulate IPC::Open2 and IPC::Open3 behavior.

The default values are:

  %EXE_OPTIONS = (
      pid         => undef,
      stdin       => 0,
      stdout      => 0,
      stderr      => 0,
      autoflush   => 1,
      binmode_io  => undef,
  );

These are the effects of setting the following options:

pid => \$pid

Set $pid to the child process PID, given a SCALAR reference. The PID will not be returned as part of the return values of exe( ).

stdin => 1 or stdin => \$TO_STDIN

Return a WRITEHANDLE to STDIN of the child process. The filehandle will be set to autoflush on write if $EXE_OPTIONS{autoflush} is true.

If given a SCALAR reference, set $TO_STDIN to the WRITEHANDLE described above. The WRITEHANDLE then will not be returned as part of the return values of exe( ).

stdout => 1 or stdout => \$FROM_STDOUT

Return a READHANDLE from STDOUT of the child process, so output to stdout may be captured. When this option is set and &READER is not provided, the default &READER subroutine will NOT be called.

If given a SCALAR reference, set $FROM_STDOUT to the READHANDLE described above. The READHANDLE then will not be returned as part of the return values of exe( ).

stderr => 1 or stdout => \$FROM_STDERR

Return a READHANDLE from STDERR of the child process, so output to stderr may be captured.

If given a SCALAR reference, set $FROM_STDERR to the READHANDLE described above. The READHANDLE then will not be returned as part of the return values of exe( ).

autoflush => 0

Disable autoflush on the WRITEHANDLE to STDIN of the child process. This option only has effect when $EXE_OPTIONS{stdin} is true.

binmode_io => ":raw", ":crlf", ":bytes", ":encoding(utf8)", etc.

Set binmode of STDIN and STDOUT of the child process for layer $EXE_OPTIONS{binmode_io}. This is automatically done for subsequently chained exe( )cutions. To stop this, set to an empty string "" or another layer to bring a different mode into effect.

bg( )

NOTE: This only applies to non-Unix platforms.

\%BG_OPTIONS is a hash reference that can be provided as the first argument to bg( ) to set wait time (in seconds) before relinquishing control back to the parent thread. See "CAVEAT" for reasons why this is necessary.

The default value is:

  %BG_OPTIONS = (
      wait => 2,  # Win32 option
  );

RETURN VALUES

By chaining exe( ) and bg( ) statements, calling the single returned CODE reference sets off the chain of executions. This returns a LIST in which each element corresponds to each exe( ) or bg( ) call.

exe( )

  • When exe( ) executes an external process, the PID for that process is returned, or an EMPTY LIST if exe( ) failed in any operation prior to forking. If an EMPTY LIST is returned, the chain of execution stops there and the next &READER is not called, guaranteeing the final return LIST to be truncated at that point. Failure after forking causes die( ) to be called.

  • When exe( ) executes a &READER subroutine, the subroutine's return value is returned. If there is no explicit &READER, the implicit default &READER subroutine is called instead:

      sub { print while <STDIN>; waitpid($_[0], 0); return $? } # $_[0] is the $child_pid

    It returns $?, which is the status of the last pipe process close. This allows code to be written like:

      my $exit = &{ exe 'myprog', $myarg }; # $exit = ($myprog_pid, $myprog_exit_status);
  • When non-default \%EXE_OPTIONS are specified, each exe( ) returns additional filehandles in the following LIST:

      (
          $PID,                # undef if exec failed
          $STDIN_WRITEHANDLE,  # only if $EXE_OPTIONS{stdin}  is true
          $STDOUT_READHANDLE,  # only if $EXE_OPTIONS{stdout} is true
          $STDERR_READHANDLE,  # only if $EXE_OPTIONS{stderr} is true
      )

    The positional LIST form return allows code to be written like:

      my ($pid, $TO_STDIN, $FROM_STDOUT) = &{
          exe +{ stdin => 1, stdout => 1 }, '/usr/bin/bc'
      };

    SCALAR references may be passed in \%EXE_OPTIONS for their scalars to be assigned in-place, instead of returning them in the positional LIST:

      my ($pid, $FROM_STDOUT);
      my ($TO_STDIN) = &{
          exe +{ pid => \$pid, stdin => 1, stdout => \$FROM_STDOUT },
            '/usr/bin/bc'
      };

    Note: It is necessary to disambiguate \%EXE_OPTIONS (also \%BG_OPTIONS) as a hash reference by including a unary + before the opening curly bracket:

      +{ stdin => 1, autoflush => 0 }
      +{ wait => 2.5 }

bg( )

Calling the CODE reference returned by bg( ) returns the PID of the background process, or an EMPTY LIST if bg( ) failed in any operation prior to forking. Failure after forking causes die( ) to be called.

ERROR CHECKING

To determine if either exe( ) or bg( ) was successful until the point of forking, check whether the returned $PID is defined.

See "EXAMPLES" for examples on error checking.

WARNING: This may get a slightly complicated for chained exe( )'s when non-default \%EXE_OPTIONS cause the positions of $PID in the overall returned LIST to be non-uniform (caveat emptor). Remember, the chain of executions is doing a lot for just a single CODE call, so due diligence is required for error checking.

A minimum count of items (PIDs and/or filehandles) can be expected in the returned LIST to determine whether forks were initiated for the entire exe( ) / bg( ) chain.

Failures after forking are responded with die( ). To handle these errors, use eval.

TAINT CHECKING

In taint mode, exe( ) will die if it is called with tainted arguments or environment variables. By default, the following environment variables are checked:

  PATH  PATHEXT  IFS  CDPATH  ENV  BASH_ENV  PERL5SHELL

We may add to this list with:

  BEGIN { push @IPC::Exe::TAINT_ENV, qw(PATH_LOCALE TERMINFO TERMPATH) }

SYNTAX

It is highly recommended to avoid unnecessary parentheses ( )'s when using exe( ) and bg( ).

IPC::Exe relies on Perl's LIST parsing magic in order to provide the clean intuitive syntax.

As a guide, the following syntax should be used:

  my @pids = &{                                          # call CODE reference
      [ bg ] exe [ sub { ... }, ] $prog1, $arg1, @ARGV,  # end line with comma
             exe [ sub { ... }, ] $prog2, $arg2, $arg3,  # end line with comma
      [ bg ] exe sub { ... },                            # this bg() acts on last exe() only
             sub { ... },
  };

where brackets [ ]'s denote optional syntax.

Note that Perl sees

  my @pids = &{
      bg exe $prog1, $arg1, @ARGV,
      bg exe sub { "2>#" }, $prog2, $arg2, $arg3,
         exe sub { 123 },
         sub { 456 },
  };

as

  my @pids = &{
      bg( exe( $prog1, $arg1, @ARGV,
              bg( exe( sub { "2>#" }, $prog2, $arg2, $arg3,
                      exe( sub { 123 },
                           sub { 456 }
                      )
                  )
              )
          )
      );
  };

CAVEAT

END { } blocks

Code declared in END blocks will be called upon exit, whether it be after &PREEXEC sub without a LIST command, from a die failure, or even a failed exec call.

The user should make provisions to handle this situation. This is desirable when END blocks must only be called in the main process (or thread).

$IPC::Exe::is_forked is set to true after the code forks in &PREEXEC and &BACKGROUND. It can be used to tell the main process/thread apart from child processes/threads:

  END {
      # only run in main process/thread
      return if $IPC::Exe::is_forked;

      ### REST OF THE CODE GOES HERE ###
      ...
  }

PLATFORMS

This module is targeted for Unix environments, using techniques described in perlipc and perlfaq8. Development is done on FreeBSD, Linux, and Win32 platforms. It may not work well on other non-Unix systems, let alone Win32.

MSWin32

Some care was taken to rely on Perl's Win32 threaded implementation of fork( ). To get things to work almost like Unix, redirections of filehandles have to be performed in a certain order. More specifically: let's say STDOUT of a child process (read: thread) needs to be redirected elsewhere (anywhere, it doesn't matter). It is important that the parent process (read: thread) does not use STDOUT until after the child is exec'ed. At the point after exec, the parent must restore STDOUT to a previously dup'ed original and may then proceed along as usual. If this order is violated, deadlocks may occur, often manifesting as an apparent stall in execution when the parent tries to use STDOUT.

exe( )

Since fork( ) is emulated with threads, &PREEXEC and &READER really do begin their lives in the same process, but in separate threads. This imposes limitations on how they can be used. One limitation is that, as separate threads, either one MUST NOT block, or else the other thread will not be able to continue.

Writing to, or reading from a pipe will block when the pipe buffer is full or empty, respectively.

Putting the facts together, it means that a pipe writer and reader should not function (as separate threads or otherwise) in the same process for fear that one may block and not let the other continue (a deadlock).

For example, this code below will block:

  &{
      exe sub { print "a" x 9000, "\n" for 1 .. 3 }, # sub is &PREEXEC
          sub { @result = <STDIN> }                  # sub is &READER
  };

The execution stalls, and the program just hangs there. &PREEXEC is writing out more data than the pipe buffer can fit. Once the buffer is full, print will block to wait for the buffer to be emptied. However, &READER is not able to continue and read off some data from the pipe buffer because it is in the same blocked process. If it were in a separate process (as in a real fork), than a blocking &PREEXEC cannot affect the &READER.

The way to ensure exe( ) works smoothly on Win32 is to exec processes on the pipeline chain. This code will work instead:

  &{
      exe qw(perl -e), 'print "a" x 9000, "\n" for 1 .. 3', # &PREEXEC exec'ed perl
          sub { @result = <STDIN> }                         # sub is &READER
  };

Now, &PREEXEC is no longer running in the same process, and cannot affect &READER. If the new perl process blocks, &READER in the original process can still continue to read the pipe.

Writing and reading small amounts of data (to not cause blocking) between &PREEXEC and &READER is possible, but not recommended.

bg( )

On Win32, bg( ) unfortunately has to substantially rely on timer code to wait for &PREEXEC to complete in order to work properly with exe( ). The example shown below illustrates that bg( ) has to wait at least until $program is exec'ed. Hence, $wait_time > $work_time must hold true and this requires a priori knowledge of how long &PREEXEC will take.

  &{
      bg +{ wait => $wait_time }, exe sub { sleep($work_time) }, $program
  };

This essentially renders bg &BACKGROUND useless if &BACKGROUND does not exec any programs (Win32).

In summary: (on Win32)

  • Only use bg( ) to exec programs into the background.

  • Keep &PREEXEC as short-running as possible. Or make sure $BG_OPTIONS{wait} time is longer.

  • No &PREEXEC (or code running in parallel thread) == no problems.

Some useful information:

  http://perldoc.perl.org/perlfork.html#CAVEATS-AND-LIMITATIONS
  http://www.nntp.perl.org/group/perl.perl5.porters/2003/11/msg85488.html
  http://www.nntp.perl.org/group/perl.perl5.porters/2003/08/msg80311.html
  http://www.perlmonks.org/?node_id=684859
  http://www.perlmonks.org/?node_id=225577
  http://www.perlmonks.org/?node_id=742363

DEPENDENCIES

Perl v5.8.8+ is required.

No non-core modules are required.

AUTHOR

Gerald Lai <glai at cpan dot org>