spipe - simple pipeline running interface
version 0.9.1
spipe [--version | [-?|-h|--help] | [-g|--debug] | [--graphviz | [-c|--config file | [[-d|--directory value] | [-i|--input string| [-it|--itype string | [[--start value] | [[--stop value]
spipe -config t/data/string_manipulation.yml -d /tmp/test
Spipe is a control script for running simple pipelines read from configuration files written in YAML language.
For internal details of the pipeline, check the documentation for the perl module App::Pipeline::Simple.
Print out a line with the program name and version number.
Show this help.
Print out the UNIX command line equivalent of the pipeline and exit.
Reports parsing and logical errors.
Print out a graphviz dot file.
Example one liner to display a graph of the pipeline:
spipe -config t/data/string_manipulation.yml -graph > \ /tmp/p.dot; dot -Tpng /tmp/p.dot| display
Path to the config file. Required unless there is a file called config.yml in the current directory.
Directory to keep all files.
If the directory does not exist, it will be created and a copy of the config file will be copied into it under name config.yml.
config.yml
For subsequent runs of the that pipeline, you adjust the parameters in the configuration file and rerun spipe without -config and -directory options.
Optional input to pipeline.
Type of the optional input. Values?
ID of the step to start or restart the pipeline.
Fails if the prerequisites of the step are not met, i.e. the input file(s) does not exist.
ID of the step to stop the pipeline. Defaults to the last step.
Verbosity level. Defaults to zero. This will get translated to Log::Log4perl levels:
verbose = -1 0 1 2 log level = DEBUG INFO WARN ERROR
Example run:
spipe -config t/data/string_manipulation.xml -dir /tmp/test
reads instructions from the config file and writes all information to the project directory.
The debug option will parse the config file, print out the command line equivalents of all commands and print out warnings of problems encountered in the file:
An other tool integrated in the system is visualization of the execution graph. It is done with the help of GraphViz perl interface module that will need to be installed from CPAN.
The following command line creates a Graphviz dot file, converts it into an image file and opens it with the Imagemagic display program:
spipe -config t/data/string_manipulation.xml -graph > \ /tmp/p.dot; dot -Tpng /tmp/p.dot | display
The default configuration is written in YAML, a simple and human readable language that can be parsed in many languages cleanly into data structures.
The YAML file contains four top level keys for the hash that the file will be read into: 1) name to give the pipeline a short name, 2) version to indicate the version number, 3) description to give a more verbose explanation what the pipeline does, and 4) steps listing pipeline steps.
name
version
description
steps
--- description: "Example of a pipeline" name: String Manipulation version: '0.4' steps:
Each step is identified by an unique short ID and has a name that identifies an executable somewhere in the system path. Alternatively, you can give the full path leading to the executable file with key path. The name will be added to the path and padded with a suitable separator character when needed.
step
path
Arguments to the executable are given individually as key/value pairs within the args tag. A single hyphen is added in front of the argument key when they are executed. If two hyphens are needed, just add one the key. Arguments can exist without values, too.
args
s3: name: cat args: in: type: redir value: s1.txt n: out: type: redir value: s3_mod.txt next: - s4
There are two special keys in and out that need to have a key type defined. The IO type can get several kinds of values:
in
out
type
unnamed
that indicates that the argument is an unnamed argument to the executable
redir
will be interpreted as UNIX redirection character '<' or '>' depending on the context
file
means that IO happens from/to a file and is rendered as named argument
dir
is rendered like file, but is a mnemonic that all files under this directory name are processed
Finally, the step tag can contain the next key that gives an array of IDs for the next steps in the execution. Typically, these steps depend on the previous step for input.
next
Practices that are completely bonkers, like spaces in file names, are not supported.
Finally, it is worth noting that YAML can need escaping and quoting to get special characters inside strings. Double quotes around a string works most of the time well. A single quote inside a single quoted string needs to be doubled.
The following example of a perl one-liner (Thanks to Nic Walker for alerting me) could be equally well written using double quotes like this: "'print $F[1]'"
s6: name: perl args: lane: '''print $F[1]''' in: type: redir value: myfile out: type: redir value: sec_column
The pipeline does not have to be linear; it can contain branches. For example, the pipeline can have several start points with different kinds of input: file and string.
Sometimes it is useful to run the same pipeline with different parameter. The starting point of execution can take a value from the command line. Leave the value for the given argument blank in the configuration file and give it from the command line. Matching of values is done by matching the type string.
spipe -conf input_demo.yml --input=ABC --itype=str --- description: "Demonstrate input from command line" name: input.yml version: '0.1' steps: s1: name: echo args: in: type: unnamed value: out: type: redir value: s1_string.txt
The empty value will be filled in from the command line into the config.yml stored in the project directory. Also, the config file looks slightly different since the steps are written out as App::Pipeline::Simple objects. Functionally there is no difference.
value
This pipeline engine has been tested using mostly linear pipelines. Extensive branching and complex dependencies might not work as expected.
There are no explicit tests for the existence of step input files. Scripts are expected to run these steps themselves and die gracefully when appropriate.
There has been no attempt to execute steps in parallel fashion.
If all this is included, this pipeline engine might not be "simple" any more.
App::Pipeline::Simple
Heikki Lehvaslaiho, KAUST (King Abdullah University of Science and Technology).
This software is copyright (c) 2012 by Heikki Lehvaslaiho.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
To install App::Pipeline::Simple, copy and paste the appropriate command in to your terminal.
cpanm
cpanm App::Pipeline::Simple
CPAN shell
perl -MCPAN -e shell install App::Pipeline::Simple
For more information on module installation, please visit the detailed CPAN module installation guide.