Bio::Grid::Run::SGE uses various configuration settings to run a job. All configuration is stored in the YAML format. The configuration can be stored at two places:
The global config file can contain e.g. settings that are used for job notifications and paths to executables.
Bio::Grid::Run::SGE can notify you if a job finishes by email or jabber message. You can also use a custom script with the script option. An example configuration would be:
script
--- notify: mail: dest: person.in.charge@example.com smtp_server: smtp.example.com jabber: jid: grid-report@jabber.example.com/grid_report password: ... dest: person-in-charge@jabber.example.com script: /path/to/log/script.pl
Custom scripts will get a json encoded structure passed via stdin. The structure has the form:
{ "subject": "the subject", "message": "the main log message", "from": "user@the.cluster.org" }
You can add other configuration settings. If you start a lot of R scripts you might want to add the Rscript bin as global configuration:
--- notify: .... r_script_bin: /usr/bin/Rscript
This configuration setting is accessible in the cluster script via the supplied configuration of the task function as $c-{extra}{r_script_bin}>.
$c-
job_name: NAME mode: Consecutive/AvsB/AllvsAll/AllvsAllNoRep args: [ '-a', 10, '-b','no' ] test: 2 no_prompt: 1 num_parts: 3000 # or combinations_per_task 300 result_dir: result_gff working_dir: stderr_dir: stdout_dir: log_dir: dir tmp_dir: dir idx_dir: dir prefix_output_dirs:
If the config file contains relative paths, the following policy is used:
working_dir
The working directory needs to exist.
With the input section it is possible to specify the type of input data and how the index should be created.
The basic layout is:
--- input: - ... # details of index 1 - ... # details of index 2
Each index element shows up as argument in the task function,
task
run_job(... task => sub { my ( $c, $result_prefix, $element_index_1, $element_index_2, ... ) = @_; } ... )
The number of indices you can use is determined by the mode. The most basic mode is Consecutive and it takes one index and iterates through every element.
Consecutive
Bio::Grid::Run::SGE can run in different iteration modes
=
--- input: - format: General #files, list and elements are synonyms files: - ../03_clean_evidence/result/merged.fa.clean chunk_size: 30 sep: ^> sep_remove: 1 sep_pos: '^'/'$' ignore_first_sep: 1 - format: List list: [ 'a', 'b', 'c' ] - format: FileList files: [ 'filea', 'fileb', 'filec' ] - format: Range list: [ 'from', 'to' ]
Example configuration:
'stdout_dir' => '/WORKING_DIR/xml_munge1.tmp/out', 'test' => '1', 'no_prompt' => undef, 'input' => [ { 'elements' => [ '../../2013-10-13_string_b2g_blast/cafa_b2g_blastSTRING_9606_protein.sequences.result/cafa_b2g_blastSTRING_*_protein.sequences.*.blast.gz' ], 'format' => 'FileList', 'idx_file' => '/WORKING_DIR/idx/xml_munge1.0.idx' } ], 'mode' => 'Consecutive', 'range' => [ '1', '1' ], 'submit_bin' => 'qsub', 'submit_params' => [], 'args' => [], 'working_dir' => '/WORKING_DIR/test', 'num_comb' => 564, 'log_dir' => '/WORKING_DIR/xml_munge1.tmp/log', 'stderr_dir' => '/WORKING_DIR/xml_munge1.tmp/err', 'tmp_dir' => '/WORKING_DIR/xml_munge1.tmp', 'smtp_server' => 'net.wur.nl', 'job_name' => 'xml_munge1', 'extra' => { 'map' => '../split_test.map.json.gz' }, 'mail' => 'joachim.bargsten@wur.nl', 'script_dir' => '/WORKING_DIR/bin', 'idx_dir' => '/WORKING_DIR/idx', 'job_cmd' => 'qsub -t 1-1 -S perl -N xml_munge1 -e /WORKING_DIR/xml_munge1.tmp/err -o /WORKING_DIR/xml_munge1.tmp/out /WORKING_DIR/xml_munge1.tmp/env.xml_munge1.pl WORKING_DIR/bin/cl_xml_munge.pl --worker /WORKING_DIR/xml_munge1.tmp/xml_munge1.config.dat', 'job_id' => '325541.1', 'cmd' => [ '/WORKING_DIR/bin/cl_xml_munge.pl' ], 'worker_config_file' => '/WORKING_DIR/xml_munge1.tmp/xml_munge1.config.dat', 'prefix_output_dirs' => '1', 'perl_bin' => '/home/cafa/perl5/perlbrew/perls/perl-5.16.3/bin/perl', 'result_dir' => '/WORKING_DIR/xml_munge1.result', 'part_size' => 1, 'num_parts' => 564
Here is a list of reserved configuration options:
$c = { cmd => ..., script_dir => ... no_post_task => ..., tmp_dir => ..., stderr_dir => ..., stdout_dir => ..., result_dir => ..., log_dir => ..., idx_dir => ..., test => ..., mail => ..., smtp_server => ..., no_prompt => ..., lib => ..., input => ..., extra => ..., num_parts => ..., combinations_per_task => ..., job_name => ..., job_id => ..., mode => ..., worker_config_file => ..., worker_env_script => ..., submit_bin => ..., submit_params => ..., perl_bin => ..., working_dir => ..., iterator => ..., args => ..., };
--- input: - format: General #files, list and elements are synonyms files: - ../03_clean_evidence/result/merged.fa.clean chunk_size: 30 sep: ^> sep_remove: 1 sep_pos: '^'/'$' ignore_first_sep: 1 - format: List list: [ 'a', 'b', 'c' ] - format: FileList files: [ 'filea', 'fileb', 'filec' ] - format: Range list: [ 'from', 'to' ] job_name: NAME mode: Consecutive/AvsB/AllvsAll/AllvsAllNoRep args: [ '-a', 10, '-b','no' ] test: 2 no_prompt: 1 num_parts: 3000 # or combinations_per_task 300 result_dir: result_gff working_dir: stderr_dir: stdout_dir: log_dir: dir tmp_dir: dir idx_dir: dir prefix_output_dirs:
The attribute args is special, normally the main executable is hard-coded in the cl_* script, but the arguments are changing per configuration. Therefore Bio::Grid::Run::SGE::Master provides the convenience attribute $c->{args}
args
$c->{args}
To install Bio::Grid::Run::SGE, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Bio::Grid::Run::SGE
CPAN shell
perl -MCPAN -e shell install Bio::Grid::Run::SGE
For more information on module installation, please visit the detailed CPAN module installation guide.