lib/HPC/Runner/Usage.pod


            
              —
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
              
=head1 Name
HPC::Runner::Usage
=head1 HPC-Runner-Scheduler
=head2 Overview
The HPC::Runner modules are wrappers around running Gnu Parallel,
Parallel::ForkManager, MCE/MCE::Queue, and job submission to a Slurm or PBS
queue
=head2 Submit your commands
    ##Submit a job to slurm to spread across nodes
    slurmrunner.pl --infile /path/to/fileofcommands --outdir slurmoutput --jobname slurmjob 
    ##Run in parallel using threads on a single noce
    parallelrunner.pl --infile /path/to/fileofcommands --outdir threadsoutput --procs 4 
     
    ##Run in parallel using MCE on a single node
    mcerunner.pl --infile /path/to/fileofcommands --outdir threadsoutput --procs 4
=head3 Run Your Command
The idea behind the HPC::Runner modules is to be able to run arbitrary
bash with proper logging, catching STDOUT/ERROR and exit status, and
when possible to run jobs in parallel with some job flow.
The modules are written with Moose, and can be overwritten and
extended.
Logging is done with Log::Log4perl.
HPC::Runner is a base class thas has the variables common among
HPC::Runner::Threads, HPC::Runner::MCE, and HPC::Runner::Slurm. All
three modules have use a similar philosophy, but different technologies
to implement it. For myself this was a workaround so I didn't have to
learn to write MPI scripts, or have every tool be written into some
sort of workflow manager.
The different runners each come with an executable script that should
be installed in your path: mcerunner.pl, parallelrunner.pl, and
slurmrunner.pl.
=head1 An Indepth Look
=head2 Single Node Execution
If you only have a single node to execute on, but still have many
threads/processes available, you can use HPC::Runner::MCE or
HPC::Runner:Threads. Both have job logging and workflow control.
=head2 Example for Runner::Threads and Runner::MCE
An example infile would contain any command that can be executed from
the command line. All the modules have a basic level of workflow
management, meaning you can use the command 'wait' to wait for all
other threads/processes to finish.
In the example directory there is a script calledE<Acirc>
testioselect.pl. It is 100% from a thread on perlmonks discussing the
proper use of IPC::Open3 found here.E<Acirc>
http://www.perlmonks.org/?node_id=151886. I based all the usage of
running bash commands from the user abstract's post, only adding in the
parts for logging.
You could create a test_threads/mce.in with the following.
    test_threads/mce.in
It is ALWAYS a good idea to put full paths when running arbitrary
scripts. Jobs will always be run from the your current working directory, but its still a good idea!
And submit that to the Runner::MCE/Threads with the following.
    #Using MCE
    mcerunner.pl --infile test_mce.in --outdir `pwd`/test --procs 4
    # OR
    # Using Parallel::Forkmanager
    parallelrunner.pl --infile test_mce.in --outdir `pwd`/test --procs 4
Which would generate you the the test directory, and logs for the
commands detailing STDOUT/STDERR, time and date, and run those commands
4 threads/processes at a time.
Each command gets its own log file, as well as a MAIN log file to
detail how the job is running overall.
=head3 Trouble Shooting mcerunner and parallelrunner
First of all, make sure your jobs run without the wrapper script.
Runner::Threads/MCE only makes sure your threads/processes start. It
does not make sure your jobs exit successfully, but the exitcode will
be in your log.
View your logs in
outdir/prunnerI<runner>datetimeI<randomstr/CMD#>datetime. This will
give you the STDOUT/STDERR.
=head3 Full Path Names
Please give all your commands, infiles, and
outdirectories the full path names. If you are executing arbitrary
script you should give either the full path name or the path should be
in your ENV{PATH}. HPC::Runner::Init will do some guessing on the infile and
outdir parameters using File::Spec, but this is no guarantee!
If you are using Runner::Threads your perl must be installed with
thread capabilities.
=head2 MultiNode Job Submission
This documentation was initially written for Slurm, but for the most part the
concepts and requirements are the same across schedulers (Slurm, PBS, SGE, etc).
=head2 HPC::Runner::Slurm Example
HPC::Runner::Slurm adds another layer to HPC::Runner::MCE or
HPC::Runner::Threads by submitting jobs to the queing system Slurm.
https://computing.llnl.gov/linux/slurm/. Slurm submits its jobs to
different machines, or nodes, across a cluster. It is common for many
users sharing the same space.
When I was first using slurm I wanted something that would
automatically distribute my jobs across the cluster in a way that would
get them done reasonably quickly. Most of the jobs being submitted were
'embarassingly parallel' and did not require much of the fine tuning
slurm is capable of. For most jobs what we wanted to be able to do was
take a list of jobs, chop them into pieces, take each piece and send it
to a node, and then on that node run those jobs in parallel.
=head3 alljobs.in
    job1
    job2
    job3
    job4
    # Lets tell mcerunner/parallelrunner/slurmrunner.pl to execute jobs5-8 AFTER jobs1-4 have completed
    # wait
    job5
    job6
    job7
    job8
What I want is for Slurm to take 4 jobs at a time, submit those to a
node. I don't want to do this all manually for obvious reasons.
=head3 Slurm Template
    #!/bin/bash
    #SBATCH --share
    #SBATCH --get-user-env
    #SBATCH --job-name=alljobs_batch1
    #SBATCH --output=batch1.out
    #SBATCH --partition=bigpartition
    #SBATCH --nodelist=node1onbigpartion
    #Here are the jobs!
    job1
    job2
    job3
    job4
Ok, I don't really want that. I want all the logging, and since those
jobs don't depend on one another I want to run them all in parallel.
Because that is what HPC is all about. ;) So I run this command instead
that uses the script that comes with Runner::Slurm.
    slurmrunner.pl --infile pwd/alljobs.in --jobname alljobs --outdir pwd/alljobs
And have the following template files created and submitted to the
queue.
Although it is not required to supply a jobname or an outdir, it is
strongly recommended especially if you are submitting multiple jobs.
=head3 Slurm Template with Batched Job
    #!/bin/bash
    ##alljobs_batch1.sh
    #SBATCH --share
    #SBATCH --get-user-env
    #SBATCH --job-name=alljobs_batch1
    #SBATCH --output=batch1.out
    #SBATCH --partition=bigpartition
    #SBATCH --nodelist=node1onbigpartion
    #SBATCH --cpus-per-task=4
    # 
    # #Take out jobs, batch them out to a node, and run them in parallel
    # mcerunner.pl --infile batch1.in --procs 4 --outdir /outdir/we/set/in/slurmrunner.pl
Where batch1.in contains our jobs1-4. The number that is in
--cpus-per-task should be greater than or equal to the maximum number
of threads/processes that are run in parallel (procs). The default
values in HPC::Runner::Slurm are fine, but if you change them make sure
you stick with that rule.
This template and batch1.in is generated by the command and is
submitted with the slurmjobid 123.
Then the next job batch is generated as alljobs_batch2.sh, and we tell
slurm we want for it to be submitted after jobs1,2,3,4 exit
successfully.
=head3 Slurm Template with Dependency
    #!/bin/bash
    ##alljobs_batch2.sh
    #SBATCH --share
    #SBATCH --get-user-env
    #SBATCH --job-name=alljobs_batch2
    #SBATCH --output=batch2.out
    #SBATCH --partition=bigpartition
    #SBATCH --nodelist=node2onbigpartion
    #SBATCH --cpus-per-task=4
    # 
    # #Don't start this job until 123 submits successfully
    # SBATCH --dependency=afterok:123
    #  
    # mcerunner.pl --infile batch2.in --procs 4 --outdir /outdir/we/set/in/slurmrunner.pl
=head2 Customizing HPC::Runner::Slurm Input
Since the HPC::Runner modules are written in Moose, they can be
overridden and extended in the usual fashion. Logging is done with
Log4Perl, so any of the appenders can be used. The default is to log to
files, but what if you want to log to rsyslog or a database?
=head3 Extend slurmrunner.pl to add your own custom loggin
    #!/usr/bin/env perl
    #slurmrunner_rsyslog.pl
      
    package Main;
    use Moose;
      
    extends 'HPC::Runner::Slurm';
      
      
    use Log::Log4perl;
      
    #Lets override init_log with our own subroutine...
      
    sub init_log {
        my $self = shift;
        my $filename = $self->logdir."/".$self->logfile;
    my $log_conf =<<EOF;
    ############################################################
    #  Log::Log4perl conf - Syslog                             #
    ############################################################
    log4perl.rootLogger                = DEBUG, SYSLOG, FILE
    log4perl.appender.SYSLOG           = Log::Dispatch::Syslog
    log4perl.appender.SYSLOG.min_level = debug
    log4perl.appender.SYSLOG.ident     = slurmrunner
    log4perl.appender.SYSLOG.facility  = local1
    log4perl.appender.SYSLOG.layout    = Log::Log4perl::Layout::SimpleLayout
    log4perl.appender.FILE           = Log::Log4perl::Appender::File
    log4perl.appender.FILE.filename  = $filename
    log4perl.appender.FILE.mode      = append
    log4perl.appender.FILE.layout    = Log::Log4perl::Layout::PatternLayout
    log4perl.appender.FILE.layout.ConversionPattern = %d %p %m %n
    EOF
        Log::Log4perl::init(\$log_conf);
        my $log = Log::Log4perl->get_logger();
        return $log;
    };
      
    Main->new_with_options->run;
      
    1;
=head2 Trouble Shooting Runner::Slurm
Make sure your paths are sourced correctly for slurm. The easiest way
to do this is add all your paths to your ~/.bashrc, source it, and add
the line
    #SBATCH --get-user-env>
to your submit script. By default this is already placed in the
template, but if you decide to supply your own template you may want to
add it.
If you are submitting a script that is not in your path, you probably
want to give the full pathname for it, especially if supplying the
outdir option. In general I think its always best to give the full
pathname.
If you are in the directory already and submitting from bash, just use
backticks around pwd.
Another common error is 'This node configuration is not available'.
This could mean several things.
        1. The node is down at the time of job submission
        2. You are asking for more resources on a node than it has. If you ask for --cpus-per-task=32 and the node only has 16 cpus, you will get this error.
        3. You misspelled the partition or nodename.
Point 2 will be improved upon in the next release so it queries slurm
for the number of cpus available on a node at the time of submission.
For now it must be manually set with --cpus-per-task
=head2 Authors and Contributors
Jillian Rowe in collaboration with the ITS Advanced Computing Team at
Weill Cornell Medical College in Qatar.
=head2 Acknowledgements
This module was originally developed at and for Weill Cornell Medical
College in Qatar. With approval from WCMC-Q, this information was
generalized and put on github, for which the authors would like to
express their gratitude. Also to all the HPC users at WCMCQ, who all
gave their input.
The continued development of the HPC::Runner modules is supported by NYUAD, at
the Center for Genomics and Systems Biology.
=cut
	Global
`s`	Focus search bar
`?`	Bring up this help dialog
	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)
	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse
	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)