NAME
Net::Hadoop::Oozie
VERSION
version 0.101
SYNOPSIS
use Net::Hadoop::Oozie;
my $oozie = Net::Hadoop::Oozie->new( %options );
DESCRIPTION
This module is a Perl interface to Oozie REST service endpoints and also include some utility methods for some bulk requests and some admin functionality.
NAME
Net::Hadoop::Oozie - Interface to various Oozie REST endpoints and utility methods.
ACCESSORS
action
api_version
doas
filter
The submission format is filter_key1=filter_value1;filter_key2=...;
, but the filters are defined as a hash.
filter => {
status => ...,
}
The valid filters are listed below.
- name
-
The application name from the workflow/coordinator/bundle definition
- user
-
The user that submitted the job
- group
-
The group for the job
- status
-
The status of the job
You need to consider a certain behavior when using filters:
The query will do an AND among all the filter names.
The query will do an OR among all the filter values for the same name.
Multiple values must be specified as different name value pairs.
jobtype
The doc says workflow, coordinator, bundle BUT in CDH 4.4, valid values are '','coordinators' and 'bundles'. workflows
and coordinator
methods are helper functions setting these values behind the scenes.
len
Defaults to 50
.
offset
Defaults to 1
.
order
Default is asc
, can be asc
or desc
. For instance, when used on a coordinator in a job
call, using desc will put the len
most recent actions in the actions key, in most recent order first; the offset
is then applied from the end of the list.
show
METHODS
END POINTS
admin
build_version
coord_rerun
coordinators
job
jobs
kill
submit_job
For details about job submission through REST, see https://oozie.apache.org/docs/4.0.0/WebServicesAPI.html#Job_Submission.
Required parameters are listed below.
oozie.wf.application.path
Like /oozie_workflows/myworkflow, must be deployed there already.
appName
How this specific instance will be called, can be anything you want.
Optional parameters are listed below.
- Auto variables
-
If you want some variable interpolated in your script (like a date, an int, or whatever), pass it in the options you call the method with. if you pass foo => 'bar', inside the workflow you will be able to use it as ${foo}.
- Configuration properties
-
Useful parameters for oozie itself (like the queue name) need AFAICT an extra level of handling. they can be set dynamically, but need a tweak in the workflow definition itself, in the top config section; for instance, if we need to specify mapreduce.job.queuename to assign the tasks to a specific fair scheduler queue, we need to declare it in the global configuration section, like this:
<property> <name>mapreduce.job.queuename</name> <value>${queueName}</value> </property>
And we will call submit_job() adding this to the options hash:
queueName => "root.<queue name>"
This method returns a job ID which you can use directly to query the job status, with the job(<jobId>) method above, so you can launch a job from a script, and have a loop query the job status at regular intervals (be nice, please) to check when it's done (untested code :-).
my $oozie = Net::Hadoop::Oozie->new;
my $job_params = [
{ appName => 'job1', myParam => 'foo' },
{ appName => 'job2', myParam => 'bar' },
...
];
for my $job (@$job_params) {
my $jobid = $oozie->submit_job({
myParam => $job->{myParam},
debug => 0, # set to 1 to print the job config and response
appName => $job->{appName},
'oozie.wf.application.path' => "/wf_base_path/<workflow name>/",
});
push @ids, $jobid;
}
while (my $jobid = shift @ids) {
my $status;
if (($status = $oozie->job($jobid)->{status}) =~ /(WAITING|READY|SUBMITTED|RUNNING)/)) {
push @ids, $jobid; # put back in the queue
sleep 10; # or more, how about 60?
}
# what do you want to do if not succeeded?
if ($status !~ /SUCCEEDED/) {
die "job $jobid died";
}
}
workflows
UTILITY METHODS
active_coordinators
active_job_paths
failed_workflows_last_n_hours
failed_workflows_last_n_hours_pretty
job_exists
This is a sugar interface on top of job
method. Normally the REST interface just dies with an HTTP 400
message on missing jobs. This method won't die and will return the data set if there is a proper response from the service. It will return false otherwise.
if ( my $job = $oozie->job_exists( $id ) ) {
# do something
}
else {
warn "No such job: $id";
}
standalone_active_workflows
Returns an arrayref of standalone workflows (as in jobs not attached to a coordinator):
my $wfs_without_a_coordinator = $oozie->standalone_active_workflows;
foreach my $wf ( @{ $wfs_without_a_coordinator } ) {
# do something
}
suspended_workflows
Returns an arrayref of suspended workflows:
my $suspended = $oozie->suspended_workflows;
foreach my $wf ( @{ $suspended } ) {
# do something
}
coordinators_with_the_same_appname_on_the_same_path
Returns a hash consisting of duplicated application names for multiple coordinators. Having coordinators like this is usually an user error when submitting jobs.
my %offenders = $oozie->coordinators_with_the_same_appname_on_the_same_path;
AUTHORS
Burak Gursoy
burak@cpan.org
David Morel
david.morel@amakuru.net
Eric Herman
eric@freesa.org
Rafael Garcia-Suarez
rgarciasuarez@gmail.com
AUTHOR
David Morel <david.morel@amakuru.net>
COPYRIGHT AND LICENSE
This software is copyright (c) 2016 by David Morel & Booking.com.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.