The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

        HPCI::Group

SYNOPSIS

Role for building a cluster-specific driver for a group of stages. This should only be used internally to the HPCI module - code that uses this driver will not load this module (or the driver module) explicitly.

It describes the user interface for a generic group, hiding (as much as possible) the specifics of the actual cluster that is being used. The driver module that consumes this role will arrange to translate the generic interface into the particular interface conventions of the specific cluster that it accesses.

An (internally defined) cluster-specific group object is defined with:

    package HPCD::$cluster::Group;
    use Moose;

        ### required method definitions

        with 'HPCI::Group' => { StageClass => 'HPCD::$cluster::Stage' },
        # any other roles required ...
                ;

        ### cluster-specific method definition if any ...

DESCRIPTION

This role provides the generic interface for a group object which can configure and run a collection of stages (jobs) on machines in a cluster. It is written to be independent of the specifics of any particular cluster interface. The cluster-specific module that consumes this role is not accessed directly by the user program - they are provided with a group driver object of the appropriate cluster-specific type using the "class method" HPCI->group (with an appropriate cluster argument) to request an appropriate to build it.

ATTRIBUTES

cluster

The type of cluster that will be used to execute the group of stages. This value is passed on by the HPCI->group method when it creates a new group. Since it also uses that value to select the type of group object that is created, it is somewhat redundant.

name (optional)

The name of this group of stages. Defaults to 'default_group_name'.

_unique_name (internal)

The name of this group of stages. Used as the default value for the group_dir attribute.

base_dir (optional)

The directory that will contain all generated output (unless that output is specifically directed to some other location). The default is the current directory.

group_dir (optional)

The directory which will contain all output pertaining to the entire group. By default, this is a new directory under base_dir which is given a name combining the name of the group and the timestamp when the group was created (e.g. EXAMPLEGROUP-YYMMDD-HHMMSS).

max_concurrent (optional)

The maximum number of stages to be running concurrently. If 0 (which is the default), then there is no limit applied directly by HPCI (although the underlying cluster-specific driver might apply limits of its own).

stage_defaults

This attribute can be given a hash reference containing values that will be passed to every stage created.

status (provided internally)

After the execute method has been called, this attribute contains the return result from the execution. This is a hash (indexed by stage name). The value for each stage is an array of the return status. (Usually, this array has only one element, but there will be more if the stage was retried. The final element of the array is almost always the one that you wish to look at.) The return status is a hash - it will always contain an element key 'exit_status' giving the exit status of the stage. Additional entries will be found in the hash for cluster-specific return reults. Thus, to check the exit status of a particular stage you would code either:

    $result = $group->execute;
        if ($result->{SOMESTAGENAME}[-1]{exit_status}) {
            die "SOMESTAGENAME failed!";
        }

or:

    $group->execute;
        # ...
        if ($group->status->{SOMESTAGENAME}[-1]{exit_status}) {
            die "SOMESTAGENAME failed!";
        }

METHODS

$group->stage( name=>'stagename', ... )

Creates a stage and adds it to the group. See HPCI::Stage for the generic parameters you may provide for a stage; and see HPCD::$cluster::Stage for the cluster-specific parameters for the actual type of cluster you are using.

Note: this is the only way to add a stage object to the group. In particular, you cannot create a stage object separately and add it to the group - this is done to ensure that the created stage object is consistant with the actual group object and that you don't have to change code in multiple places if you switch to using a different cluster type for the group. (If you want to mix stages for multiple cluster types within your program, you should either create two groups that execute independently, or else create a stage that itself creates a group and manages the stages for the second type of cluster.)

The name parameter is required and must be unique - two stages within the same group may not have the same name.

The method returns the stage object that was created, although most code will not need it directly. (Whenever you need to refer to a stage to add dependencies, you can use its name instead of a reference to the object.)

$group->add_deps

    $group->add_deps(
            dep      => 'a_dep',                  ## one of these two
                deps     => ['dep1', 'dep2', ...],
                pre_req  => 'a_pre_req',              ## and one of these two
                pre_reqs => ['pre_req1', 'pre_req2', ...],
        );

The add_deps method marks the pre_req (or all of the pre_reqs) as being pre-requisites to the dep (or all of the deps). When the group is executed, stages may be run in parallel, but a dependent stage will not be permitted to start executing until all of its prerequisites stages have completed successfully.

It is permitted to list the same dependency multiple times. This can be convenient in that you do not need to be careful about providing non-overlapping groups when you specify sets of prerequisites.

So, you could write:

    $group->add_deps( pre_req=>'stage1', deps=>[qw(stage2 stage3)] );
    $group->add_deps( pre_reqs=>[qw(stage1 stage2)], dep=>'stage3' );

instead of:

    $group->add_deps( pre_req=>'stage1', deps=>[qw(stage2 stage3)] );
    $group->add_deps( pre_req=>'stage2', dep=>'stage3' );

or:

    $group->add_deps( pre_req=>'stage1', dep=>'stage2' );
    $group->add_deps( pre_req=>'stage2', dep=>'stage3' );

All three forms will provide the same ordering, the last is clearer for this simple sequence, but when there are many stages that have it may be easier to specify collections of dependencies at once.

However, you must be careful to avoid dependency loops. That would be a chain of dependencies stages that include the same stage multiple times (stage1 -> stage2 -> stage1). Since a dependency indicates that the prerequisite stage must be finished executing before the dependent stage can start executing, this loop would mean that the stage1 cannot start until stage2 has completed, but also that stage2 cannot start until stage1 has completed. So, neither one can ever start and they will both never complete.

Such a loop will eventually be detected, when the group has reached a point where there are no stages running, and no stages can be started - but there could have been a lot of time wasted executing stages that were not part of the loop before this is noticed and the run aborted.

Each stage argument passed can be either a reference to the stage object or the name of the stage.

$group->execute

Execute the stages in the group. Does not return until all stages are complete (or have been skipped because of a failure of some other stage or the attempt is aborted).

AUTHOR

Christopher Lalansingh - Boutros Lab

John Macdonald - Boutros Lab

ACKNOWLEDGEMENTS

Paul Boutros, Phd, PI - Boutros http://www.omgubuntu.co.uk/2016/03/vineyard-wine-configuration-tool-linuxLab

The Ontario Institute for Cancer Research