The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

sbalance - Print Slurm allocation limits, usage, and balance

SYNOPSIS

sbalance [--account ACCOUNT] [--user USER] [ --all [ --nosuppress0 ] ] [--machine [-noheaders]] [--help ] [ --man ] [ --version ] [ --verbose ]

OPTIONS

--account ACCOUNT : display information for this allocation account. May be repeated.
--user USER : display information for the allocation accounts this user has access to. May be repeated.
--all : Display per user information for all users of the allocation account.
--nosuppress0 : When displaying user information for all users of the allocation account, include users with 0 usage.
--machine : Produce machine parsable output
--noheaders : When producing machine parsable output, omit headers.
--help : Will print basic usage instructions
--man : Will print full man page
--verbose : Print more information. In particular, print sshare commands being executed. Can abbreviate as -v.
--version: Will print version information, Can abbreviate as -V.

DESCRIPTION

Displays Slurm usage and allocation account balance for specified allocation accounts. ACCOUNT may be repeated. If no ACCOUNTs are given, defaults to all allocation accounts the specified USER (or user running the script if no ACCOUNT or USER specified) has access to.

If one or more USER is given, will also print out information about how much each specified user has used. (If no user or account specified, the information for the user running the script will be displayed.) The flag --all will print out for all users with associations for the allocation account. If --nosuppress0, then users will be printed even if they have 0 usage.

Normally information printed in user friendly fashion, in units of kSU (1000-CPU-hours). If --machine is given, prints out lines in a machine parsable format, with units as per sshare output (limit and unused are in CPU-minutes, used in CPU-seconds). Header lines will be included, prefixed by a #, unless --noheaders given.

Explanation of output (normal display mode)

This section describes in detail the output in the normal mode of display (i.e. no --machine flag). For each allocation account displayed, the following is shown:

Account:

Followed by the name of the allocation account, along with the cluster in parentheses (or DEFAULT if the default cluster for the sshare command).

Limit:

This is the GrpCPUMins (or the cpu member of GrpTRESMins) for the main association for this allocation account. This is displayed in units of kSU ( 1 kSU = 1000 SU = 1000 CPU-core-hour).

Unused:

This is the difference between the Limit and Used values, in kSU.

Used:

This is the number of SUs consumed by all jobs charged against the allocation account since the usage was last reset (or from the creation of the allocation). This is in kSU. In parentheses is displayed the same value, as a percentage of Limit above (assuming Limit is nonzero).

This will be followed by zero or more lines showing usage for individual users against this allocation account. These are only shown if a specific list of users are requested, or the --all flag is given, or neither users nor accounts were specified (which is treated as if --user was given for the user running the script). Even when the above condition is met, unless the --nosuppress0 flag is given, only users that actually have non-zero usage will be displayed.

For each user to be display, an indented line of the form

        User USERNAME used X kSU ( Y % of total usage)

is displayed.

In this case, USERNAME is the username of the user, X is the number of kSU consumed by jobs run by that user charging against this allocation account. Y represents the same number, but expressed as a percentage of the total usage of the allocation.

Explanation of output (machine-parsable display mode)

This section describes in detail the output in the machine-parsable display mode (i.e. with the --machine flag). In this mode, there is one line line (preceded by a header line unless --noheaders is given) printed for each allocation account with the general usage information for the allocation account, possibly followed by another header line (unless --noheaders is given) and one line for each user for which usage information is displayed. The header line before the user usage information is only printed if there is at least one user usage line printed.

All header lines start with the # character, and are the only lines starting with that, so they can be deleted easily (as well as omitted altogether by giving the --noheaders flag).

The field separators for all lines are the colon (:) character.

The main allocation account usage lines have the following fields, in order:

cluster: name of the cluster, or DEFAULT if using the sshare default cluster.
account: the name of the allocation account
limit: the GrpCPUMins/GrpTRESMins cpu member limit, in CPU-minutes.
used: the RawUsage for the association, in CPU-seconds.
unused: The difference between limite and <used>, in CPU-minutes.

The user usage lines have the following fields, in order:

cluster: name of the cluster, as above.
account: the name of the allocation account, as above.
username: the username of the user.
used: the RawUsage for this user/allocation account, in CPU-seconds.

KNOWN ISSUES

This currently only supports a fairly simple (but also believed to be fairly common) case where allocation account limits are set in the GrpCPUMins (or cpu member of GrpTRESMins) in the main association for the allocation account, and there are no limits imposed on individual user associations.

For TRES enabled Slurm versions, we currently only display balances related to the cpu resource. And that assumes that the TRESBillingWeights is set to "cpu=1.0" (the default). These limitations are currently at least in part due to the sshare command not being able to display the usage_tres_raw information, so we are reporting against the RawUsage output of sshare, which returns the "billable" (e.g. processed by TRESBillingWeights) version of the usage.

It is planned to add another field, Available, giving the difference between UnUsed and the estimated CPU-minutes (or other TRES resources) needed to complete all currently running jobs for that allocation account. When AccountingStorageEnforce is set to safe, this is the actual criteria used to determine whether there are sufficient "funds" in the allocation account to allow a job to start, and is probably what the user really wants to see.

AUTHOR

Tom Payerle, <payerle@umd.edu>

Copyright (c) University of Maryland, 2014-2015. All rights reserved.