Submitting jobs¶

Preface¶

This section covers submitting jobs to UTHPC Center's Rocket cluster using Simple Linux Utility for Resource Management (Slurm). Following sections cover submitting jobs, selecting the correct resources and more.

A computer cluster is a set of two or more computers (nodes) that run in parallel to solve computational tasks. Computer cluster acts as a single system. A computer cluster offers its resources as one cohesive unit rather than addressing each node's resources separately - at least in most cases. Cluster's resources include CPU cores, GPUs, RAM and available disk space. Every cluster usually requires a resource manager and job scheduler for optimal resource usage and distribution. UTHPC uses Simple Linux Utility for Resource Management (Slurm) to govern resource allocation and job scheduling.

To harness the available computational power the user submits a job to a cluster and from there Slurm checks if the required resources are currently available so that the jobs could start running immediately or if the job should go into a queue (chosen by the user based on the needs of the user) for later execution when the resources become available. Slurm also allows the user to monitor their active jobs status. Additionally, users can look up statistics about their finished jobs - how much resources the job required.

The next section covers submitting jobs in a how-to fashion. Supported by simple example scripts that UTHPC encourages you to copy to your working directory in Rocket cluster to get you started.

Submitting a simple job¶

Copy the following python script into your home directory with a name print_time.py:

import time

print ("Time before : %s" % time.ctime())
time.sleep(10)
print ("Time after  : %s" % time.ctime())

Copy and paste the following job script into your home directory under the name simple.job

UT accountETAIS account

#!/bin/bash
#SBATCH --job-name="Simple python example with SLURM"
#SBATCH --time=00:01:00
#SBATCH --mem-per-cpu=128MB
module load python
srun python ./print_time.py

#!/bin/bash
#SBATCH --job-name="Simple python example with SLURM"
#SBATCH --account="ealloc_905b0_something"
#SBATCH --time=00:01:00
#SBATCH --mem-per-cpu=128MB
module load python
srun python ./print_time.py

Note

ETAIS users can only submit jobs when they use the proper allocation. The allocation specifies which ETAIS organization and project is billed for the job.

You can get the information about which allocation name to use from the minu.etais.ee , by going to the appropriate organization's resource, where it says how to submit a job.

Submit the job to Slurm¶

Template command:

$ sbatch <job_script_name>

Example:

$ sbatch simple.job

After submitting a job with sbatch command Slurm returns one line output:

Submitted batch job <job_ID>

Example:

Submitted batch job 19110128

Job script contents¶

As you can see, the job script(usually referred to as a sbatch script) consists of 2 main sections:

Job parameters, where the lines start with #SBATCH
The actual commands, where the lines are parsed like a simple bash script.

Job parametersJob contents

All the parameter lines must start with #SBATCH and the job flags go after it. Some noteworthy examples are:

#SBATCH --time - specifies how long the job will run
#SBATCH --cpus-per-task - how many CPUS per task to use.
#SBATCH --mem - how much memory to allocate to the whole job
#SBATCH --partition - what partition to use

There are many more parameters to tune and different ways to divide resources. All of the avaliable parameters can be found at the SLURM sbatch documentation page .

Everything that that goes under the #SBATCH lines will be executed as a bash script inside the job allocation. Usually this part also gets divided to two parts:

Environment preparation. First you should set up your working directories, environment variables and so on. What you should definetly do here is load all the required modules, guide for that can be found here .
Actual job commands. Here you can insert your commands line by line to be executed. This part is entirely up to the user to write as the commands vary a lot between use cases.

Interactive and ad-hoc jobs¶

It is possible to run jobs straight from the command line without putting everything in a job script. This can be useful to test your commands, install software or even open intercative shells inside your job . This can be accomplished by using a command called srun. The guide also has a chapter about appending new srun commands to already running jobs.

srun [options] <job commands> is the default syntax to be used. The srun command also picks up your current environment. This means the path you are in, the modules loaded and so on.

Check job status¶

Use squeue command with -j flag to see job's status. The <job_ID> is numerical value:

$ squeue -j <job_ID>

Following the example:

$ squeue -j 19110621
JOBID     PARTITION  NAME     USER      ST  TIME  NODES   NODELIST(REASON)
19110621  main       Simple p andersm9  R   0:09  1       stage4

Note

If there is no output from the squeue command then it means the submitted job has already finished. And the output is in user's home directory.

After the job's finished, the job output appears in the user's home directory in the form of a file. The output file is distinguishable based on it's name. The output file has a prefix slurm- followed by the ’<job_ID>’ and suffixed with .out unless specified otherwise in the job parameters.

Template output name:

slurm-<job_ID>.out

Example's output file:

slurm-19110621.out

Requesting a partition¶

The default partition is ’main’. If you don't explicitly request a partition, your job runs in the default partition. To request a different partition, you must use the Slurm option --partition in your job script:

#SBATCH --partition=testing

To choose an appropriate partition that fits your needs, look info about Cluster Partitions

Parallel jobs¶

A parallel job can either run on multiple CPU cores on a single compute node, or on multiple CPU cores distributed over multiple compute nodes. With Slurm you can request tasks, and CPUs per tasks (meaning CPU cores for a task). Different tasks of a job allocation may run on different compute nodes, however, all threads belonging to a certain process are always executed on the same node.

Shared Memory Jobs (SMP) are parallel jobs that run on a single compute node. You can request a single task and a certain number of CPU cores for that task:

#SBATCH --cpus-per-task=2

MPI jobs are parallel jobs that may run over multiple compute nodes. You may request a certain number of tasks and certain number of nodes:

#SBATCH --nodes=2
#SBATCH --ntasks-per-node=16

Warning

The requested node, task and CPU resources must match. For instance, you request one node --nodes=1 and more tasks --ntasks-per-node than CPU cores are available on a single node in this particular partition. In this case you get an error message: sbatch: error: Batch job submission failed: Requested node configuration is not available.

Open MPI¶

Slurm is built with MPI support, which means that you don't need to specify the number of processes and the execution hosts using -np and the -hostfile options. Slurm automatically provides this information to mpirun based on the allocated tasks. The srun command works as a wrapper for mpirun so you can just use it instead in your Slurm scripts. Using srun this way has Slurm communicate all of its parameters to the OMPI program accurately:

#!/bin/bash
#SBATCH --mail-user=test.user@ut.ee
(...)
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=16

module load openmpi/4.1.0
srun <options> <binary>

Performance/Runtime optimization¶

When you have considered your job needs, selected the amount of resources and the job has started running you can check how much resources it's actually using. We have a site called elk, the guide for which can be found at the elk documentation page .

The goal would be to get the resource usage to 90-100% of the amount requested. This means that your job is using the resources allocated optimally at the best speed and best of all - nothing goes to waste.

When you see your jobs using less resources than requested, it would be best to cancel the job and restart it with less resources. This lowers your bill and opens more resources to other users.