Skip to content

Interactive jobs

Interactive jobs allow users to interact with applications in real-time within cluster. With these jobs, users can request resources in the cluster, and then use them to run commands or scripts directly via the command line instead of running a set of commands provided by a sbatch file.

This allows for a regular session on inside a job, which is mainly useful for testing or running quick small computations. The benefit of interactive jobs is that you can run commands live, without having to write every command down in the job file.

The downside is that if something happens to either your internet connection or the head node, then the job dies as well. On top of that, there is no automatic job retry, as the job doesn't know which commands to run again.

Requesting resources

You can request interactive sessions either via salloc or srun using the pseudo-terminal option --pty and specifying the requested shell. You can apply regular Slurm resource options as well. Do keep in mind, that if the requested partition is busy and has no immediate free space for a job, you might have to wait for your interactive session to start.

srun --partition=testing --time=60 --cpus-per-task=4 --pty /bin/bash

Output when using srun:

[user@login1 ~]$ srun --pty /bin/bash
srun: job 16841897 queued and waiting for resources
srun: job 16841897 has been allocated resources

[user@stage53 ~]$

Open an interactive shell inside a preexisting job

You can use srun to attach a command to a preexisting job. This is achieved by utilizing the --jobid flag. The srun commands that you run will utilize the same resources that the job has requested.

srun --jobid JOBID --pty bash

For example you have a job running that is using one gpu and you would like to see the nvidia-smi output in real time. The job id in this case is 123456 as an example.

[user@login1 ~]$ srun --jobid 123456 --pty bash
[user@falcon1 ~]$ nvidia-smi 
Mon Jun  5 10:45:15 2023       
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla V100-PCIE...  Off  | 00000000:09:00.0 Off |                  Off |
| N/A   36C    P0    35W / 250W |      0MiB / 32510MiB |      2%      Default |
|                               |                      |                  N/A |

| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|  No running processes found                                                 |

Last update: 2023-08-14
Created: 2022-04-28