Cluster partitions¶
Cluster partitions are theoretical separations of resources to allow for different use cases. This is necessary needed so different types of jobs don't interfere with each other. Partitions are like different queues inside the whole cluster - a separate queue for regular jobs, long jobs or GPU jobs. Partitions can also have different limits and properties
If no partition is specified in your job, it will run in the default partition main
.
Rocket cluster¶
If you want to use a specific partition, you need to specify the desired partition and time limit.
Partition | DefaultTime | MaxTimeLimit | Nodes | CPU cores1 | Comments |
---|---|---|---|---|---|
testing | 10 minutes | 2 hours | stage[1-8] | 160 | Only for short testing jobs. |
main | 10 minutes | 8 days | ares[1-20], artemis[1-20] | 5120 | Main job queue. |
long | 8 days | 30 days | sfr[9-12], bfr[3-4] | 240 | Long-running jobs. |
AMD | 10 minutes | 8 days | ares[1-20], artemis[1-20] | 5120 | High core density partition. |
Intel | 10 minutes | 8 days | sfr[1-12], bfr[1-4] | 640 | Partition with Intel architecture nodes. |
GPU | 10 minutes | 8 days | falcon[1-6], pegasus[1,2] | 384 | Utilising GPUs |
To see the partitions on the head node by issuing the command:
scontrol show partition
Important
By default, a job allocation uses 1 node with 1 CPU and 2 GB of memory. A user can have maximum of 1000 CPU cores at a time.
Note
You can see the individual limits with the command:
sacctmgr show association where user=<username>
-
The column "CPU cores" shows how many cores at maximum are available for all jobs using that particular partition. ↩