GPUs¶
UTHPC Kubernetes has several NVIDIA P100 GPUs inside the cluster, which are available for workloads. These P100 have are time-sliced for better handling of multiple workloads.
Recommendation is not to "sit" on a GPU persistently, but have a queue and worker system for spinning up workloads when it's not needed. This helps with controlling the cost, as billing happens for a whole GPU.
Cluster users can ask GPUs for their workloads like this:
apiVersion: apps/v1
kind: Deployment
metadata:
name: cuda-example
spec:
replicas: 1
selector:
matchLabels:
app: cuda-app
template:
metadata:
labels:
app: cuda-app
spec:
containers:
- name: cuda-container
image: "k8s.gcr.io/cuda-vector-add:v0.1"
resources:
limits:
nvidia.com/gpu: 1 # (1)
tolerations: # (2)
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
- This is the important part, which makes the GPU available for the pod, and makes sure the workload runs on a node with an existing and free GPU.
- This is also important - this taint keeps non-GPU workflows off of the GPU machines, but GPU workflows need to tolerate this taint.