GPUs¶

UTHPC Kubernetes has several NVIDIA P100 GPUs inside the cluster, which are available for workloads. These P100 have are time-sliced for better handling of multiple workloads.

Recommendation is not to "sit" on a GPU persistently, but have a queue and worker system for spinning up workloads when it's not needed. This helps with controlling the cost, as billing happens for a whole GPU.

Cluster users can ask GPUs for their workloads like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cuda-example
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cuda-app
  template:
    metadata:
      labels:
        app: cuda-app
    spec:
      containers:
      - name: cuda-container
        image: "k8s.gcr.io/cuda-vector-add:v0.1"
        resources:
          limits: 
            nvidia.com/gpu: 1 # (1)
      tolerations: # (2)
      - key: nvidia.com/gpu
        operator: Exists
        effect: NoSchedule

This is the important part, which makes the GPU available for the pod, and makes sure the workload runs on a node with an existing and free GPU.
This is also important - this taint keeps non-GPU workflows off of the GPU machines, but GPU workflows need to tolerate this taint.