Databases¶

There are three ways to using databases in UTHPC managed Kubernetes - a managed MySQL database, and two operator-based PostgreSQL database systems.

Database engine	Name	Deployment location	Resource profile	Billing	Notes
MySQL	MariaDB	Separate infrastructure	Shared resources	Separate, storage based	Cannot change versions.
PostgreSQL	CloudnativePG	Kubernetes	Dedicated resources	Kubernetes resource billing	Can easily change versions.
PostgreSQL	Zalando Operator	Kubernetes	Dedicated resources	Kubernetes resource billing	Can easily change versions.

Danger

You can always deploy your own databases as Deployment or Statefulset, but keep in mind, even Google raises some considerations about this.

MariaDB¶

CloudnativePG operator¶

CloudnativePG is the simpler option out of the two operators, as the initial setup is quicker. It supports managing PostgreSQL based database clusters.

Advantages	Disadvantages
Simpler deployment manifest.	Does not support automatic recovery from the WAL archive.
Easily supports WAL archiving and backups to S3.	Backups need to be scheduled beforehand.
`Deployment` based, so one `Pod` failing to start does not block other ones.	Not as configurable as some other options.
Has a management plugin , used together with `kubectl`.

Zalando operator¶

Zalando Operator is a bit more complicated, but allows for slightly more powerful configuration.

Advantages	Disadvantages
Supports automatic recovery from WAL archive, even when cluster is fully deleted.	Complicated deployment manifest.
Easily supports WAL archiving and backups to S3.	`Statefulset` based, so a failing "first" `Pod` causes other ones to not be scheduled.
Extremely configurable.	Sometimes confusingly configurable.

There's two example manifests provided by the developers of the Zalando Operator - a minimal manifest , and a complete cluster manifest .

Minimal cluster manifestManifest with WAL archiving

Minimal cluster manifest is available here: a minimal manifest

This manifest is not error tolerant in the sense, that if the database is deleted, there's no recovery from backups or WAL archival.

It is possible to configure the cluster to automatically replicate all operations on the database to an object storage. This is called WAL archiving, and allows for point-in-time restoration of backups. This archiving is almost real time, so even a total loss of a PostgreSQL cluster is mostly recoverable from without many lost transactions.

WAL archiving also speeds up the rebuild of replica instances, as they no longer have to pull the whole state from the master instance.

apiVersion: v1
kind: Secret
metadata:
  name: postgres-s3-backup
  namespace: <namespace>
data:
  s3_access_key_id: <access_key_id>
  s3_secret_access_key: <access_secret_key>
  s3_bucket: <bucket_name>
  s3_endpoint: <endpoint_url>
---
apiVersion: acid.zalan.do/v1
kind: postgresql
metadata:
  name: <name>
  namespace: <namespace>
spec:
  databases:
    <database>: <user>
    foo_database: foo_user # (1)!
  numberOfInstances: 2 # (2)!
  postgresql:
    version: "16"
  env: # (6)!
  - name: AWS_ENDPOINT
    valueFrom:
      secretKeyRef:
        name: postgres-s3-backup
        key: s3_endpoint
  - name: AWS_ACCESS_KEY_ID
    valueFrom:
      secretKeyRef:
        name: postgres-s3-backup
        key: s3_access_key_id
  - name: AWS_SECRET_ACCESS_KEY
    valueFrom:
      secretKeyRef:
        name: postgres-s3-backup
        key: s3_secret_access_key
  - name: WAL_S3_BUCKET
    valueFrom:
      secretKeyRef:
        name: postgres-s3-backup
        key: s3_bucket
  - name: USE_WALG_RESTORE
    value: "true"
  - name: BACKUP_SCHEDULE
    value: 00 01 * * *
  - name: CLONE_USE_WALG_RESTORE
    value: "true"
  resources:
    limits:
      cpu: "2"
      memory: "4Gi"
    requests:
      cpu: "500m"
      memory: "516Mi"
  teamId: <team> # (3)!
  users:
    <user>: # (4)!
      - <permissions>
    foo_user:
      - superuser
      - createdb
  volume:
    size: "5Gi" # (5)!

Define a database and user connection, in the format of <database>:<user>.
Number of replica instances. At least 2 is required to be node-failure tolerant.
This is the Zalando operator specific teams API, which is, for us, turned off. Just specify a descriptive name for your team here.
Specify the cluster-wide permissions for your user here. Which database a user belongs to was declared in point 1.
By default, the database is created on the Longhorn 3-times replicated storage system, so the recommendation is to be conservative with storage size. PostgreSQL tends to work much better on smaller database sizes.
This is the configuration for enabling S3 backups and WAL archival. The credentials are taken from the secret. If you do not have an S3 endpoint to use, you can request one from UTHPC.