https://github.com/thadeu/voodu-postgres

Last synced: 28 days ago
JSON representation
Host: GitHub
URL: https://github.com/thadeu/voodu-postgres
Owner: thadeu
Created: 2026-04-23T14:35:40.000Z (about 2 months ago)
Default Branch: main
Last Pushed: 2026-05-04T18:40:08.000Z (about 1 month ago)
Last Synced: 2026-05-04T20:38:41.342Z (about 1 month ago)
Language: Go
Size: 15 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # voodu-postgres

Voodu plugin that expands a `postgres { … }` HCL block into a production-ready postgres cluster: 1 primary + N streaming-replication standbys, auto-generated passwords, dual `DATABASE_URL` injection (writer + reader endpoints, RDS-style).

Bare block produces a single hardened primary; `replicas = 3` flips it into a cluster with `pg_basebackup`-cloned standbys and `primary_conninfo`-driven WAL streaming.

## Table of contents

- [Quick start](#quick-start)

- [Configuration](#configuration)

  - [HCL contract](#hcl-contract)

  - [Plugin defaults](#plugin-defaults)

  - [Customising postgresql.conf via `pg_config`](#customising-postgresqlconf-via-pg_config)

  - [Custom image (extensions baked)](#custom-image-extensions-baked)

  - [Statefulset passthrough](#statefulset-passthrough)

- [High availability — streaming replication](#high-availability--streaming-replication)

  - [Cluster shape](#cluster-shape)

  - [Cold-start sequence](#cold-start-sequence)

  - [Write vs read endpoints](#write-vs-read-endpoints)

  - [Promote a standby](#promote-a-standby)

  - [Rejoining the old primary](#rejoining-the-old-primary)

- [Backup automation](#backup-automation)

  - [`vd pg:backups` (Heroku-style)](#vd-pgbackups-heroku-style-recommended)

  - [Retention (`--keep` / `--max-age`)](#retention---keep----max-age)

  - [`vd postgres:backup` / `vd postgres:restore` (legacy)](#vd-postgresbackup--vd-postgresrestore-legacy-pg_basebackup)

- [Connecting via `psql`](#connecting-via-psql)

- [Real-world examples](#real-world-examples)

  - [Single primary + Rails app](#single-primary--rails-app)

  - [3-replica cluster + Rails MultiDB](#3-replica-cluster--rails-multidb)

  - [Custom image with pgvector](#custom-image-with-pgvector)

  - [External access for DBeaver / TablePlus](#external-access-for-dbeaver--tableplus)

- [Plugin reference](#plugin-reference)

  - [Commands](#commands)

  - [Asset files emitted](#asset-files-emitted)

  - [Statefulset env vars](#statefulset-env-vars)

  - [Bucket keys (config)](#bucket-keys-config)

  - [Repo layout](#repo-layout)

  - [Development](#development)

- [Install & upgrade](#install--upgrade)

- [Storage](#storage)

- [License](#license)

---

## Quick start

```hcl

# voodu.hcl

postgres "clowk-lp" "db" {

  image    = "postgres:16"

  replicas = 3                  # 1 primary + 2 standbys

}

```

Apply:

```bash

vd apply -f voodu.hcl

```

Plugin emits:

- **`asset "clowk-lp" "db"`** — 4 files: bash entrypoint wrapper, postgresql.conf overrides, streaming-replication conf, replication-user init script

- **`statefulset "clowk-lp" "db"`** — 3 pods (`db-0..db-2`), 1 volume claim (`data`, per-pod), 4 asset bind-mounts, host bind-mount `/opt/voodu/backups/clowk-lp/db` → `/backups`, env vars wiring everything

- **2 config_set actions** (first apply only) — persists auto-gen `POSTGRES_PASSWORD` + `POSTGRES_REPLICATION_PASSWORD`

Wire your app:

```bash

vd postgres:link clowk-lp/db clowk-lp/web --reads

# → DATABASE_URL    on web (primary)

# → DATABASE_READ_URL on web (read pool spanning standbys)

```

Inspect:

```bash

vd postgres:info clowk-lp/db

```

---

## Configuration

### HCL contract

Every field is optional. Defaults shown.

```hcl

postgres "scope" "name" {

  image    = "postgres:latest"            # operator typically pins major (postgres:16)

  replicas = 1                            # 1 = single primary; >=2 = cluster

  database = "postgres"                   # initdb arg

  user     = "postgres"                   # superuser

  port     = 5432                         # listen port

  password         = ""                   # default empty → auto-gen 32-byte hex

                                          # operator-set value lives in HCL plaintext

  replication_user = "replicator"         # separate role, REPLICATION attribute only

  initdb_locale   = "C.UTF-8"             # initdb --locale

  initdb_encoding = "UTF8"                # initdb --encoding

  # postgresql.conf overrides (key/value, anti-injection guarded)

  pg_config = {

    max_connections = 200

    shared_buffers  = "256MB"

    log_connections = true

  }

  # Extensions: parsed + validated, but NOT auto-installed (M-P4

  # ships `vd postgres:exec` for explicit install).

  extensions = ["pgvector", "pg_stat_statements"]

  # Statefulset passthrough — anything the statefulset accepts

  # flows through unchanged. See "Statefulset passthrough" below.

}

```

Print the complete spec at any time:

```bash

voodu-postgres defaults

```

### Plugin defaults

Bare block — `postgres "data" "db" {}` — emits a statefulset with:

- `image = "postgres:latest"` (override to pin major in prod)

- `replicas = 1` (single primary)

- `command = ["bash", "/usr/local/bin/voodu-postgres-entrypoint"]` (role-aware wrapper)

- `ports = ["5432"]` (loopback by default — `vd postgres:expose` to publish)

- `volume_claims`: `data` at `/var/lib/postgresql/data` (per-pod, statefulset-style)

- `volumes`: host bind-mount `/opt/voodu/backups//` → `/backups` (where `vd pg:backups:capture` writes)

- `health_check = "pg_isready -U postgres -d postgres -p 5432"`

- env: `POSTGRES_USER` / `POSTGRES_DB` / `POSTGRES_PASSWORD` / `POSTGRES_INITDB_ARGS` / `PGDATA` / `PG_PORT` / `PG_NAME` / `PG_SCOPE_SUFFIX` / `PG_PRIMARY_ORDINAL` / `PG_REPLICATION_USER` / `PG_REPLICATION_PASSWORD`

- `probes`: kubelet-style liveness + readiness wired up by default — see next section

### Default probes (since 0.13.0)

Bare `postgres "data" "db" {}` blocks ship with these probes pre-configured so the operator doesn't boot postgres blind. The values are tuned for cold-boot tolerance + serving-state precision; the canonical postgres `pg_isready` signal drives readiness so caddy / consumer apps automatically bypass standbys mid-`pg_basebackup` or primaries mid-WAL-replay.

| probe | selector | timing |

|---|---|---|

| liveness | `tcp_socket { port =  }` | `initial_delay = "20s"`, `period = "10s"`, `failure_threshold = 3` |

| readiness | `exec { command = ["pg_isready", "-U", "", "-d", "", "-p", ""] }` | `period = "5s"`, `failure_threshold = 1`, `success_threshold = 2` |

**Why these defaults:**

- **Liveness via TCP**: process-alive signal at the cheapest possible cost. `initial_delay = 20s` tolerates the 5-10s postgres takes to bind on cold start + slower disks; 3 consecutive 10s failures (~30s window) before docker restarts the container survives one-off GC pauses without crash-looping.

- **Readiness via `pg_isready`**: the strict "actually serving" signal. Returns exit 0 only when postgres accepts connections AND is in a serving state — NOT during startup recovery, NOT during `pg_basebackup` on standbys, NOT while WAL replay catches up. During those windows the upstream is bypassed by anything respecting the probe (caddy active health check, Rails connection pool retry, custom service discovery).

- **User + Database + Port from spec**: the readiness command is rendered with the operator-supplied values verbatim, so a `postgres "data" "db" { user = "appuser" database = "appdata" port = 5433 }` block produces a probe matching that authentication shape without manual configuration.

**Override:**

Declaring **any** `probes { ... }` block in the HCL replaces the default entirely (no sub-block merging — operator-declared partial probes do NOT inherit the plugin's defaults for the slots they didn't declare). If you want to override one probe and keep the other, redeclare both:

```hcl

postgres "data" "db" {

  probes {

    # Custom liveness (e.g. swap to http_get if you front postgres

    # with pgbouncer + an HTTP health endpoint)

    liveness {

      http_get {

        path = "/healthz"

        port = 8080

      }

      period = "15s"

    }

    # Redeclare the plugin default explicitly to keep it

    readiness {

      exec { command = ["pg_isready", "-U", "appuser", "-d", "appdata"] }

      period            = "5s"

      failure_threshold = 1

      success_threshold = 2

    }

  }

}

```

**Disabling probes entirely:** declare `probes {}` with all three sub-blocks (`liveness`, `readiness`, `startup`) omitted — voodu accepts an empty `probes` block and runs nothing.

**Upgrade note (from < 0.13.0):** the first `vd apply` after upgrading the plugin re-renders the statefulset spec with probes attached, which flips the spec hash and triggers one top-down rolling restart per ordinal. Per-pod data survives via the existing per-ordinal volume claims; the restart is cosmetic.

### Customising postgresql.conf via `pg_config`

The `pg_config` map renders into a `voodu-99-overrides.conf` file that postgres loads via `include_dir`. "Last assignment wins" — operator overrides trump plugin defaults from `voodu-50-streaming.conf`.

```hcl

postgres "clowk-lp" "db" {

  pg_config = {

    # tuning

    max_connections      = 200

    shared_buffers       = "1GB"

    work_mem             = "16MB"

    effective_cache_size = "3GB"

    random_page_cost     = 1.1     # SSD storage

    # logging

    log_connections    = true

    log_disconnections = true

    log_min_messages   = "warning"

    log_statement      = "ddl"

  }

}

```

Type rules:

| HCL type | Renders as |

|---|---|

| `int` / `int64` / whole `float` | unquoted integer |

| fractional `float` | unquoted decimal |

| `bool` | `on` / `off` (postgres convention) |

| `string` | `'...'` (single-quoted, embedded `'` escaped as `''`) |

Keys validated as `^[a-z][a-z0-9_]*$` — anti-injection guard so an operator-controlled key can't smuggle additional directives.

### Custom image (extensions baked)

For extensions not in the official `postgres:` image (pgvector, postgis), point `image` at a custom build:

```hcl

postgres "clowk-lp" "db" {

  image = "ghcr.io/clowk/pg16-pgvector:1.2.0"

}

```

Two ways to produce that image:

**(A) Pre-built and pushed to a registry** — your CI runs `docker build` + `docker push`. HCL just references the tag.

**(B) Inline build via voodu** — the controller's statefulset kind accepts `dockerfile` / `workdir` / `path` / `lang { }`. voodu apply streams your source over SSH, runs `docker build`, tags `clowk-lp-db:latest`, deploys:

```hcl

postgres "clowk-lp" "db" {

  workdir    = "infra/postgres"

  dockerfile = "Dockerfile.pg"

  replicas   = 3

  lang {

    name = "generic"

    build_args = {

      PG_MAJOR = "16"

    }

  }

}

```

`infra/postgres/Dockerfile.pg`:

```dockerfile

FROM postgres:16

RUN apt-get update && apt-get install -y postgresql-16-pgvector \

    && rm -rf /var/lib/apt/lists/*

```

The same image must work as primary AND standby (standbys clone primary via `pg_basebackup`, no separate image).

Extensions: enable inside the database via your app's migration system (Rails `enable_extension :pgvector`, Django RunSQL, etc.) OR with a one-off `vd postgres:psql  -c "CREATE EXTENSION IF NOT EXISTS pgvector"`.

### Statefulset passthrough

Anything the statefulset kind accepts flows through unchanged. Common cases:

```hcl

postgres "clowk-lp" "db" {

  image = "postgres:16"

  # extra env vars (operator-supplied, merged with plugin's)

  env = {

    TZ        = "America/Sao_Paulo"

    PGAPPNAME = "clowk-lp"

  }

  # config from shared buckets (e.g. shared AWS creds)

  env_from = ["aws/cli", "monitoring/secrets"]

  # custom health check (overrides plugin's pg_isready default)

  health_check = "pg_isready -U appuser -d appdata -p 5432"

  # kernel-level CPU/memory caps via cgroups

  resources {

    limits {

      cpu    = "2"      # 2 cpus (or "500m" for 0.5 cpu, k8s-style millicores)

      memory = "4Gi"    # 4 GiB (binary; "4G" for decimal SI; "1024" plain bytes)

    }

  }

  # build-mode (instead of image = "...") — Dockerfile + workdir + lang { }

  # see "Custom image (extensions baked)" below

}

```

Plugin-owned fields (`database`, `user`, `password`, `port`, `initdb_locale`, `initdb_encoding`, `pg_config`, `extensions`, `replication_user`) are stripped from the merged spec before emitting — they don't leak to the statefulset wire shape.

#### Resource limits

`resources { limits { ... } }` translates to `docker run --cpus= --memory=` on every replica pod. Two layers of constraint apply to postgres:

1. **Container-level (this block)** — kernel cap via cgroups. OOM-kills the postgres process if it exceeds memory limit. Sane budget = host RAM × 0.7 ÷ replicas, leaving headroom for other workloads.

2. **App-level (`pg_config`)** — postgres-internal allocations. Should be SMALLER than the container cap so postgres self-limits before the kernel kills it.

Recommended pairing for a `memory = "4Gi"` container limit:

```hcl

resources {

  limits {

    cpu    = "2"

    memory = "4Gi"

  }

}

pg_config = {

  shared_buffers       = "1GB"     # ~25% of container limit

  effective_cache_size = "3GB"     # ~75% of container limit

  work_mem             = "16MB"    # per-query, watch for high concurrency

  max_connections      = 200

}

```

Value formats accepted (k8s parity):

| Type | Form | Examples |

|---|---|---|

| CPU | decimal | `"2"`, `"1.5"`, `"0.25"` |

| CPU | millicores | `"500m"` (= 0.5), `"100m"` (= 0.1) |

| Memory | binary (1024-based, preferred) | `"4Gi"`, `"512Mi"`, `"256Ki"` |

| Memory | decimal SI (1000-based) | `"4G"`, `"500M"`, `"1500K"` |

| Memory | plain bytes | `"4294967296"` |

Omit `resources { }` entirely or leave individual fields empty for "no limit" — docker daemon defaults apply (effectively unlimited until host RAM is exhausted).

---

## High availability — streaming replication

### Cluster shape

```hcl

postgres "clowk-lp" "db" {

  image    = "postgres:16"

  replicas = 3

}

```

Produces 3 pods with stable identity:

| Pod | Role | DNS | Per-pod data volume |

|---|---|---|---|

| `clowk-lp-db.0` | primary | `db-0.clowk-lp.voodu` | `voodu-clowk-lp-db-data-0` |

| `clowk-lp-db.1` | standby | `db-1.clowk-lp.voodu` | `voodu-clowk-lp-db-data-1` |

| `clowk-lp-db.2` | standby | `db-2.clowk-lp.voodu` | `voodu-clowk-lp-db-data-2` |

Plus the round-robin shared alias `db.clowk-lp.voodu` resolving to all 3 pods (use sparingly — primary writes need pod-0 specifically).

Streaming config baked into `voodu-50-streaming.conf`:

```ini

hot_standby      = on

max_wal_senders  = 10

primary_conninfo = 'host=db-0.clowk-lp.voodu port=5432 user=replicator password= application_name=voodu-postgres-standby'

wal_keep_size    = '1GB'

```

No replication slots — `wal_keep_size = 1GB` is more predictable. Standbys that fall behind beyond 1GB get re-cloned via `pg_basebackup` on next pod restart (wrapper detects empty PGDATA and re-runs basebackup).

### Cold-start sequence

```

t=0  pod-0 (primary) boots

     → official docker-entrypoint.sh detects empty PGDATA → initdb

     → /docker-entrypoint-initdb.d/00_create_replication.sh runs:

        - CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD '...'

        - append `host replication replicator all scram-sha-256` to pg_hba.conf

     → postgres ready

t=2s pod-1, pod-2 (standbys) boot in parallel

     → wrapper detects ORDINAL != PRIMARY_ORDINAL

     → loops on pg_isready waiting for db-0

     → pg_basebackup -h db-0.clowk-lp.voodu -X stream → clones data dir

     → touch standby.signal (postgres comes up in recovery mode)

     → docker-entrypoint.sh sees PG_VERSION exists → skips initdb

     → postgres starts, reads primary_conninfo from voodu-50-streaming.conf

     → connects to primary, starts streaming WAL

```

Standby retry: if primary isn't ready (race during first cluster apply), the wrapper retries `pg_isready` up to 60 times × 5s sleep (~5min budget) before giving up. In practice the primary's initdb finishes well within that window.

### Write vs read endpoints

`vd postgres:link   --reads` emits two URLs on the consumer:

| Var | Targets | Use case |

|---|---|---|

| `DATABASE_URL` | `db-0.scope.voodu` (primary, ordinal 0) | All writes |

| `DATABASE_READ_URL` | `db-1.scope.voodu,db-2.scope.voodu,...` (libpq multi-host) | Read queries |

`DATABASE_READ_URL` uses postgres's native multi-host syntax with `target_session_attrs=any` — libpq tries hosts left-to-right; clients can shuffle the list for round-robin distribution.

Without `--reads`, only `DATABASE_URL` is emitted (single primary endpoint). Apps without read-replica logic just use that.

### Promote a standby

`vd postgres:promote` is a 2-step manual failover. Plugin owns ALL the postgres-side SQL — operator never types it:

```bash

# Step 1: PROMOTE — plugin runs lag check + pg_promote() + flips

#         PG_PRIMARY_ORDINAL + refreshes DATABASE_URL on every

#         linked consumer + rolling restart.

vd postgres:promote clowk-lp/db --replica 1

# Step 2: REJOIN — recover the OLD primary as a standby of the new one.

#         Required because the rolling restart hits pod-0 with

#         PG_PRIMARY_ORDINAL=1 but PGDATA still in primary state — the

#         wrapper's split-brain guard catches it and refuses to start.

vd postgres:rejoin clowk-lp/db --replica 0

```

What `promote` does internally:

1. **Lag check** — queries `pg_stat_replication` on the current primary. Refuses if any standby is behind (`max_lag_bytes > 0`) unless `--force` is passed.

2. **pg_promote** — runs `SELECT pg_promote(true, 60)` inside the target container. Postgres exits recovery mode and becomes the new primary.

3. **Wait** — polls `pg_is_in_recovery()` until it returns `false` (max 30s).

4. **Bucket flip** — sets `PG_PRIMARY_ORDINAL=N` so the next expand re-renders `streaming.conf` with the new primary FQDN.

5. **Refresh consumers** — every linked consumer's `DATABASE_URL[/_READ_URL]` is rewritten to point at the new primary; rolling restart picks up the change.

Flags:

- `--force` — promote despite replication lag (operator accepts data loss). Required when the current primary is unreachable (lag check itself would error).

- `--no-restart` — flip the bucket + URLs but skip the rolling restart on the postgres pods. Useful for staged promotions (verify, then restart on demand).

Legacy alias: `vd postgres:failover` still works (same handler), kept for backward-compat with scripts using the old name.

### Rejoining the old primary

`vd postgres:rejoin` runs `pg_rewind` against the current primary inside a one-shot container that shares the target pod's data volume (`docker run --volumes-from`). Steps:

1. `docker stop ` — pg_rewind requires the target offline

2. `docker run --rm --volumes-from  postgres: pg_rewind --target-pgdata=… --source-server="host= user=replicator …"`

3. `touch standby.signal` in the data dir

4. `docker start ` — pod boots as standby, picks up streaming from the new primary

**When `pg_rewind` fails**: divergence too large (WAL recycled past the divergence point) yields `could not find previous WAL record at…`. Fallback:

```bash

vd delete clowk-lp/db --replica 0 --prune    # wipes the data volume

vd apply -f voodu.hcl                         # wrapper bootstraps fresh standby via pg_basebackup

```

The wrapper script's split-brain guard prevents accidental writes during this window — if a pod is configured as standby (`ORDINAL != PG_PRIMARY_ORDINAL`) but PGDATA was last used as primary (no `standby.signal`), it exits 1 with a clear message pointing at `vd postgres:rejoin`.

---

## Backup automation

### `vd pg:backups` (Heroku-style, recommended)

`pg_dump` snapshots managed by the plugin. The host directory `/opt/voodu/backups///` is bind-mounted at `/backups` inside every pod; backup files live there as `bNNN-.dump` (custom format, restorable via `pg_restore`).

#### Pre-flight (once per host)

```bash

sudo mkdir -p /opt/voodu/backups/clowk-lp/db

# chown is handled by the entrypoint wrapper at boot — no manual chown needed

```

#### Capture a snapshot

`pg:backups:capture` is **detached by default** — it spawns a sibling Docker container that runs `pg_dump` and returns the new backup ID immediately. The dump survives SSH drops, terminal closure, and Ctrl-C of the `vd` command.

```bash

# Default: detached, returns ID immediately

vd pg:backups:capture clowk-lp/db

# postgres clowk-lp/db: backup b008 capturing in background (container clowk-lp-db-backup-b008, source: ordinal 0). track via: vd pg:backups clowk-lp/db  |  vd pg:backups:logs clowk-lp/db b008 --follow

# Foreground with rolling progress

vd pg:backups:capture clowk-lp/db --follow

# From a standby (offloads work) + follow

vd pg:backups:capture clowk-lp/db --from-replica 1 --follow

```

Output with `--follow`:

```

capture: pg_dump → /backups/b008-20260505T143000Z.dump (db=postgres, user=postgres; raw size 487 MB, estimated dump ~243 MB, container clowk-lp-db-backup-b008)

  [3s] 12.0 MB written (~5% of estimate)

  [10s] 68.0 MB written (~28% of estimate)

  [25s] 156.0 MB written (~64% of estimate)

  [45s] 218.0 MB written (~90% of estimate)

done: 251.3 MB in 52s

postgres clowk-lp/db: backup b008 captured (b008-20260505T143000Z.dump, 251.3 MB, source: ordinal 0, elapsed 52s)

```

The `% of estimate` is a **hint, not a real percent**. pg_dump doesn't expose a true progress signal — the plugin queries `pg_database_size()` upfront and assumes ~50% compression (varies wildly with data shape). When the estimate is clearly wrong (>200%), the suffix is dropped silently and you just see bytes-written + elapsed. The trustworthy numbers are ` MB written` and `[s]`.

#### Container shape (capture internals)

Each capture spawns a one-shot container named `--backup-bNNN` that:

- Reuses the source pod's image (so `pg_dump` is available)

- Shares the source pod's network namespace via `--network container:` (postgres reachable at `127.0.0.1`)

- Bind-mounts the same `/opt/voodu/backups//` host path at `/backups`

- Runs `pg_dump -F c -Z 6 -f /backups/`; on non-zero exit, removes the partial file

Auto-prune: each `:capture` removes any **exited** sibling backup containers from prior runs (free up the bNNN slot for re-use after `:delete`). Running ones are never touched — concurrent captures with different IDs work fine.

> **Roadmap:** Once the controller exposes a runtime-dispatch action (`apply_manifest`), capture will migrate internally to emit a voodu `job` manifest instead of running `docker run` directly. UX stays identical; plumbing gets cleaner. See [the controller proposal](#) for status.

#### List (with status)

```bash

vd pg:backups clowk-lp/db

# === Backups for clowk-lp/db

#

#   ID     Status     Created at            Size

#   b007   complete   2026-05-04T02:00:00Z  245.3 MB

#   b008   running    2026-05-05T14:30:00Z  87.0 MB

# JSON for jq pipelines

vd pg:backups clowk-lp/db -o json | jq '.backups[] | {id, status, size_bytes}'

```

The list merges two data sources:

- **Files** in `/opt/voodu/backups///` (read via host bind-mount — works without a running pod) → `complete` captures.

- **Backup containers** matched via `docker ps -a --filter name=--backup-` → in-progress (`running`) and recently-failed (`failed`).

Status reflects which side a given `bNNN` appears on:

| Status | Container | File | Meaning |

|---|---|---|---|

| `running` | running | partial / absent | capture in flight |

| `complete` | exited 0 (or pruned) | present | done, restorable |

| `failed` | exited != 0 | absent (wrapper removed) | needs investigation via `pg:backups:logs` |

#### Stream live logs

```bash

# One-shot read of accumulated container output

vd pg:backups:logs clowk-lp/db b008

# Tail in real-time (blocks until the container exits)

vd pg:backups:logs clowk-lp/db b008 --follow

```

Wraps `docker logs [-f] --backup-bNNN`. Works for `running`, `complete`, and `failed` captures — until the next `:capture` auto-prunes exited siblings.

#### Restore from a backup

```bash

# Local backup ID

vd pg:backups:restore clowk-lp/db b007 --yes

# http(s) URL — e.g. an S3 presigned URL pointing at a pg_dump file

vd pg:backups:restore clowk-lp/db https://my-bucket.s3.amazonaws.com/db.dump?... --yes

```

For URL inputs, the plugin downloads the file to a host temp dir, `docker cp`s it into `/tmp/.dump` inside the primary, runs `pg_restore`, then cleans up both sides.

Runs `pg_restore --clean --if-exists --no-owner --no-privileges` against the primary's database. **Destructive of database content** (drops + recreates every object the dump touches), but **non-destructive of the cluster** — pods stay up, standbys replicate the restored state via streaming WAL.

Compared to the legacy `vd postgres:restore`:

| | `pg:backups:restore` (Heroku-style) | `postgres:restore` (legacy) |

|---|---|---|

| Source | `pg_dump` custom format (in `/backups/`) | `pg_basebackup` tar (operator-supplied path) |

| Destructive of | database content (objects in the dump) | entire cluster (PGDATA on every pod) |

| Cluster downtime | none — primary stays up | full — every pod stopped, wiped, restarted |

| Cross-version | yes (pg_dump portable) | no (pg_basebackup is exact bit copy) |

| Cross-host | yes (pg_dump portable) | no (same major version + SSL state) |

Pick `pg:backups:restore` for routine "rollback to last night" scenarios; `postgres:restore` for "salvage from a tar I have on a laptop after total host loss."

#### Download a backup file

Ship a backup off-host (laptop, S3 via host CLI, another voodu host):

```bash

vd pg:backups:download clowk-lp/db b007

# → ./b007-20260507T020000Z.dump in CWD

vd pg:backups:download clowk-lp/db b007 --to /srv/archive/db.dump

```

Bytes are copied verbatim via `docker cp`; the resulting file is a portable `pg_dump` custom format ready for `pg_restore` anywhere.

#### Delete a backup

```bash

vd pg:backups:delete clowk-lp/db b007 --yes

```

Removes `/backups/` inside the pod (host bind-mount → file disappears from disk). Sequence IDs **don't** renumber — deleting `b005` leaves a gap, next capture is `b008` (max+1 of remaining). Operator's responsibility to ensure no off-host sync (S3, rsync) is mid-flight against the file.

#### Schedule (paste-and-apply HCL helper)

`pg:backups:schedule` prints the cronjob block ready to paste into `voodu.hcl`:

```bash

vd pg:backups:schedule clowk-lp/db --at "0 3 * * *" --from-replica 1

```

```hcl

# Add this to your voodu.hcl, then run `vd apply`:

cronjob "clowk-lp" "db-backup" {

  schedule = "0 3 * * *"

  image    = "ghcr.io/clowk/voodu-cli:latest"

  command = ["bash", "-c", "vd pg:backups:capture clowk-lp/db --from-replica 1"]

}

# To remove the schedule later: delete the block + run `vd apply --prune`.

```

`pg:backups:schedule` itself emits no manifests and writes nothing — it's purely a templating helper. Voodu cronjobs are declarative (HCL is the source of truth); managed-schedule state plugin-side would diverge from HCL on the next `vd apply --prune`.

Combine with off-host sync (rclone, aws s3 sync, rsync) in a sibling cronjob mounting the same host path read-only.

#### Cancel an in-progress capture

```bash

# Stop everything in flight for clowk-lp/db

vd pg:backups:cancel clowk-lp/db

# postgres clowk-lp/db: stopped 1 capture

# Stop a specific capture

vd pg:backups:cancel clowk-lp/db b008

```

Runs `docker stop` on the matching `--backup-bNNN` containers. The container's `pg_dump` receives SIGTERM (10s grace), then SIGKILL. Postgres detects the dropped connection and rolls back — non-destructive of the cluster.

The container's wrapper script removes the partial dump file on non-zero exit, so a cancelled capture leaves no garbage in `/backups/`.

#### Retention (`--keep` / `--max-age`)

Backups grow unboundedly without intervention — every `:capture` writes a new `.dump` and leaves the exited job container behind. On a tight host (e.g. dev/test on a single 80GB HD) you want both:

- a **count cap** so the most recent N captures stay around regardless of cadence, and

- an **age cap** so really old snapshots get reaped even if you've barely been capturing.

Two CLI flags cover both axes; the plugin auto-prunes after each capture and on demand via `:prune`. Both flags accept the same shapes everywhere they appear.

**Flags:**

| Flag | Meaning | Examples |

|---|---|---|

| `--keep ` | Keep at most N most-recent backups | `--keep 30` |

| `--max-age ` | Drop anything older than D | `--max-age 7d`, `--max-age 2w`, `--max-age 168h` |

| `--retention ` | Alias for `--max-age` | same as above |

A backup is deleted if it fails **either** check (rank > N **or** age > D). Both flags are optional — pass none = no auto-prune.

**Persisted defaults via the config bucket:**

The plugin seeds `BACKUP_KEEP=30` on first apply. Subsequent `:capture` invocations pick this up automatically — no flags needed:

```bash

# First apply seeds BACKUP_KEEP=30 in the bucket

vd apply

# This single command captures + auto-prunes to 30 newest

vd pg:backups:capture clowk-lp/db

# postgres clowk-lp/db: backup b031 captured ...

# retention (keep 30): pruned 1 backup(s): b001

```

Override the persisted policy any time:

```bash

# Lower cap for a specific cluster

vd config clowk-lp/db set BACKUP_KEEP=14

# Add age cap on top

vd config clowk-lp/db set BACKUP_MAX_AGE=14d

# Disable auto-prune entirely (capture stops touching old files)

vd config clowk-lp/db set BACKUP_KEEP=0

vd config clowk-lp/db set BACKUP_MAX_AGE=

```

Precedence per axis:

1. **CLI flag** (`--keep`, `--max-age`) — wins for that axis only

2. **App-level bucket** (`vd config / set ...`) — per-resource

3. **Scope-level bucket** (`vd config  set ...`) — shared default for every resource in the scope

4. **Built-in default** — empty (no auto-prune); plugin seeds `BACKUP_KEEP=30` only at first apply, app-level

Per-axis means `--keep 5` for a one-off doesn't blow away the bucket's `BACKUP_MAX_AGE=14d` — only the count axis gets overridden for that invocation.

**Shared defaults via scope-level config:**

The controller's `ResolveConfig` merges **scope-level + app-level** (app wins on conflict) and the plugin reads through this — no `env_from` plumbing needed. Set the policy once for an entire scope:

```bash

# All postgres resources in scope `clowk-lp` inherit these

vd config clowk-lp set BACKUP_KEEP=14

vd config clowk-lp set BACKUP_MAX_AGE=7d

# Override one specific cluster

vd config clowk-lp/db set BACKUP_KEEP=30

```

> **Why not `env_from`:** the plugin reads policy via `GET /config` (controller's bucket store) BEFORE the capture container is spawned. `env_from` is a runtime container-env injection mechanism — different code path. Use scope-level config for shared defaults.

**On-demand prune (`:prune`):**

```bash

# Preview without touching disk — uses bucket defaults if no flags

vd pg:backups:prune clowk-lp/db --dry-run

# Tighter one-off

vd pg:backups:prune clowk-lp/db --keep 5 --yes

# Both axes, explicit

vd pg:backups:prune clowk-lp/db --keep 14 --max-age 7d --yes

```

`:prune` reads the bucket the same way as `:capture`. With `BACKUP_KEEP=30` already set, `vd pg:backups:prune clowk-lp/db --yes` is enough — no flags needed.

`--yes` is required for destructive runs; `--dry-run` skips the confirmation gate AND the disk operations, useful for cron previewing.

**What gets pruned:**

For each backup that fails the policy:

1. The `.dump` file in `/opt/voodu/backups///` is removed.

2. The matching exited backup container (from `docker ps -a`) is `docker rm`'d so `vd get pd` stays clean.

Running containers are always skipped — never reap an in-flight capture.

**Schedule with retention inlined:**

`:schedule` accepts the same flags and inlines them into the cronjob's command, so the cron-driven captures self-trim:

```bash

vd pg:backups:schedule clowk-lp/db --at "0 3 * * *" --keep 30 --max-age 14d

```

```hcl

cronjob "clowk-lp" "db-backup" {

  schedule = "0 3 * * *"

  image    = "ghcr.io/clowk/voodu-cli:latest"

  command = ["bash", "-c", "vd pg:backups:capture clowk-lp/db --keep 30 --max-age 14d"]

}

```

You can also rely on the bucket-persisted defaults and skip the flags here — the cronjob will pick up `BACKUP_KEEP` / `BACKUP_MAX_AGE` from the config bucket the same way an interactive `:capture` does.

#### Off-site durability

```hcl

cronjob "clowk-lp" "db-backups-s3-sync" {

  schedule = "*/30 * * * *"

  image    = "amazon/aws-cli:latest"

  command  = ["s3", "sync", "/backups/", "s3://my-bucket/postgres/clowk-lp-db/"]

  volumes  = ["/opt/voodu/backups/clowk-lp/db:/backups:ro"]

  env_from = ["aws/cli"]

}

```

Backup files are immutable once written, so `s3 sync` never re-uploads.

### `vd postgres:backup` / `vd postgres:restore` (legacy, pg_basebackup)

The original commands run `pg_basebackup` (physical snapshot) instead of `pg_dump` (logical dump). They still work and are the fastest path for full-cluster restores, but require operator-supplied paths and don't integrate with the `/backups` directory.

```bash

# pg_basebackup tar to operator-chosen path

vd postgres:backup clowk-lp/db --destination /srv/backups/db-20260504.tar

# Restore (DESTRUCTIVE — wipes every pod's PGDATA)

vd postgres:restore clowk-lp/db --from /srv/backups/db.tar --yes

```

Output is a tar with `base.tar` + `pg_wal.tar` (`-X stream` includes WAL needed to make the backup self-consistent). Point-in-snapshot only — no PITR.

> **`pg:backups:restore` / `:download` / `:delete` / `:schedule` will replace this surface in the next iteration.**

---

## Real-world examples

### Single primary + Rails app

```hcl

# voodu.hcl

postgres "clowk-lp" "db" {

  image    = "postgres:16"

  database = "appdata"

  user     = "appuser"

}

deployment "clowk-lp" "web" {

  image = "ghcr.io/clowk/web:latest"

  env = {

    RAILS_ENV = "production"

  }

  ports = ["3000"]

}

```

```bash

vd apply -f voodu.hcl

vd postgres:link clowk-lp/db clowk-lp/web

# Rails web pod now has DATABASE_URL set:

# postgres://appuser:@db-0.clowk-lp.voodu:5432/appdata

```

### 3-replica cluster + Rails MultiDB

```hcl

postgres "clowk-lp" "db" {

  image    = "postgres:16"

  database = "appdata"

  user     = "appuser"

  replicas = 3                        # 1 primary + 2 standbys

}

deployment "clowk-lp" "web" {

  image = "ghcr.io/clowk/web:latest"

  ports = ["3000"]

}

```

```bash

vd apply -f voodu.hcl

vd postgres:link clowk-lp/db clowk-lp/web --reads

# web now has both:

# DATABASE_URL      = postgres://...@db-0.clowk-lp.voodu:5432/appdata

# DATABASE_READ_URL = postgres://...@db-1.clowk-lp.voodu:5432,db-2.clowk-lp.voodu:5432/appdata?target_session_attrs=any

```

`config/database.yml`:

```yaml

production:

  primary:

    url: <%= ENV["DATABASE_URL"] %>

  primary_replica:

    url: <%= ENV["DATABASE_READ_URL"] %>

    replica: true

```

`app/models/application_record.rb`:

```ruby

class ApplicationRecord < ActiveRecord::Base

  primary_abstract_class

  connects_to database: { writing: :primary, reading: :primary_replica }

end

```

### Custom image with pgvector

```hcl

postgres "clowk-lp" "db" {

  workdir    = "infra/postgres"

  dockerfile = "Dockerfile.pg"

  replicas   = 3

  lang {

    name = "generic"

  }

}

```

`infra/postgres/Dockerfile.pg`:

```dockerfile

FROM postgres:16

RUN apt-get update \

 && apt-get install -y postgresql-16-pgvector \

 && rm -rf /var/lib/apt/lists/*

```

```bash

vd apply -f voodu.hcl

# voodu streams the build context, runs `docker build`, tags

# clowk-lp-db:latest, deploys to the 3 pods.

# Enable in your app's migrations:

# Rails:  enable_extension :pgvector

# Django: from pgvector.django import VectorField

# OR inline:

vd postgres:psql clowk-lp/db -c "CREATE EXTENSION IF NOT EXISTS pgvector"

```

### External access for DBeaver / TablePlus

```bash

vd postgres:expose clowk-lp/db

# postgres clowk-lp/db now exposed on 0.0.0.0: — pod restart triggered.

# Reminder: rotate password (vd postgres:new-password clowk-lp/db) if it

# ever leaked anywhere untrusted.

# Get the password

PW=$(vd config clowk-lp/db get POSTGRES_PASSWORD -o json | jq -r .POSTGRES_PASSWORD)

# Connect from your laptop

psql "postgres://postgres:$PW@my-vm-ip:5432/postgres"

# Done? un-expose

vd postgres:unexpose clowk-lp/db

```

⚠ Verify your firewall (ufw, security group, iptables) before exposing. The plugin flips the bind from `127.0.0.1` to `0.0.0.0`; the host firewall still controls who can reach the port from outside.

---

## Plugin reference

### Commands

| Command | Purpose |

|---|---|

| `vd postgres:link   [--reads]` | Wire `DATABASE_URL` (and optionally `DATABASE_READ_URL`) into a consumer |

| `vd postgres:unlink  ` | Remove the link |

| `vd postgres:new-password  [--no-restart]` | Rotate superuser password + auto-refresh every linked consumer |

| `vd postgres:info  [-o json]` | Cluster topology snapshot (text or JSON) |

| `vd postgres:expose ` | Publish on 0.0.0.0:`` (Internet-facing) |

| `vd postgres:unexpose ` | Return to 127.0.0.1:`` (loopback only) |

| `vd postgres:promote  --replica  [--force] [--no-restart]` | Promote a standby to primary (plugin runs `pg_promote()` internally; refuses on lag without `--force`) |

| `vd postgres:rejoin  --replica ` | Re-attach a divergent pod as standby via `pg_rewind` |

| `vd postgres:psql  [--replica N] [-c ""]` | Drop into psql against the cluster (no password needed) |

| `vd pg:backups  [-o json]` | List backups + status (running/complete/failed) |

| `vd pg:backups:capture  [--from-replica N] [--follow] [--keep N] [--max-age D]` | Spawn pg_dump in a sibling container (detached default); auto-prunes after success when retention flags or `BACKUP_KEEP`/`BACKUP_MAX_AGE` set |

| `vd pg:backups:logs   [--follow]` | docker logs on a backup container |

| `vd pg:backups:restore   --yes` | `pg_restore` from local id or http(s) URL (DESTRUCTIVE of db content) |

| `vd pg:backups:download   [--to ]` | Copy a backup file from the pod to the host |

| `vd pg:backups:delete   --yes` | Remove a backup file from `/backups/` |

| `vd pg:backups:prune  [--keep N] [--max-age D] [--dry-run] [--yes]` | Apply retention on demand (file + container cleanup) |

| `vd pg:backups:schedule  [--at ] [--from-replica N] [--keep N] [--max-age D]` | Print cronjob HCL template for paste-and-apply (retention inlined into the captured command) |

| `vd pg:backups:cancel  []` | docker stop running capture(s) |

| `vd postgres:backup  --destination  [--from-replica N]` | (legacy) `pg_basebackup` snapshot to a tar file |

| `vd postgres:restore  --from  --yes` | (legacy) Restore PGDATA from a tar (DESTRUCTIVE of cluster) |

| `vd postgres:help` | Plugin overview |

Pass `--help` to any subcommand for full usage.

### Asset files emitted

The plugin emits one asset per postgres resource with 4 files:

| Key | Mount path | Loaded by |

|---|---|---|

| `entrypoint` | `/usr/local/bin/voodu-postgres-entrypoint` | `command = ["bash", ...]` |

| `pg_overrides_conf` | `/etc/postgresql/voodu-99-overrides.conf` | postgres `include_dir` (operator pg_config) |

| `streaming_conf` | `/etc/postgresql/voodu-50-streaming.conf` | postgres `include_dir` (primary_conninfo, hot_standby) |

| `init_replication_sh` | `/docker-entrypoint-initdb.d/00_create_replication.sh` | docker-entrypoint.sh on first boot of primary only |

The `voodu-NN-` prefixes order the include_dir scan: plugin defaults (50) load before operator overrides (99). "Last config wins" → operator pg_config trumps plugin defaults cleanly.

### Statefulset env vars

| Var | Source | Used by |

|---|---|---|

| `POSTGRES_USER` / `POSTGRES_DB` / `POSTGRES_PASSWORD` | plugin (HCL + auto-gen) | docker-entrypoint.sh (initdb) |

| `POSTGRES_INITDB_ARGS` | plugin (`--locale=... --encoding=...`) | docker-entrypoint.sh |

| `PGDATA` | plugin (`/var/lib/postgresql/data/pgdata`) | postgres |

| `PG_PORT` | plugin (HCL `port`) | wrapper script (`-p`) |

| `PG_NAME` / `PG_SCOPE_SUFFIX` | plugin (resource ref) | wrapper script (FQDN composition) |

| `PG_PRIMARY_ORDINAL` | plugin (default 0; M-P5 will flip via failover) | wrapper script (role detection) |

| `PG_REPLICATION_USER` / `PG_REPLICATION_PASSWORD` | plugin (HCL + auto-gen) | wrapper script (`pg_basebackup`), init script (`CREATE USER`) |

| `VOODU_REPLICA_ORDINAL` | controller (per-pod) | wrapper script (role detection) |

### Bucket keys (config)

| Key | Owner | Purpose |

|---|---|---|

| `POSTGRES_PASSWORD` | plugin auto-gen | Superuser password — read by `link`, rotated by `new-password` |

| `POSTGRES_REPLICATION_PASSWORD` | plugin auto-gen | Replication user password — read by streaming.conf renderer + `rejoin` |

| `POSTGRES_LINKED_CONSUMERS` | `vd postgres:link/unlink` | Comma-separated `/` refs for `new-password`/`failover` fan-out |

| `PG_EXPOSE_PUBLIC` | `vd postgres:expose/unexpose` | `"true"` flips ports to 0.0.0.0; absent = loopback |

| `PG_PRIMARY_ORDINAL` | `vd postgres:failover` | Current primary ordinal (default `0`); flipped by failover, read by wrapper script + streaming.conf renderer |

| `BACKUP_KEEP` | seeded by plugin (default `30`); operator-tunable | Cap on backup count for auto-prune. `0` / empty disables count-based pruning. Read by `:capture` and `:prune`. |

| `BACKUP_MAX_AGE` | operator | Cap on backup age for auto-prune. Accepts `7d`, `2w`, `168h`. Empty disables age-based pruning. Read by `:capture` and `:prune`. |

Operator can read all of these via `vd config `. Setting them manually before first apply pre-seeds (useful for dev environments wanting deterministic passwords).

**Scope-level inheritance:** any of these keys can be set at the scope level (`vd config  set KEY=VAL` — no `/`) and the plugin will pick them up. App-level (`vd config / set ...`) wins on conflict. Useful for shared defaults like `BACKUP_KEEP`/`BACKUP_MAX_AGE` across every postgres in a scope without touching each resource bucket.

**`env_from` does NOT apply to plugin reads.** `env_from` injects vars into the container's runtime environment; plugins read through the controller's `GET /config` (etcd store). For shared defaults, use scope-level config above.

### Repo layout

```

cmd/voodu-postgres/

  main.go               # entrypoint (subcommand switch)

  postgres.go           # postgresSpec parser/validator + plugin-owned strip

  entrypoint.go         # bash wrapper renderer (role-aware)

  pgconfig.go           # postgresql.conf overrides renderer

  password.go           # superuser password lifecycle

  replication.go        # replication password lifecycle + streaming.conf + init script

  controller.go         # invocationContext + controllerClient

  link.go               # cmdLink + cmdUnlink + URL builder + linked consumers

  new_password.go       # cmdNewPassword + auto-refresh consumers

  info.go               # cmdInfo (text + JSON snapshot)

  expose.go             # cmdExpose + cmdUnexpose

  help.go               # plugin overview

  *_test.go             # unit + integration tests

bin/                    # wrappers + the binary

plugin.yml              # plugin metadata (declares commands)

Makefile                # build / test / install-local

install                 # post-install hook

uninstall               # pre-uninstall hook

```

### Development

```bash

make build               # produces bin/voodu-postgres

make test                # go test ./...

make lint                # go vet ./...

make cross               # cross-compile linux/amd64 + linux/arm64

# Smoke test expand without dispatch

echo '{"kind":"postgres","scope":"x","name":"y","spec":{"replicas":3}}' \

  | bin/voodu-postgres expand | jq

# Smoke test help texts

bin/voodu-postgres link --help

bin/voodu-postgres expose --help

# Install into a local plugins root for E2E

make install-local PLUGINS_ROOT=/opt/voodu/plugins

```

---

## Install & upgrade

```bash

vd plugins:install thadeu/voodu-postgres

vd plugins:install thadeu/voodu-postgres --version 0.3.0

```

The plugin ships pre-built linux/amd64 + linux/arm64 binaries via GitHub Releases. The install hook downloads the right one for the host arch and drops it into `$VOODU_PLUGIN_DIR/postgres/bin/`.

## Storage

The plugin uses two storage shapes:

### Per-pod data (statefulset semantics)

Each pod gets its own `data` volume — postgres' authoritative data directory. Pod restarts re-attach to the same volume; scale-down preserves it; `vd delete --prune` is the only thing that destroys it.

| Claim | Volume name (ordinal 0) | Mount path | Survives |

|---|---|---|---|

| `data` | `voodu---data-0` | `/var/lib/postgresql/data` | pod restart, scale-down |

Standbys (ordinal 1+) get their own per-pod `data` volume (`voodu---data-1`, `-2`, …). Each pod is independent — losing one pod's data volume forces a re-bootstrap from the primary via `pg_basebackup`, but the others stay healthy.

### Backups directory (shared host bind-mount)

Every pod mounts `/opt/voodu/backups///` from the host at `/backups` inside the container (rw on every pod). `vd pg:backups:capture` writes `pg_dump` files here; cronjobs sync the dir to off-host storage.

| Source | Container path | Mount mode | Default |

|---|---|---|---|

| host bind-mount | `/backups` | `:rw` (all pods) | `/opt/voodu/backups//` |

**Pre-flight (operator, once per host)**:

```bash

sudo mkdir -p /opt/voodu/backups/clowk-lp/db

```

Ownership (`postgres:postgres`, uid 999) is auto-fixed by the entrypoint wrapper at boot — no manual `chown` needed.

## License

MIT
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/thadeu/voodu-postgres

Awesome Lists containing this project

README