An open API service indexing awesome lists of open source software.

https://github.com/heyvaldemar/keycloak-traefik-letsencrypt-docker-compose

Keycloak with Let's Encrypt Using Docker Compose
https://github.com/heyvaldemar/keycloak-traefik-letsencrypt-docker-compose

containerization containers devops docker docker-compose docker-container docker-hub docker-image docker-registry docker-volumes dockerfile environment-variables healthcheck keycloak keycloak-server letsencrypt letsencrypt-certificates traefik yaml

Last synced: about 1 month ago
JSON representation

Keycloak with Let's Encrypt Using Docker Compose

Awesome Lists containing this project

README

          

# Keycloak + Traefik + Let's Encrypt โ€” Docker Compose

[![Deployment Verification](https://github.com/heyvaldemar/keycloak-traefik-letsencrypt-docker-compose/actions/workflows/deployment-verification.yml/badge.svg?branch=main)](https://github.com/heyvaldemar/keycloak-traefik-letsencrypt-docker-compose/actions/workflows/deployment-verification.yml)
[![OpenSSF Scorecard](https://api.scorecard.dev/projects/github.com/heyvaldemar/keycloak-traefik-letsencrypt-docker-compose/badge)](https://scorecard.dev/viewer/?uri=github.com/heyvaldemar/keycloak-traefik-letsencrypt-docker-compose)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

## Contents

- [Why this stack?](#why-this-stack)
- [Prerequisites](#prerequisites)
- [Getting started](#getting-started)
- [Features](#features)
- [Typical use cases](#typical-use-cases)
- [Supply chain trust](#supply-chain-trust)
- [Production checklist](#production-checklist)
- [Backups](#backups)
- [Restoring a database backup](#restoring-a-database-backup)
- [Testing](#testing)
- [Security Notes](#security-notes)
- [Container hardening](#container-hardening)
- [Standardization reference](#standardization-reference)
- [About the maintainer](#about-the-maintainer)

This repository deploys **Keycloak** behind **Traefik** with automatic **Let's Encrypt TLS**, backed by **PostgreSQL**, with a scheduled **backup container** and a companion **restore script**. One `docker compose up` away from a production-shaped identity-and-access-management service at `https://your-domain`.

๐Ÿ“™ Full narrative installation guide on the blog: [heyvaldemar.com/install-keycloak-using-docker-compose/](https://www.heyvaldemar.com/install-keycloak-using-docker-compose/).

## Why this stack?

| Need | This stack | Manual install | Keycloak Helm (K8s) | Other compose examples |
|------|-----------|----------------|---------------------|------------------------|
| Ready to deploy in <10 min | โœ… | โŒ hours of setup | โœ… if K8s is already running | Often |
| TLS via Let's Encrypt, auto-renewed | โœ… Traefik ACME built-in | Manual certbot | Via cert-manager | Varies |
| Runs on Docker Compose (no Kubernetes required) | โœ… | N/A | โŒ K8s required | โœ… |
| PostgreSQL bundled with healthcheck + start-order dependency | โœ… | Separate install | โœ… | Varies |
| Scheduled DB backups + pruning | โœ… | Manual cron | External (Velero etc.) | Rare |
| One-command restore script | โœ… | Manual `pg_restore` | Manual | Rare |
| Upstream images pinned by `sha256` digest | โœ… | N/A | Depends on chart | Rare |
| Dependabot-tracked weekly updates | โœ… | N/A | Depends | Rare |
| CI-verified deployment + backup/restore on every push | โœ… | N/A | Varies | Rare |
| Credentials via env (never committed) | โœ… | N/A | K8s Secrets | Often committed plaintext |

Four moving parts (Traefik + Keycloak + Postgres + backups). No hidden complexity, no Kubernetes prerequisites, no manual certificate management.

## Prerequisites

Before you start, you need:

- **A Linux server** with a public IP. Tested on Ubuntu 22.04 LTS+ and Debian 12+; should work on any distro that runs current Docker. Local Mac/Windows works for dev; production is Linux.
- **Docker Engine 24+ and Docker Compose 2.20+.** Quick check: `docker version` and `docker compose version`. If you don't have Docker yet: [official install guide](https://docs.docker.com/engine/install/).
- **A domain you control,** with two `A` records (or `CNAME`s) pointing at your server's public IP โ€” one for Keycloak (e.g. `keycloak.example.com`), one for the Traefik dashboard (e.g. `traefik.keycloak.example.com`). DNS must propagate before deploy or Let's Encrypt's TLS-ALPN challenge will fail.
- **Ports 80 and 443 open** on the server's firewall and not bound by another service. Quick check: `sudo ss -ltn '( sport = :80 or sport = :443 )'` should be empty.
- **~2 GB free RAM and 1 free CPU** for the running stack. ~500 MB of disk for the images plus whatever your backup retention requires.

That's it. No Kubernetes, no separate database server, no certbot setup.

## Getting started

```bash
# 1. Clone
git clone https://github.com/heyvaldemar/keycloak-traefik-letsencrypt-docker-compose
cd keycloak-traefik-letsencrypt-docker-compose

# 2. Create the two Docker networks the stack expects
docker network create traefik-network
docker network create keycloak-network

# 3. Copy the environment template and fill in required values
cp .env.example .env
$EDITOR .env
# ^ Required: KEYCLOAK_DB_PASSWORD, KEYCLOAK_ADMIN_PASSWORD,
# TRAEFIK_BASIC_AUTH, TRAEFIK_ACME_EMAIL, TRAEFIK_HOSTNAME,
# KEYCLOAK_HOSTNAME. See .env.example for generation commands.

# 4. Deploy
docker compose -f keycloak-traefik-letsencrypt-docker-compose.yml -p keycloak up -d
```

Within a minute or two, both `https://${KEYCLOAK_HOSTNAME}` (Keycloak UI) and `https://${TRAEFIK_HOSTNAME}` (Traefik dashboard, basic-auth protected) are live with fresh Let's Encrypt certificates.

### What success looks like

```bash
# All four services should report as healthy:
docker compose -f keycloak-traefik-letsencrypt-docker-compose.yml -p keycloak ps
# Expected: postgres, keycloak, traefik all show "(healthy)"; backups shows "Up"

# Traefik should have issued a Let's Encrypt cert within ~30s of first start:
docker compose -p keycloak logs traefik | grep -i "adding certificate"
# Expected: "Adding certificate for domain(s) keycloak.example.com"

# Keycloak responds on its public URL:
curl -fsS -o /dev/null -w "%{http_code}\n" "https://${KEYCLOAK_HOSTNAME}/health/ready"
# Expected: 200

# First backup lands in the volume after KEYCLOAK_BACKUP_INIT_SLEEP (default 30m):
docker compose -p keycloak logs backups | tail -3
# Expected: a "Backup OK: ... (NN bytes)" line
```

### Common first-deploy issues

- **Cert issuance fails / no `adding certificate` log line.** Either DNS hasn't propagated to your server's IP yet, or port 80 isn't reachable from the public internet. Confirm with `dig +short ${KEYCLOAK_HOSTNAME}` (should match server IP) and `curl -I http://${KEYCLOAK_HOSTNAME}` from outside the server (should not time out).
- **`docker compose up` fails with `set in .env`.** A required variable is empty in `.env`. The error message lists the variable. Most likely candidates: `KEYCLOAK_DB_PASSWORD`, `KEYCLOAK_ADMIN_PASSWORD`, `TRAEFIK_BASIC_AUTH`. Generate per the comments in `.env.example`.
- **`network keycloak-network not found`.** Step 2 (the `docker network create ...` commands) was skipped. Run them and retry.
- **Keycloak container restarts in a loop with `Build failed due to errors`.** This is rare and means an env var combo isn't supported by the upstream image's auto-build. Check `docker compose -p keycloak logs keycloak` for the specific error and reach out via [Issues](https://github.com/heyvaldemar/keycloak-traefik-letsencrypt-docker-compose/issues).

### Apply `.env` or compose-file changes

```bash
docker compose -f keycloak-traefik-letsencrypt-docker-compose.yml -p keycloak up -d --force-recreate
```

## Features

- **Keycloak** latest stable (26.2.5) with PostgreSQL 16 backing store.
- **Traefik v3** reverse proxy with automatic HTTPโ†’HTTPS redirect at entry-point level and Let's Encrypt TLS-ALPN challenge for cert issuance.
- **Basic-auth protected Traefik dashboard** on a separate hostname.
- **Prometheus metrics** exposed by Traefik (`--metrics.prometheus`) โ€” wire your own scraper.
- **Healthchecks** on every service (Postgres `pg_isready`, Keycloak `/health/ready`, Traefik `/ping`) with service-dependency ordering (`depends_on: condition: service_healthy`).
- **Scheduled PostgreSQL backups** with configurable interval, retention, and destination path.
- **Automated restore script** (`keycloak-restore-database.sh`) with interactive backup selection.
- **Traefik exposed-by-default disabled** โ€” only services with `traefik.enable=true` labels are routed.
- **Credentials required at deploy time** โ€” compose fails fast if `.env` is incomplete, preventing accidental boots with empty or default credentials.

### Typical use cases

- **Self-hosted SSO for homelabs** โ€” wire up Nextcloud, Grafana, Portainer, GitLab (or anything OIDC-capable) behind Keycloak federation.
- **Small-team identity provider** โ€” consultancies, startups, internal tools that outgrew shared passwords.
- **Developer sandbox** โ€” spin up a realistic Keycloak for integration testing without provisioning a managed IdP.
- **Step toward production Kubernetes** โ€” run the Docker Compose stack first, validate the shape, then migrate to a Helm chart once the config is known-good.

## Supply chain trust

This repository is a **deployment template**, not a custom Docker image. It orchestrates three upstream images:

- [`traefik`](https://hub.docker.com/_/traefik) โ€” reverse proxy, Docker Hub official image
- [`quay.io/keycloak/keycloak`](https://quay.io/repository/keycloak/keycloak) โ€” Keycloak upstream
- [`postgres`](https://hub.docker.com/_/postgres) โ€” PostgreSQL, Docker Hub official image

All three are pinned to `tag@sha256:` in `.env.example`. Compose pulls by digest, not by tag. Two users deploying this repo on different days get byte-identical image manifests regardless of upstream repushes.

Dependabot's `docker` ecosystem watches each digest and opens a weekly PR when any of them changes. CI's **Deployment Verification** workflow runs on every push, pull request, and every Monday at 06:00 UTC โ€” it stands up the full compose stack with ephemeral credentials, validates HTTPS routing + Traefik dashboard smoke, and tears down. Drift in upstream images surfaces within a week instead of on the next user deploy.

GitHub Actions are also pinned by commit SHA with `# vX.Y.Z` version comments. Dependabot's `github-actions` ecosystem keeps those fresh.

See [`SECURITY.md`](SECURITY.md) for the disclosure policy.

## Production checklist

Before exposing this to real users, check every box:

- [ ] **Rotate the bootstrap admin.** `KEYCLOAK_ADMIN_USERNAME`/`PASSWORD` create a single admin on first start. After login, create your real admin users (preferably via Keycloak Federation or a second-factor-protected account), then disable or delete the bootstrap admin from the Keycloak UI.
- [ ] **Strong secrets everywhere.** `KEYCLOAK_DB_PASSWORD` and `KEYCLOAK_ADMIN_PASSWORD` must be at least 24 random characters. Generate with `openssl rand -base64 24 | tr -d '/+=' | head -c 32`. Traefik dashboard BCrypt hash must be regenerated per deployment.
- [ ] **Host-mount the backups volume.** By default the `backups` service writes to a named docker volume. For disaster recovery, bind-mount it to a host path that's included in your off-host backup solution: `- /srv/keycloak-postgres/backups:/srv/keycloak-postgres/backups`.
- [ ] **Verify Let's Encrypt cert issuance.** Watch Traefik logs during first start: `docker compose -p keycloak logs traefik -f`. A successful TLS-ALPN challenge logs `Adding certificate for domain(s) ${KEYCLOAK_HOSTNAME}` within ~30 seconds.
- [ ] **Lock down the Traefik dashboard.** The dashboard is basic-auth protected by default, but basic auth is basic. Consider restricting the dashboard's router to specific source IPs via Traefik's `IPAllowList` middleware, or skip exposing it publicly and rely on `docker compose logs`.
- [ ] **Plan your upgrade path.** Keycloak does not guarantee DB-schema compatibility across major versions. Before bumping `KEYCLOAK_IMAGE_TAG` from 26.x to 27.x (when released), read Keycloak's migration guide, test the bump on a staging database restored from a recent backup.
- [ ] **Know the restore procedure.** Run `./keycloak-restore-database.sh` against a test environment before you need it in production. Document the `BACKUP_PATH` and restore steps alongside your other DR runbooks.

## Backups

The `backups` container runs on the same network as Postgres and performs a dump โ†’ prune โ†’ sleep loop:

1. **Dump** โ€” `pg_dump` of the Keycloak database piped through `gzip`, timestamp-named. `set -o pipefail` catches `pg_dump` failures even though `gzip` exits 0. Failed dumps are renamed with a `.failed` suffix for diagnosis; the loop continues to the next cycle.
2. **Prune** โ€” deletes files matching `${KEYCLOAK_POSTGRES_BACKUP_NAME}-*.gz` older than `KEYCLOAK_POSTGRES_BACKUP_PRUNE_DAYS` days. Set `PRUNE_DAYS=0` to disable pruning entirely.
3. **Sleep** โ€” waits `KEYCLOAK_BACKUP_INTERVAL` before the next dump.

All four knobs (`KEYCLOAK_BACKUP_INIT_SLEEP`, `KEYCLOAK_BACKUP_INTERVAL`, `KEYCLOAK_POSTGRES_BACKUP_PRUNE_DAYS`, `KEYCLOAK_POSTGRES_BACKUPS_PATH`) are configured via `.env`. See `.env.example` for defaults (30-minute warm-up, 24-hour interval, 7-day retention).

**Verify backups are running:**

```bash
docker compose -p keycloak logs backups | tail -20
```

Expected output โ€” one timestamped line per backup cycle:

```
[2026-04-23T03:00:01+00:00] Starting backup to /srv/keycloak-postgres/backups/keycloak-postgres-backup-2026-04-23_03-00-01.gz
[2026-04-23T03:00:03+00:00] Backup OK: /srv/keycloak-postgres/backups/keycloak-postgres-backup-2026-04-23_03-00-01.gz (47382 bytes)
```

A `Backup FAILED` line (with the partial file renamed to `.failed`) is your signal that something is broken โ€” typically the postgres container is unhealthy, the backup volume filled up, or the DB credentials were rotated without updating the backups container environment.

**Off-host replication.** By default backups live in the `keycloak-database-backups` Docker volume โ€” if the host dies, backups die with it. For disaster recovery, bind-mount the backup path to a host directory that your off-host backup solution (restic, rclone, Borg, S3 sync, etc.) already covers:

```yaml
# docker-compose.override.yml
services:
backups:
volumes:
- /srv/keycloak-postgres/backups:/srv/keycloak-postgres/backups
```

## Restoring a database backup

`keycloak-restore-database.sh` handles the restore flow end-to-end with safety guards at every step where data loss is possible:

1. **Sources `.env`** โ€” DB name/user/backups path read from your live configuration (not hardcoded). Works after you customise the defaults.
2. **Lists available backups** from the backups volume.
3. **Prompts for selection** โ€” you copy-paste the filename. The script rejects typos / path-traversal by validating the selection against the listed filenames.
4. **Integrity-checks** the selected archive via `gunzip -t`. A corrupt archive is caught here, before anything is touched.
5. **Requires `DESTROY` confirmation** โ€” typing anything else (including empty) aborts without changes.
6. **Creates a pre-restore snapshot** of the CURRENT database state at `/tmp/pre-restore-.gz` inside the backups container. This is your rollback if the restore produces a broken DB.
7. **Stops Keycloak**, drops + recreates the database, pipes the selected backup into `psql`.
8. **Starts Keycloak**, waits up to 2 minutes for the healthcheck to report `healthy`, then runs a sanity query confirming the `public` schema has tables.

If step 8 fails (Keycloak unhealthy, or the restored DB has 0 public-schema tables), the script exits non-zero and prints the exact command sequence to recover from the pre-restore snapshot.

Make the script executable, then run from the repository root (where `.env` lives):

```bash
chmod +x keycloak-restore-database.sh
./keycloak-restore-database.sh
```

The script uses the `PGPASSWORD` inherited from the backups container, so no credentials need to be passed on the command line.

**RTO / RPO expectations** for the default configuration:

| Metric | Default value | How to tighten |
|---|---|---|
| **RPO** (max data loss) | 24 hours (one `KEYCLOAK_BACKUP_INTERVAL`) | Reduce `KEYCLOAK_BACKUP_INTERVAL` (e.g. `1h`) |
| **RTO** (typical restore time) | 1-3 minutes on a small DB; scales with DB size | Keep Keycloak state lean (realms + clients only, ship audit logs elsewhere) |
| **Backup retention** | 7 days (one `PRUNE_DAYS`) | Increase `KEYCLOAK_POSTGRES_BACKUP_PRUNE_DAYS` |
| **Pre-restore snapshot** | Automatic before every restore, kept at `/tmp/pre-restore-*.gz` inside the backups container | โ€” |

## Testing

The [Deployment Verification](https://github.com/heyvaldemar/keycloak-traefik-letsencrypt-docker-compose/actions/workflows/deployment-verification.yml?query=branch%3Amain) workflow runs end-to-end backup + restore tests on every push, every pull request, and every Monday at 06:00 UTC. The `backup-restore-e2e` job boots the full compose stack with ephemeral credentials and short backup intervals (`INIT_SLEEP=10s`, `INTERVAL=30s`, `PRUNE_DAYS=7`) and exercises seven scenarios:

1. **`.env` required** โ€” `docker compose config` fails cleanly without `.env`, guarding the `${VAR:?...}` compose syntax.
2. **Backup created** โ€” a `.gz` appears in the backups volume with size > 0.
3. **Backup integrity** โ€” `gunzip -t` on the backup exits zero.
4. **Backup contents valid** โ€” decompressed SQL contains `PostgreSQL database dump` header and `CREATE TABLE`/`CREATE SCHEMA`.
5. **Backup failure detected** โ€” stopping postgres forces a failed cycle; a `*.failed` file and `Backup FAILED` log line are produced.
6. **Restore roundtrip** โ€” inserting a marker row, restoring an earlier backup, and asserting the marker is gone proves the backup is genuinely restorable (not a no-op).
7. **Prune removes old** โ€” a fake file with 14-day-old mtime is deleted on the next prune cycle; recent backups are preserved.

Run the same tests locally:

```bash
# Bring the stack up first, with short backup intervals in .env โ€” see tests/README.md
docker compose -f keycloak-traefik-letsencrypt-docker-compose.yml -p keycloak up -d
./tests/e2e-backup-restore.sh
```

A green [`backup-restore-e2e`](https://github.com/heyvaldemar/keycloak-traefik-letsencrypt-docker-compose/actions/workflows/deployment-verification.yml?query=branch%3Amain) run is the authoritative proof that the backup + restore flow works end-to-end on every push. If you deploy this template and hit an unexpected issue, compare the green CI run's logs to your own โ€” most "doesn't work" cases trace to DNS propagation, firewall rules, hostname mismatches, or a customised `.env` that silently breaks a variable the tests cover.

## Security Notes

- Credentials are read from `.env` at deploy time. `.env` is gitignored. The compose file uses `${VAR:?...}` syntax so `docker compose up` fails immediately with a helpful error if any required variable is missing.
- **Pre-rotation advisory.** Commits before [PR #12](https://github.com/heyvaldemar/keycloak-traefik-letsencrypt-docker-compose/pull/12) (merged 2026-04-23) committed real credential values. Those values remain in git history but are no longer referenced by any live file. Anyone who deployed with the pre-rotation configuration should rotate their live credentials and regenerate the Traefik dashboard BCrypt hash.
- Traefik dashboard is behind basic auth. Consider adding IP allow-listing for additional isolation.
- Upstream image digests are pinned; Dependabot auto-opens weekly PRs when digests change.
- CI runs on every push and every Monday to catch upstream drift.

See [`SECURITY.md`](SECURITY.md) for the vulnerability disclosure process.

## Container hardening

This stack ships with production-grade container hardening (per [`self-host-repo-hardening-runbook` โ†’ Phase 7](https://github.com/heyvaldemar/self-host-repo-hardening-runbook/blob/main/RUNBOOK.md#phase-7--container-security-context--resource-limits)) applied to every service:

- **`security_opt: no-new-privileges:true`** โ€” prevents privilege escalation via setuid binaries even if a process inside escapes its initial capability set.
- **`cap_drop: [ALL]`** โ€” drops every Linux capability. Each service adds back only what it genuinely needs:
- `postgres`: `CHOWN`, `DAC_READ_SEARCH`, `FOWNER`, `SETGID`, `SETUID` โ€” needed by `docker-entrypoint.sh` to chown the data dir on first boot and `gosu` to drop to the postgres user.
- `traefik`: `NET_BIND_SERVICE` โ€” needed to bind to ports 80/443.
- `keycloak`, `backups`: nothing.
- **`read_only: true`** with explicit `tmpfs` mounts on postgres, traefik, and backups. Keycloak is intentionally NOT read-only โ€” see the comment block on the keycloak service in the compose file for the rationale (upstream image auto-runs `kc.sh build` at start, which writes to `/opt/keycloak/lib/quarkus/`).
- **Resource limits** (`deploy.resources.limits.memory` + `cpus`) on every service. Prevents one runaway service from starving the host.
- **Non-root users** declared explicitly: `keycloak` runs as `1000:0`, `traefik` as `0:0` (its image has no non-root user; root + `cap_drop: ALL` + `cap_add: NET_BIND_SERVICE` is the minimum-privilege configuration). `postgres` and `backups` leave `user:` unset โ€” the official postgres image's `docker-entrypoint.sh` handles user-switching internally (drops to uid 999 via `gosu` after the initial chown).

Tested end-to-end as part of the v1.0.0 release. The `backup-restore-e2e` CI job validates that backup write + dropdb + createdb + restore work under the hardened context on every push.

## Standardization reference

The patterns in this stack are codified as a reusable standard at [`heyvaldemar/self-host-repo-hardening-runbook` โ†’ `RUNBOOK.md`](https://github.com/heyvaldemar/self-host-repo-hardening-runbook/blob/main/RUNBOOK.md). If you maintain a similar Compose-stack repo and want to follow the same standard, the runbook covers eight phases:

- **Phase 0** โ€” Security audit + `.env` hygiene
- **Phase 1** โ€” Community files (LICENSE, SECURITY.md, CHANGELOG.md, Dependabot)
- **Phase 2** โ€” CI workflow hardening (commit-SHA pins, per-job permissions, weekly cron)
- **Phase 3** โ€” Upstream image digest pinning
- **Phase 4** โ€” README rewrite (evaluator-first structure)
- **Phase 5** โ€” OpenSSF Scorecard
- **Phase 6** โ€” CI lint + upstream Trivy scan
- **Phase 7** โ€” Container security context + resource limits (this README's "Container hardening" section)

Plus an optional Architecture Decision Records (ADR) phase for flagship repos. Nine common pitfalls grounded in production iteration history. Helper scripts for bulk operations (digest resolution, action-tag dereferencing, phase-1/5 application).

---

## About the maintainer

**Maintained by [Vladimir Mikhalev](https://github.com/heyvaldemar)** โ€” Docker Captain ยท IBM Champion ยท AWS Community Builder

[YouTube](https://www.youtube.com/channel/UCf85kQ0u1sYTTTyKVpxrlyQ?sub_confirmation=1) ยท [Blog](https://heyvaldemar.com) ยท [LinkedIn](https://www.linkedin.com/in/heyvaldemar/)