{"id":51312418,"url":"https://github.com/pgedge/coldfront","last_synced_at":"2026-07-01T05:02:21.475Z","repository":{"id":365662254,"uuid":"1239849452","full_name":"pgEdge/coldfront","owner":"pgEdge","description":"BETA PRE-RELEASE - NOT FOR PRODUCTION: Transparent data tiering and partition lifecycle management for PostgreSQL. A single table spans recent rows in PostgreSQL partitions and older rows in Apache Iceberg on S3-compatible, Azure, or GCS storage; the cold tier is readable and writable through the same SQL, no app changes.","archived":false,"fork":false,"pushed_at":"2026-06-29T23:37:36.000Z","size":666,"stargazers_count":32,"open_issues_count":4,"forks_count":2,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-30T00:06:10.649Z","etag":null,"topics":["duckdb","iceberg","lakekeeper","parquet","postgres","postgresql","s3"],"latest_commit_sha":null,"homepage":"https://docs.pgedge.com/coldfront","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pgEdge.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":".github/SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-15T14:01:52.000Z","updated_at":"2026-06-29T23:37:40.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/pgEdge/coldfront","commit_stats":null,"previous_names":["pgedge/coldfront"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/pgEdge/coldfront","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pgEdge%2Fcoldfront","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pgEdge%2Fcoldfront/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pgEdge%2Fcoldfront/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pgEdge%2Fcoldfront/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pgEdge","download_url":"https://codeload.github.com/pgEdge/coldfront/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pgEdge%2Fcoldfront/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34993438,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-07-01T02:00:05.325Z","response_time":130,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["duckdb","iceberg","lakekeeper","parquet","postgres","postgresql","s3"],"created_at":"2026-07-01T05:02:20.642Z","updated_at":"2026-07-01T05:02:21.470Z","avatar_url":"https://github.com/pgEdge.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pgEdge ColdFront\n\n\u003e [!WARNING]\n\u003e ColdFront is pre-release beta software under active development. Do\n\u003e not use it in production. Interfaces, on-disk formats, and behaviour\n\u003e may change without notice, and data loss is possible.\n\n[![CI](https://github.com/pgEdge/ColdFront/actions/workflows/ci.yml/badge.svg)](https://github.com/pgEdge/ColdFront/actions/workflows/ci.yml)\n\nColdFront keeps tables in PostgreSQL and cold data in Apache Iceberg\n(Parquet on S3-compatible, Azure, or GCS storage), and the cold tier is\nboth readable and writable through the same SQL with no application\nchanges. The application queries every table as an ordinary PostgreSQL\nrelation, and both operating modes present the same standard SQL\nsurface.\n\nColdFront provides two operating modes:\n\n- Tiered mode keeps recent data in native PostgreSQL partitions and\n  archives older data to Iceberg on a watermark; the application reads a\n  single unified view, and the archiver moves rows from hot to cold on a\n  schedule.\n- Decoupled mode stores the table entirely in Iceberg from the first\n  row; PostgreSQL holds a thin wrapper view and a registry row, and the\n  coldfront extension handles every data-modifying statement on that\n  view.\n\nBoth modes coexist within one database, and you choose the mode per\ntable at creation time. The SQL surface is identical for both modes:\nstandard SELECT, INSERT, UPDATE, and DELETE against the relation.\n\nDecoupled mode scales out horizontally across many PostgreSQL nodes that\nshare one Lakekeeper catalog and one object store. The bakery protocol\nin the coldfront extension serializes Iceberg commits on the PostgreSQL\nside using Spock-replicated Snowflake tickets, so concurrent writers\nnever collide at the catalog. The protocol implements Lamport mutual\nexclusion with the Ricart-Agrawala deferred-reply optimization, and the\n[formal model](docs/formal/README.md) verifies its safety with TLA+.\n\n## How It Works\n\nColdFront runs inside PostgreSQL and rewrites each statement to the\ncorrect tier, so the application sees one relation:\n\n```text\nApplication\n  |\n  |-- SELECT * FROM events            reads hot + cold transparently\n  |-- INSERT INTO events ...          hot via PG, cold via raw_query\n  |-- UPDATE events SET ... WHERE ... rewritten to the right tier\n  |-- DELETE FROM events WHERE ...    rewritten to the right tier\n         |\n  PostgreSQL 16/17/18 + pg_duckdb + coldfront\n    _events (partitioned hot data, native PG)\n    events  VIEW (hot + cold, replaces the original table)\n    coldfront extension (rewrites DML to the right tier)\n    pg_duckdb (Iceberg reads + writes via DuckDB, in-process)\n         |\n  Lakekeeper (Iceberg REST catalog)\n         |\n  S3-compatible object store (Parquet data + Iceberg metadata)\n         |\n  Archiver (Go binary, cron) moves expired PG partitions to Iceberg\n```\n\n## Installation\n\nColdFront is open source under the PostgreSQL License and runs on stock\nPostgreSQL 16, 17, and 18. The full build workflow lives in\n**[INSTALL.md](docs/installation.md)**: build the DuckDB 1.5.x base and the\ncoldfront layer with one `docker build`, or install bare-metal. Then\ncontinue with the Quickstart below.\n\n**Setting up on cloud S3?** Once the image is built, the\n**[S3 setup guide](docs/object_store.md)** takes you from an empty bucket to\na working cold tier end-to-end.\n\n## Quickstart\n\nBuild the image (see the [Installation](docs/installation.md) guide) and\nbring up the stack:\n\n```bash\ndocker compose up -d --build\n```\n\nBootstrap Lakekeeper and create a warehouse (see the one-time setup in\nthe [Using ColdFront](docs/usage.md) guide), then create a table in psql:\n\n```sql\nCREATE EXTENSION IF NOT EXISTS pg_duckdb;\nCREATE EXTENSION IF NOT EXISTS coldfront;\nSELECT coldfront.set_storage_secret('admin', 'adminsecret', 'seaweedfs:8333');\n\n-- Decoupled (iceberg-only) table, stored entirely in Iceberg on S3:\nSELECT coldfront.create_iceberg_table('public', 'events',\n  '[{\"name\":\"id\",\"type\":\"bigint\"},{\"name\":\"ts\",\"type\":\"timestamptz\"},{\"name\":\"note\",\"type\":\"text\"}]'::jsonb);\nINSERT INTO events VALUES (1, now(), 'hello');\nSELECT count(*) FROM events;\n```\n\n## Documentation\n\nThe following table lists the ColdFront guides and what each one covers:\n\n| Doc | Contents |\n|---|---|\n| **[USAGE.md](docs/usage.md)** | Day-to-day use - both modes plus the standalone partition manager, one-time setup, reading/writing, supported types, the partition CLI, storage backends, distributed (mesh) setup, tuning |\n| **[INSTALL.md](docs/installation.md)** | Build from source (Docker or bare-metal); Testing \u0026 CI |\n| **[S3_HOWTO.md](docs/object_store.md)** | Get ColdFront running on cloud S3 (virtual-hosted), end-to-end |\n| **[COMPACTOR.md](docs/compaction.md)** | Cold-tier table maintenance - compaction, snapshot expiry, orphan-file removal |\n| **[ARCHITECTURE.md](docs/architecture.md)** | Shared architecture and core mechanics |\n| **[ARCHITECTURE_TIERED.md](docs/architecture_tiered.md)** | Tiered (hot PG + cold Iceberg) deep dive |\n| **[ARCHITECTURE_DECOUPLED.md](docs/architecture_decoupled.md)** | Decoupled (iceberg-only) deep dive |\n\n## Least-privilege application roles\n\nApplication roles need no superuser and no server-file access, yet they\nread and write the cold tier through the same transparent view.\nOnboarding an application role is a single call:\n\n```sql\nSELECT coldfront.grant_app_access('alice');\n```\n\ngrant_app_access grants only the minimum the cold path needs, all\nderived from the registry rather than hardcoded: membership in\nduckdb.postgres_role, schema USAGE, SELECT on the registry, DML on every\nregistered view, and EXECUTE on the runtime cold-path functions. The\ncall is idempotent and is not executable by PUBLIC, so an application\nrole can never self-grant. The role is never granted\npg_read_server_files or pg_write_server_files, so it has no host-file\naccess. CREATE ROLE and GRANT both replicate over Spock, so you onboard\na role once on any node and it propagates across the mesh.\n\n**How a non-superuser reaches Iceberg.** pg_duckdb force-disables DuckDB's\n`LocalFileSystem` for non-superusers, which would block the side-loaded\niceberg/postgres DuckDB extensions from loading on `ATTACH`. ColdFront's\nattach helpers `coldfront.ensure_attached()` / `ensure_pg_attached()` are\ntherefore `SECURITY DEFINER` (with a pinned `search_path`): the extension\nload + `ATTACH` run elevated once per session, and because the DuckDB\ninstance is per-backend the attach persists, so every subsequent read\n(`iceberg_scan`) and write (`_exec_iceberg_with_claim`) runs as the **app\nrole** over S3/httpfs - no `LocalFileSystem`, no elevation.\n\n**Hardening.** Because the attach helpers run elevated, the\ndeployment-config GUCs they consume - `coldfront.warehouse`,\n`coldfront.lakekeeper_endpoint`, `coldfront.local_pg_dsn` - are registered\n`PGC_SUSET` (superuser-set-only), so a non-superuser cannot redirect the\nelevated `ATTACH` at an attacker endpoint; `local_pg_dsn` is additionally\n`GUC_SUPERUSER_ONLY` (it may carry credentials). Operators set these in\n`postgresql.conf` as before.\n\n**Turnkey.** The image defaults `duckdb.postgres_role = coldfront_duckdb`\nand creates that NOLOGIN role at init, so the non-superuser path works out\nof the box - `grant_app_access` is the only step. Set\n`COLDFRONT_DUCKDB_ROLE=''` to keep pg_duckdb's stock superuser-only\nbehaviour. Superusers are unaffected either way.\n\n**Spock mesh.** `CREATE ROLE` and `GRANT` both replicate via Spock DDL, so\ncreate the role + run `grant_app_access` **once on any one node** - the role\nand every grant propagate to the whole mesh. Don't repeat them per-node (a\nrepeated `CREATE ROLE` is a harmless local \"already exists\" error, just\nunnecessary). Mesh cold *writes* route through the Ricart-Agrawala bakery,\nwhose coordination functions (`_claim_iceberg_lock` /\n`_release_iceberg_lock`) are `SECURITY DEFINER` so a non-superuser drives the\ncross-node serialization (reading `pg_stat_replication` liveness + dblinking\nthe claim) with the right privilege - verified **protocol-neutral** against\nthe TLA+ model (`docs/formal/`). Least privilege therefore holds for writes\nin a mesh too, not just single-node.\n\nThe whole boundary is asserted end-to-end by the journey's\n`story_app_privilege` (non-superuser tiered read+write; in a mesh,\ncross-node read + a SECURITY DEFINER-bakery cold write from a peer), by\n`ci/ops.sh` check 3 (the role cannot redirect the endpoint, cannot\nself-grant, and an un-onboarded role is cleanly denied), and at the catalog\nlevel by the `privilege_model` pg_regress test.\n\n## Caveats\n\nIceberg on Azure ADLS Gen2 requires Blob soft-delete, container\nsoft-delete, and change feed (blob events) to be OFF on the storage\naccount. Lakekeeper warehouse creation otherwise fails with HTTP 409\n(\"This endpoint does not support BlobStorageEvents or SoftDelete\").\nDisable those features on the storage account before using it as a cold\ntier.\n\n## Project Structure\n\nThe repository is laid out as follows:\n\n```text\npgedge-coldfront/\n├── cmd/\n│   ├── archiver/               ← tiering daemon: moves expired PG partitions → Iceberg (pure Go, pgx)\n│   ├── partitioner/            ← standalone partition-manager CLI (time/id modes, 2-level)\n│   └── compactor/              ← cold-tier maintenance: compaction, snapshot expiry, orphan removal (iceberg-go)\n├── internal/\n│   ├── config/                 ← YAML config loading + validation\n│   ├── partcfg/                ← in-DB, Spock-replicated per-table lifecycle config\n│   ├── partition/              ← partition create/find/detach/drop (time + id modes)\n│   ├── sqlutil/                ← shared SQL helpers\n│   ├── view/                   ← unified view + trigger generation\n│   └── watermark/              ← archive_watermark table CRUD\n├── extension/coldfront/        ← PGXS C extension (DML hooks, bakery, registry, SQL)\n├── ci/\n│   ├── journey.sh              ← THE canonical user journey (the E2E spec)\n│   ├── matrix.sh               ← drives PG×topology×mode×target cells (--quick / --full)\n│   ├── ops.sh                  ← operational checks (privilege model, Lakekeeper-down, S3-down)\n│   ├── probe-standby.sh        ← risk gate: iceberg_scan on a read-only hot standby\n│   ├── lib.sh                  ← shared step/assert/psql helpers\n│   ├── topo/                   ← vanilla.sh (1 node) · mesh.sh (3-node Spock)\n│   └── runbooks/               ← failover-patroni.md (failover delegated to Patroni)\n├── docker/\n│   ├── Dockerfile.duckdb15-base ← DuckDB 1.5.x base (pg_duckdb 1.5.4 + patched iceberg)\n│   ├── Dockerfile.duckdb15      ← thin coldfront app layer (ARG PG_MAJOR=16|17|18)\n│   ├── iceberg-*.patch          ← duckdb-iceberg patches (bakery commit-refresh + strict-reader interop)\n│   ├── entrypoint.sh\n│   └── seaweedfs-s3.json        ← SeaweedFS S3 auth config (example)\n├── docs/                       ← MkDocs site (user docs; mkdocs.yml at repo root)\n│   ├── index.md · installation.md · object_store.md · usage.md · compaction.md\n│   ├── architecture.md · architecture_tiered.md · architecture_decoupled.md\n│   └── formal/                 ← TLA+ model of the bakery protocol (Bakery_v2.tla)\n├── docker-compose.yml          ← END-USER single-node stack (ports published)\n├── docker-compose.matrix.yml   ← CI only: single-node vanilla matrix\n├── docker-compose.mesh.yml     ← CI only: 3-node Spock mesh\n├── run-ci-local.sh             ← pre-commit gate (ci/matrix.sh --quick)\n├── config.example.yaml · Makefile · mkdocs.yml\n├── DUCKDB_1.5_PATCHED.md       ← the patched DuckDB 1.5 base: what's patched + how it's built\n└── DUCKDB_1.5_UNPATCHED.md     ← building/running the base unpatched, and the consequences\n```\n\n## Dependencies\n\nThe following table lists the services and components ColdFront runs\nagainst:\n\n| Component | Version | Purpose |\n|-----------|---------|---------|\n| PostgreSQL | 16, 17, or 18 | Database with native partitioning (stock upstream; no fork) |\n| pg_duckdb | 1.5.4 (PR #1025) | Iceberg reads + writes via DuckDB in-process |\n| duckdb-iceberg | `v1.5-variegata` @ `0fad545a`, patched | Iceberg catalog/IO for DuckDB; carries ColdFront's four patches (see [DUCKDB_1.5_PATCHED.md](DUCKDB_1.5_PATCHED.md)) |\n| Lakekeeper | latest | Iceberg REST catalog (Rust binary) |\n| S3-compatible store | any | SeaweedFS, MinIO, GCS, Azure Blob, etc. |\n\nBuilding from source needs the Go toolchain (the version is pinned in\n[go.mod](go.mod)). The Go module dependencies are the source of truth in\n[go.mod](go.mod) / [go.sum](go.sum); the archiver and partitioner build\nas static, CGO-free binaries on `pgx/v5`, and the compactor is a separate\nmodule ([cmd/compactor/go.mod](cmd/compactor/go.mod)) built on\n`apache/iceberg-go`.\n\n## Versioning\n\nColdFront carries two independent version numbers, each following its own\nconvention:\n\n- Release tags use three-part\n  [Semantic Versioning](https://semver.org) (`vMAJOR.MINOR.PATCH`, for\n  example `v1.0.0`); Git tags, GitHub releases, container image tags, and\n  the changelog all use this form. Three parts are required because\n  ColdFront is a Go module, and the toolchain recognises only full `vX.Y.Z`\n  tags as releases. The patch field keeps a bugfix-only release (`v1.0.1`)\n  distinct from a feature release (`v1.1.0`), which matters for a\n  data-writing extension where \"same behaviour, one safety fix\" is worth\n  stating plainly.\n- The PostgreSQL extension uses the conventional two-part version in its\n  control file (`default_version = '1.0'`) and upgrade-script filenames\n  (`coldfront--1.0--1.1.sql`), as is standard for PostgreSQL extensions.\n\nThe two map cleanly: extension `1.0` ships inside release `v1.0.0`, and a\npatch release may carry the same extension version or bump it with an\nupgrade script when the SQL changes.\n\n## Author\n\nCreated by Jimmy Angelakos.\n\n## License\n\nPostgreSQL License. See [LICENSE.md](LICENSE.md). Redistributed third-party\ncomponents and their notices: [THIRD_PARTY_NOTICES.md](THIRD_PARTY_NOTICES.md).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpgedge%2Fcoldfront","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpgedge%2Fcoldfront","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpgedge%2Fcoldfront/lists"}