{"id":41523118,"url":"https://github.com/samwillis/pot-query-postgres-poc","last_synced_at":"2026-01-23T21:12:55.396Z","repository":{"id":329187405,"uuid":"1118482456","full_name":"samwillis/pot-query-postgres-poc","owner":"samwillis","description":null,"archived":false,"fork":false,"pushed_at":"2025-12-17T21:41:12.000Z","size":64,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-12-21T09:07:45.102Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/samwillis.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-17T20:29:02.000Z","updated_at":"2025-12-17T21:34:19.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/samwillis/pot-query-postgres-poc","commit_stats":null,"previous_names":["samwillis/pot-query-postgres-poc"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/samwillis/pot-query-postgres-poc","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/samwillis%2Fpot-query-postgres-poc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/samwillis%2Fpot-query-postgres-poc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/samwillis%2Fpot-query-postgres-poc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/samwillis%2Fpot-query-postgres-poc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/samwillis","download_url":"https://codeload.github.com/samwillis/pot-query-postgres-poc/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/samwillis%2Fpot-query-postgres-poc/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28700519,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-23T17:25:48.045Z","status":"ssl_error","status_checked_at":"2026-01-23T17:25:47.153Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-23T21:12:54.543Z","updated_at":"2026-01-23T21:12:55.382Z","avatar_url":"https://github.com/samwillis.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Electric Snapshot POC\n\nA proof-of-concept demonstrating \"point-in-time\" reads in PostgreSQL using synthetic MVCC snapshots derived from logical replication (WAL tailing). This enables querying the database as-of a specific commit, even after subsequent commits have modified the data.\n\n## What This POC Proves\n\n### Core Hypothesis Validated ✅\n\n**We can execute read-only queries against historical database states by:**\n1. Observing transaction commits via logical replication (WAL tailing)\n2. Computing MVCC snapshot identifiers at each commit point\n3. Using those snapshots to query PostgreSQL and retrieve the exact data that existed at that point in time\n\n### Evidence: Physical Tuple Examination\n\nThe `vacuum-proof.spec.ts` test provides **definitive proof** by examining PostgreSQL's physical heap storage:\n\n```\nHeap page contains 11 tuple versions\nOld (dead) tuple versions in heap: 10\n```\n\nUsing the `pageinspect` extension, we can see the actual heap page contents:\n\n```\n lp | t_ctid | t_xmin | t_xmax | data (decoded)\n----+--------+--------+--------+------------------\n  1 | (0,2)  |    852 |    853 | \"initial data\"\n  2 | (0,3)  |    853 |    854 | \"data version 1\"\n  3 | (0,4)  |    854 |    855 | \"data version 2\"\n  ...\n 11 | (0,11) |    862 |      0 | \"data version 10\"\n```\n\nThis proves:\n- **11 physical tuple versions** exist in the heap for the same logical row\n- **10 are \"dead\" versions** (superseded by updates, with `t_xmax \u003e 0`)\n- Each version contains the **actual historical data**\n- Our queries read these **real old tuples**, not just transaction ID tricks\n\n### Key Test: \"As-Of Hides Later Commit\"\n\n```typescript\n// 1. Tx1: Set allowed=false, capture snapshot1\nawait client.query(`UPDATE acl SET allowed = false WHERE ...`);\n// snapshot1 = \"762:762:\"\n\n// 2. Tx2: Set allowed=true  \nawait client.query(`UPDATE acl SET allowed = true WHERE ...`);\n\n// 3. Current query returns TRUE (the new value)\nconst current = await client.query(`SELECT allowed FROM acl`);\nexpect(current.rows[0].allowed).toBe(true);\n\n// 4. As-of query with snapshot1 returns FALSE (the old value!)\nconst asOf = await client.query(`\n  SELECT electric_exec_as_of($1::pg_snapshot, 'SELECT allowed FROM acl', '[]')\n`, [snapshot1]);\nexpect(asOf.rows[0].electric_exec_as_of[0].allowed).toBe(false);\n```\n\n## How It Works\n\n### 1. Logical Replication Stream (WAL Tailing)\n\nWe use `pg-logical-replication` to consume the PostgreSQL logical replication stream:\n\n```typescript\nservice.on('data', (lsn, message) =\u003e {\n  if (message.tag === 'begin') {\n    // Transaction started - track the xid\n    inFlightXids.add(BigInt(message.xid));\n  } else if (message.tag === 'commit') {\n    // Transaction committed - compute snapshot\n    const snapshot = computeSnapshotAfterCommit(xid, inFlightXids);\n    // This snapshot represents \"just after this commit\"\n  }\n});\n```\n\n### 2. Snapshot Computation\n\nWhen a transaction commits, we compute a snapshot string representing the database state immediately after:\n\n```typescript\nfunction computeSnapshotAfterCommit(committedXid, inFlightXids) {\n  const inFlightAfter = new Set(inFlightXids);\n  inFlightAfter.delete(committedXid);\n  \n  // xmax = one past the highest xid we've seen\n  const xmax = max([committedXid, ...inFlightAfter]) + 1n;\n  \n  // xmin = lowest still-in-flight xid, or xmax if none\n  const xmin = inFlightAfter.size \u003e 0 \n    ? min([...inFlightAfter]) \n    : xmax;\n  \n  // xip = in-flight transactions in range [xmin, xmax)\n  const xip = [...inFlightAfter]\n    .filter(x =\u003e x \u003e= xmin \u0026\u0026 x \u003c xmax)\n    .sort();\n  \n  return `${xmin}:${xmax}:${xip.join(',')}`;\n}\n```\n\n**Snapshot format: `xmin:xmax:xip1,xip2,...`**\n- `xmin`: Oldest transaction still in-progress (or xmax if none)\n- `xmax`: First transaction ID not yet assigned\n- `xip`: List of in-progress transaction IDs\n\n### 3. PostgreSQL C Extension\n\nThe `electric_exec_as_of` function:\n\n```sql\nSELECT electric_exec_as_of(\n  '762:763:'::pg_snapshot,           -- The historical snapshot\n  'SELECT * FROM users WHERE id=$1', -- Query to execute\n  '[\"user123\"]'::jsonb               -- Parameters\n);\n-- Returns: [{\"id\": \"user123\", \"name\": \"Alice\", ...}]\n```\n\n**Implementation:**\n\n1. **Parse the snapshot string** into xmin, xmax, and xip array\n2. **Create a custom SnapshotData** structure based on the current transaction's snapshot\n3. **Override MVCC fields** (xmin, xmax, xcnt, xip) with our historical values\n4. **Push the custom snapshot** using `PushActiveSnapshot()`\n5. **Execute the query via SPI** with the snapshot active\n6. **Return results as JSON**\n\n```c\n// Create custom snapshot from parsed values\nsnap = (Snapshot) MemoryContextAllocZero(TopTransactionContext, size);\nmemcpy(snap, base, sizeof(SnapshotData));\n\nsnap-\u003exmin = xmin;  // Our historical xmin\nsnap-\u003exmax = xmax;  // Our historical xmax\nsnap-\u003exip = xip;    // Our in-flight xid list\nsnap-\u003excnt = xcnt;\n\n// Execute with this snapshot\nPushActiveSnapshot(snap);\nret = SPI_execute(wrapped_sql, true, 0);\nPopActiveSnapshot();\n```\n\n### 4. MVCC Visibility Rules\n\nPostgreSQL's MVCC determines tuple visibility using:\n\n- **xmin**: Transaction that created this tuple version\n- **xmax**: Transaction that deleted/updated this tuple (0 if still current)\n\nA tuple is visible to snapshot S if:\n- `tuple.xmin \u003c S.xmax` AND `tuple.xmin` is committed AND `tuple.xmin` not in `S.xip`\n- AND (`tuple.xmax == 0` OR `tuple.xmax \u003e= S.xmax` OR `tuple.xmax` in `S.xip` OR `tuple.xmax` aborted)\n\nBy providing a historical snapshot, we see **exactly the tuples that were visible at that point in time**.\n\n## Use Case: ElectricSQL Sync/Auth Validation\n\nThis POC demonstrates the foundation for ElectricSQL-style sync validation:\n\n1. **Client syncs data** from a specific point in time\n2. **Server observes commits** via logical replication\n3. **Auth rules need validation** against the data state the client saw\n4. **Use `electric_exec_as_of`** to query with the client's snapshot\n5. **Validate permissions** against historical data, not current data\n\nExample: A user had read access to document X at sync time T1. By T2, their access was revoked. When validating operations from the T1 sync, we query with snapshot T1 to correctly see they DID have access.\n\n## Data Retention: The VACUUM Problem\n\n### Why Old Tuples Might Disappear\n\nPostgreSQL's VACUUM removes \"dead\" tuple versions that are no longer visible to any active snapshot. Without protection, our historical queries would fail.\n\n### Solution: Replication Slot Retention\n\nLogical replication slots track a `restart_lsn`. PostgreSQL will NOT vacuum tuples that might be needed by any slot:\n\n```sql\n-- Create a slot\nSELECT pg_create_logical_replication_slot('my_slot', 'pgoutput');\n\n-- Check the slot's restart_lsn\nSELECT slot_name, restart_lsn FROM pg_replication_slots;\n```\n\n**By not acknowledging WAL consumption**, the slot's `restart_lsn` stays old, preventing VACUUM from removing tuples needed for historical snapshots.\n\n### Current POC Approach\n\nThe tests disable autovacuum:\n```sql\nALTER TABLE acl SET (autovacuum_enabled = false);\n```\n\nIn production, you would:\n1. Keep replication slots for each \"active\" historical point\n2. Acknowledge WAL only up to the oldest needed snapshot\n3. Clean up slots when snapshots are no longer needed\n\n## Project Structure\n\n```\n/electric-snapshot-poc\n├── ext/electric_poc/           # PostgreSQL C extension\n│   ├── Makefile                # PGXS build file\n│   ├── electric_poc.control    # Extension metadata\n│   ├── electric_poc--0.0.1.sql # SQL function definition\n│   └── electric_poc.c          # C implementation (~300 lines)\n├── docker/\n│   └── Dockerfile              # Postgres 16 + extension image\n├── test/\n│   ├── package.json            # Node.js dependencies\n│   ├── vitest.config.ts        # Test configuration\n│   └── tests/\n│       ├── asof.spec.ts        # Main integration tests (10 tests)\n│       ├── vacuum-proof.spec.ts # Heap examination tests (2 tests)\n│       └── helpers/\n│           ├── postgres.ts     # Database connection helpers\n│           └── replication.ts  # WAL tailing + snapshot computation\n├── .gitignore\n└── README.md\n```\n\n## Running the Tests\n\n### Prerequisites\n\n- PostgreSQL 16 with development headers\n- Node.js 18+\n- Build tools (gcc, make)\n\n### Setup\n\n```bash\n# Install Postgres (Ubuntu/Debian)\nsudo apt-get install postgresql-16 postgresql-server-dev-16 build-essential\n\n# Build and install the extension\ncd ext/electric_poc\nmake \u0026\u0026 sudo make install\n\n# Configure Postgres for logical replication\n# Add to postgresql.conf:\n#   wal_level = logical\n#   max_replication_slots = 10\n#   max_wal_senders = 10\n\n# Restart Postgres\nsudo systemctl restart postgresql\n\n# Create test database\nsudo -u postgres createdb testdb\n\n# Install test dependencies and run\ncd test\nnpm install\nnpm test\n```\n\n### Expected Output\n\n```\n ✓ tests/asof.spec.ts (10 tests) 4492ms\n   ✓ Test 1 - Smoke test function (3 tests)\n   ✓ Test 2 - As-of hides later commit (1 test)\n   ✓ Test 3 - Guardrails (5 tests)\n   ✓ Test 4 - Stress test (1 test)\n\n ✓ tests/vacuum-proof.spec.ts (2 tests) 2613ms\n   ✓ should read old tuple versions after multiple updates\n   ✓ should verify multiple tuple versions exist in heap\n\n Test Files  2 passed (2)\n      Tests  12 passed (12)\n```\n\n## API Reference\n\n### `electric_exec_as_of(snapshot, sql, args)`\n\nExecute a read-only query under a historical MVCC snapshot.\n\n```sql\nSELECT electric_exec_as_of(\n  snapshot pg_snapshot,  -- Historical snapshot (e.g., '750:751:')\n  sql text,              -- SELECT query with $1, $2 placeholders\n  args jsonb             -- Parameter array: '[\"value1\", \"value2\"]'\n) RETURNS jsonb;         -- Array of result rows as JSON objects\n```\n\n**Example:**\n\n```sql\nSELECT electric_exec_as_of(\n  '750:751:'::pg_snapshot,\n  'SELECT * FROM users WHERE id = $1 AND active = $2',\n  '[\"user123\", \"true\"]'::jsonb\n);\n-- Returns: [{\"id\": \"user123\", \"name\": \"Alice\", \"active\": true}]\n```\n\n**Errors:**\n- Non-SELECT queries are rejected\n- Malformed snapshot strings cause errors\n- Only text parameters are supported (bound as `TEXTOID`)\n\n### `SET LOCAL electric.snapshot = '\u003cpg_snapshot text\u003e'` (transaction-scoped mode)\n\nInstall a **synthetic MVCC snapshot for the rest of the current transaction**, so you can run **normal SQL** (no wrapper) under that point-in-time view.\n\n**Usage:**\n\n```sql\nBEGIN ISOLATION LEVEL REPEATABLE READ;\nSET LOCAL electric.snapshot = 'xmin:xmax:xip1,xip2,...'; -- xip list may be empty\nSELECT ...;  -- runs under the synthetic snapshot\nCOMMIT;\n```\n\n**Important guardrails (enforced by the extension):**\n\n- **Must be inside an explicit transaction block** (`BEGIN ...`, not autocommit).\n- **Isolation level must be** `REPEATABLE READ` **or** `SERIALIZABLE` (i.e. uses a transaction snapshot).\n- **Must be set before the first query** in the transaction (before PostgreSQL fixes the transaction snapshot).\n- **Not supported in subtransactions** (e.g. after `SAVEPOINT`) in this POC.\n- Set `''` (empty string) to clear: `SET LOCAL electric.snapshot = ''`.\n\n**Notes:**\n\n- Snapshot format is `pg_snapshot`-style text: `xmin:xmax:xip_list`. Subxids are not tracked in this POC.\n- This implementation touches PostgreSQL snapshot internals and is **version-sensitive**; it’s intended for a POC.\n\n## Limitations\n\nThis is a proof-of-concept with known limitations:\n\n| Limitation | Description | Production Consideration |\n|------------|-------------|-------------------------|\n| No subtransaction support | `subxip` is always empty | May miss some edge cases |\n| Approximate snapshots | Based on BEGIN/COMMIT timing | Generally safe, may be slightly conservative |\n| Text-only parameters | All args bound as TEXT | Add type inference for production |\n| No VACUUM handling | Relies on disabled autovacuum | Use replication slot retention |\n| Single-threaded WAL tailing | One stream per test | Scale with multiple consumers |\n\n## Key Insights\n\n1. **PostgreSQL MVCC stores multiple tuple versions** - This is the foundation that makes point-in-time queries possible.\n\n2. **Snapshots are just metadata** - A snapshot is just xmin/xmax/xip values that determine visibility rules.\n\n3. **We can construct synthetic snapshots** - By observing transaction commits via WAL, we can build valid snapshots.\n\n4. **Custom snapshots work with SPI** - The `PushActiveSnapshot` API allows executing queries under any valid snapshot.\n\n5. **Data retention is separate from visibility** - Snapshots control what's *visible*, but we must also ensure old tuples *exist* (via slot retention or disabled vacuum).\n\n## References\n\n- [PostgreSQL MVCC Documentation](https://www.postgresql.org/docs/current/mvcc.html)\n- [pg_snapshot Type](https://www.postgresql.org/docs/current/datatype-pg-snapshot.html)\n- [Logical Replication Protocol](https://www.postgresql.org/docs/current/protocol-logical-replication.html)\n- [SPI (Server Programming Interface)](https://www.postgresql.org/docs/current/spi.html)\n- [pageinspect Extension](https://www.postgresql.org/docs/current/pageinspect.html)\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsamwillis%2Fpot-query-postgres-poc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsamwillis%2Fpot-query-postgres-poc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsamwillis%2Fpot-query-postgres-poc/lists"}