{"id":50416908,"url":"https://github.com/xataio/deltax","last_synced_at":"2026-05-31T06:30:29.164Z","repository":{"id":358949442,"uuid":"1171581416","full_name":"xataio/deltax","owner":"xataio","description":"Fast time-series extension for PostgreSQL","archived":false,"fork":false,"pushed_at":"2026-05-27T12:54:56.000Z","size":5678,"stargazers_count":184,"open_issues_count":1,"forks_count":7,"subscribers_count":3,"default_branch":"main","last_synced_at":"2026-05-27T13:07:44.976Z","etag":null,"topics":["compression","pgrx","postgres","postgresql","postgresql-extension","rust","time-series"],"latest_commit_sha":null,"homepage":"https://xata.io","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/xataio.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":"NOTICE","maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-03-03T11:43:04.000Z","updated_at":"2026-05-27T12:32:28.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/xataio/deltax","commit_stats":null,"previous_names":["xataio/deltax"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/xataio/deltax","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xataio%2Fdeltax","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xataio%2Fdeltax/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xataio%2Fdeltax/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xataio%2Fdeltax/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/xataio","download_url":"https://codeload.github.com/xataio/deltax/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xataio%2Fdeltax/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33722156,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-31T02:00:06.040Z","response_time":95,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compression","pgrx","postgres","postgresql","postgresql-extension","rust","time-series"],"created_at":"2026-05-31T06:30:28.707Z","updated_at":"2026-05-31T06:30:29.157Z","avatar_url":"https://github.com/xataio.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/xataio/xata/blob/main/LICENSE\"\u003e\u003cimg src=\"https://img.shields.io/badge/License-Apache_2.0-green\" alt=\"License - Apache 2.0\"\u003e\u003c/a\u003e\u0026nbsp;\n  \u003ca href=\"https://twitter.com/xata\"\u003e\u003cimg src=\"https://img.shields.io/badge/@xata-6c47ff?label=Follow\u0026logo=x\" alt=\"X (formerly Twitter) Follow\" /\u003e\u003c/a\u003e\u0026nbsp;\n  \u003ca href=\"https://bsky.app/profile/xata.io\"\u003e\u003cimg src=\"https://img.shields.io/badge/@xata-6c47ff?label=Follow\u0026logo=bluesky\" alt=\"Bluesky Follow\" /\u003e\u003c/a\u003e\u0026nbsp;\n  \u003ca href=\"https://www.youtube.com/@xataio\"\u003e\u003cimg src=\"https://img.shields.io/badge/@xataio-6c47ff?label=Youtube\u0026logo=youtube\" alt=\"Youtube Subscribe\" /\u003e\u003c/a\u003e\u0026nbsp;\n\u003c/p\u003e\n\n# DeltaX (δx) - Fast time-series extension for PostgreSQL\n\nDeltaX (δx) is a PostgreSQL extension offering compression and columnar storage for time-series \ndata. It can be used as a pure open-source (Apache 2.0) alternative to TimescaleDB or\nas a PostgreSQL-native alternative to dedicated analytics stores like ClickHouse, when\nyou'd like your data to stay in Postgres.\n\nδx stores the compressed columnar data in regular Postgres tables. It does _not_ use \nits own storage format on disk. The advantage of this approach is that features like \nphysical/logical replication, crash recovery, backups, and pg_dump work as for any other \nPostgres table.\n\n## Contents\n\n- [Benchmarks](#benchmarks)\n- [How it works](#how-it-works)\n- [Features](#features)\n- [Limitations](#limitations)\n- [Installation and quick start](#installation-and-quick-start)\n- [Correctness testing](#correctness-testing)\n- [Reference](#reference)\n- [How can I help](#how-can-i-help)\n- [License](#license)\n\n## Benchmarks\n\nThese results are as of May 19th, 2026.\n\n### ClickBench\n\nOn the [ClickBench](https://benchmark.clickhouse.com/) benchmark, which runs 43 analytical\nqueries against a web analytics dataset of 100M rows × 105 columns, δx currently ranks lower than \nspecialized analytical stores like ClickHouse and DuckDB, but it is the highest-ranking of \nall the systems that are storing the data in PostgreSQL.\n\nThe following screenshot contains a selection of Postgres extensions/projects + ClickHouse for reference.\nIt displays the \"combined\" metric, which is a weighted average combining hot times, cold time, load time,\nand storage size.\n\n\u003cimg src=\"images/clickbench-combined.png\" width=\"800\" alt=\"ClickBench combined: pg_deltax ranks in-between ClickHouse and TimescaleDB\"\u003e\n\n#### Compression / storage size\n\nLooking at the **compression ratio / storage size**, δx offers a compression ratio of about 7× on\nthis particular dataset. Compression ratios vary considerably by data characteristics.\n\n\u003cimg src=\"images/clickbench-storage-size.png\" width=\"800\" alt=\"ClickBench storage size: pg_deltax compression ratio is ~7x\"\u003e\n\n#### Load time\n\n\u003cimg src=\"images/clickbench-load-time.png\" width=\"800\" alt=\"ClickBench load times result\"\u003e\n\nNote: The reason δx can load the data faster than Postgres is that it has support for backfilling data directly\nfrom Parquet files. On a more standard setup where the data is loaded into normal Postgres tables and\nthen compressed, the load time would be similar to the PostgreSQL result plus the compression time.\n\n\n### JSONBench\n\n[JSONBench](https://jsonbench.com/) is a benchmark similar to ClickBench but for measuring performance\non semi-structured data. The dataset contains Bluesky firehose data exported as ndjson.\n\nδx has support for extracting particular fields from JSONB columns and compressing them with the same\ncolumnar algorithms as the native columns. This enables the following results on JSONBench.\n\n\u003cimg src=\"images/jsonbench-hot-run.png\" width=\"800\" alt=\"JSONBench hot run results\"\u003e\n\n\n## How it works\n\nLet's start with an example time-series table partitioned by a timestamp column. The data itself can be metrics, \nlogs, events, etc. Anything that contains a timestamp. PostgreSQL has built-in partitioning, so it's very common\nto partition time-series data in fixed-interval partitions (e.g. daily, weekly, or monthly). In our example, let's\nassume monthly. The partitioned table might look something like this:\n\n\u003cimg src=\"images/deltax-partitioned-table.png\" width=\"800\" alt=\"PostgreSQL partitioned table\"\u003e\n\nUnder typical time-series workloads, only the last partition (the current month) receives writes. The rest typically\nonly receive reads. Based on this observation, the idea is that we can compress older partitions so that they take \nless space.\n\n\u003cimg src=\"images/deltax-compressed-partitions.png\" width=\"800\" alt=\"Compressed partitions\"\u003e\n\nA naive way to do this is to compress all the data in a given partition with a single algorithm (say, LZ4). However,\nit turns out that compressing column by column has two important advantages:\n- we can use type-specific compression algorithms which can be a lot more efficient in compression.\n- if all the values of a given column are stored together one by one, filtering by that column becomes very efficient.\n\n\u003cimg src=\"images/deltax-columnar-compress.png\" width=\"400\" alt=\"Switching to columnar-oriented storage during compression\"\u003e\n\nIn other words, during the compression process, we also switch from row-oriented to column-oriented storage. This is \ndone on a per-segment basis, meaning that each partition is split into segments of roughly equal size (by default, 30K rows)\nand compressed segment by segment.\n\nδx is currently using the following algorithms to compress the data of columns of given types:\n\n- **Integers** (`int2`, `int4`, `int8`): tries three encodings, Constant (single repeated value), Frame-of-Reference + bit-packing (small range around a base), and Delta-Varint (variable-length encoded deltas between consecutive values), and picks whichever produces the smallest blob per segment.\n- **Floats** (`float4`, `float8`): Gorilla XOR encoding (the scheme from Facebook's Gorilla paper), which exploits the fact that consecutive floats in time-series data tend to share most of their binary representation.\n- **Timestamps and dates** (`timestamp`, `timestamptz`, `date`): Gorilla delta-of-delta encoding, very compact when timestamps are evenly or near-evenly spaced.\n- **Booleans** (`bool`): bitmap encoding, 1 bit per value.\n- **Text with low cardinality** (`text`, `varchar`, `bpchar`): dictionary encoding when cardinality is \u0026lt; 50% of rows and \u0026lt; 65,536 distinct values, with the dictionary indices optionally further LZ4-compressed.\n- **Text with high cardinality** (`text`, `varchar`, `bpchar`): block-LZ4 over the raw strings.\n- **JSONB** (`jsonb`): the raw JSONB bytes go through the same pipeline as text (dictionary or block-LZ4). In addition, when compression is enabled you can pass a `json_extract` spec to pull selected fields out of a JSONB column into synthetic columns of a chosen type (`text`, `bigint`, `timestamptz`, etc.) at compression time. These synthetic columns are then compressed with the matching type-specific codec above, just like native columns, and can be filtered, ordered, and aggregated on directly.\n\nAcross all of these, NULLs are extracted into a separate null bitmap before compression, so the codec only sees non-null values.\n\nDuring compression, δx also collects metadata about the values in each segment: \n\n- Time bounds and row count per segment.\n- Per-column min, max, sum, non-null count, and non-zero count.\n- Per-column distinct-value count.\n- Bloom filters for numeric, date, and timestamp columns.\n- Value-presence bitmaps for low-cardinality (≤32 distinct values per partition) text columns.\n- Per-row text-length sidecars: an LZ4-compressed array of character counts for every text column.\n\nThis metadata can be used during planning and execution to speed up queries, either by skipping segments that can't contribute to the result, or by answering queries directly from the metadata without touching the compressed blobs at all.\n\nThe compressed data and the metadata are stored in companion tables for each partition, with a layout carefully chosen to minimize IO for the usual access patterns. The companion tables are normal Postgres tables, meaning that they benefit from the Postgres infrastructure for replication and crash recovery. They are used transparently by the Postgres planner and executor hooks to speed up queries.\n\n\u003cimg src=\"images/deltax-compressed-columnar-partitions.png\" width=\"800\" alt=\"DeltaX compressed columnar partitions\"\u003e\n\nAn important design trade-off of δx is that compressed partitions become read-only. Writes to them are rejected and the only way to update individual rows is to decompress and re-compress the whole partition.\n\n\n## Features\n\nCurrent features include:\n\n**Storage \u0026 compression**\n\n- Auto-partitioning: turn any table with a timestamp column into a time-range partitioned table; out-of-range inserts land in a default partition.\n- Per-column codecs: type-specific compression (Gorilla XOR for floats, Gorilla delta-of-delta for timestamps, Constant / FOR + bit-packing / Delta-Varint for integers, dictionary + LZ4 for text, bitmap for booleans), best codec picked per segment.\n- Rich segment metadata: per-column min / max / sum / non-null / non-zero / distinct counts, bloom filters for numeric / date / timestamp columns, value-presence bitmaps for low-cardinality text, and per-row text-length sidecars.\n\n**Query path**\n\n- Transparent decompression: queries against compressed partitions work unchanged; the planner injects custom scan nodes that decompress on the fly.\n- Segment pruning: skip whole segments using time bounds, segment-by equality, min/max, bloom filters, value-presence bitmaps, or dictionary entries — before reading the compressed blob.\n- Vectorized batch filters: `=`, `\u003c\u003e`, `\u003c`, `\u003c=`, `\u003e`, `\u003e=`, `LIKE`, `IN` evaluated in tight Rust loops over decoded batches, bypassing PostgreSQL's per-row `ExecQual`.\n- Aggregate pushdown: `COUNT(*)`, `MIN` / `MAX`, `SUM`, `AVG`, `COUNT(col)`, and `GROUP BY` answered either from segment metadata or by a vectorized aggregator inside the scan node.\n- Top-N fast path: `ORDER BY ts LIMIT N` uses a two-pass scan that decodes only the sort column for most segments, then the remaining columns for the ~N winning rows.\n- Parallel aggregation: parallel-aware `Partial → Gather → FinalAgg` for `SUM` / `AVG` / `COUNT` with numeric `WHERE`.\n- Shared-memory blob cache: cross-backend DSA-backed cache of detoasted compressed blobs, so hot-cache scans don't pay TOAST cost.\n- Text-length sidecar fast path: `length(col)` / `col = ''` / `col \u003c\u003e ''` queries read a few-KB sidecar instead of detoasting the multi-MB text blob.\n\n**JSON field extraction**\n\n- Selective JSONB field extraction: pull selected JSON paths out of a JSONB column into synthetic typed columns at compression time and compress them with the matching native codec.\n- Automatic query rewrite: queries written against the original JSONB column (`data-\u003e\u003e'field'`-style chains) are transparently rewritten to read from the synthetic columns.\n\n**Ingest \u0026 operations**\n\n- Direct backfill: `COPY ... WITH (FORMAT deltax_compress)` writes straight to compressed companion tables from TSV / CSV / Parquet, bypassing the heap and its WAL / index / MVCC overhead.\n- Background worker: drains the default partition into proper ones, pre-creates future partitions, compresses partitions past `compress_after`, drops partitions past `drop_after`.\n- PostgreSQL 17 and 18 supported.\n\n## Limitations\n\n- Compressed partitions are read-only. Writes are rejected; whole-partition operations (`DROP`, `TRUNCATE`) still work. If you need to update individual rows in an old partition, you must decompress, modify, and re-compress.\n- No continuous (auto-refreshed) materialized aggregates yet. It is on our roadmap.\n- No offloading of old partitions to S3. Data tiering is on our roadmap.\n- Postgres 17 and 18 only.\n\n## Installation and quick start\n\n### Installation from deb file\n\nDownload the `.deb` matching your PG major version and architecture from the [latest release](https://github.com/xataio/pg_deltax/releases/latest), then:\n\n```sh\napt-get install -y ./pg-deltax-pg17_\u003cversion\u003e_amd64.deb\n```\n\nδx registers a background worker from `_PG_init`, so it must be in `shared_preload_libraries`:\n\n```sh\necho \"shared_preload_libraries = 'pg_deltax'\" \u003e\u003e $PGDATA/postgresql.conf\n# restart PostgreSQL, then:\npsql -c \"CREATE EXTENSION pg_deltax;\"\n```\n\n### Installation from source\n\nRequires a Rust toolchain, the PostgreSQL server dev headers (`postgresql-server-dev-17` or `-18` on Debian / Ubuntu), and `cargo-pgrx` matching the `pgrx` version in `Cargo.toml`:\n\n```sh\ncargo install cargo-pgrx --version 0.17.0 --locked\ncargo pgrx init --pg17=$(which pg_config)\n```\n\nThen build and install the extension into the PostgreSQL instance pointed at by `pg_config`:\n\n```sh\ncargo pgrx install --release --pg-config $(which pg_config) \\\n    --features pg17 --no-default-features\n```\n\nReplace `pg17` with `pg18` to target PostgreSQL 18. Then add `pg_deltax` to `shared_preload_libraries`, restart PostgreSQL, and `CREATE EXTENSION pg_deltax;` as above.\n\n### Quickstart\n\nEither install pg_deltax as per above or run in this repo:\n\n```sh\nmake run     # starts Postgres in docker with the extension loaded\nmake psql    # connects to it via psql\n```\n\nThen: \n\npg_deltax installs all its functions and internal catalog tables into a dedicated `deltax` schema (so its `time_bucket`, `first`, `last`, etc. don't collide with TimescaleDB or pg_duckdb). Call them schema-qualified as `deltax.\u003cfn\u003e(...)` — or `SET search_path TO public, deltax;` once if you'd rather call them bare.\n\n```sql\nCREATE EXTENSION IF NOT EXISTS pg_deltax;\nCREATE TABLE metrics (ts TIMESTAMPTZ NOT NULL, device TEXT, value FLOAT8);\nSELECT deltax.deltax_create_table('metrics', 'ts', '1 day');\n\n-- Insert ~100,000 rows spanning the last ~2.3 days across 5 devices. The past\n-- (sealed) partition needs enough data that the per-partition companion-table\n-- overhead is dwarfed by the compression savings — at very small scales the\n-- fixed overhead can actually grow the database after compression.\nINSERT INTO metrics (ts, device, value)\nSELECT\n    now() - (i * interval '2 seconds'),\n    'sensor-' || (i % 5),\n    20.0 + sin(i::float / 100) * 5\nFROM generate_series(0, 100000) AS i;\n\n-- simulate a bg worker run which would drain the defualt partition every 60s\nSELECT deltax.deltax_drain_default_partition('metrics');\n\n-- Check size and info before compression\nSELECT pg_size_pretty(pg_database_size(current_database())) AS size;\nSELECT * FROM deltax.deltax_partition_info('metrics');\n\n-- Compression — compresses every partition whose window is fully in the past\n-- (today's still-open partition is skipped). Normally done by the bg thread automatically.\nSELECT deltax.deltax_enable_compression('metrics', order_by =\u003e ARRAY['device', 'ts']);\nSELECT * FROM deltax.deltax_compress_all_partitions('metrics');\nSELECT * FROM deltax.deltax_compression_stats('metrics');\n\n-- Demo queries\nSELECT deltax.time_bucket('1 day', ts) AS day, avg(value) FROM metrics GROUP BY 1 ORDER BY 1;\nSELECT deltax.first(value, ts), deltax.last(value, ts) FROM metrics;\n\n-- Size reporting after compression — both the deltatable's catalog-truthful\n-- size and the whole database for comparison with the value above.\nSELECT pg_size_pretty(deltax.deltax_table_size('metrics'));\nSELECT pg_size_pretty(pg_database_size(current_database())) AS size;\n```\n\n## Correctness testing\n\nThe main correctness invariant in the test suite is: δx must always respond with the same results as plain Postgres returns from the uncompressed version of the table. Whenever the response is different, it is a bug. There are cases where this condition is relaxed: for example, on a `LIMIT 10` query, if the 10th row has ties, any of them is accepted. We have the following comparison policies:\n\n- `ordered_exact` — rows and row order must match exactly.\n- `unordered_exact` — row multiset must match, order is ignored.\n- `limit_ties` — relaxed policy for non-unique `ORDER BY ... LIMIT` cases; boundary rows can differ as long as they're tied with rows the other side returned.\n- `float_tolerant` — ordered comparison with a small numeric tolerance.\n\nWe have four layers of automated tests:\n\n- Rust unit tests (`make test`)\n- Integration tests (`make integration-test`): end-to-end tests against a running PostgreSQL with the extension loaded, run against both PG 17 and 18. They cover partitioning, compression / decompression round-trips, the background worker, parallel scans, parquet loading, JSONB field extraction, the blob cache, value bitmaps, meta-only aggregation, and more.\n- Plain-PG-vs-δx correctness harness (`make correctness`): the implementation of the invariant above. Loads identical logical data into a regular PostgreSQL table and a δx table, runs the same query against both, and compares the results. The suite covers aggregates, ordering, predicates, codec round-trips via direct backfill, planner-mode coverage, partition / segment edges, joins with uncompressed tables.\n- Benchmark correctness (e.g. `make -C clickbench verify`). The benchmark harnesses also act as cross-implementation parity checks, so a query that benchmarks fast but returns wrong results fails the run.\n\n## Reference\n\n- [Function reference](docs/FUNCTIONS.md) — partitioning, retention, compression, analytics, and blob-cache observability functions.\n- [Configuration reference](docs/CONFIGURATION.md) — all `pg_deltax.*` GUCs.\n- [Logical replication](docs/LOGICAL_REPLICATION.md) — setting up native PostgreSQL logical replication with pg_deltax-managed tables.\n\n## How can I help\n\nAt the moment, the best way to contribute to this project is to:\n\n- Spread the word: star the repo, post about it on social media, tell your friends.\n- If you have a use-case in your company where δx would be beneficial, please [get in touch](mailto:info@xata.io) and we'll evaluate if δx is ready for it, or what it would take to make it ready.\n- Ask your Postgres cloud provider to add support for δx. We'd like to explicitly encourage other Postgres cloud providers to adopt it.\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for the developer guide. We recommend getting in touch before contributing new features.\n\n## License\n\nLicensed under the Apache License, Version 2.0. See [LICENSE](LICENSE) for the full text.\n\n\u003cbr\u003e\n\u003cp align=\"right\"\u003eMade with 💜 by \u003ca href=\"https://xata.io\"\u003eXata 🦋\u003c/a\u003e\u003c/p\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxataio%2Fdeltax","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxataio%2Fdeltax","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxataio%2Fdeltax/lists"}