{"id":49283611,"url":"https://github.com/impossibleforge/pfc-duckdb","last_synced_at":"2026-04-25T20:03:34.950Z","repository":{"id":348757132,"uuid":"1199757256","full_name":"ImpossibleForge/pfc-duckdb","owner":"ImpossibleForge","description":"DuckDB extension to read PFC-JSONL compressed log files with block-level timestamp filtering","archived":false,"fork":false,"pushed_at":"2026-04-21T15:33:14.000Z","size":364,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-21T17:36:06.431Z","etag":null,"topics":["analytics","compression","duckdb","duckdb-extension","jsonl","log-compression","sql","structured-logs"],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ImpossibleForge.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-02T17:16:28.000Z","updated_at":"2026-04-21T15:33:17.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/ImpossibleForge/pfc-duckdb","commit_stats":null,"previous_names":["impossibleforge/pfc-duckdb"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ImpossibleForge/pfc-duckdb","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ImpossibleForge%2Fpfc-duckdb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ImpossibleForge%2Fpfc-duckdb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ImpossibleForge%2Fpfc-duckdb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ImpossibleForge%2Fpfc-duckdb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ImpossibleForge","download_url":"https://codeload.github.com/ImpossibleForge/pfc-duckdb/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ImpossibleForge%2Fpfc-duckdb/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32274987,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-25T18:29:39.964Z","status":"ssl_error","status_checked_at":"2026-04-25T18:29:32.149Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analytics","compression","duckdb","duckdb-extension","jsonl","log-compression","sql","structured-logs"],"created_at":"2026-04-25T20:03:31.986Z","updated_at":"2026-04-25T20:03:34.928Z","avatar_url":"https://github.com/ImpossibleForge.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pfc — DuckDB Extension for PFC-JSONL\n\nYou have compressed log archives on disk. To query them you normally decompress everything first — even if you only need one hour out of thirty days.\n\nThis extension changes that. Query `.pfc` files directly from DuckDB SQL. A block index tells the extension exactly which chunks of the file to decompress — the rest stays compressed.\n\n\u003e **Requires:** The `pfc_jsonl` binary installed on your machine (Step 1 below). The extension calls it for decompression.\n\u003e\n\u003e **Platform:** Linux x86_64 and macOS Apple Silicon (ARM64). No native Windows binary — Windows users must use WSL2 or a Linux machine.\n\n```sql\nINSTALL pfc FROM community;\nLOAD pfc;\nLOAD json;\n\nSELECT\n    line-\u003e\u003e'$.level'   AS level,\n    line-\u003e\u003e'$.message' AS message\nFROM read_pfc_jsonl('/var/log/events.pfc')\nWHERE line-\u003e\u003e'$.level' = 'ERROR';\n```\n\n[![Awesome DuckDB](https://awesome.re/mentioned-badge.svg)](https://github.com/davidgasquez/awesome-duckdb)\n\n## What is PFC-JSONL?\n\n[PFC-JSONL](https://github.com/ImpossibleForge/pfc-jsonl) is a high-performance compressed log format built for structured (JSONL) data. It achieves **better compression than gzip and zstd** on real log data while supporting **random block access** — meaning you can decompress only the time range you need.\n\nKey properties:\n- Each file is split into independently compressible blocks\n- A `.pfc.bidx` binary index stores the byte offset and timestamp range of every block\n- The PFC binary can decompress any subset of blocks in a single call\n- **Free for personal and open-source use** — no account, no signup required\n\n## How It Works (Architecture)\n\n```\n┌──────────────────────────────────────────────────────────────┐\n│ DuckDB                                                       │\n│                                                              │\n│  SELECT * FROM read_pfc_jsonl('events.pfc', ts_from=...)     │\n│           │                                                  │\n│  ┌────────▼──────────┐    reads     ┌─────────────────────┐  │\n│  │  pfc extension    │─────────────▶│  events.pfc.bidx    │  │\n│  │  (MIT, open src)  │  block index │  (block timestamps) │  │\n│  └────────┬──────────┘              └─────────────────────┘  │\n│           │ popen() / subprocess                              │\n└───────────┼──────────────────────────────────────────────────┘\n            │\n            ▼\n  ┌─────────────────────┐\n  │  pfc_jsonl binary   │  ← proprietary, closed source\n  │  (v3.4+, local)     │    contains BWT+rANS compression\n  └─────────────────────┘\n            │\n            ▼\n  decompressed JSON lines → back to DuckDB\n```\n\nThe extension is a **thin open-source wrapper** — it reads the `.bidx` index in C++ to select which blocks are needed, then calls the PFC binary once to decompress only those blocks. The compression algorithm stays closed.\n\n## Installation\n\n### Step 1 — Install the PFC binary (once per machine)\n\nThe extension calls the `pfc_jsonl` binary for decompression.\nDownload the latest release for your platform:\n\n**Linux x64:**\n```bash\ncurl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-linux-x64 \\\n     -o /usr/local/bin/pfc_jsonl\nchmod +x /usr/local/bin/pfc_jsonl\npfc_jsonl --help      # verify install\n```\n\n**macOS (Apple Silicon M1/M2/M3/M4):**\n```bash\ncurl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-macos-arm64 \\\n     -o /usr/local/bin/pfc_jsonl\nchmod +x /usr/local/bin/pfc_jsonl\npfc_jsonl --help      # verify install\n```\n\n\u003e **macOS Intel (x64):** Binary coming soon.\n\n\u003e **Custom path:** Set `PFC_JSONL_BINARY=/path/to/pfc_jsonl` in your environment to override the default `/usr/local/bin/pfc_jsonl`.\n\n### Step 2 — Install the DuckDB extension\n\n```sql\nINSTALL pfc FROM community;\nLOAD pfc;\n```\n\n### Build from source (developers / early access)\n\n```bash\ngit clone --recurse-submodules https://github.com/ImpossibleForge/pfc-duckdb\ncd pfc-duckdb\nGEN=ninja make release\n# Extension at: build/release/extension/pfc/pfc.duckdb_extension\n```\n\n## Usage\n\n### Basic query\n\n```sql\nLOAD pfc;\n\nSELECT line FROM read_pfc_jsonl('/path/to/file.pfc');\n```\n\nEach row contains one raw JSON string in the `line` column.\nUse the DuckDB `json` extension to parse fields:\n\n```sql\nLOAD json;\n\nSELECT\n    line-\u003e\u003e'$.timestamp' AS ts,\n    line-\u003e\u003e'$.level'     AS level,\n    line-\u003e\u003e'$.message'   AS message,\n    line-\u003e\u003e'$.service'   AS service\nFROM read_pfc_jsonl('/path/to/file.pfc');\n```\n\n### Timestamp-based block filtering\n\nPFC files include a `.pfc.bidx` index with the timestamp range of each block.\nPass `ts_from` and/or `ts_to` (Unix seconds) to skip entire blocks before decompression:\n\n```sql\n-- Only decompress blocks that overlap the given time window\nSELECT line\nFROM read_pfc_jsonl(\n    '/path/to/file.pfc',\n    ts_from = 1735689600,   -- 2026-01-01 00:00:00 UTC\n    ts_to   = 1735775999    -- 2026-01-01 23:59:59 UTC\n);\n```\n\nConvert a timestamp string to Unix seconds with `epoch()`:\n\n```sql\nSELECT line\nFROM read_pfc_jsonl(\n    '/path/to/file.pfc',\n    ts_from = epoch(TIMESTAMPTZ '2026-01-01 00:00:00+00'),\n    ts_to   = epoch(TIMESTAMPTZ '2026-01-02 00:00:00+00')\n);\n```\n\n### Combining block filter and row filter\n\n`ts_from`/`ts_to` skip entire **blocks** (coarse, fast).\nAdd a `WHERE` clause for **row-level** precision:\n\n```sql\nLOAD json;\n\nSELECT line-\u003e\u003e'$.message' AS msg\nFROM read_pfc_jsonl(\n    '/var/log/api.pfc',\n    ts_from = epoch(TIMESTAMPTZ '2026-03-15 08:00:00+00'),\n    ts_to   = epoch(TIMESTAMPTZ '2026-03-15 10:00:00+00')\n)\nWHERE line-\u003e\u003e'$.level' = 'ERROR';\n```\n\n### Analytics examples\n\n```sql\nLOAD json;\n\n-- Error rate per hour\nSELECT\n    strftime(to_timestamp((line-\u003e\u003e'$.ts')::BIGINT), '%Y-%m-%d %H:00') AS hour,\n    count(*) FILTER (WHERE line-\u003e\u003e'$.level' = 'ERROR') AS errors,\n    count(*)                                            AS total\nFROM read_pfc_jsonl('/var/log/api.pfc')\nGROUP BY hour ORDER BY hour;\n\n-- Top 10 slowest endpoints\nSELECT\n    line-\u003e\u003e'$.path'                              AS endpoint,\n    avg((line-\u003e\u003e'$.duration_ms')::DOUBLE)        AS avg_ms,\n    count(*)                                     AS requests\nFROM read_pfc_jsonl('/var/log/api.pfc')\nGROUP BY endpoint ORDER BY avg_ms DESC LIMIT 10;\n```\n\n## API Reference\n\n### `read_pfc_jsonl(path [, ts_from, ts_to])`\n\n| Parameter | Type    | Default | Description |\n|-----------|---------|---------|-------------|\n| `path`    | VARCHAR | —       | Path to the `.pfc` file. A `.pfc.bidx` index must exist at `path + \".bidx\"`. |\n| `ts_from` | BIGINT  | 0       | Lower bound for block selection (Unix seconds). `0` = no lower bound. |\n| `ts_to`   | BIGINT  | 0       | Upper bound for block selection (Unix seconds). `0` = no upper bound. |\n\n**Returns:** table with one column `line VARCHAR` — one row per decompressed JSON line.\n\n**Block filtering semantics:**\nA block is included if its timestamp range `[ts_start, ts_end]` overlaps `[ts_from, ts_to]`.\nBlocks with unknown timestamps are always included.\nIf both `ts_from` and `ts_to` are `0`, all blocks are read.\n\n## File Requirements\n\n| File | Required | Description |\n|------|----------|-------------|\n| `file.pfc` | yes | Compressed PFC-JSONL file |\n| `file.pfc.bidx` | yes | Binary block index (requires PFC-JSONL v3.4+) |\n\nGenerate both with the PFC binary:\n\n```bash\npfc_jsonl compress input.jsonl output.pfc\n# Produces: output.pfc  +  output.pfc.bidx\n```\n\n\u003e **Note:** The Docker image on Docker Hub (`impossibleforge/pfc-jsonl`) is a server-side compression tool. It is **not** required for using the DuckDB extension — you only need the standalone `pfc_jsonl` binary from GitHub Releases.\n\n## Performance\n\nBlock-level filtering can skip the majority of a file.\nExample: 30-day log file, 720 hourly blocks — a 1-hour query reads **1 block** instead of 720.\n\n| Query range | Blocks read | Speedup (720-block file) |\n|-------------|-------------|--------------------------|\n| 30 days     | 720/720     | 1×                       |\n| 1 day       | ~24/720     | ~30×                     |\n| 1 hour      | ~1/720      | ~720×                    |\n\n\n---\n\n## Disclaimer\n\nPFC-DuckDB is an independent open-source project and is not affiliated with, endorsed by, or associated with the DuckDB Foundation or DuckDB Labs.\n## License\n\nThe PFC-JSONL binary is **free for personal and open-source use** — no account, no signup, no phone-home.\n\nCommercial use requires a license. Contact: [info@impossibleforge.com](mailto:info@impossibleforge.com)\n\n## Troubleshooting\n\n**`Cannot open index file: /path/to/file.pfc.bidx`**\nThe `.pfc.bidx` index is missing. Compress with PFC-JSONL v3.4+:\n```bash\npfc_jsonl compress input.jsonl output.pfc\n```\n\n\n**`PFC binary not found at '/usr/local/bin/pfc_jsonl'`**\nBinary is missing or not executable. Re-run the curl install command, or set `PFC_JSONL_BINARY=/path/to/pfc_jsonl`.\n\n**`popen() failed — could not start PFC binary subprocess`**\nThe extension uses `popen()` to call the PFC binary. Windows is not supported — use WSL2 or a Linux machine.\n\n**`ts_from (...) must be \u003c= ts_to (...)`**\nYou passed an inverted time range. Swap the values so `ts_from` comes before `ts_to`.\n\n## Related Projects\n\n| Project | Description |\n|---------|-------------|\n| [pfc-jsonl](https://github.com/ImpossibleForge/pfc-jsonl) | The core binary — compress, decompress, query |\n| [pfc-fluentbit](https://github.com/ImpossibleForge/pfc-fluentbit) | Stream Fluent Bit logs directly to `.pfc` archives |\n| [pfc-migrate](https://github.com/ImpossibleForge/pfc-migrate) | Convert existing gzip/zstd/lz4 archives to PFC — local, S3, Azure, GCS |\n| [pfc-jsonl (PyPI)](https://pypi.org/project/pfc-jsonl/) | Python package — `pip install pfc-jsonl` |\n| [pfc-vector](https://github.com/ImpossibleForge/pfc-vector) | High-performance Rust ingest daemon for Vector.dev and Telegraf |\n| [pfc-otel-collector](https://github.com/ImpossibleForge/pfc-otel-collector) | OpenTelemetry OTLP/HTTP log exporter |\n| [pfc-kafka-consumer](https://github.com/ImpossibleForge/pfc-kafka-consumer) | Kafka / Redpanda consumer |\n| [pfc-telegraf](https://github.com/ImpossibleForge/pfc-telegraf) | Telegraf HTTP output plugin → PFC |\n| [pfc-grafana](https://github.com/ImpossibleForge/pfc-grafana) | Grafana data source plugin for PFC archives |\n\n---\n\n## License\n\nThe **pfc DuckDB extension** (this repository) is released under the **MIT License** — see [LICENSE](https://github.com/ImpossibleForge/pfc-duckdb/blob/main/LICENSE).\n\nThe **PFC-JSONL binary** (`pfc_jsonl`) is proprietary software — free for personal and open-source use. Commercial use requires a license: [info@impossibleforge.com](mailto:info@impossibleforge.com)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fimpossibleforge%2Fpfc-duckdb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fimpossibleforge%2Fpfc-duckdb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fimpossibleforge%2Fpfc-duckdb/lists"}