{"id":49283681,"url":"https://github.com/impossibleforge/pfc-jsonl","last_synced_at":"2026-04-25T20:03:57.696Z","repository":{"id":348531049,"uuid":"1198473087","full_name":"ImpossibleForge/pfc-jsonl","owner":"ImpossibleForge","description":"High-ratio JSONL compressor with block-level random access. ~9% ratio on log data - 25% smaller than gzip, 37% smaller than zstd. Free for personal and open-source use.","archived":false,"fork":false,"pushed_at":"2026-04-21T15:32:41.000Z","size":89,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-04-21T17:36:07.567Z","etag":null,"topics":["cli","compression","data-compression","duckdb","fluent-bit","jsonl","log-compression","structured-logs"],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ImpossibleForge.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-01T13:07:21.000Z","updated_at":"2026-04-21T15:32:45.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/ImpossibleForge/pfc-jsonl","commit_stats":null,"previous_names":["impossibleforge/pfc-jsonl"],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/ImpossibleForge/pfc-jsonl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ImpossibleForge%2Fpfc-jsonl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ImpossibleForge%2Fpfc-jsonl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ImpossibleForge%2Fpfc-jsonl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ImpossibleForge%2Fpfc-jsonl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ImpossibleForge","download_url":"https://codeload.github.com/ImpossibleForge/pfc-jsonl/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ImpossibleForge%2Fpfc-jsonl/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32274987,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-25T18:29:39.964Z","status":"ssl_error","status_checked_at":"2026-04-25T18:29:32.149Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cli","compression","data-compression","duckdb","fluent-bit","jsonl","log-compression","structured-logs"],"created_at":"2026-04-25T20:03:56.586Z","updated_at":"2026-04-25T20:03:57.689Z","avatar_url":"https://github.com/ImpossibleForge.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# PFC-JSONL — High-Ratio JSONL Compressor with Block-Level Random Access\n\nMost log archives are compressed. Most queries touch one hour. Most tools make you decompress everything anyway.\n\nPFC-JSONL stores a block index alongside every compressed file. Query a time window with DuckDB and only the relevant blocks are decompressed — the rest stays on disk, untouched.\n\n\u003e **~9% compression ratio** (25% smaller than gzip, 37% smaller than zstd on typical JSONL logs).\n\u003e **30×–700× faster** time-range queries vs. full-file decompression.\n\n[![License: Free for personal use](https://img.shields.io/badge/License-Free%20for%20personal%20use-blue.svg)](https://github.com/ImpossibleForge/pfc-jsonl/blob/main/LICENSE)\n[![Version](https://img.shields.io/badge/Version-3.4.4-green.svg)]()\n[![DuckDB Extension](https://img.shields.io/badge/DuckDB-Extension-orange.svg)](https://github.com/ImpossibleForge/pfc-duckdb)\n[![Fluent Bit](https://img.shields.io/badge/Fluent%20Bit-Ready-green.svg)](https://github.com/ImpossibleForge/pfc-fluentbit)\n[![Vector](https://img.shields.io/badge/Vector-Sink-6B40BF.svg)](https://github.com/ImpossibleForge/pfc-vector)\n[![Telegraf](https://img.shields.io/badge/Telegraf-Plugin-00acac.svg)](https://github.com/ImpossibleForge/pfc-telegraf)\n[![OpenTelemetry](https://img.shields.io/badge/OpenTelemetry-Collector-425CC7.svg)](https://github.com/ImpossibleForge/pfc-otel-collector)\n[![PyPI](https://img.shields.io/badge/PyPI-pfc--jsonl-blue.svg)](https://pypi.org/project/pfc-jsonl/)\n[![Awesome DuckDB](https://awesome.re/mentioned-badge.svg)](https://github.com/davidgasquez/awesome-duckdb)\n[![Awesome Observability](https://awesome.re/mentioned-badge.svg)](https://github.com/adriannovegil/awesome-observability)\n\n---\n\n## Why PFC-JSONL?\n\n| Tool | Ratio on JSONL Logs | Block Access | 10 TB archive, 1h query |\n|------|---------------------|--------------|-------------------------|\n| **PFC-JSONL** | **~9 %** | ✅ Block-level | **~26 MB download** |\n| gzip | ~12 % | ❌ Full file | ~1.43 TB |\n| zstd | ~14.2 % | ❌ Full file | ~1.43 TB |\n\n**PFC-JSONL default is 25% smaller than gzip and 37% smaller than zstd at typical settings.**\nRatios measured on 200 MB JSONL log data (8 services, mixed log levels, ~961K lines), PFC-JSONL v3.4.\n\n---\n\n## Install\n\n### Linux x86_64 \u0026 macOS ARM64 — Direct Binary\n\n**Linux x86_64:**\n```bash\ncurl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-linux-x64 \\\n     -o /usr/local/bin/pfc_jsonl \u0026\u0026 chmod +x /usr/local/bin/pfc_jsonl\n\npfc_jsonl --help\n```\n\n**macOS (Apple Silicon M1/M2/M3/M4):**\n```bash\ncurl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-macos-arm64 \\\n     -o /usr/local/bin/pfc_jsonl \u0026\u0026 chmod +x /usr/local/bin/pfc_jsonl\n\npfc_jsonl --help\n```\n\n\u003e **macOS Intel (x64):** Binary coming soon. Contact: info@impossibleforge.com\n\u003e **Windows:** No native binary. Use WSL2 or a Linux machine.\n\n---\n\n## DuckDB Extension\n\nQuery `.pfc` files directly from DuckDB SQL — no intermediate decompression step:\n\n```sql\nINSTALL pfc FROM community;\nLOAD pfc;\nLOAD json;\n\n-- Read all lines\nSELECT line-\u003e\u003e'$.level' AS level, line-\u003e\u003e'$.message' AS msg\nFROM read_pfc_jsonl('/path/to/events.pfc')\nLIMIT 10;\n\n-- Block-level timestamp filter: only decompress relevant blocks\nSELECT count(*)\nFROM read_pfc_jsonl(\n    '/path/to/events.pfc',\n    ts_from = epoch(TIMESTAMPTZ '2026-01-01 00:00:00+00'),\n    ts_to   = epoch(TIMESTAMPTZ '2026-01-02 00:00:00+00')\n);\n```\n\nThe DuckDB extension calls `pfc_jsonl` as a subprocess. Install the binary first (see above).\nSee [pfc-duckdb on GitHub](https://github.com/ImpossibleForge/pfc-duckdb) for manual install instructions.\n\n---\n\n## Ingest — Send Data to PFC\n\nPlug PFC-JSONL into your existing logging or metrics pipeline. All ingest tools buffer data locally, compress when the buffer is full, and optionally upload to S3.\n\n| Tool | Protocol / Format | Port | Repo |\n|------|-------------------|------|------|\n| **[pfc-fluentbit](https://github.com/ImpossibleForge/pfc-fluentbit)** | Fluent Bit output plugin | — | Fluent Bit → `.pfc` |\n| **[pfc-vector](https://github.com/ImpossibleForge/pfc-vector)** | HTTP sink (JSON / NDJSON) | 8766 | Vector.dev → `.pfc` |\n| **[pfc-telegraf](https://github.com/ImpossibleForge/pfc-telegraf)** | HTTP (InfluxDB line protocol + JSON) | 8767 | Telegraf → `.pfc` |\n| **[pfc-otel-collector](https://github.com/ImpossibleForge/pfc-otel-collector)** | OTLP/HTTP (logs, traces, metrics) | 4318 | OpenTelemetry → `.pfc` |\n| **[pfc-kafka-consumer](https://github.com/ImpossibleForge/pfc-kafka-consumer)** | Kafka / Redpanda consumer | — | Kafka topic → `.pfc` |\n| **[pfc-gateway](https://github.com/ImpossibleForge/pfc-gateway)** ↕ | HTTP REST `POST /ingest` | 8765 | Any source → `.pfc` (+ query) |\n\n\u003e **pfc-gateway** is bidirectional — it accepts ingest via `POST /ingest` and serves queries via `POST /query`. No DuckDB required.\n\n---\n\n## Query — Read PFC Archives\n\n### DuckDB Extension\nThe fastest way to query `.pfc` archives locally — see the [DuckDB Extension](#duckdb-extension) section above.\n\n### pfc-gateway — HTTP REST API\n\nQuery `.pfc` archives over HTTP without DuckDB — works with any language, curl, Grafana, or PowerBI:\n\n```bash\n# Start the gateway (points at your archive directory)\nPFC_ARCHIVE_DIR=/var/lib/pfc PFC_API_KEY=secret \\\n  python3 pfc_gateway.py --port 8765\n\n# Query a time range\ncurl -X POST http://localhost:8765/query \\\n  -H \"x-api-key: secret\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"file\": \"/var/lib/pfc/logs_20260101.pfc\",\n    \"from_ts\": \"2026-01-01T10:00:00Z\",\n    \"to_ts\":   \"2026-01-01T11:00:00Z\"\n  }'\n\n# Query multiple files at once\ncurl -X POST http://localhost:8765/query/batch \\\n  -H \"x-api-key: secret\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"files\": [\"/var/lib/pfc/logs_20260101.pfc\", \"/var/lib/pfc/logs_20260102.pfc\"]}'\n```\n\nAlso supports **Grafana SimpleJSON** — point the Grafana data source at `http://localhost:8765/grafana`.\nSee [pfc-gateway on GitHub](https://github.com/ImpossibleForge/pfc-gateway) for full documentation.\n\n---\n\n## Migrate Existing Archives\n\nAlready have logs stored as gzip, zstd, bzip2, or lz4 — on disk, on S3, on Azure, or on GCS?\n\n**[pfc-migrate](https://github.com/ImpossibleForge/pfc-migrate)** converts them in one command, directly in your storage (no egress charges):\n\n```bash\npip install pfc-migrate[all]\n\n# Local\npfc-migrate convert --dir /var/log/archive/ --output-dir /var/log/pfc/ -v\n\n# S3\npfc-migrate s3 --bucket my-logs --prefix 2025/ --out-bucket my-logs-pfc --out-prefix pfc/\n\n# Azure Blob\npfc-migrate azure --container my-logs --prefix 2025/ --out-container my-logs-pfc --connection-string \"...\"\n\n# GCS\npfc-migrate gcs --bucket my-logs --prefix 2025/ --out-bucket my-logs-pfc\n```\n\n---\n\n## Python Package\n\nUse the [pfc Python package](https://github.com/ImpossibleForge/pfc-py) (PyPI: `pfc-jsonl`) to compress, decompress, and query `.pfc` files from Python:\n\n```bash\npip install pfc-jsonl\n```\n\n```python\nimport pfc\n\npfc.compress(\"logs/app.jsonl\", \"logs/app.pfc\")\npfc.query(\"logs/app.pfc\",\n          from_ts=\"2026-01-15T08:00:00\",\n          to_ts=\"2026-01-15T09:00:00\",\n          output_path=\"logs/morning.jsonl\")\n```\n\n---\n\n## Commands\n\n| Command | Description |\n|---------|-------------|\n| `pfc_jsonl compress \u003cinput\u003e \u003coutput\u003e` | Compress JSONL → `.pfc` + `.pfc.bidx` |\n| `pfc_jsonl decompress \u003cinput\u003e \u003coutput\u003e` | Full decompression |\n| `pfc_jsonl query \u003cinput\u003e --from X --to Y --out \u003coutput\u003e` | Decompress blocks matching time range |\n| `pfc_jsonl seek-block N \u003cinput\u003e [output]` | Extract single block by index |\n| `pfc_jsonl seek-blocks \u003cinput\u003e --blocks N [N...]` | Extract multiple blocks (DuckDB primitive) |\n| `pfc_jsonl info \u003cinput\u003e` | Show block table + timestamp ranges |\n\n---\n\n## Input Format\n\nOne JSON object per line with a timestamp field:\n\n```json\n{\"timestamp\": \"2025-01-15T06:32:11Z\", \"level\": \"ERROR\", \"service\": \"api\", \"msg\": \"timeout\"}\n{\"timestamp\": \"2025-01-15T06:32:12Z\", \"level\": \"INFO\",  \"service\": \"db\",  \"msg\": \"query_ok\"}\n```\n\nSupported timestamp fields: `timestamp`, `ts`, `time`, `@timestamp` (ISO 8601 or Unix epoch seconds).\n\n---\n\n## How It Works\n\nPFC divides JSONL logs into independent blocks (configurable, default 32 MiB).\nEach block is compressed with: **BWT → MTF → RLE → rANS O1**.\nBlock timestamp ranges are stored in `.pfc.bidx` (32 bytes/block, binary, C++-readable).\n\nTo query a time range, only the relevant blocks are decompressed — the rest is never read.\n\n---\n\n## Related Repos\n\n**Ingest**\n- [pfc-fluentbit](https://github.com/ImpossibleForge/pfc-fluentbit) — Fluent Bit output plugin → PFC\n- [pfc-vector](https://github.com/ImpossibleForge/pfc-vector) — Vector.dev HTTP sink → PFC (Rust)\n- [pfc-telegraf](https://github.com/ImpossibleForge/pfc-telegraf) — Telegraf HTTP output plugin → PFC\n- [pfc-otel-collector](https://github.com/ImpossibleForge/pfc-otel-collector) — OpenTelemetry OTLP/HTTP → PFC\n- [pfc-kafka-consumer](https://github.com/ImpossibleForge/pfc-kafka-consumer) — Kafka / Redpanda consumer → PFC\n- [pfc-grafana](https://github.com/ImpossibleForge/pfc-grafana) — Grafana data source plugin for PFC archives\n\n**Query \u0026 Gateway**\n- [pfc-gateway](https://github.com/ImpossibleForge/pfc-gateway) — HTTP REST API: ingest + query, no DuckDB required\n- [pfc-duckdb](https://github.com/ImpossibleForge/pfc-duckdb) — DuckDB community extension for SQL queries on PFC files\n\n**Archive \u0026 Migration**\n- [pfc-migrate](https://github.com/ImpossibleForge/pfc-migrate) — convert gzip/zstd/lz4/bz2 archives → PFC (local, S3, Azure, GCS)\n- [pfc-archiver-cratedb](https://github.com/ImpossibleForge/pfc-archiver-cratedb) — autonomous archive daemon for CrateDB\n- [pfc-archiver-questdb](https://github.com/ImpossibleForge/pfc-archiver-questdb) — autonomous archive daemon for QuestDB\n\n**SDK**\n- [pfc-py](https://github.com/ImpossibleForge/pfc-py) — Python client library (PyPI: `pfc-jsonl`)\n\n---\n\n## License\n\nPFC-JSONL is **free for personal and open-source use**.\n\nCommercial use (production pipelines, paid services, or business operations) requires a license.\nContact: **info@impossibleforge.com**\n\n---\n\n*Built by [ImpossibleForge](https://github.com/ImpossibleForge)*\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fimpossibleforge%2Fpfc-jsonl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fimpossibleforge%2Fpfc-jsonl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fimpossibleforge%2Fpfc-jsonl/lists"}