{"id":49283411,"url":"https://github.com/impossibleforge/pfc-kafka-consumer","last_synced_at":"2026-04-25T20:02:55.731Z","repository":{"id":353588589,"uuid":"1220064733","full_name":"ImpossibleForge/pfc-kafka-consumer","owner":"ImpossibleForge","description":"Kafka consumer that compresses log messages directly to PFC format","archived":false,"fork":false,"pushed_at":"2026-04-24T14:17:55.000Z","size":27,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-24T16:28:40.208Z","etag":null,"topics":["compression","confluent","kafka","log-management","logs","observability","pfc-jsonl","python","redpanda","s3"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ImpossibleForge.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-24T14:04:03.000Z","updated_at":"2026-04-24T14:17:59.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/ImpossibleForge/pfc-kafka-consumer","commit_stats":null,"previous_names":["impossibleforge/pfc-kafka-consumer"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/ImpossibleForge/pfc-kafka-consumer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ImpossibleForge%2Fpfc-kafka-consumer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ImpossibleForge%2Fpfc-kafka-consumer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ImpossibleForge%2Fpfc-kafka-consumer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ImpossibleForge%2Fpfc-kafka-consumer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ImpossibleForge","download_url":"https://codeload.github.com/ImpossibleForge/pfc-kafka-consumer/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ImpossibleForge%2Fpfc-kafka-consumer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32274987,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-25T18:29:39.964Z","status":"ssl_error","status_checked_at":"2026-04-25T18:29:32.149Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compression","confluent","kafka","log-management","logs","observability","pfc-jsonl","python","redpanda","s3"],"created_at":"2026-04-25T20:02:54.974Z","updated_at":"2026-04-25T20:02:55.723Z","avatar_url":"https://github.com/ImpossibleForge.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pfc-kafka-consumer\n\n**Kafka consumer for PFC-JSONL log compression** — consume messages from Kafka topics and compress them directly to `.pfc` format.\n\nCommits Kafka offsets **only after successful PFC compression** — no data loss if the process crashes mid-flight.\n\n[![License](https://img.shields.io/badge/license-MIT-blue)](LICENSE)\n[![Part of PFC-JSONL Ecosystem](https://img.shields.io/badge/ecosystem-PFC--JSONL-brightgreen)](https://github.com/ImpossibleForge/pfc-jsonl)\n\n---\n\n## How it fits in your pipeline\n\n```\nKafka / Redpanda\n      │  topic: app-logs, access-logs, ...\n      ▼\npfc-kafka-consumer          ← this service\n      │  pfc_jsonl compress  (after each rotation)\n      │  commit offsets      (only on success)\n      ▼\nkafka_20260115_100000.pfc   →  local disk or S3\n      │\n      ▼\nQuery with DuckDB / pfc-gateway\n```\n\n---\n\n## Quickstart\n\n### 1. Install\n\n```bash\npip install confluent-kafka toml\n# Optional S3 upload:\npip install boto3\n```\n\n### 2. Download pfc_jsonl binary\n\n```bash\n# Linux x86_64\ncurl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-linux-x86_64 \\\n     -o /usr/local/bin/pfc_jsonl \u0026\u0026 chmod +x /usr/local/bin/pfc_jsonl\n\n# macOS ARM64\ncurl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-macos-arm64 \\\n     -o /usr/local/bin/pfc_jsonl \u0026\u0026 chmod +x /usr/local/bin/pfc_jsonl\n```\n\n### 3. Configure\n\n```bash\ncp config/config.toml ./config.toml\n# Edit brokers, topics, group_id\n```\n\n### 4. Start\n\n```bash\npython pfc_kafka_consumer.py --config config.toml\n# 2026-01-15T10:00:00 [pfc-kafka] INFO pfc-kafka-consumer v0.1.0 started\n# 2026-01-15T10:00:00 [pfc-kafka] INFO Topics: ['app-logs'] | Group: pfc-consumer\n```\n\n---\n\n## Configuration\n\n```toml\n[kafka]\nbrokers           = [\"localhost:9092\"]\ntopics            = [\"app-logs\", \"access-logs\"]\ngroup_id          = \"pfc-consumer\"\nauto_offset_reset = \"earliest\"   # or \"latest\"\npoll_timeout_sec  = 1.0\nbatch_size        = 500\n\n# Optional auth\nsecurity_protocol = \"PLAINTEXT\"  # PLAINTEXT | SSL | SASL_PLAINTEXT | SASL_SSL\nsasl_mechanism    = \"\"           # PLAIN | SCRAM-SHA-256 | SCRAM-SHA-512\nsasl_username     = \"\"\nsasl_password     = \"\"\nssl_ca_location   = \"\"\n\n[buffer]\nrotate_mb             = 64\nrotate_sec            = 3600\noutput_dir            = \"/tmp/pfc-kafka\"\nprefix                = \"kafka\"\ncommit_after_compress = true     # safe default — commit only after successful compress\n\n[pfc]\nbinary = \"/usr/local/bin/pfc_jsonl\"\n\n[s3]\nenabled = false\nbucket  = \"my-log-archive\"\nprefix  = \"kafka-logs/\"\nregion  = \"us-east-1\"\n```\n\n---\n\n## Output format\n\nEach Kafka message becomes one flat JSONL line. JSON messages are merged; plain strings are wrapped.\n\n**JSON message:**\n```json\n{\"timestamp\": \"2026-01-15T10:00:00.123Z\", \"level\": \"ERROR\", \"service\": \"payment\"}\n```\n→ becomes:\n```json\n{\n  \"timestamp\": \"2026-01-15T10:00:00.123Z\",\n  \"level\": \"ERROR\",\n  \"service\": \"payment\",\n  \"_topic\": \"app-logs\",\n  \"_partition\": 2,\n  \"_offset\": 84712,\n  \"_kafka_timestamp\": \"2026-01-15T10:00:00.123Z\"\n}\n```\n\n**Plain string message:**\n```\n2026-01-15T10:00:00 ERROR payment failed\n```\n→ becomes:\n```json\n{\n  \"message\": \"2026-01-15T10:00:00 ERROR payment failed\",\n  \"timestamp\": \"2026-01-15T10:00:00.123Z\",\n  \"_topic\": \"app-logs\",\n  \"_partition\": 0,\n  \"_offset\": 12345,\n  \"_kafka_timestamp\": \"2026-01-15T10:00:00.123Z\"\n}\n```\n\n---\n\n## Offset commit safety\n\n`commit_after_compress = true` (default):\n- Messages are **not** committed to Kafka until the PFC file is written successfully\n- If the process crashes before compression completes, messages are re-consumed on restart\n- No data loss — at-least-once delivery guarantee\n\n`commit_after_compress = false`:\n- Offsets committed immediately after polling\n- Higher throughput, but messages may be lost if compression fails\n\n---\n\n## Confluent Cloud / MSK / Redpanda Cloud\n\n```toml\n[kafka]\nbrokers           = [\"pkc-xxxx.us-east-1.aws.confluent.cloud:9092\"]\nsecurity_protocol = \"SASL_SSL\"\nsasl_mechanism    = \"PLAIN\"\nsasl_username     = \"YOUR_API_KEY\"\nsasl_password     = \"YOUR_API_SECRET\"\n```\n\n---\n\n## Querying compressed logs\n\n```sql\n-- DuckDB\nINSTALL pfc FROM community;\nLOAD pfc;\n\nSELECT level, service, count(*)\nFROM read_pfc_jsonl('kafka_20260115_100000.pfc',\n                    ts_from=1768471200::BIGINT,\n                    ts_to=1768471500::BIGINT)\nWHERE line LIKE '%ERROR%'\nGROUP BY level, service\nORDER BY 3 DESC;\n```\n\n---\n\n## Running tests\n\n```bash\npip install pytest confluent-kafka toml\npytest tests/test_kafka_consumer.py tests/test_resilience.py -v\n\n# Full E2E (requires Docker):\npython3 tests/e2e_integration_test.py\n```\n\n---\n\n## Part of the PFC-JSONL Ecosystem\n\n| Repo | What it does |\n|------|-------------|\n| [pfc-jsonl](https://github.com/ImpossibleForge/pfc-jsonl) | Core compressor (BWT + rANS) |\n| [pfc-duckdb](https://github.com/ImpossibleForge/pfc-duckdb) | DuckDB community extension |\n| [pfc-fluentbit](https://github.com/ImpossibleForge/pfc-fluentbit) | Native Fluent Bit output plugin |\n| [pfc-vector](https://github.com/ImpossibleForge/pfc-vector) | High-performance HTTP ingest daemon |\n| [pfc-otel-collector](https://github.com/ImpossibleForge/pfc-otel-collector) | OpenTelemetry OTLP/HTTP exporter |\n| [pfc-gateway](https://github.com/ImpossibleForge/pfc-gateway) | HTTP query gateway |\n| [pfc-migrate](https://github.com/ImpossibleForge/pfc-migrate) | Migrate from gzip/zstd/S3/Azure/GCS |\n| **pfc-kafka-consumer** | **Kafka / Redpanda consumer** |\n| [pfc-grafana](https://github.com/ImpossibleForge/pfc-grafana) | Grafana data source plugin for PFC archives |\n\n---\n\n\n---\n\n## Disclaimer\n\nPFC-Kafka-Consumer is an independent open-source project and is not affiliated with, endorsed by, or associated with the Apache Software Foundation, Apache Kafka, or Confluent.\n## License\n\npfc-kafka-consumer (this repository) is released under the MIT License — see [LICENSE](LICENSE).\n\nThe PFC-JSONL binary (`pfc_jsonl`) is proprietary software — free for personal and open-source use. Commercial use requires a license: [info@impossibleforge.com](mailto:info@impossibleforge.com)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fimpossibleforge%2Fpfc-kafka-consumer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fimpossibleforge%2Fpfc-kafka-consumer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fimpossibleforge%2Fpfc-kafka-consumer/lists"}