{"id":13432792,"url":"https://github.com/influxdata/rskafka","last_synced_at":"2025-05-16T13:03:33.270Z","repository":{"id":38442853,"uuid":"444345682","full_name":"influxdata/rskafka","owner":"influxdata","description":"A minimal Rust client for Apache Kafka","archived":false,"fork":false,"pushed_at":"2025-03-26T16:33:38.000Z","size":877,"stargazers_count":310,"open_issues_count":15,"forks_count":40,"subscribers_count":10,"default_branch":"main","last_synced_at":"2025-05-11T10:50:07.594Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/influxdata.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2022-01-04T08:42:52.000Z","updated_at":"2025-05-07T15:15:38.000Z","dependencies_parsed_at":"2024-01-16T01:26:03.821Z","dependency_job_id":"a7b29a7f-9ed4-4c80-9811-105b24ddd440","html_url":"https://github.com/influxdata/rskafka","commit_stats":{"total_commits":378,"total_committers":12,"mean_commits":31.5,"dds":0.3306878306878307,"last_synced_commit":"114847f42517f7215392bc490b876760a16a0672"},"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/influxdata%2Frskafka","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/influxdata%2Frskafka/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/influxdata%2Frskafka/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/influxdata%2Frskafka/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/influxdata","download_url":"https://codeload.github.com/influxdata/rskafka/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254535826,"owners_count":22087398,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T02:01:16.641Z","updated_at":"2025-05-16T13:03:33.213Z","avatar_url":"https://github.com/influxdata.png","language":"Rust","funding_links":[],"categories":["Rust"],"sub_categories":[],"readme":"# RSKafka\n\n[![CircleCI](https://circleci.com/gh/influxdata/rskafka/tree/main.svg?style=shield\u0026circle-token=531ba1f38035a10da6dbf7cc71e6f55eff496c70)](https://circleci.com/gh/influxdata/rskafka/tree/main)\n[![Crates.io](https://img.shields.io/crates/v/rskafka)](https://crates.io/crates/rskafka)\n[![Documentation](https://img.shields.io/docsrs/rskafka)](https://docs.rs/crate/rskafka/latest)\n[![License](https://img.shields.io/crates/l/rskafka)](#license)\n\nThis crate aims to be a minimal Kafka implementation for simple workloads that wish to use Kafka as a distributed\nwrite-ahead log.\n\nIt is **not** a general-purpose Kafka implementation, instead it is heavily optimised for simplicity, both in terms of\nimplementation and its emergent operational characteristics. It was originally used by [InfluxDB 3.0] but no longer is.\n\nThis crate has:\n\n* No support for offset tracking, consumer groups, transactions, etc...\n* No built-in buffering, aggregation, linger timeouts, etc...\n* Independent write streams per partition\n\nIt will be a good fit for workloads that:\n\n* Perform offset tracking independently of Kafka\n* Read/Write reasonably sized payloads per-partition\n* Have a low number of high-throughput partitions [^1]\n\n\n## Usage\n\n```rust,no_run\n# async fn test() {\nuse rskafka::{\n    client::{\n        ClientBuilder,\n        partition::{Compression, UnknownTopicHandling},\n    },\n    record::Record,\n};\nuse chrono::{TimeZone, Utc};\nuse std::collections::BTreeMap;\n\n// setup client\nlet connection = \"localhost:9093\".to_owned();\nlet client = ClientBuilder::new(vec![connection]).build().await.unwrap();\n\n// create a topic\nlet topic = \"my_topic\";\nlet controller_client = client.controller_client().unwrap();\ncontroller_client.create_topic(\n    topic,\n    2,      // partitions\n    1,      // replication factor\n    5_000,  // timeout (ms)\n).await.unwrap();\n\n// get a partition-bound client\nlet partition_client = client\n    .partition_client(\n        topic.to_owned(),\n        0,  // partition\n        UnknownTopicHandling::Retry,\n     )\n     .await\n    .unwrap();\n\n// produce some data\nlet record = Record {\n    key: None,\n    value: Some(b\"hello kafka\".to_vec()),\n    headers: BTreeMap::from([\n        (\"foo\".to_owned(), b\"bar\".to_vec()),\n    ]),\n    timestamp: Utc.timestamp_millis(42),\n};\npartition_client.produce(vec![record], Compression::default()).await.unwrap();\n\n// consume data\nlet (records, high_watermark) = partition_client\n    .fetch_records(\n        0,  // offset\n        1..1_000_000,  // min..max bytes\n        1_000,  // max wait time\n    )\n   .await\n   .unwrap();\n# }\n```\n\nFor more advanced production and consumption, see [`crate::client::producer`] and [`crate::client::consumer`].\n\n\n## Features\n\n- **`compression-gzip` (default):** Support compression and decompression of messages using [gzip].\n- **`compression-lz4` (default):** Support compression and decompression of messages using [LZ4].\n- **`compression-snappy` (default):** Support compression and decompression of messages using [Snappy].\n- **`compression-zstd` (default):** Support compression and decompression of messages using [zstd].\n- **`full`:** Includes all stable features (`compression-gzip`, `compression-lz4`, `compression-snappy`,\n  `compression-zstd`, `transport-socks5`, `transport-tls`).\n- **`transport-socks5`:** Allow transport via SOCKS5 proxy.\n- **`transport-tls`:** Allows TLS transport via [rustls].\n- **`unstable-fuzzing`:** Exposes some internal data structures so that they can be used by our fuzzers. This is NOT a stable\n  feature / API!\n\n## Testing\n\n### Redpanda\n\nTo run integration tests against [Redpanda], run:\n\n```console\n$ docker-compose -f docker-compose-redpanda.yml up\n```\n\nin one session, and then run:\n\n```console\n$ TEST_INTEGRATION=1 TEST_BROKER_IMPL=redpanda KAFKA_CONNECT=0.0.0.0:9011 cargo test\n```\n\nin another session.\n\n### Apache Kafka\n\nTo run integration tests against [Apache Kafka], run:\n\n```console\n$ docker-compose -f docker-compose-kafka.yml up\n```\n\nin one session, and then run:\n\n```console\n$ TEST_INTEGRATION=1 TEST_BROKER_IMPL=kafka KAFKA_CONNECT=localhost:9011 KAFKA_SASL_CONNECT=localhost:9097 cargo test\n```\n\nin another session. Note that Apache Kafka supports a different set of features then redpanda, so we pass other\nenvironment variables.\n\n### Using a SOCKS5 Proxy\n\nTo run the integration test via a SOCKS5 proxy, you need to set the environment variable `SOCKS_PROXY`. The following\ncommand requires a running proxy on the local machine.\n\n```console\n$ KAFKA_CONNECT=0.0.0.0:9011,kafka-1:9021,redpanda-1:9021 SOCKS_PROXY=localhost:1080 cargo test --features full\n```\n\nThe SOCKS5 proxy will automatically be started by the docker compose files. Note that `KAFKA_CONNECT` was extended by\naddresses that are reachable via the proxy.\n\n### Java Interopt\nTo test if RSKafka can produce/consume records to/from the official Java client, you need to have Java installed and the\n`TEST_JAVA_INTEROPT=1` environment variable set.\n\n### Fuzzing\nRSKafka offers fuzz targets for certain protocol parsing steps. To build them make sure you have [cargo-fuzz] installed.\nSelect one of the following fuzzers:\n\n- **`protocol_reader`:** Selects an API key and API version and then reads message frames and tries to decode the\n  response object. The message frames are read w/o the length marker for more efficient fuzzing.\n- **`record_batch_body_reader`:** Reads the inner part of a record batch (w/o the prefix that contains length and CRC)\n  and tries to decode it. In theory this is covered by `protocol_reader` as well but the length fields and CRC make it\n  hard for the fuzzer to traverse this data structure.\n\nThen run the fuzzer with:\n\n```console\n$ cargo +nightly fuzz run protocol_reader\n...\n```\n\nLet it running for how long you wish or until it finds a crash:\n\n```text\n...\nFailing input:\n\n        fuzz/artifacts/protocol_reader/crash-369f9787d35767c47431161d455aa696a71c23e3\n\nOutput of `std::fmt::Debug`:\n\n        [0, 18, 0, 3, 0, 0, 0, 0, 71, 88, 0, 0, 0, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 0, 0, 0, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 0, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 18, 18, 0, 164, 0, 164, 164, 164, 30, 164, 164, 0, 0, 0, 0, 63]\n\nReproduce with:\n\n        cargo fuzz run protocol_reader fuzz/artifacts/protocol_reader/crash-369f9787d35767c47431161d455aa696a71c23e3\n\nMinimize test case with:\n\n        cargo fuzz tmin protocol_reader fuzz/artifacts/protocol_reader/crash-369f9787d35767c47431161d455aa696a71c23e3\n```\n\nSadly the backtraces that you might get are not really helpful and you need a debugger to detect the exact source\nlocations:\n\n```console\n$ rust-lldb ./target/x86_64-unknown-linux-gnu/release/protocol_reader fuzz/artifacts/protocol_reader/crash-7b824dad6e26002e5488e8cc84ce16728222dcf5\n...\n\n(lldb) r\n...\nProcess 177543 launched: '/home/mneumann/src/rskafka/target/x86_64-unknown-linux-gnu/release/protocol_reader' (x86_64)\nINFO: Running with entropic power schedule (0xFF, 100).\nINFO: Seed: 3549747846\n...\n==177543==ABORTING\n(lldb) AddressSanitizer report breakpoint hit. Use 'thread info -s' to get extended information about the report.\nProcess 177543 stopped\n...\n\n(lldb) bt\n* thread #1, name = 'protocol_reader', stop reason = AddressSanitizer detected: allocation-size-too-big\n  * frame #0: 0x0000555556c04f20 protocol_reader`::AsanDie() at asan_rtl.cpp:45:7\n    frame #1: 0x0000555556c1a33c protocol_reader`__sanitizer::Die() at sanitizer_termination.cpp:55:7\n    frame #2: 0x0000555556c01471 protocol_reader`::~ScopedInErrorReport() at asan_report.cpp:190:7\n    frame #3: 0x0000555556c021f4 protocol_reader`::ReportAllocationSizeTooBig() at asan_report.cpp:313:1\n...\n```\n\nThen create a unit test and fix the bug.\n\nFor out-of-memory errors [LLDB] does not stop automatically. You can however set a breakpoint before starting the\nexecution that hooks right into the place where it is about to exit:\n\n```console\n(lldb) b fuzzer::PrintStackTrace()\n```\n\n### Benchmarks\nInstall [cargo-criterion], make sure you have some Kafka cluster running, and then you can run all benchmarks with:\n\n```console\n$ TEST_INTEGRATION=1 TEST_BROKER_IMPL=kafka KAFKA_CONNECT=localhost:9011 cargo criterion --all-features\n```\n\nIf you find a benchmark that is too slow, you can may want to profile it. Get [cargo-with], and [perf], then run (here\nfor the `parallel/rskafka` benchmark):\n\n```console\n$ TEST_INTEGRATION=1 TEST_BROKER_IMPL=kafka KAFKA_CONNECT=localhost:9011 cargo with 'perf record --call-graph dwarf -- {bin}' -- \\\n    bench --all-features --bench write_throughput -- \\\n    --bench --noplot parallel/rskafka\n```\n\nHave a look at the report:\n\n```console\n$ perf report\n```\n\n\n## License\n\nLicensed under either of these:\n\n * Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or \u003chttps://www.apache.org/licenses/LICENSE-2.0\u003e)\n * MIT License ([LICENSE-MIT](LICENSE-MIT) or \u003chttps://opensource.org/licenses/MIT\u003e)\n\n### Contributing\n\nUnless you explicitly state otherwise, any contribution you intentionally submit for inclusion in the work, as defined\nin the Apache-2.0 license, shall be dual-licensed as above, without any additional terms or conditions.\n\n\n[^1]: Kafka's design makes it hard for any client to support the converse, as ultimately each partition is an\nindependent write stream within the broker. However, this crate makes no attempt to mitigate per-partition overheads\ne.g. by batching writes to multiple partitions in a single ProduceRequest\n\n\n[Apache Kafka]: https://kafka.apache.org/\n[cargo-criterion]: https://github.com/bheisler/cargo-criterion\n[cargo-fuzz]: https://github.com/rust-fuzz/cargo-fuzz\n[cargo-with]: https://github.com/cbourjau/cargo-with\n[gzip]: https://en.wikipedia.org/wiki/Gzip\n[InfluxDB 3.0]: https://github.com/influxdata/influxdb\n[LLDB]: https://lldb.llvm.org/\n[LZ4]: https://lz4.github.io/lz4/\n[perf]: https://perf.wiki.kernel.org/index.php/Main_Page\n[Redpanda]: https://www.redpanda.com/\n[rustls]: https://github.com/rustls/rustls\n[Snappy]: https://github.com/google/snappy\n[zstd]: https://github.com/facebook/zstd\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finfluxdata%2Frskafka","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Finfluxdata%2Frskafka","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finfluxdata%2Frskafka/lists"}