{"id":20535999,"url":"https://github.com/seaql/sea-streamer","last_synced_at":"2025-05-14T21:06:10.783Z","repository":{"id":103510557,"uuid":"482881803","full_name":"SeaQL/sea-streamer","owner":"SeaQL","description":"🌊 Stream processing toolkit for Redis \u0026 Kafka","archived":false,"fork":false,"pushed_at":"2025-04-22T23:45:25.000Z","size":1402,"stargazers_count":315,"open_issues_count":6,"forks_count":11,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-05-13T21:00:34.687Z","etag":null,"topics":["hacktoberfest"],"latest_commit_sha":null,"homepage":"https://www.sea-ql.org/SeaStreamer/","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SeaQL.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"github":"SeaQL"}},"created_at":"2022-04-18T14:42:48.000Z","updated_at":"2025-05-13T13:05:38.000Z","dependencies_parsed_at":"2023-09-28T02:27:05.758Z","dependency_job_id":"f5dc50d3-e223-43ab-bca6-0e909ebcbc80","html_url":"https://github.com/SeaQL/sea-streamer","commit_stats":{"total_commits":326,"total_committers":5,"mean_commits":65.2,"dds":"0.036809815950920255","last_synced_commit":"137f5ffed30c73301f896fe0296b17d0fac6e12c"},"previous_names":[],"tags_count":18,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SeaQL%2Fsea-streamer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SeaQL%2Fsea-streamer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SeaQL%2Fsea-streamer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SeaQL%2Fsea-streamer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SeaQL","download_url":"https://codeload.github.com/SeaQL/sea-streamer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254227611,"owners_count":22035669,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hacktoberfest"],"created_at":"2024-11-16T00:35:01.064Z","updated_at":"2025-05-14T21:06:10.730Z","avatar_url":"https://github.com/SeaQL.png","language":"Rust","funding_links":["https://github.com/sponsors/SeaQL"],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n  \u003cimg src=\"https://raw.githubusercontent.com/SeaQL/sea-streamer/master/docs/SeaStreamer Banner.png\"/\u003e\n\n  \u003ch1\u003eSeaStreamer\u003c/h1\u003e\n\n  \u003cp\u003e\n    \u003cstrong\u003e🌊 A real-time stream processing toolkit for Rust\u003c/strong\u003e\n  \u003c/p\u003e\n\n  [![crate](https://img.shields.io/crates/v/sea-streamer.svg)](https://crates.io/crates/sea-streamer)\n  [![docs](https://docs.rs/sea-streamer/badge.svg)](https://docs.rs/sea-streamer)\n  [![build status](https://github.com/SeaQL/sea-streamer/actions/workflows/rust.yml/badge.svg)](https://github.com/SeaQL/sea-streamer/actions/workflows/rust.yml)\n\n\u003c/div\u003e\n\nSeaStreamer is a toolkit to help you build real-time stream processors in Rust.\n\n## Features\n\n1. Async\n\nSeaStreamer provides an async and non-blocking API with no locks on the hot path. Supporting both `tokio` and `async-std`,\nyou can build highly concurrent stream processors.\n\n2. Generic\n\nWe provide integration for Redis \u0026 Kafka behind a generic trait interface, so your program can be backend-agnostic.\n\n3. Testable\n\nSeaStreamer also provides a set of tools to work with streams via unix files / pipes, so it is testable without setting up a cluster,\nand extremely handy when working locally.\n\n4. Micro-service Oriented\n\nLet's build real-time (multi-threaded, no GC), self-contained (aka easy to deploy), low-resource-usage, long-running stream processors in Rust!\n\n## Quick Start\n\nAdd the following to your `Cargo.toml`\n\n```toml\nsea-streamer = { version = \"0\", features = [\"kafka\", \"redis\", \"stdio\", \"socket\", \"runtime-tokio\"] }\n```\n\nHere is a basic [stream consumer](https://github.com/SeaQL/sea-streamer/tree/main/examples/src/bin/consumer.rs):\n\n```rust\n#[tokio::main]\nasync fn main() -\u003e Result\u003c()\u003e {\n    env_logger::init();\n\n    let Args { stream } = Args::parse();\n\n    let streamer = SeaStreamer::connect(stream.streamer(), Default::default()).await?;\n\n    let mut options = SeaConsumerOptions::new(ConsumerMode::RealTime);\n    options.set_auto_stream_reset(SeaStreamReset::Earliest);\n\n    let consumer: SeaConsumer = streamer\n        .create_consumer(stream.stream_keys(), options)\n        .await?;\n\n    loop {\n        let mess: SeaMessage = consumer.next().await?;\n        println!(\"[{}] {}\", mess.timestamp(), mess.message().as_str()?);\n    }\n}\n```\n\nHere is a basic [stream producer](https://github.com/SeaQL/sea-streamer/tree/main/examples/src/bin/producer.rs):\n\n```rust\n#[tokio::main]\nasync fn main() -\u003e Result\u003c()\u003e {\n    env_logger::init();\n\n    let Args { stream } = Args::parse();\n\n    let streamer = SeaStreamer::connect(stream.streamer(), Default::default()).await?;\n\n    let producer: SeaProducer = streamer\n        .create_producer(stream.stream_key()?, Default::default())\n        .await?;\n\n    for tick in 0..100 {\n        let message = format!(r#\"\"tick {tick}\"\"#);\n        eprintln!(\"{message}\");\n        producer.send(message)?;\n        tokio::time::sleep(Duration::from_secs(1)).await;\n    }\n\n    producer.end().await?; // flush\n\n    Ok(())\n}\n```\n\nHere is a [basic stream processor](https://github.com/SeaQL/sea-streamer/tree/main/examples/src/bin/processor.rs).\nSee also other [advanced stream processors](https://github.com/SeaQL/sea-streamer/tree/main/examples/).\n\n```rust\n#[tokio::main]\nasync fn main() -\u003e Result\u003c()\u003e {\n    env_logger::init();\n\n    let Args { input, output } = Args::parse();\n\n    let streamer = SeaStreamer::connect(input.streamer(), Default::default()).await?;\n    let options = SeaConsumerOptions::new(ConsumerMode::RealTime);\n    let consumer: SeaConsumer = streamer\n        .create_consumer(input.stream_keys(), options)\n        .await?;\n\n    let streamer = SeaStreamer::connect(output.streamer(), Default::default()).await?;\n    let producer: SeaProducer = streamer\n        .create_producer(output.stream_key()?, Default::default())\n        .await?;\n\n    loop {\n        let message: SeaMessage = consumer.next().await?;\n        let message = process(message).await?;\n        eprintln!(\"{message}\");\n        producer.send(message)?; // send is non-blocking\n    }\n}\n```\n\nNow, let's put them into action.\n\nWith Redis / Kafka:\n\n```shell\nSTREAMER_URI=\"redis://localhost:6379\" # or\nSTREAMER_URI=\"kafka://localhost:9092\"\n\n# Produce some input\ncargo run --bin producer -- --stream $STREAMER_URI/hello1 \u0026\n# Start the processor, producing some output\ncargo run --bin processor -- --input $STREAMER_URI/hello1 --output $STREAMER_URI/hello2 \u0026\n# Replay the output\ncargo run --bin consumer -- --stream $STREAMER_URI/hello2\n# Remember to stop the processes\nkill %1 %2\n```\n\nWith File:\n\n```shell\n# Create the file\nfile=/tmp/sea-streamer-$(date +%s)\ntouch $file \u0026\u0026 echo \"File created at $file\"\n# Produce some input\ncargo run --bin producer -- --stream file://$file/hello \u0026\n# Replay the input\ncargo run --bin consumer -- --stream file://$file/hello\n# Start the processor, producing some output\ncargo run --bin processor -- --input file://$file/hello --output stdio:///hello\n```\n\nWith Stdio:\n\n```shell\n# Pipe the producer to the processor\ncargo run --bin producer -- --stream stdio:///hello1 | \\\ncargo run --bin processor -- --input stdio:///hello1 --output stdio:///hello2\n```\n\n## Production\n\nSeaStreamer File powers the event stream of [FireDBG](https://firedbg.sea-ql.org/).\nWe use SeaStreamer Redis heavily ourselves in production.\n\n## Architecture\n\nThe architecture of [`sea-streamer`](https://docs.rs/sea-streamer) is constructed by a number of sub-crates:\n\n+ [`sea-streamer-types`](https://docs.rs/sea-streamer-types)\n+ [`sea-streamer-socket`](https://docs.rs/sea-streamer-socket)\n    + [`sea-streamer-kafka`](https://docs.rs/sea-streamer-kafka)\n    + [`sea-streamer-redis`](https://docs.rs/sea-streamer-redis)\n    + [`sea-streamer-stdio`](https://docs.rs/sea-streamer-stdio)\n    + [`sea-streamer-file`](https://docs.rs/sea-streamer-file)\n+ [`sea-streamer-runtime`](https://docs.rs/sea-streamer-runtime)\n\nAll crates share the same major version. So `0.1` of `sea-streamer` depends on `0.1` of `sea-streamer-socket`.\n\n### `sea-streamer-types`: Traits \u0026 Types\n\nThis crate defines all the traits and types for the SeaStreamer API, but does not provide any implementation.\n\n### `sea-streamer-socket`: Backend-agnostic Socket API\n\nAkin to how SeaORM allows you to build applications for different databases, SeaStreamer allows you to build\nstream processors for different streaming servers.\n\nWhile the `sea-streamer-types` crate provides a nice trait-based abstraction, this crates provides a concrete-type API,\nso that your program can stream from/to any SeaStreamer backend selected by the user *on runtime*.\n\nThis allows you to do neat things, like generating data locally and then stream them to Redis / Kafka. Or in the other\nway, sink data from server to work on them locally. All _without recompiling_ the stream processor.\n\nIf you only ever work with one backend, feel free to depend on `sea-streamer-redis` / `sea-streamer-kafka` directly.\n\nA small number of cli programs are provided for demonstration. Let's set them up first:\n\n```shell\n# The `clock` program generate messages in the form of `{ \"tick\": N }`\nalias clock='cargo run --package sea-streamer-stdio  --features=executables --bin clock'\n# The `relay` program redirect messages from `input` to `output`\nalias relay='cargo run --package sea-streamer-socket --features=executables,backend-kafka,backend-redis --bin relay'\n```\n\nHere is how to stream from Stdio ➡️ Redis / Kafka. We generate messages using `clock` and then pipe it to `relay`,\nwhich then streams to Redis / Kafka:\n\n```shell\n# Stdio -\u003e Redis\nclock -- --stream clock --interval 1s | \\\nrelay -- --input stdio:///clock --output redis://localhost:6379/clock\n# Stdio -\u003e Kafka\nclock -- --stream clock --interval 1s | \\\nrelay -- --input stdio:///clock --output kafka://localhost:9092/clock\n```\n\nHere is how to stream between Redis ↔️ Kafka:\n\n```shell\n# Redis -\u003e Kafka\nrelay -- --input redis://localhost:6379/clock --output kafka://localhost:9092/clock\n# Kafka -\u003e Redis\nrelay -- --input kafka://localhost:9092/clock --output redis://localhost:6379/clock\n```\n\nHere is how to *replay* the stream from Kafka / Redis:\n\n```shell\nrelay -- --input redis://localhost:6379/clock --output stdio:///clock --offset start\nrelay -- --input kafka://localhost:9092/clock --output stdio:///clock --offset start\n```\n\n### `sea-streamer-kafka`: Kafka / Redpanda Backend\n\nThis is the Kafka / Redpanda backend implementation for SeaStreamer.\nThis crate provides a comprehensive type system that makes working with Kafka easier and safer.\n\nFirst of all, all API (many are sync) are properly wrapped as async. Methods are also marked `\u0026mut` to eliminate possible race conditions.\n\n`KafkaConsumerOptions` has typed parameters.\n\n`KafkaConsumer` allows you to `seek` to point in time, `rewind` to particular offset, and `commit` message read.\n\n`KafkaProducer` allows you to `await` a send `Receipt` or discard it if you are uninterested. You can also flush the Producer.\n\n`KafkaStreamer` allows you to flush all producers on `disconnect`.\n\nSee [tests](https://github.com/SeaQL/sea-streamer/blob/main/sea-streamer-kafka/tests/consumer.rs) for an illustration of the stream semantics.\n\nThis crate depends on [`rdkafka`](https://docs.rs/rdkafka),\nwhich in turn depends on [librdkafka-sys](https://docs.rs/librdkafka-sys), which itself is a wrapper of\n[librdkafka](https://docs.confluent.io/platform/current/clients/librdkafka/html/index.html).\n\nConfiguration Reference: \u003chttps://kafka.apache.org/documentation/#configuration\u003e\n\n### `sea-streamer-redis`: Redis Backend\n\nThis is the Redis backend implementation for SeaStreamer.\nThis crate provides a high-level async API on top of Redis that makes working with Redis Streams fool-proof:\n\n+ Implements the familiar SeaStreamer abstract interface\n+ A comprehensive type system that guides/restricts you with the API\n+ High-level API, so you don't call `XADD`, `XREAD` or `XACK` anymore\n+ Mutex-free implementation: concurrency achieved by message passing\n+ Pipelined `XADD` and paged `XREAD`, with a throughput in the realm of 100k messages per second\n\nWhile we'd like to provide a Kafka-like client experience, there are some fundamental differences between Redis and Kafka:\n\n1. In Redis sequence numbers are not contiguous\n    1. In Kafka sequence numbers are contiguous\n2. In Redis messages are dispatched to consumers among group members in a first-ask-first-served manner, which leads to the next point\n    1. In Kafka consumer \u003c-\u003e shard is 1 to 1 in a consumer group\n3. In Redis `ACK` has to be done per message\n    1. In Kafka only 1 Ack (read-up-to) is needed for a series of reads\n\nWhat's already implemented:\n\n+ RealTime mode with AutoStreamReset\n+ Resumable mode with auto-ack and/or auto-commit\n+ LoadBalanced mode with failover behaviour\n+ Seek/rewind to point in time\n+ Basic stream sharding: split a stream into multiple sub-streams\n\nIt's best to look through the [tests](https://github.com/SeaQL/sea-streamer/tree/main/sea-streamer-redis/tests)\nfor an illustration of the different streaming behaviour.\n\nHow SeaStreamer offers better concurrency?\n\nConsider the following simple stream processor:\n\n```rust\nloop {\n    let input = XREAD.await;\n    let output = process(input).await;\n    XADD(output).await;\n}\n```\n\nWhen it's reading or writing, it's not processing. So it's wasting time idle and reading messages with a higher delay, which in turn limits the throughput.\nIn addition, the ideal batch size for reads may not be the ideal batch size for writes.\n\nWith SeaStreamer, the read and write loops are separated from your process loop, so they can all happen in parallel (async in Rust is multi-threaded, so it is truely parallel)!\n\n![](https://raw.githubusercontent.com/SeaQL/sea-streamer/main/sea-streamer-redis/docs/sea-streamer-concurrency.svg)\n\nIf you are reading from a consumer group, you also have to consider when to ACK and how many ACKs to batch in one request. SeaStreamer can commit in the background on a regular interval, or you can commit asynchronously without blocking your process loop.\n\nIn the future, we'd like to support Redis Cluster, because sharding without clustering is not very useful.\nRight now it's pretty much a work-in-progress.\nIt's quite a difficult task, because clients have to take responsibility when working with a cluster.\nIn Redis, shards and nodes is a M-N mapping - shards can be moved among nodes *at any time*.\nIt makes testing much more difficult.\nLet us know if you'd like to help!\n\nYou can quickly start a Redis instance via Docker:\n\n```sh\ndocker run -d --rm --name redis -p 6379:6379 redis\n```\n\nThere is also a [small utility](https://github.com/SeaQL/sea-streamer/tree/main/sea-streamer-redis/redis-streams-dump) to dump Redis Streams messages into a SeaStreamer file.\n\nThis crate is built on top of [`redis`](https://docs.rs/redis).\n\n### `sea-streamer-stdio`: Standard I/O Backend\n\nThis is the `stdio` backend implementation for SeaStreamer. It is designed to be connected together with unix pipes,\nenabling great flexibility when developing stream processors or processing data locally.\n\nYou can connect processors together with pipes: `processor_a | processor_b`.\n\nYou can also connect them asynchronously:\n\n```shell\ntouch stream # set up an empty file\ntail -f stream | processor_b # program b can be spawned anytime\nprocessor_a \u003e\u003e stream # append to the file\n```\n\nYou can also use `cat` to replay a file, but it runs from start to end as fast as possible then stops,\nwhich may or may not be the desired behavior.\n\nYou can write any valid UTF-8 string to stdin and each line will be considered a message. In addition, you can write some message meta in a simple format:\n\n```log\n[timestamp | stream_key | sequence | shard_id] payload\n```\n\nNote: the square brackets are literal `[` `]`.\n\nThe following are all valid:\n\n```log\na plain, raw message\n[2022-01-01T00:00:00] { \"payload\": \"anything\" }\n[2022-01-01T00:00:00.123 | my_topic] \"a string payload\"\n[2022-01-01T00:00:00 | my-topic-2 | 123] [\"array\", \"of\", \"values\"]\n[2022-01-01T00:00:00 | my-topic-2 | 123 | 4] { \"payload\": \"anything\" }\n[my_topic] a string payload\n[my_topic | 123] { \"payload\": \"anything\" }\n[my_topic | 123 | 4] { \"payload\": \"anything\" }\n```\n\nThe following are all invalid:\n\n```log\n[Jan 1, 2022] { \"payload\": \"anything\" }\n[2022-01-01T00:00:00] 12345\n```\n\nIf no stream key is given, it will be assigned the name `broadcast` and sent to all consumers.\n\nYou can create consumers that subscribe to only a subset of the topics.\n\nConsumers in the same `ConsumerGroup` will be load balanced (in a round-robin fashion), meaning you can spawn multiple async tasks to process messages in parallel.\n\n### `sea-streamer-file`: File Backend\n\nThis is very similar to `sea-streamer-stdio`, but the difference is SeaStreamerStdio works in real-time,\nwhile `sea-streamer-file` works in real-time and replay. That means, SeaStreamerFile has the ability to\ntraverse a `.ss` (sea-stream) file and seek/rewind to a particular timestamp/offset.\n\nIn addition, Stdio can only work with UTF-8 text data, while File is able to work with binary data.\nIn Stdio, there is only one Streamer per process. In File, there can be multiple independent Streamers\nin the same process. Afterall, a Streamer is just a file.\n\nThe basic idea of SeaStreamerFile is like a `tail -f` with one message per line, with a custom message frame\ncarrying binary payloads. The interesting part is, in SeaStreamer, we do not use delimiters to separate messages.\nThis removes the overhead of encoding/decoding message payloads. But it added some complexity to the file format.\n\nThe SeaStreamerFile format is designed for efficient fast-forward and seeking. This is enabled by placing an array\nof Beacons at fixed interval in the file. A Beacon contains a summary of the streams, so it acts like an inplace\nindex. It also allows readers to align with the message boundaries. To learn more about the file format, read\n[`src/format.rs`](https://github.com/SeaQL/sea-streamer/blob/main/sea-streamer-file/src/format.rs).\n\nOn top of that, are the high-level SeaStreamer multi-producer, multi-consumer stream semantics, resembling\nthe behaviour of other SeaStreamer backends. In particular, the load-balancing behaviour is same as Stdio,\ni.e. round-robin.\n\n### Decoder\n\nWe provide a small utility to decode `.ss` files:\n\n```sh\ncargo install sea-streamer-file --features=executables --bin ss-decode\n # local version\nalias ss-decode='cargo run --package sea-streamer-file --features=executables --bin ss-decode'\nss-decode -- --file \u003cfile\u003e --format \u003cformat\u003e\n```\n\nPro tip: pipe it to `less` for pagination\n\n```sh\nss-decode --file mystream.ss | less\n```\n\nExample `log` format:\n\n```log\n # header\n[2023-06-05T13:55:53.001 | hello | 1 | 0] message-1\n # beacon\n```\n\nExample `ndjson` format:\n\n```json\n/* header */\n{\"header\":{\"stream_key\":\"hello\",\"shard_id\":0,\"sequence\":1,\"timestamp\":\"2023-06-05T13:55:53.001\"},\"payload\":\"message-1\"}\n/* beacon */\n```\n\nThere is also a Typescript implementation under [`sea-streamer-file-reader`](https://github.com/SeaQL/sea-streamer/tree/main/sea-streamer-file/sea-streamer-file-reader).\n\n### TODO\n\n1. Resumable: currently unimplemented. A potential implementation might be to commit into a local SQLite database.\n2. Sharding: currently it only streams to Shard ZERO.\n3. Verify: a utility program to verify and repair SeaStreamer binary file.\n\n### `sea-streamer-runtime`: Async runtime abstraction\n\nThis crate provides a small set of functions aligning the type signatures between `async-std` and `tokio`,\nso that you can build applications generic to both runtimes.\n\n## License\n\nLicensed under either of\n\n-   Apache License, Version 2.0\n    ([LICENSE-APACHE](LICENSE-APACHE) or \u003chttp://www.apache.org/licenses/LICENSE-2.0\u003e)\n-   MIT license\n    ([LICENSE-MIT](LICENSE-MIT) or \u003chttp://opensource.org/licenses/MIT\u003e)\n\nat your option.\n\nUnless you explicitly state otherwise, any contribution intentionally submitted\nfor inclusion in the work by you, as defined in the Apache-2.0 license, shall be\ndual licensed as above, without any additional terms or conditions.\n\n## Sponsor\n\n[SeaQL.org](https://www.sea-ql.org/) is an independent open-source organization run by passionate developers. If you enjoy using our libraries, please star and share our repositories. If you feel generous, a small donation via [GitHub Sponsor](https://github.com/sponsors/SeaQL) will be greatly appreciated, and goes a long way towards sustaining the organization.\n\nWe invite you to participate, contribute and together help build Rust's future.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fseaql%2Fsea-streamer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fseaql%2Fsea-streamer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fseaql%2Fsea-streamer/lists"}