https://github.com/seungyongshim/poc-kafka-exactly-once

Last synced: 7 months ago
JSON representation

Host: GitHub
URL: https://github.com/seungyongshim/poc-kafka-exactly-once
Owner: seungyongshim
License: gpl-3.0
Created: 2021-05-06T08:30:20.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2023-02-28T20:58:10.000Z (over 2 years ago)
Last Synced: 2025-02-03T08:38:36.262Z (8 months ago)
Language: C#
Size: 154 KB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

![CI](../../workflows/CI/badge.svg)

# Exactly Once Processing (WordCount)

This example demonstrates how to count the number of occurrences of each word in a given input set of lines, where the amount of data can potentially be large enough that the computation needs to be distributed across many machines. This is an example of a streaming [map-reduce](https://en.wikipedia.org/wiki/MapReduce) calculation, with **exactly once processing** semantics.

Refer to comments in the [code](Program.cs) for commentary on the implementation.

### Running the example:

For simplicity, the instructions below assume that you're using a single Kafka broker with the default configuration running on `localhost`.

To remove all topics used by this example application:

```
dotnet run localhost:9092 del
```

To write some lines of text into the topic `lines` (auto-created if it doesn't exist), rate limited to one per second for demonstration purposes:

```
dotnet run localhost:9092 gen
```

To run the "map" part of the calculation, which splits the input line text into words and writes them in the partitioned Kafka topic `words`:

```
dotnet run localhost:9092 map map_client_id_1
```

You can run multiple instances of this stage, specifying different client id's for each. If you use the same client id, the new process
will "fence" the old one with the same id.

This is an example of a stateless stream processor.

To run the "reduce" part of the calculation, which counts the number of occurrences of each word and writes updates to the compacted, partitiond Kafka topic `counts`:

```
dotnet run localhost:9092 reduce reduce_client_id_1
```

You can parallelize this stage as well by running multiple instances with different client id's.

This is an example of a stateful stream processor, where the working state is materialized into a local FASTER store.

You can dynamically start and stop the map and reduce processing instances at any time and watch as the workload rebalances. Both the map and reduce stages make use of [incremental rebalancing](https://www.confluent.io/blog/cooperative-rebalancing-in-kafka-streams-consumer-ksqldb/), which minimizes the impact of rebalances on processing.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/seungyongshim/poc-kafka-exactly-once

Awesome Lists containing this project

README