https://github.com/aramperes/kafka-denormalization

Denormalizing Kafka topics using Kafka Streams
https://github.com/aramperes/kafka-denormalization

analytics denormalization kafka kafka-streams spring-kafka streaming

Last synced: 11 months ago
JSON representation

Denormalizing Kafka topics using Kafka Streams

Host: GitHub
URL: https://github.com/aramperes/kafka-denormalization
Owner: aramperes
License: mit
Created: 2022-08-21T23:36:20.000Z (almost 4 years ago)
Default Branch: master
Last Pushed: 2022-08-27T00:53:06.000Z (almost 4 years ago)
Last Synced: 2025-02-21T14:49:41.495Z (over 1 year ago)
Topics: analytics, denormalization, kafka, kafka-streams, spring-kafka, streaming
Language: Java
Homepage:
Size: 126 KB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # kafka-denormalization

This is a sample project to denormalize two Kafka topics into one. In other words,

it performs a many-to-one join between two topics based on a foreign key, and emits

the joined data to a third topic.

The basic use-case is for when your data is already being produced on some topics, and you need

to combine them over time as updates come in (and updating existing records from either side).

## Example

This repository contains an example using Hacker News comments and stories. The `services` directory

contains 2 microservices, one that polls for new stories and produces them on a topic, and the other for comments.

`hn.comments` (left)

```json

{"by":"zinekeller","id":32546427,"parent":32546388,"text":"...","time":1661132891,"type":"comment","story":32545513}

```

`hn.stories` (right)

```json

{"by":"thesuperbigfrog","descendants":40,"id":32545513,"score":50,"time":1661124181,"title":"The Google Pixel 6a highlights everything wrong with the U.S. phone market","type":"story","url":"https://www.xda-developers.com/google-pixel-6a-us-market-editorial/"}

```

Our objective is to join these 2 topics into one. Each message will contain the comment object, as well as the inflated story object.

`hn.comments-with-story`

```json

{

    "comment": {"by":"zinekeller","id":32546427,"parent":32546388,"text":"...","time":1661132891,"type":"comment","story":32545513},

    "story": {"by":"thesuperbigfrog","descendants":40,"id":32545513,"score":50,"time":1661124181,"title":"The Google Pixel 6a highlights everything wrong with the U.S. phone market","type":"story","url":"https://www.xda-developers.com/google-pixel-6a-us-market-editorial/"}

}

```

Using the DSL I made for this project, it can be represented like this:

```java

@Autowired

public void buildPipeline(StreamsBuilder builder) {

    var indexStore = Stores.inMemoryKeyValueStore("index");

    StreamDenormalize.builder()

        .keySchema(JoinKeySchemas.Blake2b(8, Serdes.String(), Serdes.String()))

        .indexTopic("hn.index")

            .indexStore(indexStore)

        .leftTopic("hn.comments")

            .leftSerde(Comment.serde)

        .rightTopic("hn.stories")

            .rightSerde(Story.serde)

        .joinOn(comment -> comment.story().toString())

            .joiner((comment, story) -> new JoinedCommentStoryEvent(comment, story))

            .keyMapper((k, joined) -> joined.comment().id().toString())

        .build()

        .innerJoin(builder)

            .to("hn.comments-with-story", Produced.with(Serdes.String(), JoinedCommentStoryEvent.serde));

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/aramperes/kafka-denormalization

Awesome Lists containing this project

README