https://github.com/beaglefoot/streaming-etl

Exploratory project using Kafka for building streaming ETL
https://github.com/beaglefoot/streaming-etl

asyncio data-engineering debezium docker etl kafka kafka-connect streaming

Last synced: about 1 month ago
JSON representation

Exploratory project using Kafka for building streaming ETL

Host: GitHub
URL: https://github.com/beaglefoot/streaming-etl
Owner: Beaglefoot
Created: 2022-04-13T12:17:40.000Z (about 3 years ago)
Default Branch: master
Last Pushed: 2022-05-05T20:22:53.000Z (almost 3 years ago)
Last Synced: 2025-02-03T23:54:49.393Z (3 months ago)
Topics: asyncio, data-engineering, debezium, docker, etl, kafka, kafka-connect, streaming
Language: Python
Homepage:
Size: 152 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Streaming ETL

This is an exploratory data engineering project to get some idea how Kafka can be utilized for ETL with streaming approach.

Some things are intentionally simplified. Language of choice is Python instead of Java.

## Architecture

![Arch diagram](./assets/architecture.drawio.svg)

## The goal and how it works

The main idea is to take a typical OLTP DB and stream changes to it via Change Data Capture down the pipeline to Dimensional Data Warehouse.

In absence of real transactional application a data generator is set up.

The changes are captured with Debezium which streams data to Kafka Broker. Transformer subscribes to new messages on input topics, transforms the data and writes it to output topics. These are connected to DWH via JDBC Sink Connector.

Kafka stores messages in binary format internally so these are also encoded/decoded with JSON Schema. Different services get their awareness of the actual schema via Schema Registry. Transformer also has models pregenerated from schemas which facilitate development and help with types and static analysis.

There are more descriptions for services in inner directories.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/beaglefoot/streaming-etl

Awesome Lists containing this project

README