https://github.com/foogaro/change-data-capture
CDC project based on Debezium, Kafka, MS SQL Server, Infinispan and Teiid, entirely based on containers.
https://github.com/foogaro/change-data-capture
Last synced: about 2 months ago
JSON representation
CDC project based on Debezium, Kafka, MS SQL Server, Infinispan and Teiid, entirely based on containers.
- Host: GitHub
- URL: https://github.com/foogaro/change-data-capture
- Owner: foogaro
- License: gpl-3.0
- Created: 2019-03-19T14:45:05.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2022-11-16T09:31:23.000Z (over 2 years ago)
- Last Synced: 2023-03-11T16:08:05.407Z (over 2 years ago)
- Language: Java
- Size: 68.6 MB
- Stars: 23
- Watchers: 3
- Forks: 6
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Change Data Capture
In this post I'll show and describe how to achieve change-data-capture (aka CDC) using the most reliable open source softwares.
Here they are:
* Docker
* MS SQL
* Debezium
* Kafka
* Teiid
* And of course some Java code.## Docker
Based on Linux Kernel capabilities, Docker got the idea to ship an application as a Linux container image (using the Docker format), to achieve consistency and reliability.I would say the real "write once, run everywhere... as long as it's Linux".
More info on [docker.io](https://docker.io)
## Microsft SQL Server
A database platform...I know, I said open source software, but I'm running the official Docker image, so at least is free!
They are getting there...
## Debezium
No better explanation than the one taken from its site:
> Debezium is an open source distributed platform for change data capture.
> Debezium records in a transaction log all row-level changes committed to each database table. Each application simply reads the transaction logs their interested in, and they see all of the events in the same order in which they occurred.More info on [debezium.io](https://debezium.io)
## Kafka
No better explanation than the one taken from its site:
> Kafka® is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.## Teiid
No better explanation than the one taken from its site:
> Teiid offers a relational abstraction of all information sources that is highly performant and allows for integration with your existing relational tools. Teiid has an accompanying easy-to-use design tool that enables data architects to integrate disparate information in minutes.## Java code
You can find everything it this repo:
- https://github.com/foogaro/change-data-capture/tree/master/infinispan-teiid
# ArchitectureThere:

So basically, this is the flow:
* some external software adds new records into the database;
* the database stores the data and updates its transaction log;
* Debezium reads the transaction log and receives the changes from the databases (either insert, or update, or delete);
* Debezium sends those changes as single event to a Kafka topic;
* The [InfinispanSinkConnector](https://github.com/infinispan/infinispan-kafka) receives those events from the topic, and it sends them to Infinispan;
* A cache listener on Infinispan elaborates the new key and converts the message in a POJO (with ProtoBuf annotations);
* Asynchronously... Teiid exposes the Infinispan caches providing support for JDBC, ODBC and OData4 protocols for client applications;
* Applications can query and aggregate the data they need connecting directly to Teiid and proudly show their dashboards.
Now, all this work is implemented for you as MVPoC.
It would be really cool if you could contribute by fixing the following issues:
* Fix the [issue#1](https://github.com/foogaro/change-data-capture/issues/1) for the Infinispan custom cache-store and get rid of the ``` @ClientListener``` running as ``` while (true) {} ``` in a Java [class](https://github.com/foogaro/change-data-capture/blob/master/infinispan-teiid/infinispan-listener/src/test/java/com/foogaro/cdc/infinispan/InfinispanKafkaRunner.java).
* Fix the [issue#2](https://github.com/foogaro/change-data-capture/issues/2) to run it all on OpenShift;
* Fix the [issue#3](https://github.com/foogaro/change-data-capture/issues/3) to show cache metrics in Grafana.# Running the all thing
As easy as
```bash
./scripts/run-them-all
```# Grafana dashboard
Here it is the Grafana dashbaord to monitor the Infinispan caches:

# Demo
You can find a _playback_ demo [here](https://github.com/foogaro/change-data-capture/tree/master/demo).The _playback_ demo can be download both as GIF or WEBM file.
# Implementation Details
Now I'm really tired.If you need, drop me an email.
Ciao,
Luigi