Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/gordonmurray/debezium_warpstream_http_sink
Using Debezium to send Change Data Capture (CDC) events to HTTP endpoints
https://github.com/gordonmurray/debezium_warpstream_http_sink
debezium http kafka ngrok warpstream
Last synced: about 1 month ago
JSON representation
Using Debezium to send Change Data Capture (CDC) events to HTTP endpoints
- Host: GitHub
- URL: https://github.com/gordonmurray/debezium_warpstream_http_sink
- Owner: gordonmurray
- License: mit
- Created: 2024-01-12T20:14:36.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2024-11-29T21:59:45.000Z (about 1 month ago)
- Last Synced: 2024-11-29T22:40:09.280Z (about 1 month ago)
- Topics: debezium, http, kafka, ngrok, warpstream
- Homepage:
- Size: 8.46 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Debezium, Warpstream and a HTTP sink
Using Debezium to send Change Data Capture (CDC) events to HTTP endpoints
I wanted to try out the HTTP sink connector to see how sending data out to external endpoints might work.
I created a small project to source data from a relational database using Debezium to populate some kafka topics and then added a connector to send data from one of the topics to a HTTP endpoint.
It works well and had a couple of pleasant surprises such as the connector auto creats topics in the kafka cluster to store a record of errors and successes of the HTTP calls. That's great for debugging issues. You can control the topic names in the connector config too.
I created a docker compose file that creates a Mariadb Database with a tiny products table and a Debezium container. A first connector is added to source the data from the database and populate kafka topics. A second connector is then added to send events form one of the topics to a HTTP endpoint.
I used Warpstream as my kafka cluster and I used Ngrok as a temporary endpoint to send data to so I could inspect the data is received.
The source connector to read data from a table in the database is located at files/connector_mariadb.json looks like the following:
```json5
{
"connector.class": "io.debezium.connector.mysql.MySqlConnector","database.history.kafka.bootstrap.servers": "api-xxxxxxxxxx.warpstream.com:9092",
"database.history.kafka.topic": "history",// database connection details
"database.hostname": "mariadb",
"database.password": "rootpassword",
"database.port": "3306",
"database.server.id": "12",
"database.server.name": "myconnector",
"database.user": "root",
"database.whitelist": "mydatabase","schema.history.internal.kafka.bootstrap.servers": "api-xxxxxxxxxx.warpstream.com:9092",
"schema.history.internal.kafka.topic": "schema-changes.mydatabase",
"table.whitelist": "mydatabase.products",
"tasks.max": "1",
"topic.prefix": "testing",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms": "unwrap"
}
```The fields to pay attention to in the JSON config are:
* database.history.kafka.bootstrap.servers & chema.history.internal.kafka.bootstrap.servers - your kafka brokers, in my case Im using WarpStream
* database.* fields - to specify the connection details to the database
* table.whitelist - the table you want to pull data from, written in the format of schema.tablenameDebezium will create a number of topics in the kafka cluster, in this case it will create a topic called testing.mydatabase.products bases on the prefix and table.whitelist contents.
![Warpstream UI showing topics]({{ site.url }}/images/warpstream_topics.png)
Once the topic is in place with some records, I can then add the HTTP sink connector, located at files/connector_http.json. The connector config looks like the following:
```json5
{
"connector.class": "io.confluent.connect.http.HttpSinkConnector",
"tasks.max": "1",
"topics": "testing.mydatabase.products",
"http.api.url": "https://xxxxxxxx.ngrok-free.app",
"request.method": "POST",
"headers": "Content-Type:application/json",
"batch.size": "3",
"max.retries": "3",
"retry.backoff.ms": "3000",
"connection.timeout.ms": "2000",
"request.timeout.ms": "5000",// add your warpstream endpoint here
"confluent.topic.bootstrap.servers": "api-xxxxxxxxx.warpstream.com:9092",
"reporter.bootstrap.servers": "api-xxxxxxxxx.warpstream.com:9092","reporter.error.topic.name": "error-topic",
"reporter.error.topic.replication.factor": "1",
"reporter.result.topic.name": "success-topic",
"reporter.result.topic.replication.factor": "1"
}
```The fields to pay attention to in the JSON config are:
* topics - thats the topic in kafka you want to respond to, in my case its 'products'
* http.api.url - thats the URL to send the events to, in my case is an ngrok endpoint for testing
* confluent.topic.bootstrap.servers & reporter.bootstrap.servers - your kafka brokers, in my case Im using WarpStreamFor an API endpoint I used [ngrok](https://ngrok.com/), I started it locally using the following command and it gave me a URL to add to my HTTP connector
```
ngrok http 80
```Once the connectors are updated with any changes you want to make you can start the containers.
Run `docker-compose up -d` and connect to the Debezium container:
```
docker exec -it debezium /bin/bash
```Docker compose will upload the connectors in to the Debezium container, you can start the connectors using to following commands:
Create the source connector:
```
curl -X PUT http://localhost:8083/connectors/my_database/config -H "Content-Type: application/json" -d @connector_mariadb.json
```Create the HTTP sink connector:
```
curl -X PUT http://localhost:8083/connectors/http/config -H "Content-Type: application/json" -d @connector_http.json
```You should see at least 3 calls to the API endpoint, one for each of the 3 records thats in the database by default. Adding or updating records in the table will trigger more calls to the endpoint.
The events look like the following in Ngrok
```
Struct{id=3,name=Product C,price=39.99}
```It doesn't trigger a HTTP event for any schema changes, but any events afterwards will contain the data for any new columns that are added.
```
Struct{id=5,name=product E,price=33.00,notes=great product}
```Schema changes are recorded in another topic called schema-changes.mydatabase so another HTTP sink could be used to monitor that topic if you wanted to watch for schema changes.
### Useful commands
List Connectors:
```
curl localhost:8083/connectors
```Check Connector status:
```
curl localhost:8083/connectors/my_database/status
```Use the Kafka CLI to list topics in WarpStream
```
./bin/kafka-topics.sh --bootstrap-server localhost:9092 --list
```List the messages in a topic
```
./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic testing.mydatabase.products --from-beginning
```