https://github.com/kimaruthagna/kafka-postgres
Data stream processing with Apache Kafka connectors, sinks and PostgreSQL
https://github.com/kimaruthagna/kafka-postgres
Last synced: 4 months ago
JSON representation
Data stream processing with Apache Kafka connectors, sinks and PostgreSQL
- Host: GitHub
- URL: https://github.com/kimaruthagna/kafka-postgres
- Owner: KimaruThagna
- Created: 2021-08-11T11:08:01.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2021-08-11T17:52:23.000Z (about 4 years ago)
- Last Synced: 2025-03-18T08:34:42.114Z (7 months ago)
- Size: 44.9 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Kafka-Postgres
Data stream processing with Apache Kafka connectors, sinks and PostgreSQL
# Use Case
An ecommerce store with transactional data and wishes to process the event data generated by transactions.# Tech used
- Apache Kafka for stream data handling
- Zookeeper for managing Kafka
- Kafka Connect for "transporting data into and out of Kafka
- Postgres for working with Kafka connectors to handle data going into and out of Kafka
- KSQL server for create real-time processing
- Kafka's schema registry for imposing the AVRO format.# Connectors
`source.json` (source) Defines the connection between the source database(postgres) and Kafka as the destination
`sink.json` (Sink) Defines the connection between Kafka as the source and a postgres DB as the destination
# Running the project
Spin up the services using the command ` docker-compose up ` use `-d` flag to run in detatch mode.## Setup a Postgres DB and Load Data
On a separate terminal, To spin up a postgres db, run the command
```
docker run -it --rm --network=kafka_postgres_default \
-v $PWD:/home/data/ \
postgres:11.0 psql -h postgres -U postgres
```
In the PSQL interface, run the commands defined in the `tables.sql` file
## Make source connection
Submit `source.json` file to the connect service via a curl command
```
curl -X POST -H “Accept:application/json” -H “Content-Type: application/json” --data @source.json http://localhost:8083/connectors
```
Query the connector to see if it worked `curl -H “Accept:application/json” localhost:8083/connectors/`if successful, the transactions table should be seen as a TOPIC. Run the command
` docker exec -it /bin/bash` to access the kafka container bash. In this case, "kafka" is the container name. You can also use the container ID
Once in the container bash, run the commmand
`/usr/bin/kafka-topics — list — zookeeper zookeeper:2181` to view topics.## Access Topics via KSQL
Since the KSQL-CLI server is running courtesy of docker compose, run the command ` docker exec -it /bin/ksql http://ksqldb-server:8088 `
Once in the CLI enter the command `SHOW TOPICS`
In the KSQL interface, create a stream and table for mirroring the transactions table in the postgres DB```
CREATE STREAM transaction_src (id INTEGER, account_action VARCHAR, user-id INTEGER, tx_id VARCHAR, amount DOUBLE PRECISION)
WITH (KAFKA_TOPIC=’dbserver1.public.transactions, VALUE_FORMAT=’AVRO’);CREATE STREAM transaction_rekey WITH (PARTITIONS=1) AS
SELECT * FROM transaction_src PARTITION BY user-id;SHOW STREAMS;
CREATE TABLE transactions (id INTEGER, account_action VARCHAR, user-id INTEGER, tx_id VARCHAR, amount DOUBLE PRECISION)
WITH (KAFKA_TOPIC=TRANSACTIONS_REKEY, VALUE_FORMAT=’AVRO’, KEY=’user-id’);SHOW TABLES;
```## Perform basic analysis with KSQL
Assume that transactions above 8,000,000 are considered suspicious, we would create a table with suspicious transactions using the SQL statement below
```
CREATE TABLE TRANSACTIONS_SUSPECT AS
SELECT AMOUNT, TIMESTAMP, USER-ID, TX_ID, ACCOUNT_ACTION FROM TRANSACTIONS_REKEY WHERE AMOUNT > 8000000 WITH (KAFKA_TOPIC=TRANSACTIONS_SUSPECT, VALUE_FORMAT=’delimited’, KEY=TX_ID);
```## Submit Sink Config to Connect Registry
Run the curl command
```
curl -X POST -H “Accept:application/json” -H “Content-Type: application/json” --data @sink.json http://localhost:8083/connectors
```
## Inspect Data in Postgres
Access the running postgres container defined in the docker-compose file by running`docker exec -it psql -U postgres -W postgres ecommerce`
View the data brought in by Kafka connect by running the SQL commands
```
SELECT * FROM TRANSACTIONS_SUSPECT;
SELECT COUNT(TX_ID) FROM TRANSACTIONS_SUSPECT;
```
When you add new data in the source database by copying csv data, the data should go through the whole process, get processed via the simple KSQL query and upon re running the above query, the results will be different.