An open API service indexing awesome lists of open source software.

https://github.com/paahaad/arroyo-elt

This repository demonstrates how to build an ETL pipeline for any solana prorgam using Arroyo that sinks data to TimescaleDB and PostgreSQL via Debezium.
https://github.com/paahaad/arroyo-elt

Last synced: 10 months ago
JSON representation

This repository demonstrates how to build an ETL pipeline for any solana prorgam using Arroyo that sinks data to TimescaleDB and PostgreSQL via Debezium.

Awesome Lists containing this project

README

          

# ETL Pipeline for Pump.fun Account Data
This repo uses an Arroyo WebSocket source to stream data. It parses the incoming data using a user-defined function (UDF) that deserializes it with Borsh. The processed data is then sent to Kafka (in KRaft mode) in Debezium format.

image

## DEMO
[![Watch the video](https://img.youtube.com/vi/cUPeAbmXjqo/maxresdefault.jpg)](https://youtu.be/cUPeAbmXjqo)

Kafka Connect runs in a distributed setup, with connectors configured to sink data into both PostgreSQL and TimescaleDB.

Link for connector class
- curl -OL https://repo1.maven.org/maven2/io/debezium/debezium-connector-jdbc/3.2.0.Final/debezium-connector-jdbc-3.2.0.Final-plugin.tar.gz
- curl -OL https://repo1.maven.org/maven2/io/debezium/debezium-connector-postgres/2.3.2.Final/debezium-connector-postgres-2.3.2.Final-plugin.tar.gz

# Kakfa Sink Setup
- Kakfa running is kraft mode.
- kakfa connector deploye in distributed setup with config file
```
# Basic Connect configuration
bootstrap.servers=localhost:9092
group.id=connect-cluster
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false

# Offset storage
offset.storage.topic=connect-offsets
offset.storage.replication.factor=1
offset.storage.partitions=25

# Config storage
config.storage.topic=connect-configs
config.storage.replication.factor=1

# Status storage
status.storage.topic=connect-status
status.storage.replication.factor=1
status.storage.partitions=5

# Plugin path
plugin.path=./config/plugins

# REST API
rest.host.name=localhost
rest.port=8083
```

make sure you plugins are in correct path