https://github.com/paahaad/arroyo-elt
This repository demonstrates how to build an ETL pipeline for any solana prorgam using Arroyo that sinks data to TimescaleDB and PostgreSQL via Debezium.
https://github.com/paahaad/arroyo-elt
Last synced: 10 months ago
JSON representation
This repository demonstrates how to build an ETL pipeline for any solana prorgam using Arroyo that sinks data to TimescaleDB and PostgreSQL via Debezium.
- Host: GitHub
- URL: https://github.com/paahaad/arroyo-elt
- Owner: paahaad
- Created: 2025-07-24T09:28:30.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-07-24T10:03:37.000Z (11 months ago)
- Last Synced: 2025-07-24T14:32:43.681Z (11 months ago)
- Language: Rust
- Homepage:
- Size: 4.88 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ETL Pipeline for Pump.fun Account Data
This repo uses an Arroyo WebSocket source to stream data. It parses the incoming data using a user-defined function (UDF) that deserializes it with Borsh. The processed data is then sent to Kafka (in KRaft mode) in Debezium format.

## DEMO
[](https://youtu.be/cUPeAbmXjqo)
Kafka Connect runs in a distributed setup, with connectors configured to sink data into both PostgreSQL and TimescaleDB.
Link for connector class
- curl -OL https://repo1.maven.org/maven2/io/debezium/debezium-connector-jdbc/3.2.0.Final/debezium-connector-jdbc-3.2.0.Final-plugin.tar.gz
- curl -OL https://repo1.maven.org/maven2/io/debezium/debezium-connector-postgres/2.3.2.Final/debezium-connector-postgres-2.3.2.Final-plugin.tar.gz
# Kakfa Sink Setup
- Kakfa running is kraft mode.
- kakfa connector deploye in distributed setup with config file
```
# Basic Connect configuration
bootstrap.servers=localhost:9092
group.id=connect-cluster
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
# Offset storage
offset.storage.topic=connect-offsets
offset.storage.replication.factor=1
offset.storage.partitions=25
# Config storage
config.storage.topic=connect-configs
config.storage.replication.factor=1
# Status storage
status.storage.topic=connect-status
status.storage.replication.factor=1
status.storage.partitions=5
# Plugin path
plugin.path=./config/plugins
# REST API
rest.host.name=localhost
rest.port=8083
```
make sure you plugins are in correct path