https://github.com/gordonmurray/flink_paimon_duckdb_rill
A streaming analytics stack that captures MySQL changes via CDC, stores them in Apache Paimon format, and visualizes them with Rill dashboards
https://github.com/gordonmurray/flink_paimon_duckdb_rill
duckdb flink flink-cdc paimon rill-dashboard
Last synced: about 1 month ago
JSON representation
A streaming analytics stack that captures MySQL changes via CDC, stores them in Apache Paimon format, and visualizes them with Rill dashboards
- Host: GitHub
- URL: https://github.com/gordonmurray/flink_paimon_duckdb_rill
- Owner: gordonmurray
- License: mit
- Created: 2025-09-27T21:10:54.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-09-27T21:14:55.000Z (9 months ago)
- Last Synced: 2025-10-04T10:35:09.626Z (9 months ago)
- Topics: duckdb, flink, flink-cdc, paimon, rill-dashboard
- Language: Shell
- Homepage: https://gordonmurray.com/data/2025/09/27/when-your-real-time-dashboard-refuses-to-be-real-time.html
- Size: 13.7 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Real-Time Analytics Pipeline with Flink, Paimon, and Rill
A complete streaming analytics stack that captures MySQL changes via CDC, stores them in Apache Paimon format, and visualizes them with Rill dashboards.
## ๐ What You Get
- **Real-Time CDC**: Captures every MySQL change using Flink CDC
- **Lake Storage**: Stores data in Apache Paimon format on S3-compatible storage
- **Live Dashboard**: Rill analytics with automated catalog management
- **Automated Fixes**: Sidecar container handles DuckDB catalog prefix issues
- **One Command Start**: Everything runs with `docker compose up`
## ๐๏ธ Architecture
```
MySQL โ Flink CDC โ Apache Paimon โ MinIO โ Rill Dashboard
โ โ
Manual inserts Analytics
```
**Components:**
- **MySQL/MariaDB**: Source database with sample product data
- **Apache Flink**: Real-time CDC processing engine
- **Apache Paimon**: Lake storage format optimized for streaming
- **MinIO**: S3-compatible object storage
- **Rill**: Modern analytics dashboard with DuckDB engine
- **Rill Patcher**: Automated sidecar handling catalog prefix issues
## โก Quick Start
### Prerequisites
- Docker and Docker Compose
- 8GB+ RAM recommended
- Ports 3000, 3306, 8081, 9000-9001 available
### 1. Clone and Start
```bash
git clone
cd flink_iceberg_anomaly_pipeline_paimon
docker compose up -d
```
### 2. Initialize the CDC Pipeline
```bash
./setup_cdc.sh
```
### 3. Open the Dashboard
Navigate to: **http://localhost:3000**
The dashboard will show live data with automatic 60-second refresh.
## ๐งช Test Real-Time Updates
Add new products to see live updates:
```bash
# Add some products
docker exec mariadb mysql -u root -prootpassword -e "
INSERT INTO mydatabase.products (name, price) VALUES
('New Product 1', 99.99),
('New Product 2', 199.99);"
# Check MySQL count
docker exec mariadb mysql -u root -prootpassword -e "SELECT COUNT(*) FROM mydatabase.products;"
# Wait 60 seconds for dashboard to refresh
# You'll see the updated count automatically!
```
## ๐ง How It Works
### CDC Pipeline
1. **MySQL Changes**: Any INSERT/UPDATE/DELETE in MySQL is captured
2. **Flink Processing**: Flink CDC reads the MySQL binlog in real-time
3. **Paimon Storage**: Changes are written to Paimon tables in MinIO
4. **Rill Dashboard**: Visualizes data with 60-second refresh cycle
### The Catalog Prefix Solution
DuckDB creates random catalog prefixes (e.g., `main8514e79c`) on startup. Our `rill-patcher` sidecar:
1. Waits for Rill to start
2. Discovers the current catalog alias via SQL
3. Patches the model file with the correct prefix
4. Refreshes data every 60 seconds
5. Re-patches if Rill restarts with a new prefix
### Why Apache Paimon?
- Optimized for streaming updates with ACID guarantees
- Supports both batch and streaming workloads
- Compatible with multiple query engines
- Efficient storage with automatic compaction
## ๐ Monitoring
### Service Health Checks
```bash
# Check all containers
docker ps
# Monitor CDC job
curl -s http://localhost:8081/jobs | jq
# Test Rill Dashboard API
curl -s "http://localhost:3000/v1/instances/default/query" \
-H "Content-Type: application/json" \
-d '{"sql":"SELECT COUNT(*) FROM paimon_products"}'
# View Paimon files in MinIO
docker exec minio mc ls --recursive local/warehouse/
```
### Data Flow Verification
```bash
# MySQL data
docker exec mariadb mysql -u root -prootpassword -e "SELECT COUNT(*) FROM mydatabase.products;"
# MinIO storage
docker exec minio mc ls --recursive local/warehouse/cdc_db.db/products_sink/
# Rill dashboard count
curl -s "http://localhost:3000/v1/instances/default/query" \
-H "Content-Type: application/json" \
-d '{"sql":"SELECT COUNT(*) FROM paimon_products"}' | jq '.data[0]'
```
## ๐ ๏ธ Development
### Project Structure
```
โโโ docker-compose.yml # Complete stack definition
โโโ conf/
โ โโโ flink-conf.yaml # Flink configuration
โโโ rill/
โ โโโ connectors/ # DuckDB S3 configuration
โ โโโ models/ # SQL model definitions
โ โโโ metrics/ # Metrics definitions
โ โโโ dashboards/ # Dashboard configs
โโโ rill-patcher.sh # Automated catalog management
โโโ duckdb/
โ โโโ test_s3.py # DuckDB query examples
โโโ sql/
โ โโโ init.sql # MySQL initial data
โ โโโ setup_paimon_cdc.sql # CDC pipeline setup
โโโ setup_cdc.sh # CDC initialization script
```
### Key Configuration Files
**Flink Config** (`conf/flink-conf.yaml`):
- Configures Flink job manager and task manager
- Sets checkpointing intervals
- Defines S3/MinIO credentials
**CDC Setup** (`sql/setup_paimon_cdc.sql`):
- Creates Paimon catalog
- Defines source MySQL table
- Creates sink Paimon table
- Starts CDC pipeline
## ๐จ Troubleshooting
### Common Issues
**CDC Pipeline not starting**
```bash
# Check if the job started:
curl -s http://localhost:8081/jobs | jq
# If not, run setup again:
./setup_cdc.sh
```
**No data in MinIO**
```bash
# Check Flink job status
curl -s http://localhost:8081/jobs
# Restart CDC setup
./setup_cdc.sh
```
**Verify data flow**
```bash
# Check Flink job metrics
curl -s http://localhost:8081/jobs//metrics
# List Paimon files
docker exec minio mc ls local/warehouse/cdc_db.db/
```
### Clean Restart
```bash
# Complete reset
docker compose down -v
docker compose up -d
./setup_cdc.sh
# Wait 2-3 minutes for full initialization
```
**Built with**: Apache Flink โข Apache Paimon โข Rill โข DuckDB โข