https://github.com/lugolbis/data-immo
End-to-end ETL pipeline
https://github.com/lugolbis/data-immo
data data-engineering dbt dremio duckdb etl-pipeline lakehouse rust
Last synced: about 1 month ago
JSON representation
End-to-end ETL pipeline
- Host: GitHub
- URL: https://github.com/lugolbis/data-immo
- Owner: LugolBis
- Created: 2025-08-16T15:18:01.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2025-08-30T15:05:31.000Z (about 1 month ago)
- Last Synced: 2025-08-30T17:22:06.314Z (about 1 month ago)
- Topics: data, data-engineering, dbt, dremio, duckdb, etl-pipeline, lakehouse, rust
- Language: Rust
- Homepage:
- Size: 216 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# data-immo
**data-immo** is a high-performance ETL pipeline built in **Rust** to efficiently extract, transform, and load real estate transaction data from the **DVF+ API** into a **Lakehouse (Dremio)**.
The project is designed with a focus on **performance, reliability, and scalability**, leveraging modern data engineering tools and practices.
## πβ Schema of the pipeline
```mermaid
flowchart LR
A[**API** DVF+] -->|Extraction| B[**Rust**]
B -->|Transformation| C[**Rust**]
C -->|Loading| D[(**DuckDB**)]
D -->|Saving cleaned data| E{**dbt**}
E -->|Validation
& loading| F[(**Dremio**)]subgraph Extraction [Extract]
A
end
subgraph TransformationRust [Transform]
B
C
endsubgraph LT [Load & Transform]
D
endsubgraph VD [Validate Data quality]
E
end
subgraph LakeHouse [LakeHouse]
F
endsubgraph Docker [Docker Container]
LakeHouse
end
style A fill:#000091,stroke:#ffffff,color:#ffffff,stroke-width:1px
style B fill:#955a34,stroke:#000000,color:#000000,stroke-width:1px
style C fill:#955a34,stroke:#000000,color:#000000,stroke-width:1px
style D fill:#fef242,stroke:#000000,color:#000000,stroke-width:1px
style E fill:#fc7053,stroke:#000000,color:#000000,stroke-width:1px
style F fill:#31d3db,stroke:#ffffff,color:#ffffff,stroke-width:1px
style Docker fill: #099cec
```## π Features
- **Data Extraction**
- Fetches real estate transaction data from the **DVF+ API**.
- Handles API rate limiting, retries, and efficient pagination using Rustβs concurrency model.- **Data Transformation**
- Uses **DuckDB** to transform optimized **Parquet** data into structured, queryable formats.
- Rust is used for additional transformations, data enrichment, and performance-critical operations (I/O, etc.).- **Data Validation & Loading**
- **dbt** is used to validate, test, and model the data.
- The cleaned and validated data is loaded into **Dremio**, enabling a Lakehouse architecture.## π οΈ Tech Stack
- **Rust** β Core language for API calls, transformations, and performance optimization.
- **DuckDB** β In-process SQL engine for fast transformations of optimized Parquet datasets.
- **dbt** β Data modeling, testing, and validation layer.
- **Dremio** β Lakehouse platform for analytics and querying.## π Pipeline Overview
1. **Extract**: Retrieve raw transaction data from DVF+ API.
2. **Stage**: Store raw data as Parquet.
3. **Transform**: Apply transformations using DuckDB and Rust.
4. **Validate & Model**: Use dbt to ensure data quality and prepare final schemas.
5. **Load**: Push validated datasets into Dremio for downstream analytics.