https://github.com/huy-dataguy/nyc-taxi-lakehouse
Real-time Big Data Streaming simulating NYC taxi trip analytics using a modern Lakehouse architecture. Ingests high-volume Parquet data into Kafka, processes it with Spark Structured Streaming, stores it in Delta Lake on MinIO, and enables interactive analysis via Trino and Superset.
https://github.com/huy-dataguy/nyc-taxi-lakehouse
apache-airflow apache-kafka big-data data-engineering delta-lake lakehouse minio real-time-analytics trino
Last synced: 3 months ago
JSON representation
Real-time Big Data Streaming simulating NYC taxi trip analytics using a modern Lakehouse architecture. Ingests high-volume Parquet data into Kafka, processes it with Spark Structured Streaming, stores it in Delta Lake on MinIO, and enables interactive analysis via Trino and Superset.
- Host: GitHub
- URL: https://github.com/huy-dataguy/nyc-taxi-lakehouse
- Owner: huy-dataguy
- Created: 2025-05-02T10:09:27.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2025-06-06T17:08:18.000Z (4 months ago)
- Last Synced: 2025-06-18T09:48:46.594Z (4 months ago)
- Topics: apache-airflow, apache-kafka, big-data, data-engineering, delta-lake, lakehouse, minio, real-time-analytics, trino
- Language: Python
- Homepage:
- Size: 60.2 MB
- Stars: 0
- Watchers: 1
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md