An open API service indexing awesome lists of open source software.

https://github.com/huy-dataguy/nyc-taxi-lakehouse

Real-time Big Data Streaming simulating NYC taxi trip analytics using a modern Lakehouse architecture. Ingests high-volume Parquet data into Kafka, processes it with Spark Structured Streaming, stores it in Delta Lake on MinIO, and enables interactive analysis via Trino and Superset.
https://github.com/huy-dataguy/nyc-taxi-lakehouse

apache-airflow apache-kafka big-data data-engineering delta-lake lakehouse minio real-time-analytics trino

Last synced: 3 months ago
JSON representation

Real-time Big Data Streaming simulating NYC taxi trip analytics using a modern Lakehouse architecture. Ingests high-volume Parquet data into Kafka, processes it with Spark Structured Streaming, stores it in Delta Lake on MinIO, and enables interactive analysis via Trino and Superset.

Awesome Lists containing this project