https://github.com/iobruno/data-engineering-zoomcamp
Data Engineering examples for Airflow, Prefect, and Mage.ai; dbt for BigQuery, Redshift, ClickHouse, PostgreSQL; Spark/PySpark for Batch processing; and Kafka for Stream processing
https://github.com/iobruno/data-engineering-zoomcamp
airflow airflow-dags dbt-bigquery dbt-clickhouse dbt-postgres dbt-redshift kafka ksqldb mageai prefect pyspark spark typer-cli
Last synced: 2 months ago
JSON representation
Data Engineering examples for Airflow, Prefect, and Mage.ai; dbt for BigQuery, Redshift, ClickHouse, PostgreSQL; Spark/PySpark for Batch processing; and Kafka for Stream processing
- Host: GitHub
- URL: https://github.com/iobruno/data-engineering-zoomcamp
- Owner: iobruno
- License: cc-by-sa-4.0
- Created: 2023-01-19T16:22:49.000Z (about 2 years ago)
- Default Branch: master
- Last Pushed: 2025-02-06T00:55:47.000Z (2 months ago)
- Last Synced: 2025-02-06T01:29:18.391Z (2 months ago)
- Topics: airflow, airflow-dags, dbt-bigquery, dbt-clickhouse, dbt-postgres, dbt-redshift, kafka, ksqldb, mageai, prefect, pyspark, spark, typer-cli
- Language: Python
- Homepage: https://github.com/DataTalksClub/data-engineering-zoomcamp
- Size: 4.94 MB
- Stars: 56
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-clickhouse - iobruno/data-engineering-zoomcamp - The project provides a collection of resources and examples for Data Engineering, focusing on tools like Airflow, Prefect, and Kafka, along with various databases. (Integrations / Data Transfer and Synchronization)
README
# Data Engineering Zoomcamp
## Taking the course (20205 Cohort)
* **Start**: 13 January 2025
* **Registration link**: https://airtable.com/shr6oVXeQvSI5HuWD
* [Cohort folder](https://github.com/DataTalksClub/data-engineering-zoomcamp/tree/main/cohorts/2025) with homeworks and deadlines## Syllabus
### [Module 1: Data ingestion](module1-data-ingestion/)
* [Python ingestion with polars and pandas](module1-data-ingestion/python-ingest/)
* Rust data ingestion
* [data load tool (dlt)](module1-data-ingestion/data-load-tool/)
* [IaC with Terraform (Google Cloud Platform)](infrastructure/terraform-gcp/)
* Homework### [Module 2: Workflow orchestration](module2-workflow-orchestration/)
* [Workflow orchestration with Airflow](module2-workflow-orchestration/airflow/)
* [Workflow orchestration with Mage](module2-workflow-orchestration/mageai/)
* [Workflow orchestration with Prefect](module2-workflow-orchestration/prefect/)
* Homework### [Module 3: Lakehouses & Data Warehouse](module3-lakehouse-data-warehouse/)
* [BigQuery Data Warehouse](module3-lakehouse-data-warehouse/bigquery/)
* [StarRocks Query Engine](module3-lakehouse-data-warehouse/starrocks/)
* Lakehouse with Delta Lake
* Homework### [Module 4: Analytics engineering](module4-analytics-engineering/)
* [BigQuery and dbt](module4-analytics-engineering/bigquery/)
* [Redshift and dbt](module4-analytics-engineering/redshift/)
* Databricks and dbt
* [ClickHouse and dbt](module4-analytics-engineering/clickhouse/)
* [PostgreSQL and dbt](module4-analytics-engineering/postgres/)
* [DuckDB and dbt](module4-analytics-engineering/duckdb/)
* [Data visualization with Superset/Metabase](module4-analytics-engineering/visualization/)
* Homework### [Module 5: Batch processing](module5-batch-processing/)
* [PySpark](module5-batch-processing/pyspark/)
* Spark + Kotlin API
* Spark (Scala)
* Homework### [Module 6: Stream processing](module6-stream-processing/)
* [Stream processing with Kafka, ksqlDB and Kotlin](module6-stream-processing/kotlin/)
* [Kafka Streams with ksqlDB](module6-stream-processing/ksqldb/)
* [RisingWave: Streaming Database](module6-stream-processing/risingwave/)
* Homework### Extras
* [LakeHouse with Delta, Iceberg, Hive](https://github.com/iobruno/lakehouse-labs/)
* Capstone Project