An open API service indexing awesome lists of open source software.

https://github.com/iobruno/data-engineering-zoomcamp

Data Engineering examples for Airflow, Prefect, and Mage.ai; dbt for BigQuery, Redshift, ClickHouse, PostgreSQL; Spark/PySpark for Batch processing; and Kafka for Stream processing
https://github.com/iobruno/data-engineering-zoomcamp

airflow airflow-dags dbt-bigquery dbt-clickhouse dbt-postgres dbt-redshift kafka ksqldb mageai prefect pyspark spark typer-cli

Last synced: 2 months ago
JSON representation

Data Engineering examples for Airflow, Prefect, and Mage.ai; dbt for BigQuery, Redshift, ClickHouse, PostgreSQL; Spark/PySpark for Batch processing; and Kafka for Stream processing

Awesome Lists containing this project

README

        

# Data Engineering Zoomcamp

## Taking the course (20205 Cohort)

* **Start**: 13 January 2025
* **Registration link**: https://airtable.com/shr6oVXeQvSI5HuWD
* [Cohort folder](https://github.com/DataTalksClub/data-engineering-zoomcamp/tree/main/cohorts/2025) with homeworks and deadlines

## Syllabus

### [Module 1: Data ingestion](module1-data-ingestion/)
* [Python ingestion with polars and pandas](module1-data-ingestion/python-ingest/)
* Rust data ingestion
* [data load tool (dlt)](module1-data-ingestion/data-load-tool/)
* [IaC with Terraform (Google Cloud Platform)](infrastructure/terraform-gcp/)
* Homework

### [Module 2: Workflow orchestration](module2-workflow-orchestration/)
* [Workflow orchestration with Airflow](module2-workflow-orchestration/airflow/)
* [Workflow orchestration with Mage](module2-workflow-orchestration/mageai/)
* [Workflow orchestration with Prefect](module2-workflow-orchestration/prefect/)
* Homework

### [Module 3: Lakehouses & Data Warehouse](module3-lakehouse-data-warehouse/)
* [BigQuery Data Warehouse](module3-lakehouse-data-warehouse/bigquery/)
* [StarRocks Query Engine](module3-lakehouse-data-warehouse/starrocks/)
* Lakehouse with Delta Lake
* Homework

### [Module 4: Analytics engineering](module4-analytics-engineering/)
* [BigQuery and dbt](module4-analytics-engineering/bigquery/)
* [Redshift and dbt](module4-analytics-engineering/redshift/)
* Databricks and dbt
* [ClickHouse and dbt](module4-analytics-engineering/clickhouse/)
* [PostgreSQL and dbt](module4-analytics-engineering/postgres/)
* [DuckDB and dbt](module4-analytics-engineering/duckdb/)
* [Data visualization with Superset/Metabase](module4-analytics-engineering/visualization/)
* Homework

### [Module 5: Batch processing](module5-batch-processing/)
* [PySpark](module5-batch-processing/pyspark/)
* Spark + Kotlin API
* Spark (Scala)
* Homework

### [Module 6: Stream processing](module6-stream-processing/)
* [Stream processing with Kafka, ksqlDB and Kotlin](module6-stream-processing/kotlin/)
* [Kafka Streams with ksqlDB](module6-stream-processing/ksqldb/)
* [RisingWave: Streaming Database](module6-stream-processing/risingwave/)
* Homework

### Extras
* [LakeHouse with Delta, Iceberg, Hive](https://github.com/iobruno/lakehouse-labs/)
* Capstone Project