Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/iobruno/data-engineering-zoomcamp

Data Engineering examples for Airflow, Prefect, and Mage.ai; dbt for BigQuery, Redshift, ClickHouse, PostgreSQL; Spark/PySpark for Batch processing; and Kafka for Stream processing
https://github.com/iobruno/data-engineering-zoomcamp

airflow airflow-dags dbt-bigquery dbt-clickhouse dbt-postgres dbt-redshift kafka ksqldb mageai prefect pyspark spark typer-cli

Last synced: about 8 hours ago
JSON representation

Data Engineering examples for Airflow, Prefect, and Mage.ai; dbt for BigQuery, Redshift, ClickHouse, PostgreSQL; Spark/PySpark for Batch processing; and Kafka for Stream processing

Awesome Lists containing this project

README

        

# Data Engineering Zoomcamp

## Taking the course

### 2024 Cohort

* **Start**: 15 January 2024 (Monday) at 17:00 CET
* **Registration link**: https://airtable.com/shr6oVXeQvSI5HuWD
* [Cohort folder](https://github.com/DataTalksClub/data-engineering-zoomcamp/tree/main/cohorts/2024) with homeworks and deadlines

### Self-paced mode

All the materials of the course are freely available, so that you
can take the course at your own pace

* Follow the suggested syllabus (see below) week by week
* You don't need to fill in the registration form. Just start watching the videos and join Slack
* Check [FAQ](https://docs.google.com/document/d/19bnYs80DwuUimHM65UV3sylsCn2j1vziPOwzBwQrebw/edit?usp=sharing) if you have problems

## Syllabus

### [Module 1: Data ingestion & IaC](module1-data-ingestion/)
* [Python ingestion with polars and pandas](module1-data-ingestion/python-ingest/)
* Rust data ingestion
* [data load tool (dlt)](module1-data-ingestion/data-load-tool/)
* [Terraform for BigQuery and GCS](infrastructure/terraform-gcp/)
* Homework

### [Module 2: Workflow orchestration](module2-workflow-orchestration/)
* [Workflow orchestration with Airflow](module2-workflow-orchestration/airflow/)
* [Workflow orchestration with Mage](module2-workflow-orchestration/mageai/)
* [Workflow orchestration with Prefect](module2-workflow-orchestration/prefect/)
* Homework

### [Module 3: Data Warehouse](module3-data-warehouse/)
* [BigQuery Data Warehouse](module3-data-warehouse/bigquery/)
* Lakehouse with Delta Lake/Iceberg
* Homework

### [Module 4: Analytics engineering](module4-analytics-engineering/)
* [BigQuery and dbt](module4-analytics-engineering/bigquery/)
* [Redshift and dbt](module4-analytics-engineering/redshift/)
* Databricks and dbt
* [ClickHouse and dbt](module4-analytics-engineering/clickhouse/)
* [PostgreSQL and dbt](module4-analytics-engineering/postgres/)
* [DuckDB and dbt](module4-analytics-engineering/duckdb/)
* [Data visualization with Superset/Metabase](module4-analytics-engineering/visualization/)
* Homework

### [Module 5: Batch processing](module5-batch-processing/)
* [Big Data ecosystem](module5-batch-processing/bigdata-ecosystem/)
* [PySpark](module5-batch-processing/pyspark/)
* Spark + Scala
* Spark + Kotlin (TBD)
* Homework

### [Module 6: Stream processing](module6-stream-processing/)
* [Stream processing with Kafka, ksqlDB and Kotlin](module6-stream-processing/kotlin/)
* [Kafka Streams with ksqlDB](module6-stream-processing/ksqldb/)
* [RisingWave: Streaming Database](module6-stream-processing/risingwave/)
* Homework