Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/iobruno/data-engineering-zoomcamp
Data Engineering examples for Airflow, Prefect, and Mage.ai; dbt for BigQuery, Redshift, ClickHouse, PostgreSQL; Spark/PySpark for Batch processing; and Kafka for Stream processing
https://github.com/iobruno/data-engineering-zoomcamp
airflow airflow-dags dbt-bigquery dbt-clickhouse dbt-postgres dbt-redshift kafka ksqldb mageai prefect pyspark spark typer-cli
Last synced: about 1 month ago
JSON representation
Data Engineering examples for Airflow, Prefect, and Mage.ai; dbt for BigQuery, Redshift, ClickHouse, PostgreSQL; Spark/PySpark for Batch processing; and Kafka for Stream processing
- Host: GitHub
- URL: https://github.com/iobruno/data-engineering-zoomcamp
- Owner: iobruno
- License: cc-by-sa-4.0
- Created: 2023-01-19T16:22:49.000Z (almost 2 years ago)
- Default Branch: master
- Last Pushed: 2024-11-30T21:58:41.000Z (about 2 months ago)
- Last Synced: 2024-12-09T03:51:14.295Z (about 1 month ago)
- Topics: airflow, airflow-dags, dbt-bigquery, dbt-clickhouse, dbt-postgres, dbt-redshift, kafka, ksqldb, mageai, prefect, pyspark, spark, typer-cli
- Language: Python
- Homepage: https://github.com/DataTalksClub/data-engineering-zoomcamp
- Size: 4.5 MB
- Stars: 52
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-clickhouse - iobruno/data-engineering-zoomcamp - The project provides a collection of resources and examples for Data Engineering, focusing on tools like Airflow, Prefect, and Kafka, along with various databases. (Integrations / Data Transfer and Synchronization)
README
# Data Engineering Zoomcamp
## Taking the course
### 2024 Cohort
* **Start**: 15 January 2024 (Monday) at 17:00 CET
* **Registration link**: https://airtable.com/shr6oVXeQvSI5HuWD
* [Cohort folder](https://github.com/DataTalksClub/data-engineering-zoomcamp/tree/main/cohorts/2024) with homeworks and deadlines### Self-paced mode
All the materials of the course are freely available, so that you
can take the course at your own pace* Follow the suggested syllabus (see below) week by week
* You don't need to fill in the registration form. Just start watching the videos and join Slack
* Check [FAQ](https://docs.google.com/document/d/19bnYs80DwuUimHM65UV3sylsCn2j1vziPOwzBwQrebw/edit?usp=sharing) if you have problems## Syllabus
### [Module 1: Data ingestion & IaC](module1-data-ingestion/)
* [Python ingestion with polars and pandas](module1-data-ingestion/python-ingest/)
* Rust data ingestion
* [data load tool (dlt)](module1-data-ingestion/data-load-tool/)
* [Terraform for BigQuery and GCS](infrastructure/terraform-gcp/)
* Homework### [Module 2: Workflow orchestration](module2-workflow-orchestration/)
* [Workflow orchestration with Airflow](module2-workflow-orchestration/airflow/)
* [Workflow orchestration with Mage](module2-workflow-orchestration/mageai/)
* [Workflow orchestration with Prefect](module2-workflow-orchestration/prefect/)
* Homework### [Module 3: Data Warehouse](module3-data-warehouse/)
* [BigQuery Data Warehouse](module3-data-warehouse/bigquery/)
* Lakehouse with Delta Lake/Iceberg
* Homework### [Module 4: Analytics engineering](module4-analytics-engineering/)
* [BigQuery and dbt](module4-analytics-engineering/bigquery/)
* [Redshift and dbt](module4-analytics-engineering/redshift/)
* Databricks and dbt
* [ClickHouse and dbt](module4-analytics-engineering/clickhouse/)
* [PostgreSQL and dbt](module4-analytics-engineering/postgres/)
* [DuckDB and dbt](module4-analytics-engineering/duckdb/)
* [Data visualization with Superset/Metabase](module4-analytics-engineering/visualization/)
* Homework### [Module 5: Batch processing](module5-batch-processing/)
* [Big Data ecosystem](module5-batch-processing/bigdata-ecosystem/)
* [PySpark](module5-batch-processing/pyspark/)
* Spark + Scala
* Spark + Kotlin (TBD)
* Homework### [Module 6: Stream processing](module6-stream-processing/)
* [Stream processing with Kafka, ksqlDB and Kotlin](module6-stream-processing/kotlin/)
* [Kafka Streams with ksqlDB](module6-stream-processing/ksqldb/)
* [RisingWave: Streaming Database](module6-stream-processing/risingwave/)
* Homework