https://github.com/kirlewn/dataengineeringzoomcamp2025
This repository is my work as I follow along with the Data Engineering Zoomcamp found here: https://github.com/DataTalksClub/data-engineering-zoomcamp
https://github.com/kirlewn/dataengineeringzoomcamp2025
data-engineering docker docker-compose docker-container etl-pipeline postgresql terraform
Last synced: 3 months ago
JSON representation
This repository is my work as I follow along with the Data Engineering Zoomcamp found here: https://github.com/DataTalksClub/data-engineering-zoomcamp
- Host: GitHub
- URL: https://github.com/kirlewn/dataengineeringzoomcamp2025
- Owner: Kirlewn
- Created: 2025-01-08T11:59:07.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-02-04T16:12:59.000Z (over 1 year ago)
- Last Synced: 2025-02-04T17:22:47.282Z (over 1 year ago)
- Topics: data-engineering, docker, docker-compose, docker-container, etl-pipeline, postgresql, terraform
- Language: Jupyter Notebook
- Homepage:
- Size: 23.3 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# DataEngineeringZoomCamp2025
This repository is my work as I follow along with the Data Engineering Zoomcamp found here: https://github.com/DataTalksClub/data-engineering-zoomcamp
The goal of the course is to create a working Data Engineering pipeline.

# Modules
## Module 1: Containerization and Infrastructure as Code
- Introduction to GCP
- Docker and Docker Compose
- Running PostgreSQL with Docker
- Infrastructure setup with Terraform
## Module 2: Workflow Orchestration
- Data Lakes and Workflow Orchestration
- Workflow orchestration with Kestra
## Workshop 1: Data Ingestion
- API reading and pipeline scalability
- Data normalization and incremental loading
## Module 3: Data Warehousing
- Introduction to BigQuery
- Partitioning, clustering, and best practices
- Machine learning in BigQuery
## Module 4: Analytics Engineering
- dbt (data build tool) with PostgreSQL & BigQuery
- Testing, documentation, and deployment
- Data visualization with Metabase
## Module 5: Batch Processing
- Introduction to Apache Spark
- DataFrames and SQL
- Internals of GroupBy and Joins
## Module 6: Streaming
- Introduction to Kafka
- Kafka Streams and KSQL
- Schema management with Avro