An open API service indexing awesome lists of open source software.

https://github.com/kirlewn/dataengineeringzoomcamp2025

This repository is my work as I follow along with the Data Engineering Zoomcamp found here: https://github.com/DataTalksClub/data-engineering-zoomcamp
https://github.com/kirlewn/dataengineeringzoomcamp2025

data-engineering docker docker-compose docker-container etl-pipeline postgresql terraform

Last synced: 3 months ago
JSON representation

This repository is my work as I follow along with the Data Engineering Zoomcamp found here: https://github.com/DataTalksClub/data-engineering-zoomcamp

Awesome Lists containing this project

README

          

# DataEngineeringZoomCamp2025
This repository is my work as I follow along with the Data Engineering Zoomcamp found here: https://github.com/DataTalksClub/data-engineering-zoomcamp

The goal of the course is to create a working Data Engineering pipeline.

![image](https://github.com/user-attachments/assets/ad9cfca8-3d4e-4d9b-ba8e-943c1f5157ee)

# Modules

## Module 1: Containerization and Infrastructure as Code
- Introduction to GCP
- Docker and Docker Compose
- Running PostgreSQL with Docker
- Infrastructure setup with Terraform

## Module 2: Workflow Orchestration
- Data Lakes and Workflow Orchestration
- Workflow orchestration with Kestra

## Workshop 1: Data Ingestion
- API reading and pipeline scalability
- Data normalization and incremental loading

## Module 3: Data Warehousing
- Introduction to BigQuery
- Partitioning, clustering, and best practices
- Machine learning in BigQuery

## Module 4: Analytics Engineering
- dbt (data build tool) with PostgreSQL & BigQuery
- Testing, documentation, and deployment
- Data visualization with Metabase

## Module 5: Batch Processing
- Introduction to Apache Spark
- DataFrames and SQL
- Internals of GroupBy and Joins

## Module 6: Streaming
- Introduction to Kafka
- Kafka Streams and KSQL
- Schema management with Avro