https://github.com/agutiernc/data-eng-zoomcamp

Data Engineering Zoomcamp 2024
https://github.com/agutiernc/data-eng-zoomcamp

apache-kafka apache-spark ci-cd data-ingestion data-warehouse dbt dlt docker etl etl-pipeline google-cloud-platform jupyter-notebook mage-ai pandas pipelines postgresql pyarrow python sql terraform

Last synced: 3 months ago
JSON representation

Data Engineering Zoomcamp 2024

Host: GitHub
URL: https://github.com/agutiernc/data-eng-zoomcamp
Owner: agutiernc
Created: 2024-01-27T09:57:58.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-04-08T02:25:19.000Z (about 2 years ago)
Last Synced: 2026-01-03T13:32:40.807Z (6 months ago)
Topics: apache-kafka, apache-spark, ci-cd, data-ingestion, data-warehouse, dbt, dlt, docker, etl, etl-pipeline, google-cloud-platform, jupyter-notebook, mage-ai, pandas, pipelines, postgresql, pyarrow, python, sql, terraform
Language: Jupyter Notebook
Homepage:
Size: 1 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Data Engineering Zoomcamp 2024

This repository contains the projects, assignments, and code I've worked on as part of the *Data Engineering Zoomcamp* offered by **[Data Talks Club](https://github.com/DataTalksClub/data-engineering-zoomcamp)**. The ZoomCamp is a comprehensive online program designed to equip individuals with the essential skills and knowledge required for pursuing a career in Data Engineering.

## Course Overview

The Data Engineering Zoomcamp covers a wide range of topics, including data modeling, data pipelines, batch and stream processing, data warehousing, and various data engineering tools and technologies. Throughout the course, I've gained hands-on experience with industry-standard tools such as Apache Spark, Apache Kafka, Docker, Mage, PostgreSQL, Redpanda, dbt cloud, dlt, and more.

## Repository Structure

The repository is organized into several folders, each representing a module or topic covered in the zoomcamp:

- `module 1`: Containerization and Infrastructure as Code
- `module 2`: Workflow Orchestration with **[Mage](https://github.com/mage-ai/mage-ai)**
- `module 3`: Data Warehouse and Big Data
- `module 4`: Analytics Engineering with **[dbt](https://www.getdbt.com/)**
- `module 5`: Batch Processing with Apache Spark
- `module 6`: Streaming Data with Apache Spark, Apache Kafka, and [Redpanda](https://redpanda.com)
- `Workshop 1`: Data Ingestion with **[dlt](https://github.com/dlt-hub/dlt)**
- `Workshop 2`: Stream processing with **[RisingWave](https://github.com/risingwavelabs/risingwave)**

Each module folder contains the corresponding assignments, code examples, and documentation related to the respective topic.

## Learning Outcomes

Through this course, I have acquired a comprehensive understanding of Data Engineering principles and best practices. Some key areas of learning include:

- Data modeling techniques
- Building robust and scalable data pipelines
- Batch and stream processing with Apache Spark and Apache Kafka
- Data warehousing concepts and implementation with cloud-based solutions
- Containerization and orchestration with Docker, PostgreSQL, Mage
- Proficiency in SQL, Python, and other Data Engineering-related tools

This repository serves as a showcase of my work and demonstrates my proficiency in various Data Engineering tools and technologies.

## Contact

If you have any questions or would like to discuss my work further, please feel free to reach out to me on [LinkedIn](https://www.linkedin.com/in/alfonso-gutierrez01).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/agutiernc/data-eng-zoomcamp

Awesome Lists containing this project

README