Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/agutiernc/data-eng-zoomcamp

Data Engineering Zoomcamp 2024
https://github.com/agutiernc/data-eng-zoomcamp

apache-kafka apache-spark ci-cd data-ingestion data-warehouse dbt dlt docker etl-pipeline google-cloud-platform jupyter-notebook mage-ai pandas pipelines postgresql pyarrow python sql terraform

Last synced: 6 days ago
JSON representation

Data Engineering Zoomcamp 2024

Awesome Lists containing this project

README

        

# Data Engineering Zoomcamp 2024

This repository contains the projects, assignments, and code I've worked on as part of the *Data Engineering Zoomcamp* offered by **[Data Talks Club](https://github.com/DataTalksClub/data-engineering-zoomcamp)**. The ZoomCamp is a comprehensive online program designed to equip individuals with the essential skills and knowledge required for pursuing a career in Data Engineering.

## Course Overview

The Data Engineering Zoomcamp covers a wide range of topics, including data modeling, data pipelines, batch and stream processing, data warehousing, and various data engineering tools and technologies. Throughout the course, I've gained hands-on experience with industry-standard tools such as Apache Spark, Apache Kafka, Docker, Mage, PostgreSQL, Redpanda, dbt cloud, dlt, and more.

## Repository Structure

The repository is organized into several folders, each representing a module or topic covered in the zoomcamp:

- `module 1`: Containerization and Infrastructure as Code
- `module 2`: Workflow Orchestration with **[Mage](https://github.com/mage-ai/mage-ai)**
- `module 3`: Data Warehouse and Big Data
- `module 4`: Analytics Engineering with **[dbt](https://www.getdbt.com/)**
- `module 5`: Batch Processing with Apache Spark
- `module 6`: Streaming Data with Apache Spark, Apache Kafka, and [Redpanda](https://redpanda.com)
- `Workshop 1`: Data Ingestion with **[dlt](https://github.com/dlt-hub/dlt)**
- `Workshop 2`: Stream processing with **[RisingWave](https://github.com/risingwavelabs/risingwave)**

Each module folder contains the corresponding assignments, code examples, and documentation related to the respective topic.

## Learning Outcomes

Through this course, I have acquired a comprehensive understanding of Data Engineering principles and best practices. Some key areas of learning include:

- Data modeling techniques
- Building robust and scalable data pipelines
- Batch and stream processing with Apache Spark and Apache Kafka
- Data warehousing concepts and implementation with cloud-based solutions
- Containerization and orchestration with Docker, PostgreSQL, Mage
- Proficiency in SQL, Python, and other Data Engineering-related tools

This repository serves as a showcase of my work and demonstrates my proficiency in various Data Engineering tools and technologies.

## Contact

If you have any questions or would like to discuss my work further, please feel free to reach out to me on [LinkedIn](https://www.linkedin.com/in/alfonso-gutierrez01).