Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/agutiernc/data-eng-zoomcamp
Data Engineering Zoomcamp 2024
https://github.com/agutiernc/data-eng-zoomcamp
apache-kafka apache-spark ci-cd data-ingestion data-warehouse dbt dlt docker etl-pipeline google-cloud-platform jupyter-notebook mage-ai pandas pipelines postgresql pyarrow python sql terraform
Last synced: 6 days ago
JSON representation
Data Engineering Zoomcamp 2024
- Host: GitHub
- URL: https://github.com/agutiernc/data-eng-zoomcamp
- Owner: agutiernc
- Created: 2024-01-27T09:57:58.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-04-08T02:25:19.000Z (7 months ago)
- Last Synced: 2024-04-08T03:31:32.849Z (7 months ago)
- Topics: apache-kafka, apache-spark, ci-cd, data-ingestion, data-warehouse, dbt, dlt, docker, etl-pipeline, google-cloud-platform, jupyter-notebook, mage-ai, pandas, pipelines, postgresql, pyarrow, python, sql, terraform
- Language: Jupyter Notebook
- Homepage:
- Size: 1 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data Engineering Zoomcamp 2024
This repository contains the projects, assignments, and code I've worked on as part of the *Data Engineering Zoomcamp* offered by **[Data Talks Club](https://github.com/DataTalksClub/data-engineering-zoomcamp)**. The ZoomCamp is a comprehensive online program designed to equip individuals with the essential skills and knowledge required for pursuing a career in Data Engineering.
## Course Overview
The Data Engineering Zoomcamp covers a wide range of topics, including data modeling, data pipelines, batch and stream processing, data warehousing, and various data engineering tools and technologies. Throughout the course, I've gained hands-on experience with industry-standard tools such as Apache Spark, Apache Kafka, Docker, Mage, PostgreSQL, Redpanda, dbt cloud, dlt, and more.
## Repository Structure
The repository is organized into several folders, each representing a module or topic covered in the zoomcamp:
- `module 1`: Containerization and Infrastructure as Code
- `module 2`: Workflow Orchestration with **[Mage](https://github.com/mage-ai/mage-ai)**
- `module 3`: Data Warehouse and Big Data
- `module 4`: Analytics Engineering with **[dbt](https://www.getdbt.com/)**
- `module 5`: Batch Processing with Apache Spark
- `module 6`: Streaming Data with Apache Spark, Apache Kafka, and [Redpanda](https://redpanda.com)
- `Workshop 1`: Data Ingestion with **[dlt](https://github.com/dlt-hub/dlt)**
- `Workshop 2`: Stream processing with **[RisingWave](https://github.com/risingwavelabs/risingwave)**Each module folder contains the corresponding assignments, code examples, and documentation related to the respective topic.
## Learning Outcomes
Through this course, I have acquired a comprehensive understanding of Data Engineering principles and best practices. Some key areas of learning include:
- Data modeling techniques
- Building robust and scalable data pipelines
- Batch and stream processing with Apache Spark and Apache Kafka
- Data warehousing concepts and implementation with cloud-based solutions
- Containerization and orchestration with Docker, PostgreSQL, Mage
- Proficiency in SQL, Python, and other Data Engineering-related toolsThis repository serves as a showcase of my work and demonstrates my proficiency in various Data Engineering tools and technologies.
## Contact
If you have any questions or would like to discuss my work further, please feel free to reach out to me on [LinkedIn](https://www.linkedin.com/in/alfonso-gutierrez01).