Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ndomah/data-engineering
Links to data engineering projects and learning materials.
https://github.com/ndomah/data-engineering
airflow aws azure cassandra data-engineering databricks elt etl kafka pipelines snowflake
Last synced: about 12 hours ago
JSON representation
Links to data engineering projects and learning materials.
- Host: GitHub
- URL: https://github.com/ndomah/data-engineering
- Owner: ndomah
- Created: 2025-02-14T20:17:47.000Z (about 21 hours ago)
- Default Branch: main
- Last Pushed: 2025-02-14T20:36:15.000Z (about 21 hours ago)
- Last Synced: 2025-02-14T21:28:13.651Z (about 20 hours ago)
- Topics: airflow, aws, azure, cassandra, data-engineering, databricks, elt, etl, kafka, pipelines, snowflake
- Homepage:
- Size: 4.88 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data Engineering
This README contains links to my data engineering portfolio projects and learning materials.## Projects
[**AWS YouTube Data Analysis**](https://github.com/ndomah/AWS-YouTube-Data-Analysis)
- Tools Used: Python, SQL, AWS, Lambda, Athena, S3, IAM, Glue, QuickSight
- Analyzed YouTube trending video data using AWS services to build a scalable pipeline for data ingestion, ETL, and storage in a centralized data lake. Created QuickSight dashboards highlighting video views by country, category, and region. Workflow included ingestion, preprocessing, cataloging, and analysis.[**Real-Time Data Streaming of Random User Data**](https://github.com/ndomah/Realtime-Data-Streaming-of-Random-User-Data)
- Tools Used: Python, PostgreSQL, Docker, Airflow, Kafka, Spark, Cassandra, Zookeeper
- Built a robust, scalabale, and fault-tolerant pipeline using a modern tech stack. The pipeline ingests, processes, and stores random user-generated data from an API.[**Azure Medallion Architecture Pipeline**](https://github.com/ndomah/Azure-Medallion-Pipeline)
- Tools Used: Python, SQL, Azure, dbt, Databricks
- Implemented a complete data engineering pipeline using the Medallion Architecture (Bronze, Silver, and Gold layers) within Azure Databricks. It integrates several Azure services and dbt (Data Build Tool) to orchestrate data ingestion, transformation, and storage, ensuring a robust, scalable, and secure solution.[**ELT Pipeline**](https://github.com/ndomah/ELT-Pipeline)
- Tools Used: Python, SQL, Airflow, Snowflake, dbt
- Built a simple ELT pipeline using dbt (Data Build Tool) to transform data in Snowflake, with orchestration managed by Apache Airflow. This setup showcases a modern data engineering workflow, essential for handling large-scale data transformations efficiently.## Learning Materials
[**The Data Engineering Academy**](https://github.com/ndomah/The-Data-Engineering-Academy)[**Data Engineering Zoomcamp**]()