{"id":30067571,"url":"https://github.com/djanmagno/udacity-data-engineer-nanodegree","last_synced_at":"2026-04-16T05:01:43.655Z","repository":{"id":308731927,"uuid":"425796962","full_name":"djanmagno/Udacity-Data-Engineer-Nanodegree","owner":"djanmagno","description":"Repository containing the notebooks used on classes and projects done from the Udacity Data Engineer Nanodegree. ","archived":false,"fork":false,"pushed_at":"2021-11-11T21:56:16.000Z","size":4457,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-08-07T15:50:46.524Z","etag":null,"topics":["airflow","apache-cassandra","data-engineering","data-model","data-warehouse","etl-pipeline","postgresql","python"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/djanmagno.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-11-08T10:41:52.000Z","updated_at":"2021-11-11T22:28:57.000Z","dependencies_parsed_at":"2025-08-07T16:01:40.834Z","dependency_job_id":null,"html_url":"https://github.com/djanmagno/Udacity-Data-Engineer-Nanodegree","commit_stats":null,"previous_names":["djanmagno/udacity-data-engineer-nanodegree"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/djanmagno/Udacity-Data-Engineer-Nanodegree","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/djanmagno%2FUdacity-Data-Engineer-Nanodegree","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/djanmagno%2FUdacity-Data-Engineer-Nanodegree/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/djanmagno%2FUdacity-Data-Engineer-Nanodegree/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/djanmagno%2FUdacity-Data-Engineer-Nanodegree/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/djanmagno","download_url":"https://codeload.github.com/djanmagno/Udacity-Data-Engineer-Nanodegree/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/djanmagno%2FUdacity-Data-Engineer-Nanodegree/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31872036,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-15T15:24:51.572Z","status":"online","status_checked_at":"2026-04-16T02:00:06.042Z","response_time":69,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airflow","apache-cassandra","data-engineering","data-model","data-warehouse","etl-pipeline","postgresql","python"],"created_at":"2025-08-08T09:01:53.057Z","updated_at":"2026-04-16T05:01:43.602Z","avatar_url":"https://github.com/djanmagno.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!-- Add banner here --\u003e\n![Banner](images/banner-Udacity-Data-Engineer.png)\n\n## Project Title\n\u003cbr /\u003e\n\n\u003cp align=\"center\"\u003e\n \u003c/a\u003e\n \u003ch1 align=\"center\"\u003eUdacity Data Engineering Nanodegree\u003c/h1\u003e\n \u003cp align=\"center\"\u003e\n  Udacity Nanodegree\n  \u003cbr /\u003e\n  \u003ca href=https://github.com/djanmagno/Udacity-Data-Engineer-Nanodegree\u003e\u003cstrong\u003eExplore the repository»\u003c/strong\u003e\u003c/a\u003e\n  \u003cbr /\u003e\n  \u003cbr /\u003e\n \u003c/p\u003e\n\n\u003c/p\u003e\n\n\u003c!-- Add buttons here --\u003e\n[![Language](https://img.shields.io/badge/Python-3.9%2B-brightgreen?style=flat\u0026logo=Python)](https://www.python.org/downloads/release/python-365/) ![GitHub release (latest by date including pre-releases)](https://img.shields.io/github/v/release/djanmagno/Udacity-Data-Engineer-Nanodegree?color=red\u0026include_prereleases)\n![GitHub last commit](https://img.shields.io/github/last-commit/djanmagno/Udacity-Data-Engineer-Nanodegree?color=yellow)\n![GitHub issues](https://img.shields.io/github/issues-raw/djanmagno/Udacity-Data-Engineer-Nanodegree?color=orange)\n![GitHub pull requests](https://img.shields.io/github/issues-pr/djanmagno/Udacity-Data-Engineer-Nanodegree?color=blueviolet)\n![GitHub](https://img.shields.io/github/license/djanmagno/Udacity-Data-Engineer-Nanodegree?color=yellowgreen)\n[![Linkedin](https://img.shields.io/badge/Linkedin-blue?style=flat\u0026logo=Linkedin)](https://www.linkedin.com/in/djanmagno)\n\n\u003e Postgres, Cassandra, AWS, RedShift, S3, EMR, Spark, Airflow, ETL, ELT, Data Modelling, Database Schema, Data Warehousing, Data Lakes, Data Engineering, Udacity\n\n## About The Nanodegree\n\nThe data engineering field is expected to continue growing rapidly over the next several years, and there’s huge demand for data engineers across industries. This Data Engineer Nanodegree program is comprised of content and curriculum to support six (6) projects. It is estimated to complete the program in five (5) months working 10 hours per week.\n\nEach project will be reviewed by the Udacity reviewer network and a feedback is provided and if the student does not pass the project, he will be asked to resubmit the project until it passes.\n\nThe objective here consists in learning to design data models, build data warehouses and data lakes, automate data pipelines, and work with massive datasets. \n\nAt the end of the program, the student will combine the acquired new skills by completing a capstone project.\n\nEducational Objectives:\n* Create user-friendly relational and NoSQL data models\n* Create scalable and efficient data warehouses\n* Work efficiently with massive datasets\n* Build and interact with a cloud-based data lake\n* Automate and monitor data pipelines\n* Develop proficiency in Spark, Airflow, and AWS tools\n\n## Certificate\n\n\u003c!-- \u003cimg src=\"###\" /\u003e --\u003e\n\nTO BE ATTACHED!\n\n## **Program Details**\n\nDuring this program,  the student will complete four courses and five projects. Throughout the projects,  he will play part of a data engineer at a music streaming company. He will work with the same type of data in each project, but with increasing data volume, velocity, and complexity. below you can find a course-by-course breakdown.\n\nAssociated notebooks for this course can be found [here](https://github.com/djanmagno/Udacity-Data-Engineer-Nanodegree/tree/master/Notebook-Exercises).\n\n#### **Course 1 – Data Modeling**\n\nIn this course,  the student will learn to fit the diverse needs of data\nconsumers, understanding the differences between different data models, and how to choose the\nappropriate data model for a given situation. He will also build fluency in PostgreSQL and Apache Cassandra.\n\n**Project 01 - Data Modeling with Postgres**\n\nIn this project, the student will model user activity data for a music streaming app called Sparkify. He will create a relational database and ETL pipeline designed to optimize queries for understanding what songs users are listening to. In PostgreSQL he will also define Fact and Dimension tables and insert data into the new tables created.\n\n* Link for Project 01 - [Link](https://github.com/djanmagno/Udacity-Data-Engineer-Nanodegree/tree/master/Project-1-Data-Modeling-with-Postgres)\n\n**Project 02 - Data Modeling with Apache Cassandra**\n\nIn these projects, the student will model user activity data for a music streaming app called Sparkify. He will create a database and ETL pipeline, in Apache Cassandra, he will model the data so he can run specific queries provided by the analytics team at Sparkify.\n\n* Link for Project 02 - [Link](https://github.com/djanmagno/Udacity-Data-Engineer-Nanodegree/tree/master/Project-2-Data-Modeling-with-Apache-Cassandra)\n\n\u003c!-- #### **Course 2 – Cloud Data Warehouses**\n\nIn this course,  we will learn to create cloud-based data warehouses. In the project,  we will build an ELT pipeline that extracts data from Amazon S3, stages it in Amazon Redshift, and transforms it into a set of dimensional tables.\n\nAssociated notebooks for this course can be found [here](###). \n\nProject 3 can be found [here](###). \n\n#### **Course 3 – Data Lakes with Apache Spark**\n\nIn this course,  we will learn more about the big data ecosystem, how to work with massive datasets with Apache Spark, and how to store big data in a data lake. In the project,  we will build an ETL pipeline for a data lake using Apache Spark and S3.\n\nAssociated notebooks for this course can be found [here](###).\n\nProject 4 can be found [here](###). \n\n#### **Course 4 – Data Pipelines with Apache Airflow**\n\nIn this course,  we will learn to schedule, automate, and monitor data pipelines using Apache Airflow. In the project, they’ll continue your work on the music streaming company’s data infrastructure by creating and automating a set of data pipelines. \n\nAssociated notebooks for this course can be found [here](###).\n\nProject 5 can be found [here](###). \n\n#### **Capstone Project**\n\nUndecided project.\n\nCapstone Project can be found [here](###). --\u003e\n\n \n\n\u003c!-- LICENSE --\u003e\n\n## License\n\n[(Back to top)](#table-of-contents)\n\nDistributed under the MIT License. See `LICENSE` for more information.\n\n[MIT License](https://opensource.org/licenses/MIT)\n\n\u003c!-- CONTACT --\u003e\n\n## Contact\n\nDjan Magno - djan.magno@gmail.com\n\nProject Link - [https://github.com/djanmagno/Udacity-Data-Engineer-Nanodegree](https://github.com/djanmagno/Udacity-Data-Engineer-Nanodegree)\n\n\n## Footer\n[(Back to top)](#table-of-contents)\n\n\u003c!-- Let's also add a footer because I love footers and also you **can** use this to convey important info.\nLet's make it an image because by now you have realised that multimedia in images == cool(*please notice the subtle programming joke). --\u003e\n\nLeave a star in GitHub, give a clap in Medium and share this guide if you found this helpful.\n\n\u003c!-- Add the footer here --\u003e\n\n![Footer](images/footer.png)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdjanmagno%2Fudacity-data-engineer-nanodegree","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdjanmagno%2Fudacity-data-engineer-nanodegree","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdjanmagno%2Fudacity-data-engineer-nanodegree/lists"}