An open API service indexing awesome lists of open source software.

https://github.com/djanmagno/udacity-data-engineer-nanodegree

Repository containing the notebooks used on classes and projects done from the Udacity Data Engineer Nanodegree.
https://github.com/djanmagno/udacity-data-engineer-nanodegree

airflow apache-cassandra data-engineering data-model data-warehouse etl-pipeline postgresql python

Last synced: 6 months ago
JSON representation

Repository containing the notebooks used on classes and projects done from the Udacity Data Engineer Nanodegree.

Awesome Lists containing this project

README

          

![Banner](images/banner-Udacity-Data-Engineer.png)

## Project Title



Udacity Data Engineering Nanodegree



Udacity Nanodegree


Explore the repository»




[![Language](https://img.shields.io/badge/Python-3.9%2B-brightgreen?style=flat&logo=Python)](https://www.python.org/downloads/release/python-365/) ![GitHub release (latest by date including pre-releases)](https://img.shields.io/github/v/release/djanmagno/Udacity-Data-Engineer-Nanodegree?color=red&include_prereleases)
![GitHub last commit](https://img.shields.io/github/last-commit/djanmagno/Udacity-Data-Engineer-Nanodegree?color=yellow)
![GitHub issues](https://img.shields.io/github/issues-raw/djanmagno/Udacity-Data-Engineer-Nanodegree?color=orange)
![GitHub pull requests](https://img.shields.io/github/issues-pr/djanmagno/Udacity-Data-Engineer-Nanodegree?color=blueviolet)
![GitHub](https://img.shields.io/github/license/djanmagno/Udacity-Data-Engineer-Nanodegree?color=yellowgreen)
[![Linkedin](https://img.shields.io/badge/Linkedin-blue?style=flat&logo=Linkedin)](https://www.linkedin.com/in/djanmagno)

> Postgres, Cassandra, AWS, RedShift, S3, EMR, Spark, Airflow, ETL, ELT, Data Modelling, Database Schema, Data Warehousing, Data Lakes, Data Engineering, Udacity

## About The Nanodegree

The data engineering field is expected to continue growing rapidly over the next several years, and there’s huge demand for data engineers across industries. This Data Engineer Nanodegree program is comprised of content and curriculum to support six (6) projects. It is estimated to complete the program in five (5) months working 10 hours per week.

Each project will be reviewed by the Udacity reviewer network and a feedback is provided and if the student does not pass the project, he will be asked to resubmit the project until it passes.

The objective here consists in learning to design data models, build data warehouses and data lakes, automate data pipelines, and work with massive datasets.

At the end of the program, the student will combine the acquired new skills by completing a capstone project.

Educational Objectives:
* Create user-friendly relational and NoSQL data models
* Create scalable and efficient data warehouses
* Work efficiently with massive datasets
* Build and interact with a cloud-based data lake
* Automate and monitor data pipelines
* Develop proficiency in Spark, Airflow, and AWS tools

## Certificate

TO BE ATTACHED!

## **Program Details**

During this program, the student will complete four courses and five projects. Throughout the projects, he will play part of a data engineer at a music streaming company. He will work with the same type of data in each project, but with increasing data volume, velocity, and complexity. below you can find a course-by-course breakdown.

Associated notebooks for this course can be found [here](https://github.com/djanmagno/Udacity-Data-Engineer-Nanodegree/tree/master/Notebook-Exercises).

#### **Course 1 – Data Modeling**

In this course, the student will learn to fit the diverse needs of data
consumers, understanding the differences between different data models, and how to choose the
appropriate data model for a given situation. He will also build fluency in PostgreSQL and Apache Cassandra.

**Project 01 - Data Modeling with Postgres**

In this project, the student will model user activity data for a music streaming app called Sparkify. He will create a relational database and ETL pipeline designed to optimize queries for understanding what songs users are listening to. In PostgreSQL he will also define Fact and Dimension tables and insert data into the new tables created.

* Link for Project 01 - [Link](https://github.com/djanmagno/Udacity-Data-Engineer-Nanodegree/tree/master/Project-1-Data-Modeling-with-Postgres)

**Project 02 - Data Modeling with Apache Cassandra**

In these projects, the student will model user activity data for a music streaming app called Sparkify. He will create a database and ETL pipeline, in Apache Cassandra, he will model the data so he can run specific queries provided by the analytics team at Sparkify.

* Link for Project 02 - [Link](https://github.com/djanmagno/Udacity-Data-Engineer-Nanodegree/tree/master/Project-2-Data-Modeling-with-Apache-Cassandra)

## License

[(Back to top)](#table-of-contents)

Distributed under the MIT License. See `LICENSE` for more information.

[MIT License](https://opensource.org/licenses/MIT)

## Contact

Djan Magno - djan.magno@gmail.com

Project Link - [https://github.com/djanmagno/Udacity-Data-Engineer-Nanodegree](https://github.com/djanmagno/Udacity-Data-Engineer-Nanodegree)

## Footer
[(Back to top)](#table-of-contents)

Leave a star in GitHub, give a clap in Medium and share this guide if you found this helpful.

![Footer](images/footer.png)