{"id":15056729,"url":"https://github.com/nareshk1290/udacity-data-engineering","last_synced_at":"2025-04-10T01:15:34.631Z","repository":{"id":41127110,"uuid":"181604841","full_name":"nareshk1290/Udacity-Data-Engineering","owner":"nareshk1290","description":"Udacity Data Engineering Nano Degree (DEND)","archived":false,"fork":false,"pushed_at":"2020-01-20T22:24:18.000Z","size":1812,"stargazers_count":184,"open_issues_count":1,"forks_count":168,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-04-10T01:15:28.539Z","etag":null,"topics":["airflow","aws","cassandra","etl","postgresql","redshift","s3","spark","star-schema","udacity-dend"],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nareshk1290.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-04-16T03:05:07.000Z","updated_at":"2025-02-23T13:47:33.000Z","dependencies_parsed_at":"2022-07-21T04:49:05.850Z","dependency_job_id":null,"html_url":"https://github.com/nareshk1290/Udacity-Data-Engineering","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nareshk1290%2FUdacity-Data-Engineering","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nareshk1290%2FUdacity-Data-Engineering/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nareshk1290%2FUdacity-Data-Engineering/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nareshk1290%2FUdacity-Data-Engineering/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nareshk1290","download_url":"https://codeload.github.com/nareshk1290/Udacity-Data-Engineering/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248137891,"owners_count":21053775,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airflow","aws","cassandra","etl","postgresql","redshift","s3","spark","star-schema","udacity-dend"],"created_at":"2024-09-24T21:55:53.105Z","updated_at":"2025-04-10T01:15:34.612Z","avatar_url":"https://github.com/nareshk1290.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data Engineering Nanodegree\n\nProjects and resources developed in the [DEND Nanodegree](https://www.udacity.com/course/data-engineer-nanodegree--nd027) from Udacity.\n\n## Project 1: [Relational Databases - Data Modeling with PostgreSQL](https://github.com/nareshk1290/Udacity-Data-Engineering/tree/master/Data-Modeling/Project%201).\nDeveloped a relational database using PostgreSQL to model user activity data for a music streaming app. Skills include:\n* Created a relational database using PostgreSQL\n* Developed a Star Schema database using optimized definitions of Fact and Dimension tables. Normalization of tables.\n* Built out an ETL pipeline to optimize queries in order to understand what songs users listen to.\n\nProficiencies include: Python, PostgreSql, Star Schema, ETL pipelines, Normalization\n\n\n## Project 2: [NoSQL Databases - Data Modeling with Apache Cassandra](https://github.com/nareshk1290/Udacity-Data-Engineering/tree/master/Data-Modeling/Project%202).\nDesigned a NoSQL database using Apache Cassandra based on the original schema outlined in project one. Skills include:\n* Created a nosql database using Apache Cassandra (both locally and with docker containers)\n* Developed denormalized tables optimized for a specific set queries and business needs\n\nProficiencies used: Python, Apache Cassandra, Denormalization\n\n\n## Project 3: [Data Warehouse - Amazon Redshift](https://github.com/nareshk1290/Udacity-Data-Engineering/tree/master/Cloud%20Data%20Warehouse/Project%20Data%20Warehouse%20with%20AWS).\nCreated a database warehouse utilizing Amazon Redshift. Skills include:\n* Creating a Redshift Cluster, IAM Roles, Security groups.\n* Develop an ETL Pipeline that copies data from S3 buckets into staging tables to be processed into a star schema\n* Developed a star schema with optimization to specific queries required by the data analytics team.\n\nProficiencies used: Python, Amazon Redshift, aws cli, Amazon SDK, SQL, PostgreSQL\n\n## Project 4: [Data Lake - Spark](https://github.com/nareshk1290/Udacity-Data-Engineering/tree/master/Data%20Lakes%20with%20Spark/Project%20Data%20Lake%20with%20Spark)\nScaled up the current ETL pipeline by moving the data warehouse to a data lake. Skills include:\n* Create an EMR Hadoop Cluster\n* Further develop the ETL Pipeline copying datasets from S3 buckets, data processing using Spark and writing to S3 buckets using efficient partitioning and parquet formatting.\n* Fast-tracking the data lake buildout using (serverless) AWS Lambda and cataloging tables with AWS Glue Crawler.\n\nTechnologies used: Spark, S3, EMR, Athena, Amazon Glue, Parquet.\n\n## Project 5: [Data Pipelines - Airflow](https://github.com/nareshk1290/Udacity-Data-Engineering/tree/master/Data%20Pipeline%20with%20Airflow/Project%20Data%20Pipeline%20with%20Airflow)\nAutomate the ETL pipeline and creation of data warehouse using Apache Airflow. Skills include:\n* Using Airflow to automate ETL pipelines using Airflow, Python, Amazon Redshift.\n* Writing custom operators to perform tasks such as staging data, filling the data warehouse, and validation through data quality checks.\n* Transforming data from various sources into a star schema optimized for the analytics team's use cases.\n\nTechnologies used: Apache Airflow, S3, Amazon Redshift, Python.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnareshk1290%2Fudacity-data-engineering","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnareshk1290%2Fudacity-data-engineering","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnareshk1290%2Fudacity-data-engineering/lists"}