{"id":15060607,"url":"https://github.com/smohanta23/uber_data-engineering_etl-project","last_synced_at":"2026-01-01T23:45:11.267Z","repository":{"id":253208603,"uuid":"842812356","full_name":"Smohanta23/Uber_Data-Engineering_ETL-Project","owner":"Smohanta23","description":"This project demonstrates a comprehensive data engineering workflow using the Uber information dataset. It covers the full spectrum of data engineering pipelines, from data transformation to deployment on Google Cloud, with a focus on creating a scalable and insightful data model.","archived":false,"fork":false,"pushed_at":"2024-08-15T08:06:30.000Z","size":20563,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-01-21T22:09:56.783Z","etag":null,"topics":["big-data-analytics","bigquery","cloudcomputing","computeengine","dashboard-application","dataengineering","datainsights","datamodelling","datapipeline","datascience","datavisualization","etl-pipeline","gcp-project","googlecloudplatform","mage","opensource","python","uber","uber-api"],"latest_commit_sha":null,"homepage":"https://lookerstudio.google.com/u/0/reporting/f383c480-7dab-461f-b426-8bbb1df2f13d/page/8pqOD?s=nQI06ax2wMY","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Smohanta23.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-15T06:21:13.000Z","updated_at":"2024-08-15T08:06:33.000Z","dependencies_parsed_at":"2024-08-15T08:12:50.226Z","dependency_job_id":null,"html_url":"https://github.com/Smohanta23/Uber_Data-Engineering_ETL-Project","commit_stats":null,"previous_names":["smohanta23/uber_data-engineering_etl-project"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Smohanta23%2FUber_Data-Engineering_ETL-Project","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Smohanta23%2FUber_Data-Engineering_ETL-Project/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Smohanta23%2FUber_Data-Engineering_ETL-Project/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Smohanta23%2FUber_Data-Engineering_ETL-Project/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Smohanta23","download_url":"https://codeload.github.com/Smohanta23/Uber_Data-Engineering_ETL-Project/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243695589,"owners_count":20332629,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["big-data-analytics","bigquery","cloudcomputing","computeengine","dashboard-application","dataengineering","datainsights","datamodelling","datapipeline","datascience","datavisualization","etl-pipeline","gcp-project","googlecloudplatform","mage","opensource","python","uber","uber-api"],"created_at":"2024-09-24T23:01:08.290Z","updated_at":"2026-01-01T23:45:11.224Z","avatar_url":"https://github.com/Smohanta23.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Uber Data Analytics (Data Engineering ETL Project)\n![uber-logo](Uber-logo.jpg)\n\nIn this project, I designed a comprehensive data engineering solution using an Uber dataset to build a robust data model. I implemented data transformation by writing Python scripts to convert flat files into structured fact and dimension tables. The project was deployed on Google Cloud, utilizing Compute Engine for virtual machines, BigQuery for data warehousing, and Data Studio for creating interactive dashboards. Mage, an open-source tool, was employed for seamless data transformation and integration. This hands-on project not only demonstrates practical skills in Python and SQL but also highlights key data engineering concepts such as dimensional modeling and cloud integration for scalable data solutions.\n\n### Step 1: Designing a Process Flow on GCP\n\u003cimg width=\"861\" alt=\"process\" src=\"Process_Flow_GCP.png\"\u003e\n\n### Step 2: Building an ER Diagram for Uber Data-Flow\n![Uber Data Model](Uber_ERD.png)\n\n### Step 3: Analysing the data in Python (Feature Engineering)\nhttps://github.com/UmairThakur/Uber-Data-Analysis-ETL-PIPELINE-DATA-ANALYSIS_PROJECT/assets/81063457/4208599f-c0a2-4747-ae1b-87751a37de6f\n\n## Step 4: Developing the uber project and a bucket on the Google Cloud Platform, extracting the data, selecting the server and setting up the required permissions.\n\u003cimg width=\"711\" alt=\"gcp_start\" src=\"gcp_start.png\"\u003e\n\nNote: Project ID and Project Number are hidden intentionally for copyright issues.\n\n## Step 5: Creating a Virtual Machine Instance in GCP using GCP Compute Engine.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"Google_Compute_Engine-Logo.png\" width=\"300\" alt=\"Compute Engine Logo\"/\u003e\n  \u003cimg src=\"pre-requisites for VM.png\" width=\"300\" alt=\"Pre-requisites for VM\"/\u003e\n  \u003cimg src=\"return-values_VM.png\" width=\"300\" alt=\"Converting Tables to Dictionary in Mage\"/\u003e\n\u003c/p\u003e\n\n## Step 6: Connectting the VM to the Mage Project using SSH Linux Terminal and creating the mage project.\n\u003cimg width=\"944\" alt=\"mage_ai\" src=\"Mage_VM.png\"\u003e\n\n### Step 7: Building a data pipeline with Mage using blocks like data loader, transformer, and exporter (ETL).Incorporate your own extra transformation code into the data transformer, making the necessary adjustments.\n\n### Step 8: After setting up the pipeline, add your GCP credentials to the `io_config.yaml` file. You can obtain these credentials from the APIs and Services section in Google Cloud Console.\n\n### Step 9: Utilize BigQuery to perform ETL operations on the data, making it suitable for analysis such as creating dashboard and generating reports.\nhttps://github.com/UmairThakur/Uber-Data-Analysis-ETL-PIPELINE-DATA-ANALYSIS_PROJECT/assets/81063457/7d03f9af-28c2-405c-a7ea-55dd45cffa1f\n\n### Step 10: Finally, create a dashboard using your preferred dashboarding or reporting tool. I used Google Looker Studio, but you can also opt for other tools like Power BI, Tableau, or Qlik Sense.\n\u003cimg width=\"816\" alt=\"bttom_snap\" src=\"dashboard_snippet_.png\"\u003e\n\n\u003cimg width=\"815\" alt=\"cab_map\" src=\"Cab_Pickup_Locations.png\"\u003e\n\n\u003cimg width=\"816\" alt=\"bttom_snap\" src=\"Charts_distribution_snippet.png\"\u003e\n\n## Have a look at my Uber Dashboard- [https://lookerstudio.google.com/s/nQI06ax2wMY](https://lookerstudio.google.com/u/0/reporting/f383c480-7dab-461f-b426-8bbb1df2f13d/page/8pqOD?s=nQI06ax2wMY)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsmohanta23%2Fuber_data-engineering_etl-project","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsmohanta23%2Fuber_data-engineering_etl-project","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsmohanta23%2Fuber_data-engineering_etl-project/lists"}