{"id":22385198,"url":"https://github.com/jasontanx/data-engineer-project-1","last_synced_at":"2026-04-27T21:32:10.301Z","repository":{"id":171382412,"uuid":"647857868","full_name":"jasontanx/data-engineer-project-1","owner":"jasontanx","description":"End-to-end data engineering project ","archived":false,"fork":false,"pushed_at":"2023-07-19T15:26:40.000Z","size":3554,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-26T20:26:06.857Z","etag":null,"topics":["airline","bigquery","data-engineering","etl-pipeline","looker-studio","mage-ai"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jasontanx.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-05-31T17:09:43.000Z","updated_at":"2023-12-27T20:06:49.000Z","dependencies_parsed_at":null,"dependency_job_id":"7a7e690c-a12f-445f-b81f-a8528dedb458","html_url":"https://github.com/jasontanx/data-engineer-project-1","commit_stats":null,"previous_names":["jasontanx/data-engineer-project-1"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/jasontanx/data-engineer-project-1","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jasontanx%2Fdata-engineer-project-1","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jasontanx%2Fdata-engineer-project-1/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jasontanx%2Fdata-engineer-project-1/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jasontanx%2Fdata-engineer-project-1/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jasontanx","download_url":"https://codeload.github.com/jasontanx/data-engineer-project-1/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jasontanx%2Fdata-engineer-project-1/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32356598,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-27T20:07:02.737Z","status":"ssl_error","status_checked_at":"2026-04-27T20:07:00.910Z","response_time":128,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airline","bigquery","data-engineering","etl-pipeline","looker-studio","mage-ai"],"created_at":"2024-12-05T01:22:04.323Z","updated_at":"2026-04-27T21:32:10.285Z","avatar_url":"https://github.com/jasontanx.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# data-engineer-project-1\n\n![git2](https://github.com/jasontanx/data-engineer-project-1/assets/116934441/2455a2dd-28e0-4308-b64b-e9a1a300b70e)\n\n# European Airport \u0026 Sentiment Data Engineering Project ✈️\n\nNo. | Items | Date Updated \n--- | --- | ---\n1 | Repo Creation | 31 May 2023\n2 | Upload dataset relevant to the project | 31 May 2023\n3 | Design project data architecture | 01 June 2023\n4 | Develop BQ table with terraform  | 03 June 2023\n5 | Update Github ETL script code | 05 June 2023\n6 | Update requirements.txt file | 06 June 2023\n7 | Create conda environment in local - experimentation | 08 June 2023\n8 | Minor changes on ETL code due to code error | 11 June 2023\n9 | Minor ETL code changes - Error resolved | 12 June 2023\n10 | Data successfully ingested into Google Sheet and BigQuery ✅ | 13 June 2023\n11 | Start exploring 2nd dataset (airport comment) - sentiment analysis | 15 June 2023\n12 | Data pre-processing \u0026 develop sentiment analysis | 16 June 2023\n13 | Looker Studio chart exploration - Day 1 | 18 June 2023\n14 | Looker Studio chart exploration - Day 2 | 20 June 2023\n15 | Looker Studio chart exploration - Day 3 | 23 June 2023\n16 | Looker Studio chart complete  - Day 4 ✅ | 24 June 2023\n17 | Explore orchestration tools - Airflow | 26 June 2023\n18 | Explore orchestration tools - Mage-ai | 01 July 2023\n19 | Update code - Mage-ai + created new repo | 02 July 2023\n20 | Update code - Mage-ai + ETL pipeline | 07 July 2023\n21 | Finalise all relevant code  | 19 July 2023\n\n# Data Architecture \n\nThe following diagram shows the overall data architecture of the project along with the tools involved.\n\n![data_architecture](https://github.com/jasontanx/data-engineer-project-1/assets/116934441/a900ebd4-2f16-48c3-9991-0dbfb13cebce)\n\nIn summary, the following tools will be involved:\n1. Mage.ai --\u003e Workflow orchestration tool\n2. GCP BigQuery --\u003e Data warehouse\n3. Terraform --\u003e Infrastructure as Code (IaC) \n4. Goolge Sheet --\u003e Spreadsheet\n5. Looker Studio --\u003e Visualisation \u0026 Business Intelligence tool \n6. Python --\u003e ETL code\n\n# Final Outcome\nData ingested at GSheet below:\nhttps://docs.google.com/spreadsheets/d/1bTc2AKkWewiNziHm9L4RHIhepFIOvYJm6MVerFi6C0Y/edit#gid=0\n\nData ingested at BigQuery:\n![de_bq_ss](https://github.com/jasontanx/data-engineer-project-1/assets/116934441/18afe1fb-dd34-44fa-ba85-22d688522c9b)\n\n# Looker Studio\nEuropean Airport 2021 - Dashboard\n\n\u003cimg width=\"899\" alt=\"Screenshot 2023-06-24 at 3 20 30 PM\" src=\"https://github.com/jasontanx/data-engineer-project-1/assets/116934441/f278bf1d-2e3b-4cd4-8ec1-a225451491ad\"\u003e\n\n🌟Insights Generated🌟\n\n❇️ **32** European Countries and a total of **101** airports are involved in this dashboard\n\n❇️ There's an increase of **199k** passenger amount from the year 2020 to the year 2021\n\n❇️ **Moscow** has the highest passenger volume by **city** in the year 2021, followed by Paris and London\n\n❇️ **Country** wise, **Russia** has the highest passenger volume, followed by Spain and France\n\nSource:\nhttps://lookerstudio.google.com/reporting/084b0972-c6fd-4d16-87df-5274e9cffc32\n\n# Orchestration Tools - mage.ai\n\n- Mage-ai\n\nhttps://docs.mage.ai/introduction/overview\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjasontanx%2Fdata-engineer-project-1","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjasontanx%2Fdata-engineer-project-1","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjasontanx%2Fdata-engineer-project-1/lists"}