https://github.com/jasontanx/data-engineer-project-1
End-to-end data engineering project
https://github.com/jasontanx/data-engineer-project-1
airline bigquery data-engineering etl-pipeline looker-studio mage-ai
Last synced: 3 months ago
JSON representation
End-to-end data engineering project
- Host: GitHub
- URL: https://github.com/jasontanx/data-engineer-project-1
- Owner: jasontanx
- Created: 2023-05-31T17:09:43.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-07-19T15:26:40.000Z (almost 2 years ago)
- Last Synced: 2025-02-01T02:19:54.759Z (5 months ago)
- Topics: airline, bigquery, data-engineering, etl-pipeline, looker-studio, mage-ai
- Language: Jupyter Notebook
- Homepage:
- Size: 3.39 MB
- Stars: 2
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# data-engineer-project-1

# European Airport & Sentiment Data Engineering Project ✈️
No. | Items | Date Updated
--- | --- | ---
1 | Repo Creation | 31 May 2023
2 | Upload dataset relevant to the project | 31 May 2023
3 | Design project data architecture | 01 June 2023
4 | Develop BQ table with terraform | 03 June 2023
5 | Update Github ETL script code | 05 June 2023
6 | Update requirements.txt file | 06 June 2023
7 | Create conda environment in local - experimentation | 08 June 2023
8 | Minor changes on ETL code due to code error | 11 June 2023
9 | Minor ETL code changes - Error resolved | 12 June 2023
10 | Data successfully ingested into Google Sheet and BigQuery ✅ | 13 June 2023
11 | Start exploring 2nd dataset (airport comment) - sentiment analysis | 15 June 2023
12 | Data pre-processing & develop sentiment analysis | 16 June 2023
13 | Looker Studio chart exploration - Day 1 | 18 June 2023
14 | Looker Studio chart exploration - Day 2 | 20 June 2023
15 | Looker Studio chart exploration - Day 3 | 23 June 2023
16 | Looker Studio chart complete - Day 4 ✅ | 24 June 2023
17 | Explore orchestration tools - Airflow | 26 June 2023
18 | Explore orchestration tools - Mage-ai | 01 July 2023
19 | Update code - Mage-ai + created new repo | 02 July 2023
20 | Update code - Mage-ai + ETL pipeline | 07 July 2023
21 | Finalise all relevant code | 19 July 2023# Data Architecture
The following diagram shows the overall data architecture of the project along with the tools involved.

In summary, the following tools will be involved:
1. Mage.ai --> Workflow orchestration tool
2. GCP BigQuery --> Data warehouse
3. Terraform --> Infrastructure as Code (IaC)
4. Goolge Sheet --> Spreadsheet
5. Looker Studio --> Visualisation & Business Intelligence tool
6. Python --> ETL code# Final Outcome
Data ingested at GSheet below:
https://docs.google.com/spreadsheets/d/1bTc2AKkWewiNziHm9L4RHIhepFIOvYJm6MVerFi6C0Y/edit#gid=0Data ingested at BigQuery:
# Looker Studio
European Airport 2021 - Dashboard
🌟Insights Generated🌟
❇️ **32** European Countries and a total of **101** airports are involved in this dashboard
❇️ There's an increase of **199k** passenger amount from the year 2020 to the year 2021
❇️ **Moscow** has the highest passenger volume by **city** in the year 2021, followed by Paris and London
❇️ **Country** wise, **Russia** has the highest passenger volume, followed by Spain and France
Source:
https://lookerstudio.google.com/reporting/084b0972-c6fd-4d16-87df-5274e9cffc32# Orchestration Tools - mage.ai
- Mage-ai
https://docs.mage.ai/introduction/overview