https://github.com/tom474/data-pipeline-with-aws
[RMIT 2024C] EEET2574 - Big Data for Engineering - Group Project
https://github.com/tom474/data-pipeline-with-aws
aws data-engineering data-science data-visualization machine-learning mongodb python spark
Last synced: about 1 month ago
JSON representation
[RMIT 2024C] EEET2574 - Big Data for Engineering - Group Project
- Host: GitHub
- URL: https://github.com/tom474/data-pipeline-with-aws
- Owner: tom474
- Created: 2025-02-16T15:59:24.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2025-02-19T04:08:41.000Z (12 months ago)
- Last Synced: 2025-02-19T05:20:51.717Z (12 months ago)
- Topics: aws, data-engineering, data-science, data-visualization, machine-learning, mongodb, python, spark
- Language: Jupyter Notebook
- Homepage:
- Size: 3.44 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data Pipeline with AWS
This project builds a **data pipeline** using **AWS services** for real-time and batch data processing, integrating **machine learning** for predictive analytics.
## Tech Stack
- Python
- AWS S3
- AWS EC2
- AWS Kinesis
- AWS Lambda
- AWS SageMaker
- MongoDB
## System Architecture

## Features
### Streaming Data ETL Pipeline
- Real-time weather data ingestion from **OpenWeatherMap API**.
- Data processing using **AWS EC2, Kinesis, and Lambda**.
- Data storage in **MongoDB** for real-time analytics.
### Historical Data & Testing Data ETL Pipeline
- Integration of **U.S. Civil Flights & Weather Meteo datasets**.
- Data cleaning and transformation using **AWS SageMaker**.
- Processed data stored in **AWS S3** for analysis.
### Prediction Model Data Pipeline
- Training dataset creation using **historical flight & weather data**.
- Model training in **AWS SageMaker** using **Gradient Boosting, Decision Tree, and Random Forest**.
- Predictions generated for real-time flight delay forecasting.
### Visualization Data Pipeline
- Processed data stored in **MongoDB** for visualization.
- Dashboards built using **MongoDB Charts** for insights:
- [LAX Overview Dashboard](https://charts.mongodb.com/charts-big-data-for-engineering-lzggfsp/public/dashboards/2b54cb6f-21e0-4cba-b024-a898e3606f0f?fbclid=IwZXh0bgNhZW0CMTAAAR3SBTXZwQnorRUTwApbq4Z5FwMAzy2O3D82_Zm7hFOw6NYDMW2UijQFqeI_aem_TsqlkYcRpEPGmuLCP3pdYw)
- [LAX Weather Impact Dashboard](https://charts.mongodb.com/charts-big-data-for-engineering-lzggfsp/public/dashboards/2240afae-d905-4c0e-ad54-959e0f0a83ab?fbclid=IwZXh0bgNhZW0CMTAAAR3H2n8Hnx7prUGv0Ty__CRx8lBfiO88N0Sa-wPp-ThXvG1m6vdTFWIknnU_aem_ZSH8xQzG7SYU3poAiOjNiw)
- [LAX Performance Benchmark Dashboard](https://charts.mongodb.com/charts-big-data-for-engineering-lzggfsp/public/dashboards/40528c4b-1ef1-4a0b-ae93-d3e20911d42b?fbclid=IwZXh0bgNhZW0CMTAAAR3H2n8Hnx7prUGv0Ty__CRx8lBfiO88N0Sa-wPp-ThXvG1m6vdTFWIknnU_aem_ZSH8xQzG7SYU3poAiOjNiw)
- [LAX Time Series Dashboard](https://charts.mongodb.com/charts-big-data-for-engineering-lzggfsp/public/dashboards/5c7d6348-0dd8-4b77-a39b-18e75e3c3b44?fbclid=IwZXh0bgNhZW0CMTAAAR0kTTF7D3HchOVbwhk-3JTARp5I6U-lBYCHiZ-hWSpHinfgTprVWfLzAy8_aem_oH6mf1EOlemXOgAUZePIPg)
- [LAX Geographical Insights Dashboard](https://charts.mongodb.com/charts-big-data-for-engineering-lzggfsp/public/dashboards/b8929560-fa61-454e-95da-6ecc6190b764?fbclid=IwZXh0bgNhZW0CMTAAAR3lp61_82Zx_S2QHsDcdZRLtbtkJ9FRlw94QLAJRo69vY73N3NN-adq_BA_aem_qhm9l3CXJibgx7RiAskkxg)
- [LAX Aircraft Analysis Dashboard](https://charts.mongodb.com/charts-big-data-for-engineering-lzggfsp/public/dashboards/9f1b8081-1940-4b85-b5f1-84120b0c31c1?fbclid=IwZXh0bgNhZW0CMTAAAR3lp61_82Zx_S2QHsDcdZRLtbtkJ9FRlw94QLAJRo69vY73N3NN-adq_BA_aem_qhm9l3CXJibgx7RiAskkxg)