https://github.com/a3darekar/big-data-management-project-3
Data Analysis with Spark ML
https://github.com/a3darekar/big-data-management-project-3
Last synced: 2 months ago
JSON representation
Data Analysis with Spark ML
- Host: GitHub
- URL: https://github.com/a3darekar/big-data-management-project-3
- Owner: a3darekar
- Created: 2022-05-22T21:51:44.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2022-05-22T21:55:12.000Z (about 3 years ago)
- Last Synced: 2025-01-22T16:46:15.646Z (4 months ago)
- Language: Jupyter Notebook
- Size: 512 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data analysis with Spark ML
The project aims to analyze the US domestic flight dataset using PySpark Dataframes and predict which flight/flight carrier is most likely to be canceled or delayed.
Dataset can be found [here](https://www.kaggle.com/yuanyuwendymu/airline-delay-and-cancellation-data-2009-2018/data)