https://github.com/aravinda-1402/flight_delay_prediction_using_machine_learning
This research investigates flight delay trends, examining departure time, airline, and airport factors. Regression machine learning meth- ods are utilized to predict delay contributions from various sources. Time-series models, including LSTM, Hybrid LSTM, and Bi-LSTM, are compared with baseline regression models such as Multiple Regression, Decisi
https://github.com/aravinda-1402/flight_delay_prediction_using_machine_learning
bilstm delay lstm lstm-neural-networks prediction regression timeseries-forecasting
Last synced: about 1 month ago
JSON representation
This research investigates flight delay trends, examining departure time, airline, and airport factors. Regression machine learning meth- ods are utilized to predict delay contributions from various sources. Time-series models, including LSTM, Hybrid LSTM, and Bi-LSTM, are compared with baseline regression models such as Multiple Regression, Decisi
- Host: GitHub
- URL: https://github.com/aravinda-1402/flight_delay_prediction_using_machine_learning
- Owner: aravinda-1402
- License: apache-2.0
- Created: 2024-04-20T23:47:59.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-08-08T14:09:17.000Z (about 1 year ago)
- Last Synced: 2025-03-12T16:17:52.309Z (7 months ago)
- Topics: bilstm, delay, lstm, lstm-neural-networks, prediction, regression, timeseries-forecasting
- Language: Jupyter Notebook
- Homepage:
- Size: 6.56 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Deciphering Air Travel Disruptions: A Machine Learning Approach ✈️
📄 **Paper link:** https://doi.org/10.48550/arXiv.2408.02802
## Conda Environment 📦
Provided as `conda_environment.yml`. Create using:
```bash
conda env create -f conda_environment.yml
```
(source: [Managing Conda Environments](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html))⚠️ **NOTE**: There may be differences in models produced with this environment as the models reported were trained in Google Colab. The reported time-series models can be found under `models` as `bilstm_model.h5`, `lstm_model.h5`, and `hybrid_model.h5`.
## Notebooks 📓
### `src/preprocessing.ipynb`
Notebook for preprocessing.Dataset sourced from: [Flight Delay and Cancellation Dataset 2019-2023](https://www.kaggle.com/datasets/patrickzel/flight-delay-and-cancellation-dataset-2019-2023?select=flights_sample_3m.csv)
Creates processed label-encoded data and processed one-hot-encoded data. Splits data into train and test sets. Saves the data in CSV format. One-hot-encoded data was experimented with but ultimately not chosen due lacking computational power to handle a massive increase in features.
### `src/baseline_models.ipynb`
Trains and evaluates baseline regression models: Linear Regression, Decision Tree, Random Forest, Neural Network, XG Boost.### `src/build_lstm_data.ipynb`
Creates datasets for time series models:
1. Combines `X_train` and `X_test`
2. Sorts by date-time
3. Temporally splits the dataThis is necessary because datasets used for the baseline models are shuffled.
### `src/time_series_models.ipynb`
Trains the time-series models.### `src/evaluate_time_series_models.ipynb`
Evaluates the time-series models.