Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/datpham0412/covid19-prediction-model
Machine learning project aimed at predicting new COVID-19 cases using historical COVID-19 and mobility data. The project involves data fetching, migration, preprocessing, exploratory data analysis (EDA), feature engineering, data splitting, model training, and evaluation.
https://github.com/datpham0412/covid19-prediction-model
cmake cplusplus-17 dill googletest jupyter-notebook matplotlib pandas python3 scikit-learn scikitlearn-machine-learning seaborn-python sqlite
Last synced: 3 months ago
JSON representation
Machine learning project aimed at predicting new COVID-19 cases using historical COVID-19 and mobility data. The project involves data fetching, migration, preprocessing, exploratory data analysis (EDA), feature engineering, data splitting, model training, and evaluation.
- Host: GitHub
- URL: https://github.com/datpham0412/covid19-prediction-model
- Owner: datpham0412
- License: mit
- Created: 2024-06-21T04:49:30.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2024-07-05T04:50:24.000Z (7 months ago)
- Last Synced: 2024-07-05T17:37:53.746Z (7 months ago)
- Topics: cmake, cplusplus-17, dill, googletest, jupyter-notebook, matplotlib, pandas, python3, scikit-learn, scikitlearn-machine-learning, seaborn-python, sqlite
- Language: Jupyter Notebook
- Homepage:
- Size: 10.5 MB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# 🦠 Covid 19 Prediction Model
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/datpham0412/Covid19_Prediction_Model/blob/main/LICENSE)
[![GitHub issues](https://img.shields.io/github/issues/datpham0412/Covid19_Prediction_Model)](https://github.com/datpham0412/Covid19_Prediction_Model/issues)
[![GitHub stars](https://img.shields.io/github/stars/datpham0412/Covid19_Prediction_Model)](https://github.com/datpham0412/Covid19_Prediction_Model/stargazers)
[![GitHub forks](https://img.shields.io/github/forks/datpham0412/Covid19_Prediction_Model)](https://github.com/datpham0412/Covid19_Prediction_Model/network/members)## 📋 Project Description
The **Covid 19 Prediction Model** is a comprehensive tool designed to predict the spread and impact of Covid-19 using historical data and advanced statistical techniques. The model leverages multiple data sources, including Covid-19 case data and mobility data, to provide accurate forecasts and insights into the pandemic's trends. The project aims to assist policymakers, healthcare professionals, and the general public in understanding and responding to the ongoing Covid-19 crisis.
## 🛠 Technologies Used
- **Python**: Core programming language for data processing and model training.
- **C++**: For efficient data processing and handling large datasets.
- **SQLite**: Database management for storing and querying data.
- **Pandas**: Data manipulation and analysis.
- **Scikit-learn**: Machine learning library for building predictive models.
- **Matplotlib & Seaborn**: Data visualization.
- **CMake**: Cross-platform build system.
- **Google Test**: Unit testing framework for C++.
- **Dill**: For model serialization in Python.
- **Jupyter Notebook**: For interactive data analysis and visualization.## 📚 Features
- Fetch and preprocess Covid-19 and mobility data from multiple sources.
- Integrate and clean data, ensuring consistency and accuracy.
- Create various date-based, lag, and rolling average features to enhance model performance.
- Train and evaluate machine learning models to predict new Covid-19 cases.
- Visualize actual vs. predicted cases, residuals, and other key metrics to interpret model performance.
- Generate detailed reports and visualizations for data exploration and model results.
- Support for user-defined country data extraction and analysis.## 🚀 Installation and Running the Project
### Prerequisites
- Ensure you have `git` installed for cloning repositories.
- Ensure you have CMake installed and added to your system's PATH.### Steps
1. **Clone the Repository**:
```sh
git clone https://github.com/yourusername/Covid19_Prediction_Model.git
cd Covid19_Prediction_Model
```2. **Install CMake**:
- Download CMake from [here](https://github.com/Kitware/CMake/releases/download/v3.30.0-rc3/cmake-3.30.0-rc3-windows-x86_64.msi)
- Add the CMake binary path (e.g., `C:\Program Files\CMake\bin`) to your environment variables.3. **Clone SQLiteCpp**:
```sh
cd external
git clone https://github.com/SRombauts/SQLiteCpp.git
```4. **Modify SQLiteCpp CMakeLists.txt**:
- Open `CMakeLists.txt` in the `external/SQLiteCpp` folder.
- Change line 388 from:
```cmake
option(SQLITECPP_RUN_CPPLINT "Run cpplint.py tool for Google C++ StyleGuide." ON)
```
to:
```cmake
option(SQLITECPP_RUN_CPPLINT "Run cpplint.py tool for Google C++ StyleGuide." OFF)
```5. **Build the Project**:
```sh
cd ..
mkdir build
cd build
cmake ..
cmake --build . --config Release
```6. **Run the Application**:
```sh
cd Release
Covid19_Prediction.exe
```### Python Dependencies
Install the required Python libraries:
```sh
pip install pandas numpy scikit-learn sqlite3 matplotlib seaborn dill joblib notebook
```## Running the scripts
1. **Fetch Data**
```sh
python scripts/fetch_data.py
```This script fetches COVID-19 and mobility data. Note that this may take up to 10-20 minutes.
2. **Migrate Data**
```sh
python scripts/migrate_data.py
```This script migrates COVID-19 and mobility data for a specified country from the raw datasets to processed CSV files.
3. **Build the project**
```sh
cd ..
mkdir build
cd build
cmake ..
cmake --build . --config Release
cd Release
Covid19_Prediction.exe
```Follow these steps to configure, build, and run the C++ project.
4. **Process Data**
```sh
python scripts/data_processing.py
```This script processes the COVID-19 and mobility data for a specific country provided by the user.
5. **Perform EDA**
```sh
python scripts/eda_visualization.py
```This script performs Exploratory Data Analysis on the processed data.
6. **Feature Engineering**
```sh
python scripts/feature_engineering.py
```This script performs feature engineering on the processed data.
7. **Split Data**
```sh
python scripts/split_data.py
```This script splits the data into training and testing sets.
8. **Model Training**
```sh
python scripts/model_training.py
```This script trains the machine learning model.
9. **Model Evaluation**
```sh
python scripts/model_evaluation.py
```This script evaluates the performance of the trained model.
10. **Interpret Predictions**
```sh
cd notebooks
jupyter notebook
```Open interpret_predictions.ipynb in Jupyter Notebook to visualize and interpret the model's predictions.
## 📷 Screenshots
![CorrelationHeatMatrix](https://github.com/datpham0412/Covid19_Prediction_Model/assets/100574389/0f403266-8b3d-4c3c-ab27-28e8f5963ce1)
![JupyterNotebook2](https://github.com/datpham0412/Covid19_Prediction_Model/assets/100574389/da923b42-92c2-493f-84c4-c6d08ffae03d)
![NewCases_NewDeathsOverTime](https://github.com/datpham0412/Covid19_Prediction_Model/assets/100574389/a51787f3-ef7f-4366-8849-ec641186a30e)
![CorrelationScatterPlot](https://github.com/datpham0412/Covid19_Prediction_Model/assets/100574389/3f1631c3-b8dc-4c8e-8732-92c6166ac63c)
![DistributionNewCases](https://github.com/datpham0412/Covid19_Prediction_Model/assets/100574389/59cddbbc-ca2c-4050-9b85-bd045380cac9)
![JupyterNotebook1](https://github.com/datpham0412/Covid19_Prediction_Model/assets/100574389/dc8a5a02-ab51-4044-98dc-4f17bd901111)
![JupyterNotebook2](https://github.com/datpham0412/Covid19_Prediction_Model/assets/100574389/49d71df3-e149-4238-919c-c7791d5420f3)
![JupyterNotebook3](https://github.com/datpham0412/Covid19_Prediction_Model/assets/100574389/b1db4032-039b-49c5-b804-4a8a780bcca4)
![JupyterNotebook4](https://github.com/datpham0412/Covid19_Prediction_Model/assets/100574389/787fbf82-9a6e-430c-83b7-e474e8af2eff)## 📜 License
This project is licensed under the MIT License - see the [LICENSE](https://github.com/datpham0412/Covid19_Prediction_Model/blob/main/LICENSE)) file for details.
## 📞 Contact
## For any inquiries, please contact [[email protected]](mailto:[email protected]).
Made with ❤️ by [Dat Pham](https://github.com/datpham0412)