https://github.com/dizzydroid/asu_seniorproject_ml
An ASU | CSE375: Machine Learning project — COVID-19 Outcome Prediction using different ML models, and finding the optimal model for this classification task.
https://github.com/dizzydroid/asu_seniorproject_ml
classification decision-trees knn-classification logistic-regression machine-learning machine-learning-algorithms naive-bayes-classifier svm
Last synced: about 1 year ago
JSON representation
An ASU | CSE375: Machine Learning project — COVID-19 Outcome Prediction using different ML models, and finding the optimal model for this classification task.
- Host: GitHub
- URL: https://github.com/dizzydroid/asu_seniorproject_ml
- Owner: dizzydroid
- License: mit
- Created: 2024-12-08T10:52:18.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-05T11:46:37.000Z (over 1 year ago)
- Last Synced: 2025-02-13T14:46:27.091Z (over 1 year ago)
- Topics: classification, decision-trees, knn-classification, logistic-regression, machine-learning, machine-learning-algorithms, naive-bayes-classifier, svm
- Language: Jupyter Notebook
- Homepage:
- Size: 1.32 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# COVID-19 Outcome Prediction
## Problem Overview
The goal of this project is to predict the outcome (recovery or death) for individuals infected with COVID-19. The prediction is based on a set of pre-defined symptoms and demographic factors, using time-series data provided by the World Health Organization (WHO).
Given the ongoing global pandemic, early detection of the likely outcome can help healthcare professionals prioritize resources and patient care.
The dataset includes information from January 22, 2020, and provides features such as:
- **Country & Location**
- **Age Group**
- **Gender**
- **Symptoms**
- **History of Visit to Wuhan**
You are tasked with developing classifiers that can accurately predict the outcome for new hospital admissions.
---
## Models & Approach
### 1. **K-Nearest Neighbors (KNN)**
[K-Nearest Neighbors (KNN)](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm):
KNN is a simple, instance-based learning algorithm. It predicts the class of a new sample based on the majority class among its nearest neighbors in the feature space.
### 2. **Logistic Regression**
[Logistic Regression](https://en.wikipedia.org/wiki/Logistic_regression):
Logistic regression is a statistical model that predicts the probability of a binary outcome, using a linear combination of input features.
### 3. **Naïve Bayes**
[Naïve Bayes](https://en.wikipedia.org/wiki/Naive_Bayes_classifier):
A probabilistic classifier based on applying Bayes' theorem, assuming independence between features. It's particularly effective for text classification but can be applied to other types of data as well.
### 4. **Decision Trees**
[Decision Trees](https://en.wikipedia.org/wiki/Decision_tree_learning):
A decision tree is a flowchart-like tree structure used for classification. It splits the data based on feature values to make predictions. It's interpretable and simple to understand.
### 5. **Support Vector Machines (SVM)**
[Support Vector Machines](https://en.wikipedia.org/wiki/Support_vector_machine):
SVM is a powerful classifier that works by finding the hyperplane that best separates data into different classes. It is effective in high-dimensional spaces.
---
## Project Execution
The project consists of several phases:
1. **Data Preprocessing:** The dataset has already been cleaned and preprocessed.
2. **Model Training:** The data is split into training, validation, and test sets. Each model is trained and evaluated.
3. **Hyperparameter Tuning:** For each model, we will tune the hyperparameters to maximize performance.
4. **Model Comparison:** We compare the models based on precision, recall, F1-score, and ROC/AUC curves.
### Key Metrics
- **Precision:** How many predicted positives are actually positive.
- **Recall:** How many actual positives are correctly predicted.
- **F1-Score:** A balanced measure of precision and recall.
- **ROC/AUC:** Measures the model’s ability to distinguish between classes.
---
## Results
At the end of the project, we will have a performance comparison across all models, helping to identify the best-performing classifier for COVID-19 outcome prediction. The model will be chosen based on the highest combined performance across multiple metrics.
---
## Acknowledgments
- World Health Organization (WHO) for the dataset.
- Contributors to the various machine learning algorithms and techniques.
---