Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ayaanjawaid/brain_stroke_prediction
project aims to predict the likelihood of a stroke based on various health parameters using machine learning models. The dataset is preprocessed, analyzed, and multiple models are trained to achieve the best prediction accuracy.
https://github.com/ayaanjawaid/brain_stroke_prediction
decision-trees exploratory-data-analysis matplotlib numpy pandas python regression xgboost
Last synced: 17 days ago
JSON representation
project aims to predict the likelihood of a stroke based on various health parameters using machine learning models. The dataset is preprocessed, analyzed, and multiple models are trained to achieve the best prediction accuracy.
- Host: GitHub
- URL: https://github.com/ayaanjawaid/brain_stroke_prediction
- Owner: Ayaanjawaid
- Created: 2023-04-11T08:15:22.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-10-28T22:41:28.000Z (2 months ago)
- Last Synced: 2024-10-28T23:27:12.369Z (2 months ago)
- Topics: decision-trees, exploratory-data-analysis, matplotlib, numpy, pandas, python, regression, xgboost
- Language: Jupyter Notebook
- Homepage:
- Size: 4.52 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Project Overview
This project aims to predict the likelihood of a stroke based on various health parameters using machine learning models. The dataset is preprocessed, analyzed, and multiple models are trained to achieve the best prediction accuracy.## Libraries Used
Pandas: For data manipulation and analysis.
NumPy: For numerical operations.
Matplotlib and Seaborn: For data visualization.
Scikit-learn: For implementing machine learning models.
XGBoost: For the implementation of the XGBoost model.## Data Preprocessing
Data Loading: The dataset is loaded using Pandas.
Data Cleaning: Missing values are handled, and unnecessary columns are removed.
Feature Engineering: New features are created to enhance model performance.
Encoding: Categorical variables are encoded using one-hot encoding.## Exploratory Data Analysis (EDA)
Visualization: Various plots (histograms, bar plots, correlation heatmaps) are used to understand the distribution and relationships of the data.
Statistical Analysis: Summary statistics are computed to gain insights into the dataset.## Model Training
Multiple machine learning models are trained to predict strokes. The models used include:
Logistic Regression
Decision Tree Classifier
Random Forest Classifier
Support Vector Machine (SVM)
K-Nearest Neighbors (KNN)
XGBoost## Model Evaluation
Confusion Matrix: Used to evaluate the performance of the classification models.
Accuracy, Precision, Recall, and F1-Score: Computed for each model to compare their performance.
ROC Curve and AUC Score: Analyzed to understand the models' ability to distinguish between classes.