{"id":29287845,"url":"https://github.com/ifte-13/early-stage-brain-stroke-detection","last_synced_at":"2025-07-06T02:06:30.833Z","repository":{"id":289346268,"uuid":"954135492","full_name":"IFTE-13/Early-Stage-Brain-Stroke-Detection","owner":"IFTE-13","description":"Predictive Analysis \u0026 Early Detection of Brain stroke using Machine Learning Algorithm","archived":false,"fork":false,"pushed_at":"2025-04-22T19:43:11.000Z","size":2471,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-22T20:35:05.907Z","etag":null,"topics":["decision-tree","imbalanced-learn","knn","matplotlib","numpy","pandas","random-forest","scikit-learn","seaborn"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/IFTE-13.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-03-24T16:02:26.000Z","updated_at":"2025-04-22T19:43:14.000Z","dependencies_parsed_at":"2025-04-22T20:40:33.483Z","dependency_job_id":"b5c40ae9-ad30-44e7-a6d1-1506bd50281a","html_url":"https://github.com/IFTE-13/Early-Stage-Brain-Stroke-Detection","commit_stats":null,"previous_names":["ifte-13/early-stage-brain-stroke-detection"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/IFTE-13/Early-Stage-Brain-Stroke-Detection","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IFTE-13%2FEarly-Stage-Brain-Stroke-Detection","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IFTE-13%2FEarly-Stage-Brain-Stroke-Detection/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IFTE-13%2FEarly-Stage-Brain-Stroke-Detection/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IFTE-13%2FEarly-Stage-Brain-Stroke-Detection/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/IFTE-13","download_url":"https://codeload.github.com/IFTE-13/Early-Stage-Brain-Stroke-Detection/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IFTE-13%2FEarly-Stage-Brain-Stroke-Detection/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263837431,"owners_count":23517948,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["decision-tree","imbalanced-learn","knn","matplotlib","numpy","pandas","random-forest","scikit-learn","seaborn"],"created_at":"2025-07-06T02:06:28.518Z","updated_at":"2025-07-06T02:06:30.817Z","avatar_url":"https://github.com/IFTE-13.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Predictive Analysis \u0026 Early Detection of Brain stroke using Machine Learning Algorithm\nThis project aims to predict the likelihood of a stroke based on various health and lifestyle factors. The dataset used includes features such as age, average glucose level, BMI, gender, work type, residence type, and smoking status. The project involves exploratory data analysis, data preprocessing, feature engineering, model training, and evaluation.\n\n![Model Pipeline](model.png)\n\n## Dataset\nFind the dataset under the [data.csv](https://github.com/IFTE-13/Early-Stage-Brain-Stroke-Detection/blob/main/data.csv) file.\n### Dataset features:\n- age: Age of the patient.\n- avg_glucose_level: Average glucose level of the patient.\n- bmi: Body Mass Index of the patient.\n- gender: Gender of the patient.\n- work_type: Type of work the patient is engaged in.\n- Residence_type: Type of residence (urban or rural).\n- smoking_status: Smoking status of the patient.\n- stroke: Target variable indicating whether the patient had a stroke (1) or not (0).\n\n## Requirements\n### Libraries\n- pandas\n- numpy\n- matplotlib\n- seaborn\n- scikit-learn\n- imbalanced-learn\n- joblib\n\n## Getting Started\n* Clone the repository\n```bash\ngit clone https://github.com/IFTE-13/stroke-prediction.git\n```\n\n* Navigate to the project directory\n```bash\ncd stroke-prediction\n```\n\n* Run the script:\n```bash\npython src/stroke_prediction.py\n```\n\n* Install the libraries\n```bash\npip install pandas numpy matplotlib seaborn scikit-learn imbalanced-learn joblib\n```\n## About the Model\n### Exploratory Data Analysis (EDA)\n- Checking the dataset shape.\n- Identifying data types.\n- Identifying missing values.\n- Visualizing distributions and relationships using box plots, histograms, pair plots, and class distribution plots.\n\n### Data Preprocessing\n- Handling missing values by filling them with median/mode.\n- Encoding categorical variables using LabelEncoder.\n- Performing feature engineering (e.g., creating new features like bmi_log, bmi_category, glucose_category, age_group).\n- Dropping low-correlation features.\n\n### Principal Component Analysis (PCA)\nThe script applies PCA to reduce dimensionality and visualize the results.\n\n## Model Training\n- K-Nearest Neighbors (KNN)\n- Random Forest\n- Decision Tree\n- Support Vector Machine (SVM)\n  \n**Hyperparameter tuning is performed using RandomizedSearchCV.**\n\n##Model Evaluation\n- Accuracy\n- Precision\n- Recall\n- F1 Score\n- ROC AUC\n- PR AUC\n\n## Results Visualization\nThe script visualizes the performance of each model using bar plots, confusion matrices, ROC curves, and Precision-Recall curves.\n\n## Final Model Selection\nThe best model is selected based on the F1 score and saved using joblib.\n\n## Results\nThe final results are displayed in a table format, and the best model is saved to results/best_stroke_model.pkl.\n\n## Conclusion\nThis project demonstrates a comprehensive approach to predicting stroke likelihood using machine learning. The steps involved include data preprocessing, feature engineering, model training, and evaluation. The best model is selected based on performance metrics, and the results are visualized for clarity.\n\n## License\n\u003e [!CAUTION]\n\u003e This project is licensed under the MIT License. Feel free to use and modify the code as per the terms of the license.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fifte-13%2Fearly-stage-brain-stroke-detection","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fifte-13%2Fearly-stage-brain-stroke-detection","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fifte-13%2Fearly-stage-brain-stroke-detection/lists"}