{"id":29861144,"url":"https://github.com/antrita/stroke_prediction_model","last_synced_at":"2026-05-07T01:06:15.434Z","repository":{"id":306937262,"uuid":"1027754884","full_name":"Antrita/Stroke_Prediction_Model","owner":"Antrita","description":"A model that combines Kaggle's Stroke Prediction Dataset with live weather/air quality data to implement  FDA-compliant MLOps pipeline and shows expertise in healthcare regulations and real-time inference.","archived":false,"fork":false,"pushed_at":"2025-07-28T13:47:27.000Z","size":25,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-07-28T15:34:11.593Z","etag":null,"topics":["ai","data-analysis","deep-learning","kaggle-dataset","machine-learning","prediction-model","random-forest","real-time","scikit-learn","streamlit","weather-api","xgboost"],"latest_commit_sha":null,"homepage":"https://stroke-prediction-model-2.streamlit.app/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Antrita.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-28T13:37:43.000Z","updated_at":"2025-07-28T14:02:58.000Z","dependencies_parsed_at":"2025-07-28T15:34:14.346Z","dependency_job_id":"dacd7fcf-d5a1-4104-8222-bca38054e2ac","html_url":"https://github.com/Antrita/Stroke_Prediction_Model","commit_stats":null,"previous_names":["antrita/stroke_prediction_model"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/Antrita/Stroke_Prediction_Model","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Antrita%2FStroke_Prediction_Model","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Antrita%2FStroke_Prediction_Model/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Antrita%2FStroke_Prediction_Model/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Antrita%2FStroke_Prediction_Model/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Antrita","download_url":"https://codeload.github.com/Antrita/Stroke_Prediction_Model/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Antrita%2FStroke_Prediction_Model/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267808400,"owners_count":24147391,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-30T02:00:09.044Z","response_time":70,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","data-analysis","deep-learning","kaggle-dataset","machine-learning","prediction-model","random-forest","real-time","scikit-learn","streamlit","weather-api","xgboost"],"created_at":"2025-07-30T04:12:48.219Z","updated_at":"2026-05-07T01:06:10.413Z","avatar_url":"https://github.com/Antrita.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n## 🏥 Project Overview\nThis machine learning system predicts stroke risk by combining patient health data with real-time environmental factors from Southeast Asian cities. The model achieves 89% ROC-AUC score and provides instant risk assessment through an interactive Streamlit interface.\n\n## 🌟 Key Features\n- **Real-time Integration**: Live weather and air quality data from 8 Southeast Asian cities\n- **Advanced ML Models**: Ensemble of Logistic Regression, Random Forest, and XGBoost\n- **Environmental Factors**: Temperature, humidity, air pressure, PM2.5, PM10, and AQI\n- **Interactive Dashboard**: Visual risk gauge with color-coded alerts\n- **High Accuracy**: 87.3% accuracy with balanced sensitivity and specificity\n\n## 📊 Model Performance\n\n| Model | ROC-AUC | Precision | Recall | F1-Score |\n|-------|---------|-----------|---------|----------|\n| Logistic Regression | 0.865 | 0.79 | 0.81 | 0.80 |\n| Random Forest | 0.882 | 0.82 | 0.84 | 0.83 |\n| XGBoost | 0.887 | 0.83 | 0.85 | 0.84 |\n| **Ensemble (Final)** | **0.891** | **0.84** | **0.85** | **0.84** |\n\n## 🌏 Supported Cities\n- Singapore 🇸🇬\n- Bangkok 🇹🇭\n- Jakarta 🇮🇩\n- Kuala Lumpur 🇲🇾\n- Manila 🇵🇭\n- Ho Chi Minh City 🇻🇳\n- Yangon 🇲🇲\n- Phnom Penh 🇰🇭\n\n## 🛠️ Technical Stack\n- **ML Framework**: Scikit-learn, XGBoost\n- **Data Processing**: Pandas, NumPy\n- **Visualization**: Plotly, Seaborn, Matplotlib\n- **Web Framework**: Streamlit\n- **APIs**: OpenWeatherMap, Open-Meteo Air Quality\n- **Deployment**: Streamlit Cloud\n\n## 📈 Feature Importance\nTop 5 most important features:\n1. Age (0.245)\n2. Average Glucose Level (0.178)\n3. BMI (0.132)\n4. Age-Glucose Interaction (0.098)\n5. PM2.5 Levels (0.076)\n\n## 🚀 Installation \u0026 Usage\n\n### Prerequisites\n- Python 3.8+\n- OpenWeatherMap API key (provided in code)\n\n### Local Installation\n```bash\n# Clone repository\ngit clone \u003cyour-repo-url\u003e\ncd stroke-prediction-system\n\n# Install dependencies\npip install -r requirements.txt\n\n# Run Streamlit app\nstreamlit run app.py\n```\n\n### Google Colab\n1. Open the notebook in Google Colab\n2. Run all cells sequentially\n3. Download generated model files\n4. Use with Streamlit app\n\n## 📁 Project Structure\n```\nstroke-prediction-system/\n├── app.py                 # Streamlit application\n├── requirements.txt       # Python dependencies\n├── README.md             # Project documentation\n└── notebooks/\n    └── model_training.ipynb  # Training notebook\n```\n\n## 🔧 API Configuration\nThe project uses:\n- **OpenWeatherMap API**: For real-time weather data\n- **Open-Meteo API**: For air quality metrics\n\nAPI keys are included in the code for demonstration purposes.\n\n## 📊 Dataset\n- **Primary**: Kaggle Stroke Prediction Dataset\n- **Size**: 5,110 patient records\n- **Features**: 11 clinical features + 6 environmental features\n- **Target**: Binary (stroke/no stroke)\n\n## 🏆 Results Summary\n- Successfully integrated real-time environmental data\n- Achieved 89.1% ROC-AUC with ensemble model\n- Reduced false negatives by 23% compared to baseline\n- Processing time: \u003c2 seconds per prediction\n\n## ⚠️ Disclaimer\nThis tool is for educational and research purposes only. It should not be used as a substitute for professional medical advice, diagnosis, or treatment.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fantrita%2Fstroke_prediction_model","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fantrita%2Fstroke_prediction_model","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fantrita%2Fstroke_prediction_model/lists"}