https://github.com/d4rk-lucif3r/ana-ml-demo
https://github.com/d4rk-lucif3r/ana-ml-demo
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/d4rk-lucif3r/ana-ml-demo
- Owner: d4rk-lucif3r
- Created: 2025-01-06T09:37:21.000Z (5 months ago)
- Default Branch: master
- Last Pushed: 2025-01-06T09:54:57.000Z (5 months ago)
- Last Synced: 2025-03-01T13:55:32.517Z (4 months ago)
- Language: Python
- Size: 1.24 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Healthcare Camp Attendance Predictor 🏥
## 👋 Created by Ana
[Ana](https://openana.ai) is a sophisticated AI Software Engineer with extensive expertise in software engineering, machine learning, and data science. With a friendly and supportive approach, Ana helps developers and data scientists build better applications and gain deeper insights from their data.
Visit [openana.ai](https://openana.ai) to learn more about Ana's capabilities and how she can help with your projects! 🚀
[](https://ana-ml-demo.streamlit.app/) [](https://www.kaggle.com/datasets/shivan118/healthcare-analytics)
An advanced machine learning application for predicting and analyzing healthcare camp attendance patterns. This project helps MedCamp optimize their healthcare camp resources and improve attendance rates through sophisticated ML predictions and comprehensive insights.
## 🎯 Project Overview
MedCamp organizes health camps in several cities targeting working professionals with low work-life balance. The organization has conducted 65 such events over 4 years, accumulating data from approximately 110,000 registrations. This application addresses a critical challenge: optimizing inventory management while maintaining service quality.
### Business Problem
MedCamp faces two key challenges:
1. **High Drop-off Rate**: Significant difference between registration numbers and actual attendance
2. **Inventory Management**:
- Excess inventory leads to unnecessary costs
- Insufficient inventory results in poor participant experience### Solution Approach
Our application provides:
- Predictive analytics for attendance probability
- Comprehensive data exploration tools
- Multiple ML model options with explainable AI
- Feature importance analysis for better decision-making## 📊 Dataset Details
Source: [Healthcare Analytics Dataset on Kaggle](https://www.kaggle.com/datasets/shivan118/healthcare-analytics)
### Files Structure
1. **Health_Camp_Detail.csv**
- Health_Camp_ID
- Camp_Start_Date
- Camp_End_Date
- Category details2. **Train.csv**
- Registration details
- Patient_ID
- Health_Camp_ID
- Registration_Date
- Anonymized variables3. **Patient_Profile.csv**
- Patient_ID
- Online_Follower
- Social media details
- Income
- Education
- Age
- First_Interaction_Date
- City_Type
- Employer_Category4. **First_Health_Camp_Attended.csv**
- Format 1 camp details
- Donation amounts
- Health_Score5. **Second_Health_Camp_Attended.csv**
- Format 2 camp details
- Health_Score6. **Third_Health_Camp_Attended.csv**
- Format 3 camp details
- Number_of_stall_visited
- Last_Stall_Visited_Number### Camp Formats
1. **Format 1**: Provides instantaneous health score + Donation based
2. **Format 2**: Provides instantaneous health score
3. **Format 3**: Awareness through various health stalls### Target Variable Definition
A "favorable outcome" is defined as:
- **Format 1 & 2**: Patient receives a health score
- **Format 3**: Patient visits at least one stall## 🔧 Technical Implementation
### Tech Stack
#### Frontend
- **Streamlit**: Interactive web interface
- **Plotly**: Dynamic visualizations
- **Matplotlib/Seaborn**: Statistical plots#### Backend/ML
- **Core**: Python 3.8+
- **Data Processing**:
- pandas
- numpy
- scikit-learn
- **ML Models**:
- Random Forest
- XGBoost
- LightGBM
- Logistic Regression
- CatBoost
- **Model Explainability**: SHAP (SHapley Additive exPlanations)### Key Features
1. **Exploratory Data Analysis** 📊
- Temporal attendance patterns
- Demographic analysis
- Registration vs. attendance correlation
- Category-wise performance metrics2. **Model Training Interface** 🤖
- Multiple algorithm options
- Hyperparameter tuning
- Cross-validation
- Performance metrics visualization3. **Feature Analysis** 🎯
- Importance rankings
- Correlation studies
- Interactive feature exploration4. **Model Explainability** 🔍
- SHAP value analysis
- Feature contribution plots
- Individual prediction explanations5. **Model Management** 💾
- Save/load functionality
- Version tracking
- Performance comparison## 🚀 Getting Started
### Prerequisites
```bash
Python 3.8+
pip
git
```### Installation
1. Clone the repository:
```bash
git clone [your-repo-url]
cd [repo-name]
```2. Install dependencies:
```bash
pip install -r requirements.txt
```3. Run the application:
```bash
streamlit run app.py
```## 📱 Usage Guide
### 1. Data Exploration
- View comprehensive camp statistics
- Analyze temporal patterns
- Explore demographic distributions
- Investigate feature correlations### 2. Model Training
- Select relevant features
- Choose ML algorithm
- Configure model parameters
- Train and evaluate performance### 3. Model Interpretation
- Examine global feature importance
- Analyze individual predictions
- Understand model decisions
- Export insights## 📈 Model Evaluation
### Metrics
- Primary: ROC-AUC Score
- Supporting metrics:
- Precision
- Recall
- F1 Score
- Confusion Matrix### Validation Strategy
- Train/Test split based on temporal cutoff
- Camps before March 31st, 2006: Training
- Camps after April 1st, 2006: Testing## 🌟 Live Demo
Experience the application: [Healthcare Camp Predictor](https://ana-ml-demo.streamlit.app/)
### Demo Features
- Interactive data exploration
- Real-time model training
- Dynamic visualizations
- Instant predictions## 🤝 Contributing
We welcome contributions! Please follow these steps:
1. Fork the repository
2. Create your feature branch
3. Commit your changes
4. Push to the branch
5. Create a Pull Request## 📝 License
This project is licensed under the MIT License - see the LICENSE file for details.
## 🙏 Acknowledgments
- Dataset provided by [Kaggle](https://www.kaggle.com/datasets/shivan118/healthcare-analytics)
- MedCamp for the valuable healthcare initiative
- Streamlit for the excellent web framework---
Built with ❤️ by [Ana](https://openana.ai) | [Live Demo](https://ana-ml-demo.streamlit.app/)