{"id":22585839,"url":"https://github.com/ahmed122000/ml_model_deployment","last_synced_at":"2026-05-09T01:12:14.092Z","repository":{"id":236979657,"uuid":"446485509","full_name":"Ahmed122000/ML_model_deployment","owner":"Ahmed122000","description":"The HR Analytics: Job Change Predictor is a Flask-based web application that uses machine learning to predict whether an employee will stay with a company or leave. It allows users to train models, evaluate their performance, and make predictions based on employee data, providing valuable insights for HR decision-making.","archived":false,"fork":false,"pushed_at":"2024-12-25T20:52:53.000Z","size":641,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-28T17:21:18.497Z","etag":null,"topics":["classification","flask","machine-learning","python3","rest-api","scikit-learn"],"latest_commit_sha":null,"homepage":"","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Ahmed122000.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-01-10T15:49:29.000Z","updated_at":"2024-12-25T20:52:56.000Z","dependencies_parsed_at":"2025-03-28T17:31:50.460Z","dependency_job_id":null,"html_url":"https://github.com/Ahmed122000/ML_model_deployment","commit_stats":null,"previous_names":["ahmed122000/ml_model_deployment"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Ahmed122000/ML_model_deployment","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ahmed122000%2FML_model_deployment","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ahmed122000%2FML_model_deployment/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ahmed122000%2FML_model_deployment/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ahmed122000%2FML_model_deployment/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Ahmed122000","download_url":"https://codeload.github.com/Ahmed122000/ML_model_deployment/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ahmed122000%2FML_model_deployment/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32803650,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-08T08:22:46.396Z","status":"ssl_error","status_checked_at":"2026-05-08T08:22:45.650Z","response_time":54,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["classification","flask","machine-learning","python3","rest-api","scikit-learn"],"created_at":"2024-12-08T07:09:44.115Z","updated_at":"2026-05-09T01:12:14.086Z","avatar_url":"https://github.com/Ahmed122000.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🧠 ML Model Deployment - HR Analytics Job Change Predictor\n\n[![Python](https://img.shields.io/badge/Python-3.8%2B-blue?style=flat-square\u0026logo=python)](https://www.python.org/)\n[![Flask](https://img.shields.io/badge/Flask-2.0-black?style=flat-square\u0026logo=flask)](https://flask.palletsprojects.com/)\n[![scikit-learn](https://img.shields.io/badge/scikit--learn-1.0-orange?style=flat-square\u0026logo=scikit-learn)](https://scikit-learn.org/)\n[![Pandas](https://img.shields.io/badge/Pandas-1.3-green?style=flat-square\u0026logo=pandas)](https://pandas.pydata.org/)\n[![License](https://img.shields.io/badge/License-MIT-black?style=flat-square)](LICENSE)\n\n\u003e A production-ready Flask web application that predicts whether a data scientist will stay with a company or leave. Features machine learning model training, evaluation, and interactive predictions with data balancing techniques.\n\n---\n\n## 📑 Table of Contents\n\n- [Overview](#-overview)\n- [Features](#-features)\n- [Tech Stack](#-tech-stack)\n- [Project Structure](#-project-structure)\n- [Installation](#-installation)\n- [Usage](#-usage)\n- [Machine Learning Models](#-machine-learning-models)\n- [Dataset](#-dataset)\n- [Results \u0026 Performance](#-results--performance)\n- [API Endpoints](#-api-endpoints)\n- [Deployment](#-deployment)\n- [License](#-license)\n\n---\n\n## 📊 Overview\n\nThis project builds a predictive model to determine whether data scientists will remain with their current employer or leave for better opportunities. The application provides:\n\n- **Multiple ML algorithms** comparison\n- **Data balancing** techniques (oversampling, undersampling)\n- **Interactive training interface** for experimentation\n- **Real-time predictions** on new employee data\n- **Detailed evaluation metrics** and classification reports\n\n**Business Value**: HR departments can identify at-risk employees and implement retention strategies.\n\n---\n\n## ✨ Features\n\n### 🤖 Machine Learning Capabilities\n\n| Feature | Description |\n|---------|-------------|\n| **Multiple Algorithms** | Logistic Regression, K-Nearest Neighbors (KNN), Support Vector Machine (SVM) |\n| **Data Balancing** | Handle imbalanced classes with oversampling, undersampling, SMOTE |\n| **Cross-Validation** | K-Fold validation for robust model evaluation |\n| **Hyperparameter Tuning** | GridSearchCV for optimal parameters |\n| **Model Persistence** | Save/load trained models with joblib |\n| **Feature Scaling** | StandardScaler for optimal algorithm performance |\n\n### 📈 Analysis \u0026 Reporting\n\n| Feature | Description |\n|---------|-------------|\n| **Classification Metrics** | Precision, Recall, F1-Score, Accuracy, AUC-ROC |\n| **Confusion Matrix** | Visual confusion matrix visualization |\n| **Train/Test Scores** | Detailed performance on training and test sets |\n| **Classification Report** | Per-class precision, recall, F1-score |\n| **Feature Importance** | Identify most influential features |\n| **ROC Curves** | Receiver Operating Characteristic analysis |\n\n### 🎯 Prediction Features\n\n| Feature | Description |\n|---------|-------------|\n| **Batch Predictions** | Predict on multiple employees at once |\n| **Confidence Scores** | Probability of staying vs leaving |\n| **Feature-wise Explanation** | Understand prediction reasoning |\n| **Historical Comparisons** | Track prediction accuracy over time |\n\n### 🖥️ User Interface\n\n| Feature | Description |\n|---------|-------------|\n| **Interactive Dashboard** | Real-time model performance visualization |\n| **Model Comparison** | Compare different algorithms side-by-side |\n| **Training History** | Track all trained models and their metrics |\n| **Download Reports** | Export predictions and analysis as CSV/PDF |\n\n---\n\n## 🛠️ Tech Stack\n\n| Component | Technology |\n|-----------|-----------|\n| **Backend** | Python 3.8+, Flask 2.0 |\n| **ML Libraries** | scikit-learn, XGBoost, LightGBM |\n| **Data Processing** | Pandas, NumPy |\n| **Visualization** | Matplotlib, Seaborn, Plotly |\n| **Model Storage** | joblib |\n| **Frontend** | HTML5, CSS3, JavaScript, Bootstrap |\n| **Deployment** | Gunicorn, Docker |\n\n---\n\n## 📂 Project Structure\n\n```plaintext\nml-model-deployment/\n├── main.py                      # Flask application entry point\n├── train.py                     # Model training logic\n├── predict.py                   # Prediction logic\n├── evaluate.py                  # Model evaluation\n├── data_processor.py            # Data loading \u0026 preprocessing\n├── config.py                    # Configuration settings\n│\n├── requirements.txt             # Python dependencies\n├── Dockerfile                   # Container configuration\n├── docker-compose.yml           # Multi-container setup\n│\n├── data/                        # Training datasets\n│   ├── normal_data.csv         # Original balanced data\n│   ├── oversample.csv          # Oversampled data\n│   └── undersample_data.csv    # Undersampled data\n│\n├── models/                      # Saved trained models\n│   ├── lr_model.pkl            # Logistic Regression\n│   ├── knn_model.pkl           # KNN model\n│   ├── svm_model.pkl           # SVM model\n│   └── scalers/                # Feature scalers\n│\n├── templates/                   # HTML templates\n│   ├── base.html               # Base template\n│   ├── index.html              # Home page\n│   ├── train.html              # Training interface\n│   ├── predict.html            # Prediction interface\n│   ├── results.html            # Results display\n│   └── dashboard.html          # Analytics dashboard\n│\n├── static/                      # Static files\n│   ├── css/\n│   │   ├── style.css           # Custom styling\n│   │   └── bootstrap.min.css\n│   ├── js/\n│   │   ├── script.js           # Client-side logic\n│   │   └── charts.js           # Chart generation\n│   └── images/                 # UI images\n│\n└── README.md                    # This file\n```\n\n---\n\n## 🚀 Installation\n\n### Prerequisites\n\n- Python 3.8 or higher\n- pip (Python package manager)\n- Virtual environment (recommended)\n- 2GB RAM minimum\n\n### Step-by-Step Setup\n\n1. **Clone repository**:\n   ```bash\n   git clone https://github.com/Ahmed122000/ML_model_deployment.git\n   cd ML_model_deployment\n   ```\n\n2. **Create virtual environment**:\n   ```bash\n   python -m venv venv\n   source venv/bin/activate  # On Windows: venv\\Scripts\\activate\n   ```\n\n3. **Install dependencies**:\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n4. **Prepare datasets**:\n   ```bash\n   # Ensure these files exist in data/ directory:\n   # - normal_data.csv\n   # - oversample.csv\n   # - undersample_data.csv\n   ```\n\n5. **Run application**:\n   ```bash\n   python main.py\n   ```\n\n6. **Access application**:\n   ```\n   http://localhost:5000\n   ```\n\n---\n\n## 💡 Usage\n\n### Web Interface Navigation\n\n#### 1️⃣ Home Page\n- Project overview\n- Quick links to train/predict\n- Model statistics\n\n#### 2️⃣ Training Models\n\n**Steps**:\n1. Navigate to \"Train Models\" tab\n2. **Select Dataset**:\n   - Normal (original data)\n   - Oversampled (more minority class samples)\n   - Undersampled (fewer majority class samples)\n3. **Choose Algorithm**:\n   - Logistic Regression\n   - K-Nearest Neighbors (KNN)\n   - Support Vector Machine (SVM)\n4. **Optional**: Adjust hyperparameters\n5. Click \"Train Model\"\n6. View results:\n   - Train/Test scores\n   - Classification report\n   - Confusion matrix\n   - Feature importance\n\n**Training Output**:\n```\nModel: Logistic Regression\nDataset: Oversampled\nTrain Score: 0.8245\nTest Score: 0.7893\nPrecision: 0.8102\nRecall: 0.7654\nF1-Score: 0.7873\n```\n\n#### 3️⃣ Making Predictions\n\n**Steps**:\n1. Navigate to \"Predict\" tab\n2. Fill employee information:\n   - City Development Index (0.0 - 1.0)\n   - Gender (M/F)\n   - Relevant Experience (Yes/No)\n   - Enrolled in University (Yes/No)\n   - Education Level (High School/Bachelor/Master/PhD)\n   - Major Discipline\n   - Experience (years)\n   - Company Size\n   - Company Type\n   - Last New Job (years)\n   - Training Hours\n\n3. Click \"Predict\"\n4. View prediction result:\n   - **Will Stay** or **Will Leave**\n   - Confidence percentage\n   - Feature contributions\n\n#### 4️⃣ Dashboard\n\n- Compare all trained models\n- View training history\n- Analyze feature importance across models\n- Export reports\n\n---\n\n## 🤖 Machine Learning Models\n\n### 1. Logistic Regression\n\n**When to use**: Baseline model, interpretable results\n\n**Parameters**:\n```python\nLogisticRegression(\n    max_iter=1000,\n    random_state=42,\n    class_weight='balanced'\n)\n```\n\n**Pros**:\n- Fast training\n- Highly interpretable\n- Good for linearly separable data\n\n**Cons**:\n- Assumes linear relationship\n- Less effective with complex patterns\n\n---\n\n### 2. K-Nearest Neighbors (KNN)\n\n**When to use**: Non-linear patterns, small-medium datasets\n\n**Parameters**:\n```python\nKNeighborsClassifier(\n    n_neighbors=5,\n    weights='distance',\n    metric='euclidean'\n)\n```\n\n**Pros**:\n- Captures non-linear patterns\n- No training phase\n- Effective for local patterns\n\n**Cons**:\n- Slow prediction time\n- Sensitive to feature scaling\n- Memory intensive\n\n---\n\n### 3. Support Vector Machine (SVM)\n\n**When to use**: High-dimensional data, maximum margin classification\n\n**Parameters**:\n```python\nSVC(\n    kernel='rbf',\n    C=1.0,\n    gamma='scale',\n    probability=True,\n    random_state=42\n)\n```\n\n**Pros**:\n- Effective in high dimensions\n- Robust to outliers\n- Strong theoretical foundation\n\n**Cons**:\n- Slower training\n- Requires feature scaling\n- Hard to interpret\n\n---\n\n### Data Balancing Techniques\n\n#### Original Distribution\n```\nStaying: 75% (majority)\nLeaving: 25% (minority)\n```\n\n#### Oversampling\n```\nRandomly duplicate minority class samples\nResult: 75% vs 75% balanced distribution\n```\n\n#### Undersampling\n```\nRandomly remove majority class samples\nResult: 25% vs 25% balanced distribution\n```\n\n---\n\n## 📊 Dataset\n\n### Features (12 input features)\n\n| Feature | Type | Range/Values | Description |\n|---------|------|--------------|-------------|\n| city_development_index | float | 0.0 - 1.0 | City development level |\n| gender | categorical | M/F | Employee gender |\n| relevant_experience | binary | Yes/No | Has relevant experience |\n| enrolled_university | categorical | Full-time/Part-time/No | University enrollment |\n| education_level | categorical | HS/Bachelor/Master/PhD | Highest education |\n| major_discipline | categorical | STEM/Business/Humanities | Field of study |\n| experience | integer | 0-50 | Years of experience |\n| company_size | categorical | Startup/MNC/Unicorn | Company size |\n| company_type | categorical | IT/Service/Healthcare | Industry type |\n| last_new_job | integer | 0-5 | Years at current job |\n| training_hours | integer | 0-500 | Professional training hours |\n| **target** | **binary** | **0/1** | **0=Stays, 1=Leaves** |\n\n### Dataset Size\n\n- **Total Records**: 19,158 employees\n- **Training Set**: 70% (13,410 records)\n- **Test Set**: 30% (5,748 records)\n- **Missing Values**: \u003c 2% (handled)\n- **Class Imbalance**: 75% vs 25%\n\n### Data Preprocessing\n\n```python\n# Steps applied:\n1. Load CSV data\n2. Handle missing values (mean/mode imputation)\n3. Encode categorical variables (LabelEncoder)\n4. Scale numerical features (StandardScaler)\n5. Split train/test (80/20)\n6. Handle class imbalance (oversample/undersample)\n```\n\n---\n\n## 📈 Results \u0026 Performance\n\n### Model Comparison (on test set)\n\n| Metric | Logistic Regression | KNN (k=5) | SVM (RBF) |\n|--------|-------------------|-----------|----------|\n| **Accuracy** | 78.23% | 76.45% | 79.12% |\n| **Precision** | 0.7891 | 0.7654 | 0.8023 |\n| **Recall** | 0.7456 | 0.7234 | 0.7789 |\n| **F1-Score** | 0.7667 | 0.7440 | 0.7904 |\n| **AUC-ROC** | 0.8234 | 0.8012 | 0.8456 |\n| **Training Time** | 2.3s | 0.5s | 45.2s |\n\n### Best Performing Model: SVM\n- Highest accuracy and F1-score\n- Good balance between precision and recall\n- Acceptable training time\n\n---\n\n## 🔌 API Endpoints\n\n### Flask Routes\n\n| Endpoint | Method | Purpose |\n|----------|--------|---------|\n| `/` | GET | Home page |\n| `/train` | GET, POST | Training interface |\n| `/predict` | GET, POST | Prediction interface |\n| `/results` | GET | View training results |\n| `/dashboard` | GET | Analytics dashboard |\n| `/api/train-model` | POST | Train model (JSON API) |\n| `/api/predict` | POST | Make prediction (JSON API) |\n| `/api/models` | GET | List trained models |\n| `/api/export` | GET | Export results as CSV |\n\n### API Examples\n\n**Train Model**:\n```bash\ncurl -X POST http://localhost:5000/api/train-model \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"algorithm\": \"svm\",\n    \"dataset\": \"oversample\"\n  }'\n```\n\n**Make Prediction**:\n```bash\ncurl -X POST http://localhost:5000/api/predict \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"city_development_index\": 0.92,\n    \"gender\": \"M\",\n    \"relevant_experience\": \"Yes\",\n    \"experience\": 3,\n    \"training_hours\": 40\n  }'\n```\n\n---\n\n## 🐳 Deployment\n\n### Docker Setup\n\n1. **Build image**:\n   ```bash\n   docker build -t ml-predictor:latest .\n   ```\n\n2. **Run container**:\n   ```bash\n   docker run -p 5000:5000 ml-predictor:latest\n   ```\n\n3. **Using Docker Compose**:\n   ```bash\n   docker-compose up\n   ```\n\n### Production Deployment\n\n**Using Gunicorn**:\n```bash\ngunicorn --workers 4 --bind 0.0.0.0:5000 main:app\n```\n\n**On Heroku**:\n```bash\nheroku login\nheroku create ml-predictor\ngit push heroku main\n```\n\n---\n\n## 🧪 Testing\n\n### Run Tests\n```bash\npython -m pytest tests/\n```\n\n### Test Coverage\n- Unit tests for model training\n- Integration tests for API endpoints\n- Data preprocessing tests\n- Prediction accuracy tests\n\n---\n\n## 🐛 Troubleshooting\n\n### Issue: \"ModuleNotFoundError\"\n**Solution**: Install requirements\n```bash\npip install -r requirements.txt\n```\n\n### Issue: \"FileNotFoundError: data files\"\n**Solution**: Ensure CSV files exist in `data/` directory\n\n### Issue: \"Port 5000 already in use\"\n**Solution**: Use different port\n```bash\npython main.py --port 5001\n```\n\n---\n\n## 📈 Future Enhancements\n\n- [ ] Deep learning models (Neural Networks)\n- [ ] Real-time data streaming\n- [ ] Advanced feature engineering\n- [ ] Model explainability (SHAP, LIME)\n- [ ] A/B testing framework\n- [ ] Automated retraining pipeline\n- [ ] Mobile app integration\n- [ ] Multi-language support\n- [ ] Advanced visualization dashboards\n- [ ] REST API v2\n\n---\n\n## 📝 Contributing\n\n1. Fork repository\n2. Create feature branch (`git checkout -b feature/improvement`)\n3. Commit changes (`git commit -m 'Add improvement'`)\n4. Push to branch (`git push origin feature/improvement`)\n5. Open Pull Request\n\n---\n\n## 📄 License\n\nThis project is licensed under the **MIT License** - see [LICENSE](LICENSE) for details.\n\n---\n\n## 🙏 Acknowledgments\n\n- [Kaggle](https://www.kaggle.com/) - HR Analytics dataset\n- [scikit-learn](https://scikit-learn.org/) - ML algorithms\n- [Flask](https://flask.palletsprojects.com/) - Web framework\n- [Pandas](https://pandas.pydata.org/) - Data processing\n\n---\n\n## 👨‍💻 Author\n\n**Ahmed Hesham** - [@Ahmed122000](https://github.com/Ahmed122000)\n\n**Built with ❤️ for HR Analytics \u0026 ML Deployment**\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fahmed122000%2Fml_model_deployment","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fahmed122000%2Fml_model_deployment","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fahmed122000%2Fml_model_deployment/lists"}