https://github.com/ayan6943/employee-attrition-prediction-with-machine-learning
Employee Attrition Prediction with Machine Learning | Analyzing HR data to predict employee turnover using Random Forest. Includes EDA, feature engineering, model training, and evaluation. Achieved 90% accuracy.
https://github.com/ayan6943/employee-attrition-prediction-with-machine-learning
attrition employee machine-learning matplotlib numpy pandas python randomforestclassifier scikit-learn seaborn smote
Last synced: 3 months ago
JSON representation
Employee Attrition Prediction with Machine Learning | Analyzing HR data to predict employee turnover using Random Forest. Includes EDA, feature engineering, model training, and evaluation. Achieved 90% accuracy.
- Host: GitHub
- URL: https://github.com/ayan6943/employee-attrition-prediction-with-machine-learning
- Owner: ayan6943
- Created: 2025-03-23T07:09:50.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-03-23T07:18:56.000Z (7 months ago)
- Last Synced: 2025-03-23T08:19:30.196Z (7 months ago)
- Topics: attrition, employee, machine-learning, matplotlib, numpy, pandas, python, randomforestclassifier, scikit-learn, seaborn, smote
- Language: Jupyter Notebook
- Homepage:
- Size: 242 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ๐ง Employee Attrition Prediction with Machine Learning (Random Forest & XGBoost)
This project focuses on predicting employee attrition using two machine learning approaches โ **Random Forest** and **XGBoost** โ trained on the IBM HR Analytics dataset. It aims to help HR departments proactively identify at-risk employees and develop effective retention strategies.
By incorporating **SHAP explainability** with XGBoost, the project not only achieves high accuracy but also provides transparent insights into why employees may leave.
---
## ๐ Dataset
- **Source**: [IBM HR Analytics Employee Attrition Dataset](https://www.kaggle.com/datasets/pavansubhasht/ibm-hr-analytics-attrition-dataset)
- **Features**: Employee demographics, job role, income, overtime, satisfaction, etc.
- **Target**: `Attrition` (Yes/No)---
## ๐งฐ Tools & Technologies
| Component | Tools Used |
|------------------|-------------|
| Language | Python |
| Libraries | Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn, XGBoost, SHAP |
| Models | Random Forest, XGBoost |
| Explainability | SHAP (for XGBoost) |
| Imbalance Handling | SMOTE |---
## ๐งช Project Workflow
1. **EDA & Data Cleaning**
- Visualized attrition patterns by role, overtime, satisfaction
- Removed irrelevant/constant columns like `StandardHours`
- Handled missing values and categorical variables2. **Feature Engineering**
- Identified key predictors
- Encoded categorical variables and scaled numerical ones3. **Modeling**
- Trained both **Random Forest** and **XGBoost**
- Addressed class imbalance using **SMOTE**
- Evaluated using Accuracy, F1-Score, Precision, Recall, ROC-AUC4. **Explainability (XGBoost only)**
- Used **SHAP** to visualize feature importance and explain individual predictions---
## ๐ Model Performance Comparison
| Metric | Random Forest | XGBoost + SHAP |
|---------------|---------------|----------------|
| Accuracy | 90% | **92%** |
| Precision | 0.88 / 0.93 | **0.91 / 0.92** |
| Recall | 0.93 / 0.87 | **0.93 / 0.91** |
| F1-Score | 0.90 | **0.92** |
| ROC-AUC | 0.90 | **0.971** โ |> ๐ **XGBoost** outperformed Random Forest, especially in ROC-AUC and class balance, making it more reliable for real-world deployment.
---
## ๐ SHAP Explainability (XGBoost)
- **SHAP Summary Plot**: Visualizes global feature importance
- Most influential features:
- `OverTime`, `MonthlyIncome`, `JobSatisfaction`, `JobRole`---
## ๐ License
This project is licensed under the MIT License.