An open API service indexing awesome lists of open source software.

https://github.com/ayan6943/employee-attrition-prediction-with-machine-learning

Employee Attrition Prediction with Machine Learning | Analyzing HR data to predict employee turnover using Random Forest. Includes EDA, feature engineering, model training, and evaluation. Achieved 90% accuracy.
https://github.com/ayan6943/employee-attrition-prediction-with-machine-learning

attrition employee machine-learning matplotlib numpy pandas python randomforestclassifier scikit-learn seaborn smote

Last synced: 3 months ago
JSON representation

Employee Attrition Prediction with Machine Learning | Analyzing HR data to predict employee turnover using Random Forest. Includes EDA, feature engineering, model training, and evaluation. Achieved 90% accuracy.

Awesome Lists containing this project

README

          

# ๐Ÿง  Employee Attrition Prediction with Machine Learning (Random Forest & XGBoost)

This project focuses on predicting employee attrition using two machine learning approaches โ€” **Random Forest** and **XGBoost** โ€” trained on the IBM HR Analytics dataset. It aims to help HR departments proactively identify at-risk employees and develop effective retention strategies.

By incorporating **SHAP explainability** with XGBoost, the project not only achieves high accuracy but also provides transparent insights into why employees may leave.

---

## ๐Ÿ“Š Dataset

- **Source**: [IBM HR Analytics Employee Attrition Dataset](https://www.kaggle.com/datasets/pavansubhasht/ibm-hr-analytics-attrition-dataset)
- **Features**: Employee demographics, job role, income, overtime, satisfaction, etc.
- **Target**: `Attrition` (Yes/No)

---

## ๐Ÿงฐ Tools & Technologies

| Component | Tools Used |
|------------------|-------------|
| Language | Python |
| Libraries | Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn, XGBoost, SHAP |
| Models | Random Forest, XGBoost |
| Explainability | SHAP (for XGBoost) |
| Imbalance Handling | SMOTE |

---

## ๐Ÿงช Project Workflow

1. **EDA & Data Cleaning**
- Visualized attrition patterns by role, overtime, satisfaction
- Removed irrelevant/constant columns like `StandardHours`
- Handled missing values and categorical variables

2. **Feature Engineering**
- Identified key predictors
- Encoded categorical variables and scaled numerical ones

3. **Modeling**
- Trained both **Random Forest** and **XGBoost**
- Addressed class imbalance using **SMOTE**
- Evaluated using Accuracy, F1-Score, Precision, Recall, ROC-AUC

4. **Explainability (XGBoost only)**
- Used **SHAP** to visualize feature importance and explain individual predictions

---

## ๐Ÿ“ˆ Model Performance Comparison

| Metric | Random Forest | XGBoost + SHAP |
|---------------|---------------|----------------|
| Accuracy | 90% | **92%** |
| Precision | 0.88 / 0.93 | **0.91 / 0.92** |
| Recall | 0.93 / 0.87 | **0.93 / 0.91** |
| F1-Score | 0.90 | **0.92** |
| ROC-AUC | 0.90 | **0.971** โœ… |

> ๐Ÿš€ **XGBoost** outperformed Random Forest, especially in ROC-AUC and class balance, making it more reliable for real-world deployment.

---

## ๐Ÿ” SHAP Explainability (XGBoost)

- **SHAP Summary Plot**: Visualizes global feature importance
- Most influential features:
- `OverTime`, `MonthlyIncome`, `JobSatisfaction`, `JobRole`

---

## ๐Ÿ“„ License

This project is licensed under the MIT License.