https://github.com/ayan6943/employee-attrition-prediction-with-machine-learning

Employee Attrition Prediction with Machine Learning | Analyzing HR data to predict employee turnover using Random Forest. Includes EDA, feature engineering, model training, and evaluation. Achieved 90% accuracy.
https://github.com/ayan6943/employee-attrition-prediction-with-machine-learning

attrition employee machine-learning matplotlib numpy pandas python randomforestclassifier scikit-learn seaborn smote

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/ayan6943/employee-attrition-prediction-with-machine-learning
Owner: ayan6943
Created: 2025-03-23T07:09:50.000Z (7 months ago)
Default Branch: main
Last Pushed: 2025-03-23T07:18:56.000Z (7 months ago)
Last Synced: 2025-03-23T08:19:30.196Z (7 months ago)
Topics: attrition, employee, machine-learning, matplotlib, numpy, pandas, python, randomforestclassifier, scikit-learn, seaborn, smote
Language: Jupyter Notebook
Homepage:
Size: 242 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# 🧠 Employee Attrition Prediction with Machine Learning (Random Forest & XGBoost)

This project focuses on predicting employee attrition using two machine learning approaches — **Random Forest** and **XGBoost** — trained on the IBM HR Analytics dataset. It aims to help HR departments proactively identify at-risk employees and develop effective retention strategies.

By incorporating **SHAP explainability** with XGBoost, the project not only achieves high accuracy but also provides transparent insights into why employees may leave.

---

## 📊 Dataset

- **Source**: [IBM HR Analytics Employee Attrition Dataset](https://www.kaggle.com/datasets/pavansubhasht/ibm-hr-analytics-attrition-dataset)
- **Features**: Employee demographics, job role, income, overtime, satisfaction, etc.
- **Target**: `Attrition` (Yes/No)

---

## 🧰 Tools & Technologies

| Component | Tools Used |
|------------------|-------------|
| Language | Python |
| Libraries | Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn, XGBoost, SHAP |
| Models | Random Forest, XGBoost |
| Explainability | SHAP (for XGBoost) |
| Imbalance Handling | SMOTE |

---

## 🧪 Project Workflow

1. **EDA & Data Cleaning**
- Visualized attrition patterns by role, overtime, satisfaction
- Removed irrelevant/constant columns like `StandardHours`
- Handled missing values and categorical variables

2. **Feature Engineering**
- Identified key predictors
- Encoded categorical variables and scaled numerical ones

3. **Modeling**
- Trained both **Random Forest** and **XGBoost**
- Addressed class imbalance using **SMOTE**
- Evaluated using Accuracy, F1-Score, Precision, Recall, ROC-AUC

4. **Explainability (XGBoost only)**
- Used **SHAP** to visualize feature importance and explain individual predictions

---

## 📈 Model Performance Comparison

| Metric | Random Forest | XGBoost + SHAP |
|---------------|---------------|----------------|
| Accuracy | 90% | **92%** |
| Precision | 0.88 / 0.93 | **0.91 / 0.92** |
| Recall | 0.93 / 0.87 | **0.93 / 0.91** |
| F1-Score | 0.90 | **0.92** |
| ROC-AUC | 0.90 | **0.971** ✅ |

> 🚀 **XGBoost** outperformed Random Forest, especially in ROC-AUC and class balance, making it more reliable for real-world deployment.

---

## 🔍 SHAP Explainability (XGBoost)

- **SHAP Summary Plot**: Visualizes global feature importance
- Most influential features:
- `OverTime`, `MonthlyIncome`, `JobSatisfaction`, `JobRole`

---

## 📄 License

This project is licensed under the MIT License.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ayan6943/employee-attrition-prediction-with-machine-learning

Awesome Lists containing this project

README