Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sevilaymuni/project-no.6-tree-based-models
Random Forest Assisted Suggestions for Salifort Motors Employee Retention: Plan, Analyze, Construct and Execute
https://github.com/sevilaymuni/project-no.6-tree-based-models
data-science decision-trees evaluation-metrics gridsearchcv logistic-regression machine-learning matplotlib python random-forest-classifier scikit-learn seaborn-plots
Last synced: about 1 month ago
JSON representation
Random Forest Assisted Suggestions for Salifort Motors Employee Retention: Plan, Analyze, Construct and Execute
- Host: GitHub
- URL: https://github.com/sevilaymuni/project-no.6-tree-based-models
- Owner: SevilayMuni
- Created: 2024-05-27T16:34:39.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2024-09-16T11:07:14.000Z (4 months ago)
- Last Synced: 2024-09-16T12:44:38.561Z (4 months ago)
- Topics: data-science, decision-trees, evaluation-metrics, gridsearchcv, logistic-regression, machine-learning, matplotlib, python, random-forest-classifier, scikit-learn, seaborn-plots
- Language: Jupyter Notebook
- Homepage:
- Size: 5.99 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Random Forest Assisted Suggestions for Salifort Motors Employee Retention
Salifort Motors wants to improve employee retention and learn what determines whether employee leave or stay in the company.
I built logistic regression and tree-based machine learning models. The final random forest model performed with:
precision of 97.7%
recall of 92.5%
accuracy of 98.4%All in all, I created powerful decision tree and random forest models and **uncovered important variables for employee retention**:
satisfaction level of employee
performance review of employee
number of projects employee contributes
tenure
overwork[](https://github.com/SevilayMuni/Project-No.6-XGBoost-RandomForest/blob/main/graphs/Feature-Importances-Random-Forest-ChampionModel.png)
[](https://github.com/SevilayMuni/Project-No.6-XGBoost-RandomForest/blob/main/graphs/Feature-Importances-Decision-Tree.png)## Business Understanding
The HR department at Salifort Motors wants to improve employee satisfaction levels and answer following question: what’s likely to make the employee leave the company?The project goals:
- analyzing the data collected by the HR department
- build a model that predicts whether or not an employee will leave the company
- identify factors that contribute to their leaving#### Since finding, interviewing, and hiring new employees are time-consuming and expensive, increasing employee retention will be beneficial to the company.
## Modeling and Evaluation
#### Identifying the prediction type and model
Modeling purpose is predicting whether an employee leaves the company (categorical outcome variable). The outcome variable can be either 1 (employee left) or 0 (employee stayed).
- It is binary classification taskSince the outcome variable is categorical and task is binary classification, appropriate models:
- Logistic Regression
- Tree-based Machine Learning models#### Model Evaluation Metrics
- AUC: area under the ROC curve; it's also considered the probability that the model ranks a random positive example more highly than a random negative example.
- Precision: measures the proportion of data points predicted as True that are actually True.
- Recall: measures the proportion of data points that are predicted as True, out of all the data points that are actually True. In other words, it measures the proportion of positives that are correctly classified.
- Accuracy: measures the proportion of data points that are correctly classified.
- F1-score: aggregation of precision and recall.## Model Results 🎉
| Model | Precision | Recall | F1 | Accuracy | AUC |
| --- | --- | --- | --- | --- | --- |
| `Decision Tree` | 0.977 | 0.915 | 0.945 | 0.982 | 0.967 |
| `Random Forest` | 0.983 | 0.913 | 0.947 | 0.983 | 0.980 |
| `2nd Decision Tree` | 0.959 | 0.909 | 0.933 | 0.978 | 0.959 |
| `2nd Random Forest` | 0.976 | 0.908 | 0.941 | 0.981 | 0.977 |## Data Understanding
![image1](https://github.com/SevilayMuni/Project-No.6-XGBoost-RandomForest/blob/main/graphs/Number-of-Project-feature-graphs.png)
- Everyone with seven projects left the company, and avg. working hours of this group and those who left with six projects were noticeably higher than any other group.
- Mean working hours increases with number of projects worked.
- There are two groups of employees who left the company: 1) those who worked considerably less than their collegues with the same number of projects 2) those who worked much more. For 1) it’s possible that they were fired. Also, this group might include employees who had already given their notice and were assigned fewer hours. For 2) they probably quit.
- The optimal number of projects for employees to work on is 3; the ratio of left/stayed is very small for this group.![image2](https://github.com/SevilayMuni/Project-No.6-XGBoost-RandomForest/blob/main/graphs/Monthly-Working-Hours-vs-Promotion.png)
- Very few employees were promoted in the last 5 years. In spite of working very long hours, few of those employees were promoted.
- All of the employees who left the company were working the longest hours.
- Employees in the company are overworked.![image3](https://github.com/SevilayMuni/Project-No.6-XGBoost-RandomForest/blob/main/graphs/Satistaction-Level-and-Performance-Evaluation-Boxplots.png)
- The mean and median satisfaction scores of employees who left are lower than those of employees who stayed.![image4](https://github.com/SevilayMuni/Project-No.6-XGBoost-RandomForest/blob/main/graphs/Satisfaction-by-Tenure.png)
- Employees who left can be grouped under 2 categories: 1) Dissatisfied employees w/ shorter tenures 2) Very satisfied employees w/ medium-length tenures.
- Four-year employees who left have an unusually low satisfaction level.
- The longest-tenured employees didn’t leave.
- If an employee has spent more than six years at the company, they stay.- **Leaving** is linked to **long working hours, contributing to many projects, and low satisfaction levels.**
- *Working long hours and not receive promotions or good evaluation scores might chasing employees away*
- Employees are leaving the company as a result of poor management.
## Insights
➢ Cap the number of projects that employees contributes➢ Consider promoting employees who have been working for at least 4 years
➢ Conduct further analysis on why four-year tenured employees are so dissatisfied
➢ Either provide compensation to employees for working longer hours, or don’t require
them to do so➢ If employees aren’t familiar with the company’s overtime pay policies, inform them
about➢ High evaluation scores should not be restricted to employees who work 200+ h/month
➢ Improve performance review process
## Acknowledgements
Data Source:
https://www.kaggle.com/giripujar/hr-analyticsCC0: Public Domain
readme.so
## Variables
Variable |Description |
-----|-----|
satisfaction_level|Employee-reported job satisfaction level [0–1]|
last_evaluation|Score of employee's last performance review [0–1]|
number_project|Number of projects employee contributes to|
average_monthly_hours|Average number of hours employee worked per month|
time_spend_company|How long the employee has been with the company (years)
Work_accident|Whether or not the employee experienced an accident while at work
left|Whether or not the employee left the company
promotion_last_5years|Whether or not the employee was promoted in the last 5 years
Department|The employee's department
salary|The employee's salary (U.S. dollars)