Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nickenshidqia/credit_scorecard_model_home_credit_indonesia
Build a machine learning model that can automatically assess loans with goal to predict client’s repayment abilities and speed up inspection filing without spending more money.
https://github.com/nickenshidqia/credit_scorecard_model_home_credit_indonesia
credit-risk credit-score credit-scoring home-credit home-credit-default-risk logistic-regression machine-learning
Last synced: 1 day ago
JSON representation
Build a machine learning model that can automatically assess loans with goal to predict client’s repayment abilities and speed up inspection filing without spending more money.
- Host: GitHub
- URL: https://github.com/nickenshidqia/credit_scorecard_model_home_credit_indonesia
- Owner: nickenshidqia
- Created: 2023-12-10T06:31:32.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2023-12-10T07:32:48.000Z (about 1 year ago)
- Last Synced: 2024-11-05T21:47:19.307Z (about 2 months ago)
- Topics: credit-risk, credit-score, credit-scoring, home-credit, home-credit-default-risk, logistic-regression, machine-learning
- Language: Jupyter Notebook
- Homepage:
- Size: 2.2 MB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Machine Learning Project Credit Scorecard Model (Using Logistic Regression) on Home Credit Indonesia
## Project Description
**Problem :**
The main risk for loan companies is failure to assess credit risk accurately and efficiently. Disadvantage of Manual credit risk assessment :- Subjectivity can introduce bias and inconsistency in decision-making
- Time-consuming especially when dealing with a large number of loan applications.
- Humans errors, such as data entry mistakes, miscalculations, or oversight of important details**Challenges :**
Build a machine learning model that can automatically assess loans## Project Goal
- Predict client’s repayment abilities
- Speed up inspection filing without spending more money## Tools & Library Used
[](https://www.python.org/)
[](https://jupyter.org/)## Project Result
[Click here to get full code](https://github.com/nickenshidqia/Credit_Scorecard_Model_Home_Credit_Indonesia/blob/231cfbdf0f00f1c50a5f480c4622abf3797fe353/Home_Credit_Scorecard_Nicken_Shidqia.ipynb)
## Data Preprocessing
### A. Data Cleaning
There are 40 columns that have null values. And handling missing value by drop feature that have missing value > 50%, and replace missing values on numerical category with median & categorical with mode.
### B. Feature Selection
Split Data Train (80:20).
Categorical & Numerical Selection with criteria :
- Low cardinality (unique)
- No null values
- p-value < 0.05 (using chi square for categorical & ANOVA for numerical)
- Correlation coefficient <= 0.7**Before** Feature Selection = 122 columns
**After** Feature Selection = 16 columns### C. Feature Engineering
**Weight of Evidence (WOE) & Information Value (IV)**
- WOE generally described as a measure of the separation of good and bad customers
- IV helps to rank variables on the basis of their importance.Drop Feature No Needed :
- IV < 0.02, The variable is Not useful for prediction
- IV> 0.5, The variable is Suspicious Predictive Power**Before** Feature Engineering = 16 columns
**After** Feature Engineering = 14 columns## 2 Top Data Visualization & Insight
### A. Clients Repayment Abilities by Gender
- 61.23% customers that do not have payment difficulty are female, and 30.70% are Male
- In UK, Women account for 65% of the home credit industry's customers (Bermeo, 2018)
- Recommendation : Start a campaign to encourage more women to apply for credit### B. Clients Repayment Abilities by Occupation Type
- 23.37% customers that do not have payment difficulty are laborers, then followed by staff and managers.
- Recommendation : Start a campaign to encourage more laborers, staff, and managers to apply for credit## Machine Learning Implementation
### A. Evaluation Score
- Mean AUROC of 0.7304 is generally considered good, indicating that the logistic regression model is effective at distinguishing between the positive and negative classes.
- Based on (Trifonova, 2012) An AUC - ROC 0.7–0.8 is considered good.
- Gini coefficient of 0.4608 indicates a relatively strong separation between the model's performance and random chance.
- It suggests that the logistic regression model has a good discriminatory ability.
- Based on (Teng, 2011) Gini coefficient 0.4 - 0.5 considered big gap.### B. Score Card
- Base (Intercept) = 555
- Min Score = 300 (FICO)
- Max Score = 850 (FICO)### C. Confusion Matrix with Threshold = 0.5
- Precision = Out of all the loan status that the model predicted would get good loan, only 96% actually did.
- Recall = Out of all the loan status that actually did get good loan, the model only predicted this outcome correctly for 68% ofthose loan status.
- F1 Score = 0.8. F1 score of 0.7 or higher is often considered good (spotintelligence.com, 2023)
The accuracy is not really good because we've got 0.68 out of 1
62.46% got correct variable of good loan### D. Confusion Matrix with Best Threshold
- Best threshold = 0.353918 (Using Youden J-Statistic)
- Best threshold is used to minimized the False Positive Rate and maximize the True Positive Rate
- Precision = Out of all the loan status that the model predicted would get good loan, only 94% actually did.
- Recall = Out of all the loan status that actually did get good loan, the model only predicted this outcome correctly for 88% ofthose loan status.
- F1 Score = 0.9. F1 score of 0.7 or higher is often considered good (spotintelligence.com, 2023)
The accuracy increased significantly from 0.68 to 0.84
80.44% got correct variable of good loan.### E. Approval & Rejection Rate
**(Threshold = 0.5)**
- Choosing a 0.5 threshold might mean rejecting a lot of applicants with rejection rate 34%, which could lead to losing business
**(Best Threshold = 0.353918)**
- With best threshold, we've got rejection rate 14%
- So, we've decided to keep our preferred threshold = 0.353918 and Credit Score of 516## Business Recommendation
**Partial Auto Reject & Auto Approve**
- If a submission seems bad, it is rejected right away.
- If a submission appears to be very good, it is accepted immediately.
- If there's uncertainty, it is manually checked by the assessment team.**Create targeted campaign**
- We should launch additional campaigns targeting women, laborers, staff, and managers to encourage them to apply for credit.