Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/fedesgh/building_credit_risk_classifier_using_bagging_kneighbors
Problem statment about modeling target vector and attempt to improve metrics
https://github.com/fedesgh/building_credit_risk_classifier_using_bagging_kneighbors
feature-selection imblearn information-value sklearn
Last synced: about 1 month ago
JSON representation
Problem statment about modeling target vector and attempt to improve metrics
- Host: GitHub
- URL: https://github.com/fedesgh/building_credit_risk_classifier_using_bagging_kneighbors
- Owner: Fedesgh
- License: apache-2.0
- Created: 2024-09-11T14:37:47.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2024-10-25T21:55:12.000Z (about 2 months ago)
- Last Synced: 2024-11-21T16:15:03.400Z (about 1 month ago)
- Topics: feature-selection, imblearn, information-value, sklearn
- Language: Python
- Homepage:
- Size: 44.1 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## Motivation
The motivation for this repository are the difficulties that the dataset present when we define the Target and Features. One of the problems involve **several data leakages**.
There are several attempts in kaggle with **low metrics** particularly when we restrict the training set to features with information before the loan was granted and we want try to improve it:
https://www.kaggle.com/datasets/devanshi23/loan-data-2007-2014/data
We use various data preprocces techniques like **SelectKbest with information value**, **Binning** , **Up-sampling with Imlearn**, **One Hot Encoder** and **Imputers**
## Problems at defining the target
**loan_status** (our target) has the followings values:
- Current
- Fully Paid
- Charged Off
- Late (31-120 days)
- In Grace Period
- Does not meet the credit policy. Status:Fully Paid
- Late (16-30 days)
- Default
- Does not meet the credit policy. Status:Charged Off
**The main point we must consider is that the values belong to differents moments in the loan life span.**
Those that belong to an end of the Loan:
- Fully Paid
- Charged Off
- Does not meet the credit policy. Status:Fully Paid
- Default
- Does not meet the credit policy. Status:Charged Off
Middle term of a loan:
- Current
- Late (31-120 days)
- Late (16-30 days)
while In Grace Period belongs to the beginning.
On top of this we should consider:
**All the loans regardless its end, were previously in time "In Period Grace"**
**All the loans regardless its end, were previously in time Current and/or Late**
## Our target
"Good loans" **(1)**:
- Fully Paid
"Bad loans" **(0)**:
- Charged Off
- Does not meet the credit policy. Status:Fully Paid
- Default
- Does not meet the credit policy. Status:Charged Off
We just consider ends of loans categorys in the target, and we should consider only features in X_train set that belong **before**
the loan was granted.## Result metrics.
![result.jpg](result.jpg)