Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/fedesgh/credit_risk_clf
Problem statment about modeling target vector and attempt to improve metrics
https://github.com/fedesgh/credit_risk_clf
feature-selection imblearn information-value sklearn
Last synced: about 2 months ago
JSON representation
Problem statment about modeling target vector and attempt to improve metrics
- Host: GitHub
- URL: https://github.com/fedesgh/credit_risk_clf
- Owner: Fedesgh
- License: apache-2.0
- Created: 2024-09-11T14:37:47.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2024-09-11T19:59:25.000Z (2 months ago)
- Last Synced: 2024-09-29T09:04:17.421Z (about 2 months ago)
- Topics: feature-selection, imblearn, information-value, sklearn
- Language: Python
- Homepage:
- Size: 44 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
The motivation for this repository are the difficulties that the dataset present when we define the Target and features.
Also , there are several attempts in kaggle with low metrics particularly when we restrict the training set to features with information before the loan was granted and we want try to improve it:https://www.kaggle.com/datasets/devanshi23/loan-data-2007-2014/data
## Problems at defining the target
"loan_status" has the followings values:
- Current
- Fully Paid
- Charged Off
- Late (31-120 days)
- In Grace Period
- Does not meet the credit policy. Status:Fully Paid
- Late (16-30 days)
- Default
- Does not meet the credit policy. Status:Charged Off
The main point we must consider is that the values belong to differents moments in the loan life span.
Those that belong to an end of the Loan:
- Fully Paid
- Charged Off
- Does not meet the credit policy. Status:Fully Paid
- Default
- Does not meet the credit policy. Status:Charged Off
Middle term of a loan:
- Current
- Late (31-120 days)
- Late (16-30 days)
while In Grace Period belongs to the beginning.
On top of this we should consider:
- All the loans regardless its end, were previously in time "In Period Grace"
- All the loans regardless its end, were previously in time Current and/or Late
FIRST MODEL (STRICT):
"Good loans":
- Fully Paid
"Bad loans":
- Charged Off
- Does not meet the credit policy. Status:Fully Paid
- Default
- Does not meet the credit policy. Status:Charged Off
We just consider ends of loans categorys in the target, and we should consider only features in X_train set that belong before
the loan was granted.