Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/fedesgh/credit_risk_clf

Problem statment about modeling target vector and attempt to improve metrics
https://github.com/fedesgh/credit_risk_clf

feature-selection imblearn information-value sklearn

Last synced: about 2 months ago
JSON representation

Problem statment about modeling target vector and attempt to improve metrics

Awesome Lists containing this project

README

        

The motivation for this repository are the difficulties that the dataset present when we define the Target and features.
Also , there are several attempts in kaggle with low metrics particularly when we restrict the training set to features with information before the loan was granted and we want try to improve it:

https://www.kaggle.com/datasets/devanshi23/loan-data-2007-2014/data

## Problems at defining the target

"loan_status" has the followings values:


  1. Current

  2. Fully Paid

  3. Charged Off

  4. Late (31-120 days)

  5. In Grace Period

  6. Does not meet the credit policy. Status:Fully Paid

  7. Late (16-30 days)

  8. Default

  9. Does not meet the credit policy. Status:Charged Off

The main point we must consider is that the values belong to differents moments in the loan life span.

Those that belong to an end of the Loan:


  1. Fully Paid

  2. Charged Off

  3. Does not meet the credit policy. Status:Fully Paid

  4. Default

  5. Does not meet the credit policy. Status:Charged Off

Middle term of a loan:


  1. Current

  2. Late (31-120 days)

  3. Late (16-30 days)

while In Grace Period belongs to the beginning.

On top of this we should consider:


  1. All the loans regardless its end, were previously in time "In Period Grace"

  2. All the loans regardless its end, were previously in time Current and/or Late

FIRST MODEL (STRICT):

"Good loans":


  1. Fully Paid

"Bad loans":


  1. Charged Off

  2. Does not meet the credit policy. Status:Fully Paid

  3. Default

  4. Does not meet the credit policy. Status:Charged Off

We just consider ends of loans categorys in the target, and we should consider only features in X_train set that belong before
the loan was granted.