Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/fedesgh/building_credit_risk_classifier_using_bagging_kneighbors

Problem statment about modeling target vector and attempt to improve metrics
https://github.com/fedesgh/building_credit_risk_classifier_using_bagging_kneighbors

feature-selection imblearn information-value sklearn

Last synced: about 1 month ago
JSON representation

Problem statment about modeling target vector and attempt to improve metrics

Awesome Lists containing this project

README

        

## Motivation

The motivation for this repository are the difficulties that the dataset present when we define the Target and Features. One of the problems involve **several data leakages**.

There are several attempts in kaggle with **low metrics** particularly when we restrict the training set to features with information before the loan was granted and we want try to improve it:

https://www.kaggle.com/datasets/devanshi23/loan-data-2007-2014/data

We use various data preprocces techniques like **SelectKbest with information value**, **Binning** , **Up-sampling with Imlearn**, **One Hot Encoder** and **Imputers**

## Problems at defining the target

**loan_status** (our target) has the followings values:


  1. Current

  2. Fully Paid

  3. Charged Off

  4. Late (31-120 days)

  5. In Grace Period

  6. Does not meet the credit policy. Status:Fully Paid

  7. Late (16-30 days)

  8. Default

  9. Does not meet the credit policy. Status:Charged Off

**The main point we must consider is that the values belong to differents moments in the loan life span.**

Those that belong to an end of the Loan:


  1. Fully Paid

  2. Charged Off

  3. Does not meet the credit policy. Status:Fully Paid

  4. Default

  5. Does not meet the credit policy. Status:Charged Off

Middle term of a loan:


  1. Current

  2. Late (31-120 days)

  3. Late (16-30 days)

while In Grace Period belongs to the beginning.

On top of this we should consider:

**All the loans regardless its end, were previously in time "In Period Grace"**

**All the loans regardless its end, were previously in time Current and/or Late**

## Our target

"Good loans" **(1)**:


  1. Fully Paid

"Bad loans" **(0)**:


  1. Charged Off

  2. Does not meet the credit policy. Status:Fully Paid

  3. Default

  4. Does not meet the credit policy. Status:Charged Off

We just consider ends of loans categorys in the target, and we should consider only features in X_train set that belong **before**
the loan was granted.

## Result metrics.

![result.jpg](result.jpg)