Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/fedesgh/building_credit_risk_classifier_using_bagging_kneighbors

Problem statment about modeling target vector and attempt to improve metrics
https://github.com/fedesgh/building_credit_risk_classifier_using_bagging_kneighbors

feature-selection imblearn information-value sklearn

Last synced: about 1 month ago
JSON representation

Problem statment about modeling target vector and attempt to improve metrics

Host: GitHub
URL: https://github.com/fedesgh/building_credit_risk_classifier_using_bagging_kneighbors
Owner: Fedesgh
License: apache-2.0
Created: 2024-09-11T14:37:47.000Z (3 months ago)
Default Branch: main
Last Pushed: 2024-10-25T21:55:12.000Z (about 2 months ago)
Last Synced: 2024-11-21T16:15:03.400Z (about 1 month ago)
Topics: feature-selection, imblearn, information-value, sklearn
Language: Python
Homepage:
Size: 44.1 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

## Motivation

The motivation for this repository are the difficulties that the dataset present when we define the Target and Features. One of the problems involve **several data leakages**.

There are several attempts in kaggle with **low metrics** particularly when we restrict the training set to features with information before the loan was granted and we want try to improve it:

https://www.kaggle.com/datasets/devanshi23/loan-data-2007-2014/data

We use various data preprocces techniques like **SelectKbest with information value**, **Binning** , **Up-sampling with Imlearn**, **One Hot Encoder** and **Imputers**

## Problems at defining the target

**loan_status** (our target) has the followings values:

Current

Fully Paid

Charged Off

Late (31-120 days)

In Grace Period

Does not meet the credit policy. Status:Fully Paid

Late (16-30 days)

Default

Does not meet the credit policy. Status:Charged Off

**The main point we must consider is that the values belong to differents moments in the loan life span.**

Those that belong to an end of the Loan:

Fully Paid

Charged Off

Does not meet the credit policy. Status:Fully Paid

Default

Does not meet the credit policy. Status:Charged Off

Middle term of a loan:

Current

Late (31-120 days)

Late (16-30 days)

while In Grace Period belongs to the beginning.

On top of this we should consider:

**All the loans regardless its end, were previously in time "In Period Grace"**

**All the loans regardless its end, were previously in time Current and/or Late**

## Our target

"Good loans" **(1)**:

Fully Paid

"Bad loans" **(0)**:

Charged Off

Does not meet the credit policy. Status:Fully Paid

Default

Does not meet the credit policy. Status:Charged Off

We just consider ends of loans categorys in the target, and we should consider only features in X_train set that belong **before**
the loan was granted.

## Result metrics.

![result.jpg](result.jpg)