Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/s1dewalker/credit-risk-modeling-in-python

Exploratory data analysis on credit data and risk modeling | Python | SQL
https://github.com/s1dewalker/credit-risk-modeling-in-python

credit-card credit-risk credit-risk-analysis exploratory-data-analysis python risk risk-modelling sql

Last synced: 5 days ago
JSON representation

Exploratory data analysis on credit data and risk modeling | Python | SQL

Host: GitHub
URL: https://github.com/s1dewalker/credit-risk-modeling-in-python
Owner: s1dewalker
Created: 2024-10-30T00:45:52.000Z (3 months ago)
Default Branch: main
Last Pushed: 2024-12-14T13:21:39.000Z (about 2 months ago)
Last Synced: 2024-12-14T14:22:31.170Z (about 2 months ago)
Topics: credit-card, credit-risk, credit-risk-analysis, exploratory-data-analysis, python, risk, risk-modelling, sql
Language: Jupyter Notebook
Homepage:
Size: 1.55 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Credit Risk Modeling in Python
![](pics/crr.JPG)

## Exploratory data analysis (EDA) on credit data and credit risk modeling

### [Python](https://github.com/s1dewalker/Credit-Risk-Modeling-in-Python/blob/main/credit_risk_modeling-2.ipynb) : EDA + Credit Risk Modeling + Model Validation + Tuning

### [SQL](https://github.com/s1dewalker/Credit-Risk-Modeling-in-Python/blob/main/SQLQuery_cr_loan2.sql) : EDA + Data Cleaning

**EDA**: Exploring the data, `drop_duplicates`, finding anomalies or outliers, handling missing values with `fillna()` or `dropna()`, using `crosstab` for pivot tables.

**Risk Modeling**: Using `RandomForestClassifier` with error metrics like recall, F1-score. Dealing with Underfitting (high training error) and overfitting (testing error >> training error). Validating models with cross validation methods.

## Analysing the 5 Cs of credit

- Character - borrower's credit history / creditworthiness, customer segmentation, demographics, card type, usage
- Capacity - income (history of stable income)
- Capital - savings, invvestments
- Collateral - loan, tenure
- Conditions - purpose of credit, economy, employment type

## Error metrics

Confusion matrix:
![](pics/recall.JPG)

For credit card data, **recall** is the most important, since we want to minimize false negatives (FN).

Meaning, the actual frauds that were not predicted correctly.