Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/s1dewalker/credit-risk-modeling-in-python
Exploratory data analysis on credit data and risk modeling | Python | SQL
https://github.com/s1dewalker/credit-risk-modeling-in-python
credit-card credit-risk credit-risk-analysis exploratory-data-analysis python risk risk-modelling sql
Last synced: 5 days ago
JSON representation
Exploratory data analysis on credit data and risk modeling | Python | SQL
- Host: GitHub
- URL: https://github.com/s1dewalker/credit-risk-modeling-in-python
- Owner: s1dewalker
- Created: 2024-10-30T00:45:52.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2024-12-14T13:21:39.000Z (about 2 months ago)
- Last Synced: 2024-12-14T14:22:31.170Z (about 2 months ago)
- Topics: credit-card, credit-risk, credit-risk-analysis, exploratory-data-analysis, python, risk, risk-modelling, sql
- Language: Jupyter Notebook
- Homepage:
- Size: 1.55 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Credit Risk Modeling in Python
![](pics/crr.JPG)## Exploratory data analysis (EDA) on credit data and credit risk modeling
### [Python](https://github.com/s1dewalker/Credit-Risk-Modeling-in-Python/blob/main/credit_risk_modeling-2.ipynb) : EDA + Credit Risk Modeling + Model Validation + Tuning
### [SQL](https://github.com/s1dewalker/Credit-Risk-Modeling-in-Python/blob/main/SQLQuery_cr_loan2.sql) : EDA + Data Cleaning
**EDA**: Exploring the data, `drop_duplicates`, finding anomalies or outliers, handling missing values with `fillna()` or `dropna()`, using `crosstab` for pivot tables.
**Risk Modeling**: Using `RandomForestClassifier` with error metrics like recall, F1-score. Dealing with Underfitting (high training error) and overfitting (testing error >> training error). Validating models with cross validation methods.
## Analysing the 5 Cs of credit
- Character - borrower's credit history / creditworthiness, customer segmentation, demographics, card type, usage
- Capacity - income (history of stable income)
- Capital - savings, invvestments
- Collateral - loan, tenure
- Conditions - purpose of credit, economy, employment type## Error metrics
Confusion matrix:
![](pics/recall.JPG)For credit card data, **recall** is the most important, since we want to minimize false negatives (FN).
Meaning, the actual frauds that were not predicted correctly.