https://github.com/steviecurran/credit-risk
Credit Risk Modelling For Dummies: But With Fewer Dummies
https://github.com/steviecurran/credit-risk
binary-classification credit-risk decision-trees finance logistic-regression machine-learning nearest-neighbors suport-vec
Last synced: 8 months ago
JSON representation
Credit Risk Modelling For Dummies: But With Fewer Dummies
- Host: GitHub
- URL: https://github.com/steviecurran/credit-risk
- Owner: steviecurran
- Created: 2023-05-02T21:36:58.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-05-10T22:42:58.000Z (over 2 years ago)
- Last Synced: 2025-01-14T08:27:36.431Z (10 months ago)
- Topics: binary-classification, credit-risk, decision-trees, finance, logistic-regression, machine-learning, nearest-neighbors, suport-vec
- Language: Jupyter Notebook
- Homepage:
- Size: 4.19 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# credit-risk
Credit Risk Modelling For Dummies: But With Fewer Dummies
Standard credit risk analysis utilises scorecards which are built using only datasets with
categorical variables. This requires the continuous numerical features to be fine-classed (grouped
into discrete sets) and converted to dummy variables. Although the point of the scorecard is to
present the model in a simple way, this practice requires much convoluted pre-processing of the
data, which greatly bloats the size of the dataset and makes it more susceptible to containing
errors. Most importantly though, I find that, by retaining the numerical features, the predictive
power of the data is great improved, with a Gini coefficient of 0.95 (cf. 0.40 with fine-classing)
and Kolmogorov-Smirnov statistic of KS = 0.85 (cf. 0.30).
Although all of the processing was done in the python scripts mentioned in report.pdf, a simplified
run-through is included as the Jupyter notebooks PP.ipynb and ML.ipynb