https://github.com/jebin1999/creditriskmodel
A machine learning project for credit risk prediction using the UCI Default of Credit Card Clients dataset. The model predicts whether a client will default on their credit card payment based on their demographic, payment history, and bill statement data.
https://github.com/jebin1999/creditriskmodel
credit credit-risk creditriskmodeling quantitative-analysis quantitative-finance risk-analysis
Last synced: 3 months ago
JSON representation
A machine learning project for credit risk prediction using the UCI Default of Credit Card Clients dataset. The model predicts whether a client will default on their credit card payment based on their demographic, payment history, and bill statement data.
- Host: GitHub
- URL: https://github.com/jebin1999/creditriskmodel
- Owner: Jebin1999
- Created: 2024-12-16T11:49:42.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-12-16T12:01:30.000Z (10 months ago)
- Last Synced: 2025-05-17T18:08:39.280Z (5 months ago)
- Topics: credit, credit-risk, creditriskmodeling, quantitative-analysis, quantitative-finance, risk-analysis
- Language: Jupyter Notebook
- Homepage:
- Size: 65.4 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# **Predictive Credit Risk Model**
This repository contains a machine learning project for **credit risk prediction** using the **UCI Default of Credit Card Clients dataset**. The model predicts whether a client will default on their credit card payment based on their demographic, payment history, and bill statement data.
---
## **Overview**
Credit risk assessment is crucial for financial institutions to minimize losses. This project utilizes a **Random Forest Classifier** to predict the likelihood of a client defaulting, with results evaluated using metrics like **Accuracy**, **ROC AUC Score**, and **Classification Report**.
---
## **Dataset**
The dataset used is sourced from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients) and contains:
- **30,000 samples** of credit card clients.
- **23 features** including:
- **Demographic information**: `SEX`, `AGE`, `EDUCATION`, `MARRIAGE`
- **Payment history**: `PAY_0` to `PAY_6`
- **Bill statements**: `BILL_AMT1` to `BILL_AMT6`
- **Payment amounts**: `PAY_AMT1` to `PAY_AMT6`
- **Target variable**: `default` (1 = Default, 0 = No Default)---
## **Workflow**
1. **Data Preprocessing**:
- Filling missing values with column means.
- Standardizing numeric features using `StandardScaler`.
- Encoding categorical variables using `LabelEncoder`.2. **Class Balance Check**:
- The dataset has an equal distribution of `Default` and `No Default` classes (4673 samples each), ensuring no need for resampling techniques.3. **Model Training**:
- A **Random Forest Classifier** is trained.
- Hyperparameter tuning performed using `GridSearchCV`.4. **Model Evaluation**:
- **Accuracy**: 85.4%
- **ROC AUC Score**: 0.924
- Detailed **Classification Report** and **Confusion Matrix** are generated.5. **Feature Importance**:
- The top predictors of credit default are identified, including `LIMIT_BAL`, `PAY_0`, and `BILL_AMT` features.---
## **Results**
### Key Metrics:
| Metric | Value |
|-----------------|---------|
| **Accuracy** | 85.4% |
| **ROC AUC** | 0.924 |
| **Precision** | 0.85–0.86 |
| **Recall** | 0.85–0.86 |### **Confusion Matrix**:
The confusion matrix highlights the prediction performance for both classes:
| **Actual/Predicted** | **No Default** | **Default** |
|-----------------------|----------------|-------------|
| **No Default** | 4024 | 649 |
| **Default** | 711 | 3962 |---
## **Installation**
To run this project locally, follow these steps:
1. **Clone the Repository**:
```bash
git clone https://github.com//.git
cd
```2. **Install Dependencies**:
Install the required Python libraries using `pip`:
```bash
pip install -r requirements.txt
```3. **Run the Jupyter Notebook**:
Open the Jupyter Notebook to explore the code:
```bash
jupyter notebook
```---
## **Requirements**
- Python 3.8+
- Libraries:
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn
- imbalanced-learn (if SMOTE is applied in future versions)---
## **Visualizations**
1. **Confusion Matrix**:
2. **Feature Importance**:
---
## **Next Steps**
- Compare performance with other models like **XGBoost** and **LightGBM**.
- Deploy the model as an API for real-time predictions.
- Add visualization dashboards for better insights.---
## **Contributions**
Contributions are welcome! Feel free to fork the repository, create a new branch, and submit a pull request.
---
## **License**
This project is licensed under the [MIT License](https://opensource.org/licenses/MIT).
---
## **Author**
- **Jebin Larosh Jervis**
- Connect with me: [LinkedIn](https://www.linkedin.com/in/jebin-larosh-jervis-a52938123/)