An open API service indexing awesome lists of open source software.

https://github.com/jebin1999/creditriskmodel

A machine learning project for credit risk prediction using the UCI Default of Credit Card Clients dataset. The model predicts whether a client will default on their credit card payment based on their demographic, payment history, and bill statement data.
https://github.com/jebin1999/creditriskmodel

credit credit-risk creditriskmodeling quantitative-analysis quantitative-finance risk-analysis

Last synced: 3 months ago
JSON representation

A machine learning project for credit risk prediction using the UCI Default of Credit Card Clients dataset. The model predicts whether a client will default on their credit card payment based on their demographic, payment history, and bill statement data.

Awesome Lists containing this project

README

          

# **Predictive Credit Risk Model**

This repository contains a machine learning project for **credit risk prediction** using the **UCI Default of Credit Card Clients dataset**. The model predicts whether a client will default on their credit card payment based on their demographic, payment history, and bill statement data.

---

## **Overview**

Credit risk assessment is crucial for financial institutions to minimize losses. This project utilizes a **Random Forest Classifier** to predict the likelihood of a client defaulting, with results evaluated using metrics like **Accuracy**, **ROC AUC Score**, and **Classification Report**.

---

## **Dataset**

The dataset used is sourced from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients) and contains:

- **30,000 samples** of credit card clients.
- **23 features** including:
- **Demographic information**: `SEX`, `AGE`, `EDUCATION`, `MARRIAGE`
- **Payment history**: `PAY_0` to `PAY_6`
- **Bill statements**: `BILL_AMT1` to `BILL_AMT6`
- **Payment amounts**: `PAY_AMT1` to `PAY_AMT6`
- **Target variable**: `default` (1 = Default, 0 = No Default)

---

## **Workflow**

1. **Data Preprocessing**:
- Filling missing values with column means.
- Standardizing numeric features using `StandardScaler`.
- Encoding categorical variables using `LabelEncoder`.

2. **Class Balance Check**:
- The dataset has an equal distribution of `Default` and `No Default` classes (4673 samples each), ensuring no need for resampling techniques.

3. **Model Training**:
- A **Random Forest Classifier** is trained.
- Hyperparameter tuning performed using `GridSearchCV`.

4. **Model Evaluation**:
- **Accuracy**: 85.4%
- **ROC AUC Score**: 0.924
- Detailed **Classification Report** and **Confusion Matrix** are generated.

5. **Feature Importance**:
- The top predictors of credit default are identified, including `LIMIT_BAL`, `PAY_0`, and `BILL_AMT` features.

---

## **Results**

### Key Metrics:
| Metric | Value |
|-----------------|---------|
| **Accuracy** | 85.4% |
| **ROC AUC** | 0.924 |
| **Precision** | 0.85–0.86 |
| **Recall** | 0.85–0.86 |

### **Confusion Matrix**:
The confusion matrix highlights the prediction performance for both classes:
| **Actual/Predicted** | **No Default** | **Default** |
|-----------------------|----------------|-------------|
| **No Default** | 4024 | 649 |
| **Default** | 711 | 3962 |

---

## **Installation**

To run this project locally, follow these steps:

1. **Clone the Repository**:
```bash
git clone https://github.com//.git
cd
```

2. **Install Dependencies**:
Install the required Python libraries using `pip`:
```bash
pip install -r requirements.txt
```

3. **Run the Jupyter Notebook**:
Open the Jupyter Notebook to explore the code:
```bash
jupyter notebook
```

---

## **Requirements**

- Python 3.8+
- Libraries:
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn
- imbalanced-learn (if SMOTE is applied in future versions)

---

## **Visualizations**

1. **Confusion Matrix**:
![Confusion Matrix](/ConfusionMatrix.png)

2. **Feature Importance**:
![Feature Importance](/FeatureImportance.png)

---

## **Next Steps**

- Compare performance with other models like **XGBoost** and **LightGBM**.
- Deploy the model as an API for real-time predictions.
- Add visualization dashboards for better insights.

---

## **Contributions**

Contributions are welcome! Feel free to fork the repository, create a new branch, and submit a pull request.

---

## **License**

This project is licensed under the [MIT License](https://opensource.org/licenses/MIT).

---

## **Author**

- **Jebin Larosh Jervis**
- Connect with me: [LinkedIn](https://www.linkedin.com/in/jebin-larosh-jervis-a52938123/)