https://github.com/syedzaheerabbas/loantap-logistic-regression

A credit risk prediction project for LoanTap using machine learning to classify loan repayment behavior. It focuses on data preprocessing, handling imbalance, and optimizing model performance for real-world lending decisions.
https://github.com/syedzaheerabbas/loantap-logistic-regression

data-balancing eda f1-score logistic-regression precision recall smote visualization

Last synced: 4 months ago
JSON representation

Host: GitHub
URL: https://github.com/syedzaheerabbas/loantap-logistic-regression
Owner: Syedzaheerabbas
Created: 2025-06-18T08:03:09.000Z (4 months ago)
Default Branch: main
Last Pushed: 2025-06-18T08:11:31.000Z (4 months ago)
Last Synced: 2025-06-18T09:23:26.950Z (4 months ago)
Topics: data-balancing, eda, f1-score, logistic-regression, precision, recall, smote, visualization
Language: Jupyter Notebook
Homepage:
Size: 16.7 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

![image](https://github.com/user-attachments/assets/a63fdc21-09e3-4f3f-be5d-64cc0f751e46)

# 💼 LoanTap Credit Risk Modeling Project

## 📌 Introduction

**LoanTap** is a digital lending platform that provides flexible loan products to salaried professionals. With the rise of fintech-driven credit solutions, underwriting accuracy becomes crucial to minimize default risk while ensuring timely loan disbursement. This project builds a predictive model to assess credit risk and assist LoanTap in making data-driven lending decisions.

---

## 🧠 Project Overview

The objective of this project is to develop a machine learning model that predicts whether a borrower is likely to **repay the loan (Fully Paid)** or **default (Charged Off)**. The model supports LoanTap’s credit risk team in automating and improving the efficiency of their underwriting process.

---

## 📊 Dataset

The dataset includes borrower-level and loan-level features such as:

- **Loan Amount**
- **Annual Income**
- **Interest Rate**
- **EMI**
- **Credit Score**
- **Loan Tenure**
- **Purpose of Loan**
- **Employment Details**
- **Repayment Status (Target Variable)**

**Target Variable:**
- `Fully Paid` → 1
- `Charged Off` → 0

The dataset was imbalanced, with a majority of loans marked as “Fully Paid.”

---

## 🔬 Methodology

### 1. Exploratory Data Analysis (EDA)
- Identified skewness and outliers in numeric variables.
- Detected important patterns between features and repayment behavior.
- Handled missing values and ensured clean formatting.

### 2. Data Preprocessing
- Encoded categorical variables.
- Normalized numerical features.
- Addressed data imbalance using:
- **SMOTE (Synthetic Minority Over-sampling Technique)**
- **Class Weighting**

### 3. Model Building
Built multiple Logistic Regression models:
- Baseline Logistic Regression
- Logistic Regression with Class Weights
- Logistic Regression with SMOTE
- SMOTE + Class Weights
- Threshold-tuned model for best F1-score
- Reglurazed model

### 4. Model Evaluation
- Evaluated using **Confusion Matrix**, **F1-Score**, **Precision**, **Recall**, **ROC-AUC**, and **PR Curve**.
- Tuned the classification threshold using F1 optimization to improve performance on minority class.

---

## 📈 Results and Insights

- **Best Model**: Logistic Regression with SMOTE + Class Weighting + Threshold Tuning
- **Key Features Impacting Default Risk**:
- **Zip Code**(Geographical presence)
- High **EMI** relative to income
- Low **Credit Score**
- High **Interest Rate**
- Purpose categories like “Debt Consolidation” showed higher risk
- **F1-score improved** significantly after addressing imbalance and threshold tuning.

---

## ✅ Recommendations

- **Prioritize 36-Month Loan Terms**: Given the higher default rates on 60-month loans, encourage 36-month loans by offering slightly better terms (e.g., lower interest or processing fees) to reduce long-term risk exposure.
- Implement regional risk scoring by incorporating pincode-level default trends. High-risk areas could be subjected to stricter eligibility or additional checks.
Limit Loan Size in Risk Bands
- Incorporate external credit bureau data for enhanced accuracy.
- Regularly retrain the model to account for shifts in applicant behavior and economic conditions.

---

## 🔭 Future Improvements

- Experiment with advanced models like **XGBoost**, **Random Forest**, and **LightGBM**.
- Deploy the model using **Flask** or **Streamlit** to create an interactive loan approval dashboard.
- Integrate explainability tools like **SHAP** or **LIME** for transparent decision-making.
- Monitor model drift and performance using a feedback loop from live loan outcomes.

---

## Colab Notebook
- You can access the full Python analysis on Google Colab using the following link: [View the notebook](https://colab.research.google.com/drive/11MP_rUCVyKrtoH_NQa3tq6GFzMe9Xq8T#scrollTo=WTCNvu7F-D68)

## PDF Report

A detailed analysis report is available in the following PDF file: [View Report](Loan_Tap.pdf).

## Contact

[SYED ZAHEER ABBAS] - [SYEDZAHEER.C@GMAIL.COM]

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/syedzaheerabbas/loantap-logistic-regression

Awesome Lists containing this project

README