Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/rubydamodar/loan-approval-prediction-

Loan approval prediction is a popular machine learning project, especially in the banking and finance industry. The goal of this project is to build a predictive model that can determine whether a loan application will be approved or not based on the applicant's information such as income, credit history, and loan amount.
https://github.com/rubydamodar/loan-approval-prediction-

ai-in-finance banking classification classification-internal credit-risk data-science exploratory-data-analysis feature-engineering financial-analytics loan-approval machine-learning matplotlib pandas predictive-modeling python scikit-learn seaborn visualization

Last synced: 2 months ago
JSON representation

Loan approval prediction is a popular machine learning project, especially in the banking and finance industry. The goal of this project is to build a predictive model that can determine whether a loan application will be approved or not based on the applicant's information such as income, credit history, and loan amount.

Awesome Lists containing this project

README

        

# ๐ŸŒŸ Loan Approval Prediction Project ๐ŸŒŸ

## ๐Ÿ“– Overview

This project aims to develop a predictive model that determines the likelihood of loan approval based on various features of loan applicants. By leveraging machine learning techniques, we analyze historical loan data to predict outcomes for new loan applications.

## ๐Ÿ“ Project Structure

```plaintext
Loan-approval-prediction/
โ”‚
โ”œโ”€โ”€ ๐Ÿ““ LoanApprovalEDA.ipynb # Jupyter Notebook for Exploratory Data Analysis
โ”œโ”€โ”€ ๐Ÿ“ฆ final_model.pkl # Trained model for loan approval predictions
โ”œโ”€โ”€ ๐Ÿ“ฆ loan_approval_model.pkl # Model used for predictions
โ”œโ”€โ”€ ๐Ÿ“Š loan_approval_predictions.csv # Predictions made on new data
โ”œโ”€โ”€ ๐Ÿงช test_Y3wMUE5_7gLdaTN.csv # Test dataset
โ””โ”€โ”€ ๐Ÿ“š train_u6lujuX_CVtuZ9i.csv # Training dataset
```

## ๐Ÿ“Š Data Description

### Training Data
The training dataset consists of 614 entries and the following columns:

| **Column** | **Description** |
|------------------------|-----------------------------------------------------------|
| **Loan_ID** | Unique identifier for each loan |
| **Gender** | Gender of the applicant |
| **Married** | Marital status of the applicant |
| **Dependents** | Number of dependents |
| **Education** | Educational qualification |
| **Self_Employed** | Employment status of the applicant |
| **ApplicantIncome** | Income of the applicant |
| **CoapplicantIncome** | Income of the coapplicant |
| **LoanAmount** | Amount of loan applied for |
| **Loan_Amount_Term** | Duration of the loan in months |
| **Credit_History** | Credit history of the applicant |
| **Property_Area** | Area of residence |
| **Loan_Status** | Approval status (Y/N) |

### Test Data
The test dataset has the same structure as the training dataset but does not contain the **Loan_Status** column.

## ๐Ÿ“ˆ Exploratory Data Analysis (EDA)

- Conducted statistical analysis to understand the distribution of features.
- Visualized the loan approval status distribution, highlighting class imbalance.
- Identified categorical and numerical columns for further processing.

![EDA Visualization](path/to/eda_visualization.png)

## ๐Ÿ” Data Preprocessing

1. **Missing Value Imputation**:
- Imputed missing values in `LoanAmount`, `Loan_Amount_Term`, and `Credit_History` using appropriate strategies.

2. **Encoding Categorical Variables**:
- Converted categorical features into numerical representations using one-hot encoding.

3. **Scaling**:
- Scaled numerical features to normalize their distributions.

4. **Feature Selection**:
- Dropped unnecessary columns such as `Loan_ID` before model training.

## ๐Ÿ—๏ธ Model Building

1. **Train-Test Split**:
- Split the training data into training and validation sets.

2. **Model Selection**:
- Evaluated various models, including Logistic Regression, Decision Tree, and Random Forest.

3. **Hyperparameter Tuning**:
- Utilized GridSearchCV for hyperparameter tuning of the Random Forest model.

4. **Model Evaluation**:
- Evaluated models based on accuracy, precision, recall, and F1-score.

## ๐Ÿ“Š Results

| **Model** | **Accuracy** | **Key Parameters** |
|---------------------|---------------|----------------------------------------------------------|
| Logistic Regression | 79% | - |
| Decision Tree | 68% | - |
| Random Forest | 78% | `max_depth`: None, `max_features`: 'sqrt', `min_samples_leaf`: 1, `min_samples_split`: 10, `n_estimators`: 50 |

**Final Model Accuracy**: 78.86%

## ๐Ÿš€ Future Work

- **Address Class Imbalance**: Implement techniques to address class imbalance to improve model performance.
- **Feature Engineering**: Explore additional features that could enhance prediction accuracy.
- **Deployment**: Create a web application for real-time predictions of loan approvals.

## ๐Ÿ› ๏ธ Usage

To use the trained model for making predictions, load the `final_model.pkl` and preprocess your input data similarly to the training data preprocessing steps.

```python
import pandas as pd
import joblib

# Load model
model = joblib.load('final_model.pkl')

# Preprocess new data
new_data_processed = preprocess(new_data)

# Make predictions
predictions = model.predict(new_data_processed)
```

## ๐Ÿ“ Acknowledgments

- [Pandas](https://pandas.pydata.org/) for data manipulation
- [Scikit-learn](https://scikit-learn.org/stable/) for machine learning algorithms
- [Matplotlib](https://matplotlib.org/) and [Seaborn](https://seaborn.pydata.org/) for data visualization