https://github.com/dvelkow/loan_prediction_classification

A Classification Model which predicts whether a loan will be approved based on applicant information, it leverages various classification algorithms to do so, featuring data preprocessing, exploratory data analysis, and model comparison to provide insights into the loan approval process
https://github.com/dvelkow/loan_prediction_classification

data-engineering machine-learning portfolio

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/dvelkow/loan_prediction_classification
Owner: dvelkow
Created: 2024-05-26T13:52:26.000Z (12 months ago)
Default Branch: main
Last Pushed: 2024-06-07T02:16:48.000Z (12 months ago)
Last Synced: 2025-02-24T01:11:30.150Z (3 months ago)
Topics: data-engineering, machine-learning, portfolio
Language: Jupyter Notebook
Homepage:
Size: 8.3 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

## Loan Classification Project
Welcome to my loan classification project! This repository contains my work on predicting loan approval outcomes using various machine learning techniques. This project is based on a dataset from Kaggle.

## Table of Contents
- [Overview](#Overview)
- [Dataset Description](#dataset-description)
- [Setup Instructions](#setup-instructions)
- [How to Use](#how-to-use)
- [Approach](#approach)
- [Results](#results)

## Overview
In this project, I aim to predict whether a loan application will be approved based on a set of features provided in the dataset. This involves several steps including data cleaning, exploratory data analysis, feature engineering, model building, and evaluation.

## Dataset Description
The dataset consists of various features related to loan applicants. Here's a quick rundown of the files:

loan.csv: The main dataset containing applicant information and loan approval status.
Data_Dictionary.xlsx: An Excel file that provides detailed descriptions of the features in the dataset.

## Setup Instructions
To get started with the project, you'll need to set up your Python environment with the required libraries. Here's how you can do it:

Clone the repository:
```bash
git clone https://github.com/dvelkow/loan_prediction_classification.git
cd Solution
pip install pandas numpy matplotlib seaborn scikit-learn xgboost
```
## How to use
Place the dataset files (loan.csv and Data_Dictionary.xlsx) in the project directory. Then open and run the Jupyter notebook.

```bash
jupyter notebook Loan_Classification_Solution.ipynb
```
In the notebook, you'll find all the steps for data preprocessing, model training, and evaluation.

## Approach
Here's a brief overview of my approach to solving the loan classification problem:

**Data Preprocessing:**

-Checked for and handled missing values appropriately.

-Encoded categorical variables using one-hot encoding.

-Scaled numerical features to ensure uniformity.

**Exploratory Data Analysis:**

-Visualized the distribution of various features.

-Identified correlations and relationships between features.

**Feature Engineering:**

-Created new features to improve model performance.

-Selected the most relevant features based on exploratory analysis.

**Model Building:**

-Experimented with several algorithms including RandomForestClassifier.

-Performed hyperparameter tuning using GridSearchCV.

-Used cross-validation to ensure model stability.

**Model Evaluation:**

-Evaluated models using metrics such as accuracy, precision, recall, F1-score, and ROC-AUC.

-Analyzed the confusion matrix and ROC curves to assess performance.

## Results
The final model achieved an accuracy of 92% on the test dataset. For more detailed results and insights, please refer to the Jupyter notebook.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dvelkow/loan_prediction_classification

Awesome Lists containing this project

README