Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dvelkow/loan_prediction_classification
A Classification Model which predicts whether a loan will be approved based on applicant information, it leverages various classification algorithms to do so, featuring data preprocessing, exploratory data analysis, and model comparison to provide insights into the loan approval process
https://github.com/dvelkow/loan_prediction_classification
data-engineering machine-learning portfolio
Last synced: 25 days ago
JSON representation
A Classification Model which predicts whether a loan will be approved based on applicant information, it leverages various classification algorithms to do so, featuring data preprocessing, exploratory data analysis, and model comparison to provide insights into the loan approval process
- Host: GitHub
- URL: https://github.com/dvelkow/loan_prediction_classification
- Owner: dvelkow
- Created: 2024-05-26T13:52:26.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2024-06-07T02:16:48.000Z (8 months ago)
- Last Synced: 2025-01-11T04:07:05.296Z (26 days ago)
- Topics: data-engineering, machine-learning, portfolio
- Language: Jupyter Notebook
- Homepage:
- Size: 8.3 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Loan Classification Project
Welcome to my loan classification project! This repository contains my work on predicting loan approval outcomes using various machine learning techniques. This project is based on a dataset from Kaggle.## Table of Contents
- [Overview](#Overview)
- [Dataset Description](#dataset-description)
- [Setup Instructions](#setup-instructions)
- [How to Use](#how-to-use)
- [Approach](#approach)
- [Results](#results)## Overview
In this project, I aim to predict whether a loan application will be approved based on a set of features provided in the dataset. This involves several steps including data cleaning, exploratory data analysis, feature engineering, model building, and evaluation.## Dataset Description
The dataset consists of various features related to loan applicants. Here's a quick rundown of the files:loan.csv: The main dataset containing applicant information and loan approval status.
Data_Dictionary.xlsx: An Excel file that provides detailed descriptions of the features in the dataset.## Setup Instructions
To get started with the project, you'll need to set up your Python environment with the required libraries. Here's how you can do it:Clone the repository:
```bash
git clone https://github.com/dvelkow/loan_prediction_classification.git
cd Solution
pip install pandas numpy matplotlib seaborn scikit-learn xgboost
```
## How to use
Place the dataset files (loan.csv and Data_Dictionary.xlsx) in the project directory. Then open and run the Jupyter notebook.```bash
jupyter notebook Loan_Classification_Solution.ipynb
```
In the notebook, you'll find all the steps for data preprocessing, model training, and evaluation.## Approach
Here's a brief overview of my approach to solving the loan classification problem:**Data Preprocessing:**
-Checked for and handled missing values appropriately.
-Encoded categorical variables using one-hot encoding.
-Scaled numerical features to ensure uniformity.
**Exploratory Data Analysis:**
-Visualized the distribution of various features.
-Identified correlations and relationships between features.
**Feature Engineering:**
-Created new features to improve model performance.
-Selected the most relevant features based on exploratory analysis.
**Model Building:**
-Experimented with several algorithms including RandomForestClassifier.
-Performed hyperparameter tuning using GridSearchCV.
-Used cross-validation to ensure model stability.
**Model Evaluation:**
-Evaluated models using metrics such as accuracy, precision, recall, F1-score, and ROC-AUC.
-Analyzed the confusion matrix and ROC curves to assess performance.
## Results
The final model achieved an accuracy of 92% on the test dataset. For more detailed results and insights, please refer to the Jupyter notebook.