https://github.com/mahnoorsheikh16/loan-default-prediction

Credit risk is the borrower’s inability to repay a loan. Machine Learning models can predict risky customers and reduce lender losses. By analyzing behavior and demographics of past customers, these insights can apply to future customers for better loan decisions. This study aims to find the most suitable model for predicting loan defaults.
https://github.com/mahnoorsheikh16/loan-default-prediction

auc-score binary-classification-algorithms credit-card-fraud-detection data-cleaning data-science decision-tree-classifier exploratory-data-analysis loan-default-prediction logistic-regression machine-learning naive-bayes-classifier pre-processing python random-forest-classifier support-vector-machines xgboost-classifier

Last synced: 4 months ago
JSON representation

Host: GitHub
URL: https://github.com/mahnoorsheikh16/loan-default-prediction
Owner: mahnoorsheikh16
Created: 2024-09-19T19:13:49.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-09-19T19:36:38.000Z (almost 2 years ago)
Last Synced: 2024-12-29T01:37:32.214Z (over 1 year ago)
Topics: auc-score, binary-classification-algorithms, credit-card-fraud-detection, data-cleaning, data-science, decision-tree-classifier, exploratory-data-analysis, loan-default-prediction, logistic-regression, machine-learning, naive-bayes-classifier, pre-processing, python, random-forest-classifier, support-vector-machines, xgboost-classifier
Language: Jupyter Notebook
Homepage:
Size: 2.38 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

#### Problem Statement
Loan lending has been an important part of the daily lives for organizations and individuals alike, and this activity has become more or less inevitable with the growing financial constraints. Though loan lending is quite beneﬁcial for both the lenders and the receivers, and is considered an essential part of ﬁnancial organizations, it does carry great risks. Credit Risk is the inability of the receiver to pay back the loan at the designated time which was decided by the lender and the borrower during the loan agreement. This causes major concerns among the ﬁnancial institutes as it can result in “credit defaulting”, which can prove to be drastic to the lending party, e.g. may result in bankruptcy. A thorough evaluation and verification of the ability of a borrower to repay his/her loan in the decided time period can result in minimized credit risk, and so will prove beneficial for ﬁnancial institutes around the world.

#### Datasets
I have made use of two datasets to benchmark my results. The first dataset contains information about Dream Housing Finance company which gives out housing loans, and contains 614 instances and 13 attributes. The second dataset contains information on credit card clients in Taiwan from April 2005 to September 2005. It has 30,000 instances and 25 attributes. Both datasets have multivariate characteristics, and the attributes have integer, categorical and real data types.

#### Methodology
I have applied multiple models on the datasets to make an informed decision through the comparative analysis of all the models. My problem was a binary classification problem and so I used classifiers. Before the models were applied, each dataset went through a thorough process of Data Cleaning, Exploratory Data Analysis and Pre-processing. A total of six models were applied on each dataset, namely Logistic Regression, Decision Trees, Random Forest Classifier, Support Vector Machine (SVM), Naïve Bayes, and XGBoost Classifier.

#### Results
This study was conducted to find an algorithm which would predict loan default accurately, and save financial institutions from loaning to defaulting customers and incurring losses. Although Random Forrest Classifier came in a close second, the new classifier which has made for itself a reputation in winning Kaggle competitions has proved itself once again. The XGBoost Classifier proved itself to be the most optimal classifier in predicting if a loan would default or not on both large and small datasets. Its results surpassed the other classifiers in all aspects, be it accuracy or AUC score. Moreover, it is seen that results achieved on the smaller data were far better than that reached on the larger dataset. For example, the highest AUC score on ‘loan data set’ was 0.89, whereas that on ‘default of credit card clients’ was only 0.78. This can be due to the fact that there were many discrepancies in the ‘default of credit card clients’ dataset. It had errors, values that were not quoted in the data description, and trends which could not be explained logically. We could not remove these values either as they carried important information, and on their removal the model accuracies reduced. On the other hand, the ‘loan data set’ provided better results even though it contained missing values. This proves the fact that a model is only as good as the database.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mahnoorsheikh16/loan-default-prediction

Awesome Lists containing this project

README