https://github.com/tinaland101/credit-risk-classification

The purpose of this project is to build a credit risk classification model using machine learning techniques. This model helps identify the creditworthiness of borrowers based on historical lending data. Specifically, it uses a logistic regression model to predict whether a loan is healthy (0) or high-risk (1).
https://github.com/tinaland101/credit-risk-classification

numpy pandas pathlib scikit-learn

Last synced: 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/tinaland101/credit-risk-classification
Owner: tinaland101
Created: 2025-02-23T19:06:58.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-02-23T19:12:49.000Z (over 1 year ago)
Last Synced: 2025-02-23T20:22:39.838Z (over 1 year ago)
Topics: numpy, pandas, pathlib, scikit-learn
Language: Jupyter Notebook
Homepage:
Size: 894 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          Step 1: Import Required Libraries

# Import necessary libraries

import numpy as np  # Used for numerical operations

import pandas as pd  # Used for handling tabular data

from pathlib import Path  # Used for handling file paths

from sklearn.metrics import confusion_matrix, classification_report  # Evaluation metrics

from sklearn.model_selection import train_test_split  # Splitting dataset

from sklearn.linear_model import LogisticRegression  # Logistic regression model

numpy helps with numerical computations.

pandas allows us to manipulate tabular data (CSV files).

Path makes it easier to work with file paths.

train_test_split is used to split the dataset into training and testing sets.

LogisticRegression is the classification model we use.

 Step 2: Load and Inspect Data

# Read the CSV file into a Pandas DataFrame

df = pd.read_csv("Resources/lending_data.csv")

# Display the first 5 rows of the dataset

df.head()

Reads the CSV file containing lending data.

Displays the first few rows for review.

Step 3: Define Features (X) and Labels (y)

# Separate the y variable (loan_status column)

y = df["loan_status"]

# Separate the X variable (all columns except loan_status)

X = df.drop(columns=["loan_status"])

# Review the y variable

print(y.value_counts())  # Check how many healthy/high-risk loans

# Review the X variable

print(X.head())  # Display first 5 rows of features

y (target variable) contains loan status:

0 = Healthy Loan

1 = High-Risk Loan

X (features) contains borrower attributes (income, credit score, etc.).

value_counts() helps check the distribution of 0s and 1s.

Step 4: Split Data into Training and Testing Sets

# Split the data into training and testing sets (80% training, 20% testing)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

# Display dataset shapes

print("Training set shape:", X_train.shape, y_train.shape)

print("Testing set shape:", X_test.shape, y_test.shape)

train_test_split divides data:

80% training (X_train, y_train)

20% testing (X_test, y_test)

random_state=1 ensures consistent results.

 Step 5: Train the Logistic Regression Model

# Instantiate the Logistic Regression model with random_state=1

model = LogisticRegression(random_state=1)

# Train the model using the training data

model.fit(X_train, y_train)

Creates a logistic regression model.

Fits (trains) it using the training data.

Step 6: Make Predictions

# Make predictions using the testing dataset

y_pred = model.predict(X_test)

# Display first 10 predictions

print("Predicted labels:", y_pred[:10])

Uses the trained model to predict loan status on test data.

 Step 7: Evaluate the Model

Confusion Matrix

# Generate a confusion matrix

cm = confusion_matrix(y_test, y_pred)

print("Confusion Matrix:\n", cm)

Shows how many predictions were correct and incorrect.

Classification Report

# Generate a classification report

report = classification_report(y_test, y_pred)

print("Classification Report:\n", report)

Gives accuracy, precision, and recall for both classes (0 and 1).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tinaland101/credit-risk-classification

Awesome Lists containing this project

README