https://github.com/sharkb8t/credit-risk-classification
Demonstrates my abilities to use Jupyter Notebook with scikit-learn to train and evaluate a machine learning model.
https://github.com/sharkb8t/credit-risk-classification
jupyter-notebook numpy pandas pathlib python scikit-learn
Last synced: 2 months ago
JSON representation
Demonstrates my abilities to use Jupyter Notebook with scikit-learn to train and evaluate a machine learning model.
- Host: GitHub
- URL: https://github.com/sharkb8t/credit-risk-classification
- Owner: Sharkb8t
- Created: 2025-03-12T22:36:02.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-12T22:52:28.000Z (over 1 year ago)
- Last Synced: 2025-03-12T23:29:22.785Z (over 1 year ago)
- Topics: jupyter-notebook, numpy, pandas, pathlib, python, scikit-learn
- Language: Jupyter Notebook
- Homepage:
- Size: 0 Bytes
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# π³ Credit Risk Classification
## πOverview:
This project involves building a **logistic regression model** to predict **credit risk** based on historical loan data. The goal is to classify loans as either **healthy (0)** or **high-risk (1)** using **scikit-learn's Logistic Regression model**.
## π₯οΈ Technologies Used:
π Python
π Pandas
π€ Scikit-Learn
π Jupyter Notebook
## βΉοΈ Dataset:
The dataset (`lending_data.csv`) contains **loan records** with various financial attributes.
- **Label Variable (`loan_status`)**:
- `0`: Healthy Loan
- `1`: High-Risk Loan
- **Features (`X`)**: Various financial indicators (e.g., income, loan amount, credit history).
## π§ Installation and Usage:
### 1οΈβ£ Clone the Repository
```bash
git clone https://github.com/Sharkb8t/credit-risk-classification.git
cd credit-risk-classification
```
### 2οΈβ£ Install Required Libraries
```bash
pip install pandas scikit-learn
```
### 3οΈβ£ Run the Jupyter Notebook
```bash
jupyter notebook
```
Open `credit_risk-classification.ipynb` and run cells in sequencial order.
> Update file path in cell *5*
## πΊοΈ Steps in the Analysis
#### 1.) Data Preprocessing
β
Load and inspect the dataset:
- Read lending_data.csv into a Pandas DataFrame.
- Check for missing values and data types.
β
Define features and target variables:
- Separate loan_status as the target variable (y).
- Drop loan_status from the dataset to create features (X).
β
Split the dataset:
- Use train_test_split() to divide the data into 80% training and 20% testing.
#### 2.) Model Training
π Train a Logistic Regression Model:
- Import `LogisticRegression` from scikit-learn.
- Instantiate the model with `random_state=1`.
- Train the model using `X_train` and `y_train`.
#### 3.) Predictions and Evaluation
π Generate Predictions:
> Use `predict()` on `X_test` to generate loan risk predictions (`y_pred`).
π Evaluate Model Performance:
- Compute the Confusion Matrix to analyze misclassifications.
- Generate a Classification Report to measure:
> Precision (Positive Predictive Value)
> Recall (Sensitivity)
> F1-Score (Balance between Precision & Recall)
- Overall Accuracy
## π§βπ¬ Results Summary
| Metric | Class 0 (Healthy Loan) | Class 1 (High-Risk Loan) |
|--------------|------------------------|--------------------------|
| **Precision** | 100% | 86% |
| **Recall** | 100% | 91% |
| **F1-Score** | 100% | 88% |
| **Accuracy** | **99%** | - |
### π Key Observations
- β
The model performs exceptionally well overall with 99% accuracy.
- π΅ Perfect precision and recall for healthy loans (0).
- π‘ Good recall (91%) for high-risk loans (1), but slightly lower precision (86%), meaning some false positives exist.
### π Recommendation
The model is highly effective for predicting healthy loans, but to improve high-risk loan precision, we could:
- βοΈ Adjust class weights (since high-risk loans are underrepresented).
- π Experiment with higher `max_iter` values for better convergence.
- π² Use alternative models like Random Forest or SMOTE (Synthetic Minority Oversampling Technique) for balancing.