{"id":19680118,"url":"https://github.com/blleshi/credit_risk_classification","last_synced_at":"2026-05-10T12:45:10.787Z","repository":{"id":251860614,"uuid":"817550889","full_name":"blleshi/Credit_Risk_Classification","owner":"blleshi","description":"Credit Risk Classification","archived":false,"fork":false,"pushed_at":"2024-08-06T05:21:52.000Z","size":923,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-10T05:22:30.639Z","etag":null,"topics":["classification-report","confusion-matrix","credit-risk","credit-risk-classification","data-testing","data-training","imbalanced-learning","lending","loans","logistic-regression","logistic-regression-model","pandas","randomoversampler","resampled-data","target-classification"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/blleshi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-20T01:02:32.000Z","updated_at":"2024-08-08T16:21:00.000Z","dependencies_parsed_at":"2024-08-06T07:32:23.189Z","dependency_job_id":"afddac45-7a5f-460c-b719-151e72e0496d","html_url":"https://github.com/blleshi/Credit_Risk_Classification","commit_stats":null,"previous_names":["blleshi/module-12-challenge","blleshi/credit_risk_classification"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blleshi%2FCredit_Risk_Classification","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blleshi%2FCredit_Risk_Classification/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blleshi%2FCredit_Risk_Classification/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blleshi%2FCredit_Risk_Classification/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/blleshi","download_url":"https://codeload.github.com/blleshi/Credit_Risk_Classification/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240989249,"owners_count":19889655,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["classification-report","confusion-matrix","credit-risk","credit-risk-classification","data-testing","data-training","imbalanced-learning","lending","loans","logistic-regression","logistic-regression-model","pandas","randomoversampler","resampled-data","target-classification"],"created_at":"2024-11-11T18:04:03.315Z","updated_at":"2026-05-10T12:45:10.721Z","avatar_url":"https://github.com/blleshi.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Credit Risk Classification Challenge\n\n## Background\nCredit risk classification presents a significant challenge due to the inherent imbalance in the dataset, where healthy loans significantly outnumber risky loans. This challenge involves using various techniques to train and evaluate models on imbalanced classes. The dataset comprises historical lending activity from a peer-to-peer lending services company, and the objective is to build a model that identifies borrowers' creditworthiness.\n\n## What You’re Creating\nYou will leverage the `imbalanced-learn` library to train a logistic regression model on two versions of the dataset: the original dataset and a resampled version using the `RandomOverSampler` module from `imbalanced-learn`.\n\nFor both datasets, you will:\n- Count the target classes\n- Train a logistic regression classifier\n- Calculate the balanced accuracy score\n- Generate a confusion matrix\n- Produce a classification report\n\nAdditionally, you will document a credit risk analysis report based on a provided template.\n\n## Files\nTo get started, download the following:\n\n- Module 12 Challenge files\n\n## Instructions\nThe instructions are divided into the following sections:\n\n### Split the Data into Training and Testing Sets\n1. Open the starter code notebook.\n2. Read `lending_data.csv` from the Resources folder into a Pandas DataFrame.\n3. Create the labels set (`y`) from the “loan_status” column and the features set (`X`) from the remaining columns.\n   - Note: A value of 0 in the “loan_status” column indicates a healthy loan, while 1 indicates a high-risk loan.\n4. Check the balance of the labels using the `value_counts` function.\n5. Split the data into training and testing datasets using `train_test_split`.\n\n### Create a Logistic Regression Model with the Original Data\n1. Fit a logistic regression model using the training data (`X_train` and `y_train`).\n2. Predict the labels for the testing data using `X_test` and the trained model.\n3. Evaluate the model’s performance:\n   - Calculate the accuracy score.\n   - Generate a confusion matrix.\n   - Print the classification report.\n4. Answer: How well does the logistic regression model predict both the 0 (healthy loan) and 1 (high-risk loan) labels?\n\n### Predict a Logistic Regression Model with Resampled Training Data\nTo potentially improve model performance, you will resample the training data using `RandomOverSampler`:\n1. Resample the data with `RandomOverSampler` to ensure equal numbers of labels.\n2. Fit the `LogisticRegression` classifier on the resampled data and make predictions.\n3. Evaluate the model’s performance:\n   - Calculate the accuracy score.\n   - Generate a confusion matrix.\n   - Print the classification report.\n4. Answer: How well does the logistic regression model, trained with oversampled data, predict both the 0 (healthy loan) and 1 (high-risk loan) labels?\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fblleshi%2Fcredit_risk_classification","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fblleshi%2Fcredit_risk_classification","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fblleshi%2Fcredit_risk_classification/lists"}