https://github.com/saikrishnauppaluri/home-credit-default-risk

Home Credit Default Risk using LGBM
https://github.com/saikrishnauppaluri/home-credit-default-risk

analysis homecredit lgbm multivariate-analysis

Last synced: 2 months ago
JSON representation

Home Credit Default Risk using LGBM

Host: GitHub
URL: https://github.com/saikrishnauppaluri/home-credit-default-risk
Owner: saikrishnauppaluri
Created: 2020-11-26T23:04:58.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2021-01-21T00:48:15.000Z (over 4 years ago)
Last Synced: 2025-03-31T07:31:31.100Z (3 months ago)
Topics: analysis, homecredit, lgbm, multivariate-analysis
Language: Jupyter Notebook
Homepage:
Size: 13.7 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Home-Credit-Default-Risk
Link : https://www.kaggle.com/c/home-credit-default-risk

#Home Credit Default Risk using LGBM

**Salient Project Features:**
> The Datasets were joined together to form an Accumulated Dataset (ADS) for both Train
and Test to do analysis.

> Exploratory Data Analysis (EDA) was done using both univariate and multivariate
analysis to understand the relationship and associations among the variables.

**Missing Value treatment**
> This was leveraged to reduce the dimensionality of the dataset by dropping the
variables which had missing values greater than 50% as any imputing strategy
used for these features would have been misleading.

> For remaining numerical features, the missing values were imputed with Median
values as the variables were skewed and preference was given to Median value
over Mean.

> For remaining categorical features, imputed with a constant value to identify the
missing values. These could have been imputed with mode value but as the
Target variable in dataset is highly imbalanced, this was avoided.

−> Anomaly and Outlier detection and treatment was done for features with counter
intuitive values and/or having values extremely small or high.

−> Feature Engineering was applied to derive new variables based on learnings industry
research and correlation and multivariate analysis.

−> LightGBM algorithm was used to for predicting the probabilities for defaulting after
appropriately treating the variables, imputing and scaling the most important features.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/saikrishnauppaluri/home-credit-default-risk

Awesome Lists containing this project

README