https://github.com/saikrishnauppaluri/home-credit-default-risk
Home Credit Default Risk using LGBM
https://github.com/saikrishnauppaluri/home-credit-default-risk
analysis homecredit lgbm multivariate-analysis
Last synced: 2 months ago
JSON representation
Home Credit Default Risk using LGBM
- Host: GitHub
- URL: https://github.com/saikrishnauppaluri/home-credit-default-risk
- Owner: saikrishnauppaluri
- Created: 2020-11-26T23:04:58.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2021-01-21T00:48:15.000Z (over 4 years ago)
- Last Synced: 2025-03-31T07:31:31.100Z (3 months ago)
- Topics: analysis, homecredit, lgbm, multivariate-analysis
- Language: Jupyter Notebook
- Homepage:
- Size: 13.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Home-Credit-Default-Risk
Link : https://www.kaggle.com/c/home-credit-default-risk#Home Credit Default Risk using LGBM
**Salient Project Features:**
> The Datasets were joined together to form an Accumulated Dataset (ADS) for both Train
and Test to do analysis.> Exploratory Data Analysis (EDA) was done using both univariate and multivariate
analysis to understand the relationship and associations among the variables.**Missing Value treatment**
> This was leveraged to reduce the dimensionality of the dataset by dropping the
variables which had missing values greater than 50% as any imputing strategy
used for these features would have been misleading.> For remaining numerical features, the missing values were imputed with Median
values as the variables were skewed and preference was given to Median value
over Mean.> For remaining categorical features, imputed with a constant value to identify the
missing values. These could have been imputed with mode value but as the
Target variable in dataset is highly imbalanced, this was avoided.−> Anomaly and Outlier detection and treatment was done for features with counter
intuitive values and/or having values extremely small or high.−> Feature Engineering was applied to derive new variables based on learnings industry
research and correlation and multivariate analysis.−> LightGBM algorithm was used to for predicting the probabilities for defaulting after
appropriately treating the variables, imputing and scaling the most important features.