https://github.com/krishnaura45/insurelead
๐งชPredictive Modeling for Insurance Cross-Selling Response ๐ฅ Deep-ensemble approach
https://github.com/krishnaura45/insurelead
ann binary-classification blending boosting ensemble-learning insurance kaggle-competition ml stacking
Last synced: 5 months ago
JSON representation
๐งชPredictive Modeling for Insurance Cross-Selling Response ๐ฅ Deep-ensemble approach
- Host: GitHub
- URL: https://github.com/krishnaura45/insurelead
- Owner: krishnaura45
- License: gpl-3.0
- Created: 2025-04-20T06:18:09.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-04-24T22:14:05.000Z (about 1 year ago)
- Last Synced: 2025-06-13T00:40:45.635Z (about 1 year ago)
- Topics: ann, binary-classification, blending, boosting, ensemble-learning, insurance, kaggle-competition, ml, stacking
- Language: Jupyter Notebook
- Homepage:
- Size: 269 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# InsureLead
Predicting Customer Responses to Insurance Offers Using ML








### Project Duration: Jul 15, 2024 - Aug 1, 2024
---
## ๐ Introduction
The objective is to predict which customers will respond positively to a vehicle insurance offer. This project is part of a binary classification challenge which was hosted on Kaggle. Submissions were evaluated using **Area Under the ROC Curve (AUC)**.
---
## ๐ฅ Top Approach
Explore full implementation here: ๐ [PS4E7 - Stacking Boosters with ANN](https://github.com/krishnaura45/InsureLead/blob/main/ps4e7-stacking-boosters-and-ann.ipynb)
- ๐ **Data Integration & Inspection**
- Combined official training dataset with original insurance dataset for feature enrichment.
- ๐ ๏ธ **Preprocessing Pipelines**
- Utilized Scikit-learn pipelines and transformers with encoders: `StandardScaler`, `PowerTransformer`, `OneHotEncoder`, `OrdinalEncoder`.
- ๐ **Feature Engineering & Selection**
- Applied mutual information filtering to retain informative features.
- ๐งฐ **Modeling with Ensembles**
- Trained and validated XGBoost, CatBoost, LightGBM classifiers using Stratified K-Fold CV.
- Hyperparameter tuning with Optuna and visual exploration tools.
- ๐๏ธ **Submission Strategy**
- Ensemble predictions via model averaging on test data.
---
## ๐ Results / Outcomes
- โ
Public Leaderboard Scores: ranging from *0.50060* to *0.89727*
- ๐ Best Private Score: ***0.89690**
- ๐ฅ Rank Achieved: Ranked 70 / 2425 participants and 2234 teams as a solo participant

---
## ๐ References
- ๐ Kaggle Competition: Binary Classification of Insurance Cross Selling
- ๐ Original Dataset: Health Insurance Cross Sell Prediction Data
---
## ๐ ๏ธ Tech Stack
- Language: Python ๐
- Libraries:
- `pandas`, `polars`, `numpy` for data handling
- `matplotlib`, `seaborn` for EDA and plotting
- `scikit-learn`, `xgboost`, `catboost`, `lightgbm` for modeling
- `optuna` for hyperparameter tuning
- Tools:
- Jupyter Notebook / Kaggle Notebooks for experimentation
- Custom pipelines and scoring functions for AUC optimization
---