An open API service indexing awesome lists of open source software.

https://github.com/krishnaura45/insurelead

๐ŸงชPredictive Modeling for Insurance Cross-Selling Response ๐Ÿ”ฅ Deep-ensemble approach
https://github.com/krishnaura45/insurelead

ann binary-classification blending boosting ensemble-learning insurance kaggle-competition ml stacking

Last synced: 5 months ago
JSON representation

๐ŸงชPredictive Modeling for Insurance Cross-Selling Response ๐Ÿ”ฅ Deep-ensemble approach

Awesome Lists containing this project

README

          

# InsureLead
Predicting Customer Responses to Insurance Offers Using ML

![Python](https://img.shields.io/badge/Python-3776AB?style=for-the-badge&logo=python&logoColor=white)
![Scikit-Learn](https://img.shields.io/badge/Scikit--Learn-F7931E?style=for-the-badge&logo=scikit-learn&logoColor=white)
![Kaggle](https://img.shields.io/badge/Kaggle-20BEFF?style=for-the-badge&logo=kaggle&logoColor=white)
![ROC AUC Optimized](https://img.shields.io/badge/Optimized--for-ROC%20AUC-yellowgreen?style=for-the-badge)
![Optuna](https://img.shields.io/badge/Optuna-Tuning-blueviolet?style=for-the-badge)
![AUC Score](https://img.shields.io/badge/Best%20AUC-0.89690-2ECC71?style=for-the-badge)
![Rank](https://img.shields.io/badge/Rank-70%20of%202425-brightgreen?style=for-the-badge)
![Solo](https://img.shields.io/badge/Submission-Type%3A%20Solo-orange?style=for-the-badge)

### Project Duration: Jul 15, 2024 - Aug 1, 2024
---

## ๐ŸŒŸ Introduction

The objective is to predict which customers will respond positively to a vehicle insurance offer. This project is part of a binary classification challenge which was hosted on Kaggle. Submissions were evaluated using **Area Under the ROC Curve (AUC)**.

---

## ๐Ÿฅ‰ Top Approach

Explore full implementation here: ๐Ÿ”— [PS4E7 - Stacking Boosters with ANN](https://github.com/krishnaura45/InsureLead/blob/main/ps4e7-stacking-boosters-and-ann.ipynb)

- ๐Ÿ“Š **Data Integration & Inspection**
- Combined official training dataset with original insurance dataset for feature enrichment.

- ๐Ÿ› ๏ธ **Preprocessing Pipelines**
- Utilized Scikit-learn pipelines and transformers with encoders: `StandardScaler`, `PowerTransformer`, `OneHotEncoder`, `OrdinalEncoder`.

- ๐Ÿ” **Feature Engineering & Selection**
- Applied mutual information filtering to retain informative features.

- ๐Ÿงฐ **Modeling with Ensembles**
- Trained and validated XGBoost, CatBoost, LightGBM classifiers using Stratified K-Fold CV.
- Hyperparameter tuning with Optuna and visual exploration tools.

- ๐Ÿ‹๏ธ **Submission Strategy**
- Ensemble predictions via model averaging on test data.

---

## ๐Ÿ“Š Results / Outcomes

- โœ… Public Leaderboard Scores: ranging from *0.50060* to *0.89727*

- ๐Ÿ Best Private Score: ***0.89690**

- ๐Ÿฅ‡ Rank Achieved: Ranked 70 / 2425 participants and 2234 teams as a solo participant

![Score Progression Plot](https://github.com/user-attachments/assets/989ab79b-db3e-40ae-a7fb-ed3a040bba09)

---

## ๐Ÿ”— References

- ๐Ÿ“ Kaggle Competition: Binary Classification of Insurance Cross Selling

- ๐Ÿ“‚ Original Dataset: Health Insurance Cross Sell Prediction Data

---

## ๐Ÿ› ๏ธ Tech Stack

- Language: Python ๐Ÿ

- Libraries:

- `pandas`, `polars`, `numpy` for data handling

- `matplotlib`, `seaborn` for EDA and plotting

- `scikit-learn`, `xgboost`, `catboost`, `lightgbm` for modeling

- `optuna` for hyperparameter tuning

- Tools:

- Jupyter Notebook / Kaggle Notebooks for experimentation

- Custom pipelines and scoring functions for AUC optimization

---