https://github.com/rachkat/predictive-modeling-customer-targeting
Predictive analytics project (Logistic Regression + Random Forest) using CoIL Challenge 2000 dataset to optimize customer targeting for insurance acquisition.
https://github.com/rachkat/predictive-modeling-customer-targeting
crm-integration data-science machine-learning marketing-analytics r random-forest
Last synced: 3 days ago
JSON representation
Predictive analytics project (Logistic Regression + Random Forest) using CoIL Challenge 2000 dataset to optimize customer targeting for insurance acquisition.
- Host: GitHub
- URL: https://github.com/rachkat/predictive-modeling-customer-targeting
- Owner: rachkat
- License: mit
- Created: 2025-09-24T19:16:44.000Z (13 days ago)
- Default Branch: main
- Last Pushed: 2025-09-24T19:27:19.000Z (13 days ago)
- Last Synced: 2025-09-24T21:25:26.920Z (13 days ago)
- Topics: crm-integration, data-science, machine-learning, marketing-analytics, r, random-forest
- Homepage:
- Size: 3.65 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Improving Customer Targeting Through Predictive Modeling


---
## Executive Summary
This project applies **predictive analytics** to improve customer acquisition strategies for **The Insurance Company (TIC)**, using the **CoIL Challenge 2000 dataset** (5,822 customers Γ 86 variables).By testing **Logistic Regression** and **Random Forest**, the study demonstrates how predictive modeling can:
- Improve **customer targeting** for mobile home insurance.
- Reduce **marketing acquisition costs** by prioritizing high-probability leads.
- Provide **data-driven insights** for long-term customer relationship management.Key takeaway: **Logistic Regression** provided interpretability for deployment, while **Random Forest** delivered higher predictive accuracy by capturing complex interactions.
π **Full report PDF** β [Improving-Customer-Targeting-Through-Predictive-Modeling.pdf](./Improving-Customer-Targeting-Through-Predictive-Modeling.pdf)
---
## Project Context
**Business Challenge**
- Traditional marketing = high cost + low conversion.
- TIC needed to identify **which customers are most likely to purchase** mobile home insurance policies.**Solution Approach**
- Apply **predictive modeling** to customer demographic & product ownership data.
- Compare **Logistic Regression** (interpretable baseline) vs. **Random Forest** (higher complexity).
- Evaluate models on **accuracy, precision, recall, F1-score, ROC-AUC**.---
## Dataset
- **Source**: CoIL Challenge 2000 (real-world business data).
- **Size**: 5,822 rows Γ 86 columns.
- **Features**:
- Demographic (age, household size, income, education).
- Product ownership (car, life, home policies).
- Purchasing power & socio-economic indicators.
- **Target**: `CARAVAN` β binary (owns mobile home insurance: 1 = yes, 0 = no).---
## Methods
### Data Preparation
- Converted categorical/factor variables.
- Split into 80/20 training vs. testing sets.
- Addressed severe **class imbalance** (only ~6% policyholders).### Algorithms
1. **Logistic Regression (GLM)**
- Probability-based, interpretable, fast to deploy.
2. **Random Forest**
- Ensemble method, improved accuracy, captured nonlinear relationships.### Evaluation
- Metrics: Accuracy, Precision, Recall, F1-score, ROC-AUC.
- Visuals: Decision trees, feature importance, confusion matrix, ROC curve.---
## Results & Insights
- **Logistic Regression**:
- Strong interpretability, useful for pilot deployment.
- High **precision (78.1%)**, but low **recall (7.1%)** β missed many true buyers.- **Random Forest**:
- Better handling of complex feature interactions.
- **Feature importance**: Purchasing behavior (PBRAND), demographics (MOSTYPE), car/life insurance ownership (APERSAUT, PPERSAUT), and purchasing power (MKOOPKLA) drove predictions.- **Business Value**:
- Improved targeting efficiency β fewer wasted marketing efforts.
- Actionable insights for segmentation & cross-selling.
- Foundation for **CRM integration** and **predictive lead scoring**.---
## Key Skills Demonstrated
- **Predictive Modeling**: Logistic Regression & Random Forest.
- **Data Wrangling**: Handling imbalanced data, feature engineering, correlation analysis.
- **Model Evaluation**: Confusion matrices, precision-recall tradeoffs, ROC-AUC.
- **Business Translation**: Linking analytics results to marketing efficiency & acquisition cost reduction.
- **Reproducibility**: Structured process with R Markdown, version control in GitHub.---
## Reproducibility
**Environment**: RStudio, R 4.2.1
**Packages**: `stats`, `randomForest`, `caret`, `ggplot2````r
# Example: Logistic Regression
glm_model <- glm(CARAVAN ~ ., data = train, family = "binomial")
summary(glm_model)# Example: Random Forest
library(randomForest)
rf_model <- randomForest(CARAVAN ~ ., data = train, ntree = 500, importance = TRUE)
varImpPlot(rf_model)
```---
## Limitations & Next Steps
- Severe class imbalance limited recall β explore **SMOTE** or cost-sensitive learning.
- Add hyperparameter tuning (**grid search, Bayesian optimization**).
- Deploy models with **PMML** for integration into CRM systems.
- Pilot test with TICβs customer data before full-scale deployment.---
## License
Released under the **MIT License**. See [LICENSE](./LICENSE).---
## Tags
`predictive-analytics, logistic-regression, random-forest, insurance, customer-targeting, machine-learning, data-science, coil-challenge, r, marketing-analytics`