https://github.com/gourav052003/predicting-customer-engagement-in-financial-products-insights-from-marketing-campaigns

The purpose is to train a predictive model that can determine if a given customer will subscribe to a term deposit based on these various features. By analyzing historical data on successful and unsuccessful subscription outcomes, patterns can be identified which help predict future subscription behavior.
https://github.com/gourav052003/predicting-customer-engagement-in-financial-products-insights-from-marketing-campaigns

bank campaign-finance catboost-classifier classification investment-analysis

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/gourav052003/predicting-customer-engagement-in-financial-products-insights-from-marketing-campaigns
Owner: Gourav052003
License: mit
Created: 2024-01-20T13:09:55.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-11-08T10:04:24.000Z (7 months ago)
Last Synced: 2025-01-03T04:31:36.162Z (5 months ago)
Topics: bank, campaign-finance, catboost-classifier, classification, investment-analysis
Language: Jupyter Notebook
Homepage:
Size: 39.2 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# **Predicting-Customer-Engagement-in-Financial-Products-Insights-from-Marketing-Campaigns**

## **1. Project-Overview**
This project leverages machine learning to predict whether a customer will subscribe to a bank's term deposit based on data collected from direct marketing campaigns. By analyzing features such as customer demographics, previous interactions, and financial data, we aim to optimize marketing strategies for future campaigns.

This repository contains the complete pipeline from data preprocessing, feature engineering, model building, hyperparameter tuning, and model evaluation.

## **2. Dataset**
The dataset used is from a Portuguese banking institution, consisting of 41,188 instances and 20 features. It contains customer data and outcomes from direct marketing campaigns involving phone calls. Key features include:

- **Customer Attributes**: `age`, `job`, `marital`, `education`, `balance`, `housing`, `loan`, etc.
- **Contact Attributes**: `contact type (telephone, cellular)`, `last contact day`, `duration`, etc.
- **Previous Campaign Data**: `pdays`, `previous`, `poutcome` (outcome of the previous campaign).
- **Target Variable**: `subscribed` (whether the customer subscribed to a term deposit).

### **Dataset Preprocessing**
- Handled missing values using median imputation and default values for categorical features.
- Encoded categorical variables using One-Hot Encoding.
- Applied **Min-Max scaling** to normalize continuous features.

## **3. Exploratory Data Analysis (EDA)**
**Objective**: Identify key patterns and relationships between features and the target variable.

- **Correlation Matrix**: Assessed correlations between numerical features and the target variable.
- **Univariate and Bivariate Analysis**: Visualized distributions of important features (e.g., age, balance) and their relationships with the target.
- **Class Imbalance**: The dataset is highly imbalanced, with only ~11% positive class (i.e., subscribed). Addressed class imbalance using **SMOTE (Synthetic Minority Over-sampling Technique)**.

## **4. Feature Engineering**
- **Interaction Features**: Created new interaction terms between `balance` and `pdays` to capture potential non-linear relationships.
- **Domain-Specific Features**: Developed features such as `contact rate per campaign` and `balance-duration ratio`.
- **Temporal Features**: Derived features based on the day of the week and time of contact to account for possible temporal effects on subscription likelihood.

## **5. Model Development**
### **Baseline Models**:
1. **Logistic Regression**: As a baseline for comparison.
2. **Decision Trees**: For interpretable predictions.

### **Advanced Models**:
1. **Random Forest**: Robust model for handling non-linear relationships and feature importance analysis.
2. **XGBoost**: Gradient boosting for better generalization and handling of imbalanced classes.
3. **CatBoost**: Evaluated due to its efficiency in handling categorical features without explicit encoding.

### **Hyperparameter Tuning**:
Utilized **GridSearchCV** and **RandomizedSearchCV** for hyperparameter optimization:
- **Random Forest**: Tuned `n_estimators`, `max_depth`, and `min_samples_split`.
- **XGBoost**: Tuned `learning_rate`, `max_depth`, `n_estimators`, and `subsample`.

### **Model Evaluation Metrics**:
- **Accuracy**: Simple baseline comparison.
- **Precision**: Focus on minimizing false positives in this business context.
- **Recall**: Important to avoid missing potential customers likely to subscribe.
- **F1-Score**: Harmonic mean of precision and recall to balance both.
- **ROC-AUC Score**: Evaluated the model's ability to discriminate between the classes.

### **Handling Imbalance**:
- Implemented **SMOTE** to oversample the minority class and improve recall.
- Tested **class weights adjustment** to further balance precision and recall.

## **6. Results and Insights**
- The best-performing model was **XGBoost**, achieving:
- **Accuracy**: 90.5%
- **Precision**: 75.6%
- **Recall**: 68.3%
- **F1-Score**: 71.8%
- **ROC-AUC**: 92.2%

- **Feature Importance** (from XGBoost):
1. `duration`: The duration of the last contact.
2. `pdays`: Number of days since the client was last contacted.
3. `balance`: Customer's account balance.
4. `campaign`: Number of contacts during the current campaign.
5. `job`: Customer’s occupation.

- The **duration of the last contact** was the most influential predictor, indicating the importance of engagement time in a successful subscription.

## **7. Deployment and Next Steps**
### **Model Deployment**:
- The final model is deployed via **Flask API**. It accepts customer data as input and returns the likelihood of subscription.
- **Dockerized the API** for easy integration with other banking systems.

### **Potential Improvements**:
1. Experiment with **neural networks** to capture more complex patterns in high-dimensional data.
2. Integrate **real-time data** to make the model adaptive to changing customer behaviors and market trends.
3. Implement an **A/B testing framework** to continuously validate and improve the model in production.

## **8. Repository Structure**
```bash
├── data/ # Dataset and data processing scripts
├── notebooks/ # Jupyter notebooks for EDA and model building
├── models/ # Saved models and model training scripts
├── app/ # Flask app for deployment
├── Dockerfile # Docker configuration
├── README.md # Project documentation
└── requirements.txt # List of dependencies
```

## **9. How to Run the Project**
Clone the repository:
```
git clone https://github.com/Gourav052003/Predicting-Customer-Engagement-in-Financial-Products-Insights-from-Marketing-Campaigns.git
cd bank-term-deposit-prediction
```

Install dependencies:
```
pip install -r requirements.txt
```
Run the Jupyter notebook to train models:
```
jupyter notebook notebooks/Bank_Term_Deposit_Prediction.ipynb
```
Run the Flask API for predictions:
```
cd app
python app.py
```

### Key Technical Enhancements:
- **Detailed descriptions of models, algorithms, and hyperparameter tuning techniques**.
- **Emphasis on dealing with class imbalance** using methods like SMOTE and class weight adjustments.
- **Feature engineering** techniques that demonstrate data-driven decision-making.
- **Comprehensive evaluation metrics** showing performance beyond just accuracy, including precision, recall, F1-score, and ROC-AUC.
- **Future work** that hints at more complex methods (e.g., neural networks, real-time predictions) and production considerations (e.g., Dockerization, API deployment).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/gourav052003/predicting-customer-engagement-in-financial-products-insights-from-marketing-campaigns

Awesome Lists containing this project

README