https://github.com/udityamerit/customer-segmentation-and-churn-prediction-model

machine-learning machine-learning-algorithms scikit-learn scikitlearn-machine-learning

Last synced: 7 months ago
JSON representation

Host: GitHub
URL: https://github.com/udityamerit/customer-segmentation-and-churn-prediction-model
Owner: udityamerit
License: mit
Created: 2024-11-27T15:58:26.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-11-28T17:14:13.000Z (10 months ago)
Last Synced: 2025-01-27T06:43:00.103Z (8 months ago)
Topics: machine-learning, machine-learning-algorithms, scikit-learn, scikitlearn-machine-learning
Language: Python
Homepage:
Size: 154 KB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

## **Customer Segmentation and Churn Prediction Model**

### **Project Overview**
This project aims to:
1. Segment customers into clusters based on their demographic, spending, and engagement patterns.
2. Predict customer churn to help businesses focus their marketing efforts on retaining at-risk customers.

The app is built with **Streamlit**, using clustering (KMeans) and machine learning techniques for churn prediction. The model provides actionable insights to improve customer engagement and retention strategies.

---

### **Features**
1. **Dynamic Clustering**:
- Users can choose the number of clusters (2 to 10) for segmentation.
- Clustering helps group customers with similar characteristics for targeted campaigns.
2. **Churn Prediction**:
- Key features like spending habits, age, income, and purchase frequency are used to predict churn probabilities.
3. **Visualization**:
- PCA-based scatter plots to visualize customer segments.
- Histograms for feature distribution across clusters.
4. **Marketing Strategy Recommendations**:
- Tailored recommendations for each customer cluster.

---

### **How to Run**
1. Install dependencies:
```bash
pip install streamlit pandas numpy matplotlib seaborn scikit-learn
```
2. Save the script as `customer segmentation.py`.
3. Run the app:
```bash
streamlit run customer segmentation.py
```
4. Upload your dataset (`.csv` file) and explore clusters or churn predictions.

---

### **Model Description**
1. **Clustering**:
- Features used: Age, Income, Spending, Purchases, Campaign Response Rates, etc.
- Preprocessing includes missing value handling, scaling, and one-hot encoding.
- PCA for dimensionality reduction and visualization.

2. **Churn Prediction**:
- Supervised learning techniques (e.g., Logistic Regression, Random Forest) can be integrated for churn prediction.
- Important features include customer spending, campaign response rates, and frequency of purchases.

---

### **Answers to Key Questions**

#### **1. How would you handle imbalanced data if churned customers are fewer than active ones?**
- **Oversampling**: Use techniques like SMOTE (Synthetic Minority Oversampling Technique) to generate synthetic samples of the minority class.
- **Undersampling**: Randomly sample from the majority class to balance the dataset.
- **Class Weighting**: Modify the loss function of the model to penalize misclassifications of the minority class more heavily.
- **Evaluation Metrics**: Use metrics like F1-score, precision, recall, and ROC-AUC instead of accuracy to evaluate the model performance on imbalanced data.

#### **2. What features are the most important predictors of churn?**
- **Total Spending**: High spending may correlate with loyalty, while erratic spending patterns may indicate potential churn.
- **Campaign Response Rate**: Customers who do not respond to marketing campaigns are more likely to churn.
- **Number of Purchases**: Lower purchase frequency can be an early indicator of churn.
- **Income**: Disposable income impacts purchasing behavior, which may relate to churn risk.
- **Web Visits**: High web visits without purchases may indicate dissatisfaction or exploration of alternatives.

#### **3. How would you explain the model's predictions to a non-technical business team?**
- **Cluster Analysis**: Use visualizations (e.g., PCA plots) to show how customers are grouped based on similar characteristics.
- **Key Metrics**: Present insights such as average age, spending, and income per cluster in simple terms.
- **Churn Insights**: Explain churn probabilities using concrete examples, such as "Customers who spend less than $50 per month and do not respond to campaigns are 3x more likely to churn."
- **Business Impact**: Highlight how retaining customers in high-risk clusters can increase revenue and customer lifetime value.

#### **4. What steps would you take to deploy this model into production?**
1. **Model Packaging**:
- Save the trained model using a format like `pickle` or `joblib`.
- Create a robust pipeline for preprocessing and predictions.

2. **API Development**:
- Use **FastAPI** or **Flask** to expose the model as an API endpoint.
- Ensure secure and scalable API endpoints.

3. **Integration**:
- Embed the API in the existing CRM or marketing platforms.
- Ensure the application can handle real-time inputs and predictions.

4. **Monitoring**:
- Track model performance using dashboards (e.g., Grafana).
- Monitor drift in data distribution and retrain the model as needed.

5. **Feedback Loop**:
- Use feedback from business teams and customers to iteratively improve the model.

---

### **Folder Structure**
```
📂 Customer-Segmentation
├── 📄 customer segmentation.py
├── 📄 README.md
└── 📂 data
└── marketing_campaign.csv
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/udityamerit/customer-segmentation-and-churn-prediction-model

Awesome Lists containing this project

README