An open API service indexing awesome lists of open source software.

https://github.com/tszon/end-to-end_ds_ml_project

I built an end-to-end customer churn segregation and prediction project.
https://github.com/tszon/end-to-end_ds_ml_project

containerisation data-science docker explianable-ai exploratory-data-analysis feature-engineering hdbscan-clustering kmeans-clustering machine-learning mlflow preprocessing-data scikit-learn shap statistical-test statistical-tests streamlit supervised-learning visualisation vscode

Last synced: about 2 months ago
JSON representation

I built an end-to-end customer churn segregation and prediction project.

Awesome Lists containing this project

README

          

# ๐Ÿ“Š End-to-End ML Deployment: Telco Customer Churn Project

## ๐ŸŒ Live Demo

Click ๐Ÿ‘‰ [![Open in Streamlit](https://img.shields.io/badge/Streamlit-App-red?logo=streamlit)](https://tszontseng-telco-end2end-customer-churn-project.streamlit.app/)

---

## ๐Ÿ“– Project Overview

Customer churn is a major challenge for telecom companies โ€” retaining customers is often more cost-effective than acquiring new ones.
This project builds an **end-to-end machine learning pipeline** to predict churn, explain drivers of churn, and segment customers into actionable groups for better retention strategies.

The project includes:

* **EDA** โ†’ Explore churn patterns, tenure, contracts, charges.
* **Customer Segmentation** โ†’ KMeans (baseline) vs HDBSCAN (tuned).
* **Churn Prediction** โ†’ Logistic Regression baseline vs advanced ensemble models (Random Forest, XGBoost, Voting Classifier).
* **Explainability** โ†’ SHAP summary & waterfall plots.
* **Interactive App** โ†’ Built with **Streamlit**, deployed on Streamlit Cloud.

---

## ๐Ÿ“‚ Dataset

Dataset: [WA_Fn-UseC_-Telco-Customer-Churn.csv](https://www.kaggle.com/blastchar/telco-customer-churn)

* **Target**: `"Churn"` (Yes/No)
* **Features include**:

* **Demographics** โ†’ Gender, Senior Citizen, Dependents, Partner
* **Services** โ†’ Phone, Internet, Tech Support, Streaming, Security
* **Account Info** โ†’ Tenure, Contract, Billing, Payment Method
* **Charges** โ†’ Monthly & Total Charges

---

## ๐Ÿงช Methods & Models

### ๐Ÿ”Ž Exploratory Data Analysis (EDA)

* Customers with **fiber optic internet churn the most** (pricing/service quality issues).
* **DSL customers churn less**, possibly due to stable pricing or loyalty.
* **High churn in first 5 months** โ†’ critical onboarding phase.
* Long-tenure customers (>24 months) show **significantly lower churn rates**.

### ๐Ÿ‘ฅ Customer Segmentation (Unsupervised Learning)

* **Cluster 0 โ€” Budget Loyalists** โ†’ Minimal services, mailed check payments, stable.
* **Cluster 1 โ€” At-Risk Premiums** โ†’ Fiber optic, month-to-month, electronic check, highest churn risk.
* **Cluster 2 โ€” Balanced Mainstream** โ†’ Moderate DSL usage, mixed services, mid-spenders.
* **Cluster -1 โ€” Drifters** โ†’ DSL, no phone, low commitment.

### ๐Ÿ“Š Churn Prediction Models

* Logistic Regression (baseline)
* Random Forest (ensemble)
* XGBoost (boosted trees)
* Voting Classifier (combined)

### ๐Ÿ” Explainability (SHAP)

* Feature importance ranking.
* SHAP summary plots + waterfall plots for individual predictions.

---

## ๐Ÿš€ Deployment

* **Streamlit App** for interactive visualization and prediction.
* **Dockerized** for reproducibility.
* **Deployed on Streamlit Cloud** with a public link.

---

## โš™๏ธ Installation & Usage

### 1. Clone Repo

```bash
git clone https://github.com//Customer_Churn_Prediction.git
cd Customer_Churn_Prediction
```

### 2. Install Requirements

```bash
pip install -r requirements.txt
```

### 3. Run Locally

```bash
streamlit run scripts/app.py
```

App runs at: [http://localhost:8501](http://localhost:8501)

### 4. Run with Docker

```bash
docker build -t churn-app .
docker run -p 8501:8501 churn-app
```

---

## ๐Ÿ“ฆ Project Structure

```
Customer_Churn_Prediction/
โ”‚
โ”œโ”€โ”€ data/ # feature store JSON (not raw data)
โ”œโ”€โ”€ models/ # saved ML models (.joblib)
โ”œโ”€โ”€ reports_app/ # plots & visualizations
โ”œโ”€โ”€ scripts/ # Streamlit app (app.py) & utilities
โ”œโ”€โ”€ src/ # preprocessing, feature engineering, utils
โ”œโ”€โ”€ config.json # config settings
โ”œโ”€โ”€ requirements.txt # dependencies
โ”œโ”€โ”€ Dockerfile # container setup
โ””โ”€โ”€ README.md # this file
```

---

## ๐Ÿ› ๏ธ Tech Stack

* **Python**: `pandas`, `numpy`, `scikit-learn`, `xgboost`, `shap`, `hdbscan`, `umap`
* **Visualization**: `matplotlib`, `seaborn`, `streamlit`
* **MLOps Tools**: `Docker`, `GitHub`, `MLflow` (Experimental Tracking)
* **Deployment**: `Streamlit Cloud`

---

## ๐Ÿ“Œ Next Steps

* Extend segmentation with deep embeddings.
* Add hyperparameter search with Optuna.
* Deploy with a custom domain using Render or Railway.

---

## ๐Ÿ‘ค Author

Developed by **[Tszon Tseng](https://github.com/Tszontseng)**

* ๐Ÿ’ผ Passionate about Data Science & AI
* ๐Ÿš€ Building end-to-end ML pipelines
* ๐ŸŒ [LinkedIn Profile](https://www.linkedin.com/in/tszon-tseng-a381aa297/)

---

โœจ With this app, telecom providers can **predict churn, understand why customers leave, and design better retention strategies.**