https://github.com/hq969/customer-churn-prediction-with-hyperparameter-optimization-and-model-deployment
A complete end-to-end machine learning project that predicts customer churn using the Telco dataset. It includes data preprocessing, exploratory data analysis (EDA), model training with Random Forest, hyperparameter tuning, evaluation, and deployment via a Flask API.
https://github.com/hq969/customer-churn-prediction-with-hyperparameter-optimization-and-model-deployment
flask numpy pandas python scikit-learn xgboost
Last synced: 3 months ago
JSON representation
A complete end-to-end machine learning project that predicts customer churn using the Telco dataset. It includes data preprocessing, exploratory data analysis (EDA), model training with Random Forest, hyperparameter tuning, evaluation, and deployment via a Flask API.
- Host: GitHub
- URL: https://github.com/hq969/customer-churn-prediction-with-hyperparameter-optimization-and-model-deployment
- Owner: hq969
- License: mit
- Created: 2025-07-03T08:24:10.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2025-07-03T09:07:15.000Z (12 months ago)
- Last Synced: 2026-01-03T15:30:44.219Z (6 months ago)
- Topics: flask, numpy, pandas, python, scikit-learn, xgboost
- Language: Jupyter Notebook
- Homepage:
- Size: 399 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
### ๐ Customer Churn Prediction ML Pipeline
This project provides a **production-ready, end-to-end machine learning pipeline** for predicting customer churn using classification algorithms and modern data science tools. It includes **data preprocessing**, **EDA**, **model training**, **hyperparameter tuning**, **evaluation**, and **deployment via Flask API**.
---
## ๐ Project Structure
```
Customer-Churn-Prediction-with-Hyperparameter-Optimization-and-Model-Deployment/
โ
โโโ data/ # Raw dataset
โ โโโ churn_data.csv
โ
โโโ src/ # Core Python scripts
โ โโโ preprocessing.py # Data loading and preprocessing
โ โโโ eda_visualization.py # Data visualization functions
โ โโโ model_training.py # Base ML training script
โ โโโ hyperparameter_tuning.py # GridSearchCV optimization
โ โโโ model_evaluation.py # Evaluation metrics and reports
โ โโโ utils.py # Optional helper functions
โ
โโโ app/
โ โโโ app.py # Flask REST API for predictions
โ
โโโ notebooks/
โ โโโ churn_eda.ipynb # Jupyter Notebook for EDA
โ
โโโ models/
โ โโโ churn_model.pkl # Trained model saved with joblib
โ
โโโ requirements.txt # Python dependencies
โโโ README.md # You're here
````
---
## ๐ Objective
To develop a machine learning pipeline capable of predicting customer churn with high accuracy. The pipeline supports:
- Feature engineering
- Visualization
- Model selection
- Hyperparameter tuning
- API deployment for real-world inference
---
## ๐ผ Use Case
**Industry Example:** Telecom or subscription-based services
**Business Value:** Helps reduce churn by identifying at-risk customers and enabling retention strategies like offers, feedback, and targeted communication.
---
## ๐ ๏ธ Tech Stack
| Category | Tools Used |
|------------------------|-------------------------------------|
| Programming Language | Python |
| Data Manipulation | pandas, numpy |
| Visualization | seaborn, matplotlib |
| ML Algorithms | scikit-learn, XGBoost |
| Hyperparameter Tuning | GridSearchCV |
| Model Serialization | joblib |
| Deployment | Flask |
| Notebook Environment | Jupyter Notebook |
---
## ๐ Model Training
Currently uses **Random Forest** and **XGBoost** as base classifiers. The training script can be extended to include other models.
๐ `src/model_training.py` trains the model and saves it to `models/churn_model.pkl`.
---
## ๐งช Example Prediction API
Run the API:
```bash
cd app/
python app.py
````
Test using `curl` or Postman:
```bash
curl -X POST http://127.0.0.1:5000/predict -H "Content-Type: application/json" \
-d '{"features": [0.0, 1.0, 45.0, 5000.0, 60.0, 1.0, 0.0]}'
```
Response:
```json
{
"churn_prediction": 1
}
```
---
## ๐ฆ Installation
### 1. Clone the Repository
```bash
git clone https://github.com/yourusername/churn_ml_pipeline.git
cd churn_ml_pipeline
```
### 2. Create Virtual Environment
```bash
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
```
### 3. Install Requirements
```bash
pip install -r requirements.txt
```
### 4. Run EDA Notebook
```bash
jupyter notebook notebooks/churn_eda.ipynb
```
---
## ๐ Dataset
You can use the **Telco Customer Churn dataset** from Kaggle or IBM:
๐ [Download Here (GitHub)](https://raw.githubusercontent.com/IBM/telco-customer-churn-on-icp4d/master/data/Telco-Customer-Churn.csv)
Save as:
```bash
data/churn_data.csv
```
---
## ๐ง Model Insights
* Handles numerical + categorical data
* Supports hyperparameter tuning
* Scalable for more complex models (e.g., neural networks)
* Modular structure for experimentation
---
## ๐จโ๐ป Author
**Harsh Sonkar**
Machine Learning Engineer | Data Scientist
[LinkedIn](https://www.linkedin.com/in/harsh-sonkar/) | [GitHub](https://github.com/harsh-sonkar)
---
## ๐ค Contributions
Pull requests are welcome! Please open an issue first to discuss what you would like to change.
---
## ๐ License
This project is licensed under the MIT License.