An open API service indexing awesome lists of open source software.

https://github.com/hq969/customer-churn-prediction-with-hyperparameter-optimization-and-model-deployment

A complete end-to-end machine learning project that predicts customer churn using the Telco dataset. It includes data preprocessing, exploratory data analysis (EDA), model training with Random Forest, hyperparameter tuning, evaluation, and deployment via a Flask API.
https://github.com/hq969/customer-churn-prediction-with-hyperparameter-optimization-and-model-deployment

flask numpy pandas python scikit-learn xgboost

Last synced: 3 months ago
JSON representation

A complete end-to-end machine learning project that predicts customer churn using the Telco dataset. It includes data preprocessing, exploratory data analysis (EDA), model training with Random Forest, hyperparameter tuning, evaluation, and deployment via a Flask API.

Awesome Lists containing this project

README

          

### ๐Ÿ” Customer Churn Prediction ML Pipeline

This project provides a **production-ready, end-to-end machine learning pipeline** for predicting customer churn using classification algorithms and modern data science tools. It includes **data preprocessing**, **EDA**, **model training**, **hyperparameter tuning**, **evaluation**, and **deployment via Flask API**.

---

## ๐Ÿ“‚ Project Structure

```

Customer-Churn-Prediction-with-Hyperparameter-Optimization-and-Model-Deployment/
โ”‚
โ”œโ”€โ”€ data/ # Raw dataset
โ”‚ โ””โ”€โ”€ churn_data.csv
โ”‚
โ”œโ”€โ”€ src/ # Core Python scripts
โ”‚ โ”œโ”€โ”€ preprocessing.py # Data loading and preprocessing
โ”‚ โ”œโ”€โ”€ eda_visualization.py # Data visualization functions
โ”‚ โ”œโ”€โ”€ model_training.py # Base ML training script
โ”‚ โ”œโ”€โ”€ hyperparameter_tuning.py # GridSearchCV optimization
โ”‚ โ”œโ”€โ”€ model_evaluation.py # Evaluation metrics and reports
โ”‚ โ””โ”€โ”€ utils.py # Optional helper functions
โ”‚
โ”œโ”€โ”€ app/
โ”‚ โ””โ”€โ”€ app.py # Flask REST API for predictions
โ”‚
โ”œโ”€โ”€ notebooks/
โ”‚ โ””โ”€โ”€ churn_eda.ipynb # Jupyter Notebook for EDA
โ”‚
โ”œโ”€โ”€ models/
โ”‚ โ””โ”€โ”€ churn_model.pkl # Trained model saved with joblib
โ”‚
โ”œโ”€โ”€ requirements.txt # Python dependencies
โ””โ”€โ”€ README.md # You're here

````

---

## ๐Ÿ“Œ Objective

To develop a machine learning pipeline capable of predicting customer churn with high accuracy. The pipeline supports:
- Feature engineering
- Visualization
- Model selection
- Hyperparameter tuning
- API deployment for real-world inference

---

## ๐Ÿ’ผ Use Case

**Industry Example:** Telecom or subscription-based services

**Business Value:** Helps reduce churn by identifying at-risk customers and enabling retention strategies like offers, feedback, and targeted communication.

---

## ๐Ÿ› ๏ธ Tech Stack

| Category | Tools Used |
|------------------------|-------------------------------------|
| Programming Language | Python |
| Data Manipulation | pandas, numpy |
| Visualization | seaborn, matplotlib |
| ML Algorithms | scikit-learn, XGBoost |
| Hyperparameter Tuning | GridSearchCV |
| Model Serialization | joblib |
| Deployment | Flask |
| Notebook Environment | Jupyter Notebook |

---

## ๐Ÿ“ˆ Model Training

Currently uses **Random Forest** and **XGBoost** as base classifiers. The training script can be extended to include other models.

๐Ÿ“‚ `src/model_training.py` trains the model and saves it to `models/churn_model.pkl`.

---

## ๐Ÿงช Example Prediction API

Run the API:
```bash
cd app/
python app.py
````

Test using `curl` or Postman:

```bash
curl -X POST http://127.0.0.1:5000/predict -H "Content-Type: application/json" \
-d '{"features": [0.0, 1.0, 45.0, 5000.0, 60.0, 1.0, 0.0]}'
```

Response:

```json
{
"churn_prediction": 1
}
```

---

## ๐Ÿ“ฆ Installation

### 1. Clone the Repository

```bash
git clone https://github.com/yourusername/churn_ml_pipeline.git
cd churn_ml_pipeline
```

### 2. Create Virtual Environment

```bash
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
```

### 3. Install Requirements

```bash
pip install -r requirements.txt
```

### 4. Run EDA Notebook

```bash
jupyter notebook notebooks/churn_eda.ipynb
```

---

## ๐Ÿ“Š Dataset

You can use the **Telco Customer Churn dataset** from Kaggle or IBM:

๐Ÿ”— [Download Here (GitHub)](https://raw.githubusercontent.com/IBM/telco-customer-churn-on-icp4d/master/data/Telco-Customer-Churn.csv)

Save as:

```bash
data/churn_data.csv
```

---

## ๐Ÿง  Model Insights

* Handles numerical + categorical data
* Supports hyperparameter tuning
* Scalable for more complex models (e.g., neural networks)
* Modular structure for experimentation

---

## ๐Ÿ‘จโ€๐Ÿ’ป Author

**Harsh Sonkar**
Machine Learning Engineer | Data Scientist
[LinkedIn](https://www.linkedin.com/in/harsh-sonkar/) | [GitHub](https://github.com/harsh-sonkar)

---

## ๐Ÿค Contributions

Pull requests are welcome! Please open an issue first to discuss what you would like to change.

---

## ๐Ÿ“œ License

This project is licensed under the MIT License.