Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/aravinda-1402/customer-churn-prediction
End-to-end project predicting customer churn using ML, Apache Airflow, Docker, Flask and AWS services.
https://github.com/aravinda-1402/customer-churn-prediction
airflow apache aws aws-s3 classification docker docker-container ecr-repositories flask machine-learning sagemaker
Last synced: 1 day ago
JSON representation
End-to-end project predicting customer churn using ML, Apache Airflow, Docker, Flask and AWS services.
- Host: GitHub
- URL: https://github.com/aravinda-1402/customer-churn-prediction
- Owner: aravinda-1402
- Created: 2024-08-10T00:48:39.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2024-08-11T15:27:16.000Z (about 2 months ago)
- Last Synced: 2024-09-26T02:04:52.552Z (1 day ago)
- Topics: airflow, apache, aws, aws-s3, classification, docker, docker-container, ecr-repositories, flask, machine-learning, sagemaker
- Language: Python
- Homepage:
- Size: 361 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Customer Churn Prediction 🚀
This project leverages machine learning to predict customer churn, a critical task for businesses aiming to retain their customers. Businesses can take proactive steps to improve customer satisfaction and reduce churn rates by predicting which customers are likely to leave.
## 🌟 Project Overview
The project uses a logistic regression model to predict customer churn based on various features extracted from a dataset. The model is trained using the data provided in the `WA_Fn-UseC_-Telco-Customer-Churn.csv` file, including tenure, monthly charges, total charges, and various other customer attributes.
### Key Features:
- **End-to-end Pipeline**: The project includes an end-to-end pipeline managed with Apache Airflow, from data extraction and preprocessing to model training and deployment.
- **AWS Integration**: The project leverages AWS services such as S3 and SageMaker to perform model training in a scalable and efficient environment.
- **Dockerized Environment**: The project environment is fully containerized using Docker, ensuring easy setup and consistent performance across different environments.
- **Flask Application**: A Flask web application is included to serve predictions, making the model accessible through a simple web interface.
- **REST API**: The project provides a REST API endpoint for making predictions programmatically, enabling easy integration with other applications.## 🛠 Pipeline Visualization
The pipeline follows a simple yet effective workflow: it starts by downloading the dataset, then moves on to training the model, and finally, a Flask application is deployed to serve predictions. Each task is managed and orchestrated by Apache Airflow, ensuring smooth and automated execution.
## 🔧 Technologies & Tools
- **Python**: The core language used for data processing and model building.
- **Pandas & Scikit-Learn**: Essential libraries for data manipulation and machine learning.
- **Boto3**: AWS SDK for Python, used for interaction with AWS services.
- **Apache Airflow**: Workflow management platform for orchestrating the pipeline.
- **AWS S3 & SageMaker**: Used for storing data and running machine learning jobs in the cloud.
- **Docker**: Containerization of the environment to ensure consistent execution.
- **Flask**: A lightweight web framework for serving predictions.
- **GitHub**: Version control and collaboration platform.## 📁 Project Structure
```bash
Customer Churn Prediction Project/
├── Dockerfile # Docker configuration for setting up the environment
├── airflow/
│ ├── dags/
│ │ └── churn_prediction_dag.py # Airflow DAG definition
│ └── docker-compose.yaml # Configuration for running Airflow and other services
├── flask_app/ # Flask application for serving predictions
│ ├── static/
│ │ └── styles.css # CSS for the web interface
│ ├── templates/
│ │ └── index.html # HTML template for the web interface
│ ├── app.py # Flask application script
│ ├── Dockerfile # Docker configuration for the Flask app
│ └── requirements.txt # Python dependencies for the Flask app
├── train.py # Model training script
├── WA_Fn-UseC_-Telco-Customer-Churn.csv # Dataset
└── README.md # Project documentation (you're reading it now!)
# Project documentation (you're reading it now!)
```
## 🚀 Getting Started### Prerequisites
Ensure the following are installed on the local machine:
- Docker & Docker Compose
- Git### Installation
**Clone the repository:**
```bash
git clone https://github.com/aravinda-1402/Customer-Churn-Prediction.git
cd Customer-Churn-Prediction/airflow
```
Build and run the Docker containers:
```
docker-compose up -d
```**Access Apache Airflow:**
Once the containers are up and running, access the Airflow web UI by navigating to http://localhost:8080 in the browser.
**Running the Pipeline**
1. Trigger the DAG: In the Airflow web UI, navigate to the churn_prediction_pipeline DAG and trigger it manually to start the pipeline.2. Monitor the DAG: Monitor the progress and logs of each task in the pipeline through the Airflow interface.
## 🧠 Training the Model
The model training is handled within the `train_model` task in the Airflow DAG. This task:* Downloads the dataset from S3.
* Preprocesses the data (handling missing values, encoding categorical variables, etc.).
* Trains a logistic regression model.
* Uploads the trained model back to S3 for future predictions.## 🌐 Deploying the Flask App
The Flask app is deployed as a Docker container and serves the predictions based on the trained model:1. Run the Flask App: The app can be run manually using Docker or automatically as part of the Airflow pipeline.
2. Access the Web Interface: Navigate to http://localhost:5000 in the browser to access the prediction interface.
## 🌐 Using the REST API
In addition to the web interface, the project also exposes a REST API for making predictions programmatically. This API can be accessed at the following endpoint:
**Endpoint**: `/api/predict`
**Method**: `POST`
**Content-Type**: `application/json`
**Request Body Example**:
```json
{
"tenure": 12,
"MonthlyCharges": 70.5,
"TotalCharges": 800,
"gender": "Male",
"SeniorCitizen": "No",
"Partner": "Yes",
"Dependents": "No",
"PhoneService": "Yes",
"MultipleLines": "No",
"InternetService": "Fiber optic",
"OnlineSecurity": "No",
"OnlineBackup": "Yes",
"DeviceProtection": "Yes",
"TechSupport": "No",
"StreamingTV": "No",
"StreamingMovies": "No",
"Contract": "Month-to-month",
"PaperlessBilling": "Yes",
"PaymentMethod": "Electronic check"
}
```
**Response Example**:
```
{
"prediction": 1 // 1 indicates the customer is likely to churn, 0 indicates they are not
}
```**Accessing the API**:
You can access the API by making a POST request to `http://localhost:5000/api/predict` with the appropriate JSON payload.## 📊 Results & Evaluation
Once the model is trained, it will be stored in the `model-output/` directory in the specified S3 bucket. The model's performance can be evaluated based on accuracy, precision, recall, and F1-score using standard classification metrics in scikit-learn.## 💡 Future Work
* Model Improvement: Experiment with other models like Random Forest, Gradient Boosting, or even deep learning models to improve performance.
* Hyperparameter Tuning: Implement hyperparameter tuning using GridSearchCV or RandomizedSearchCV.