Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kfrural/customer-churn-prediction
Customer churn prediction using machine learning. The project follows CRISP-DM and KDD methodologies, including data preprocessing, feature engineering, modeling, and evaluation. It also features an interactive dashboard for visualizing results.
https://github.com/kfrural/customer-churn-prediction
crisp-dm data jupyter kdd python
Last synced: 14 days ago
JSON representation
Customer churn prediction using machine learning. The project follows CRISP-DM and KDD methodologies, including data preprocessing, feature engineering, modeling, and evaluation. It also features an interactive dashboard for visualizing results.
- Host: GitHub
- URL: https://github.com/kfrural/customer-churn-prediction
- Owner: kfrural
- License: mit
- Created: 2024-10-23T22:03:23.000Z (22 days ago)
- Default Branch: main
- Last Pushed: 2024-10-24T11:35:57.000Z (21 days ago)
- Last Synced: 2024-10-25T13:35:53.582Z (20 days ago)
- Topics: crisp-dm, data, jupyter, kdd, python
- Language: Jupyter Notebook
- Homepage:
- Size: 59.6 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Customer Churn Prediction
This project aims to predict customer churn for a telecom company using machine learning techniques. By leveraging two methodologies, **CRISP-DM** and **KDD**, we ensure a structured and insightful approach to the data mining process. The project includes data preprocessing, feature engineering, modeling, and evaluation phases, and is supported by a deployable dashboard for presenting the results.
## Project Overview
- **Objective**: Predict whether a customer will churn (leave the company) based on historical customer data.
- **Dataset**: Telecom customer churn data containing demographic information, customer behavior, and churn status.
- **Methodologies**:
- **CRISP-DM**: Cross Industry Standard Process for Data Mining.
- **KDD**: Knowledge Discovery in Databases.## Directory Structure
```plaintext
/customer-churn-prediction
│
├── /data
│ ├── telecom_churn_data.csv # Dataset for customer churn
│
├── /notebooks
│ ├── 01_data_understanding.ipynb # Data exploration and understanding
│ ├── 02_data_preprocessing.ipynb # Data cleaning and preprocessing
│ ├── 03_feature_engineering.ipynb # Feature selection and engineering
│ ├── 04_modeling.ipynb # Model training and tuning
│ ├── 05_evaluation.ipynb # Model evaluation and validation
│
├── /src
│ ├── data_preprocessing.py # Data preprocessing script
│ ├── feature_engineering.py # Feature engineering script
│ ├── modeling.py # Modeling script
│ ├── evaluation.py # Model evaluation script
│
├── /dashboard
│ ├── app.py # Dashboard for result visualization
│
├── /docs
│ ├── CRISP_DM_overview.md # Overview of CRISP-DM methodology
│ ├── KDD_overview.md # Overview of KDD methodology
│ ├── Project_Plan.md # Project plan and methodology documentation
│
├── README.md # Project overview and instructions
├── requirements.txt # List of Python dependencies
└── LICENSE # License for the project
```## Workflow
### 1. **Data Understanding**
The initial phase involves exploring the data to understand key attributes and relationships. The dataset includes features such as:- **Customer Demographics**: Gender, SeniorCitizen, tenure.
- **Service Information**: MonthlyCharges, total service usage.
- **Churn Status**: Whether the customer left the company or not.Exploration results are documented in `01_data_understanding.ipynb`.
### 2. **Data Preprocessing**
Cleaning and preparing the data for modeling. This includes:- Handling missing values.
- Encoding categorical variables.
- Normalizing or scaling numeric data.
Details in `02_data_preprocessing.ipynb` and `data_preprocessing.py`.### 3. **Feature Engineering**
New features are derived to improve model performance:- Interaction features (e.g., tenure * monthly charges).
- Aggregations or groupings based on customer type.More details in `03_feature_engineering.ipynb` and `feature_engineering.py`.
### 4. **Modeling**
Several machine learning models are tested, including:- Logistic Regression
- Random Forest
- XGBoostModel training, cross-validation, and hyperparameter tuning are performed to identify the best model.
Check `04_modeling.ipynb` and `modeling.py` for implementation.
### 5. **Evaluation**
The best-performing models are evaluated using:- Accuracy, Precision, Recall, and F1-Score.
- Confusion Matrix.
- ROC and AUC metrics.Results are recorded in `05_evaluation.ipynb` and `evaluation.py`.
### 6. **Dashboard**
A simple interactive dashboard is developed using `Plotly Dash` to visualize the model's performance, important features, and predictions.Check the `dashboard/app.py` for the dashboard setup.
## Technologies Used
- **Python**: Core programming language.
- **Pandas**: Data manipulation and analysis.
- **Scikit-learn**: Machine learning models and utilities.
- **XGBoost**: Advanced gradient boosting models.
- **Plotly Dash**: Dashboard development.
- **Jupyter Notebooks**: For interactive data exploration and model building.## Setup Instructions
1. Clone the repository:
```bash
git clone https://github.com/your-username/customer-churn-prediction.git
cd customer-churn-prediction
```2. Create a virtual environment and install dependencies:
```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
```3. Download the dataset and place it in the `/data` folder (or use the example `telecom_churn_data.csv` provided).
4. Run the notebooks to explore the data and train the models:
```bash
jupyter notebook
```5. Run the dashboard for visualizing the results:
```bash
python dashboard/app.py
```## Results
- **Best Model**: XGBoost
- **Key Metrics**:
- Accuracy: 85%
- Precision: 82%
- Recall: 78%
- F1-Score: 80%
- **Feature Importance**: Tenure, Monthly Charges, Contract Type were found to be the most predictive features.## Future Work
- Improve model interpretability using SHAP or LIME.
- Extend dashboard for real-time monitoring of churn predictions.
- Experiment with deep learning models for further improvements.