https://github.com/jatin-mehra119/churn_modeling

This repository is dedicated to predicting customer churn using machine learning techniques. It includes comprehensive scripts for data preprocessing, model training, and evaluation, along with detailed visualizations and insights.
https://github.com/jatin-mehra119/churn_modeling

classification-model datavisualization pandas scikit-learn

Last synced: about 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/jatin-mehra119/churn_modeling
Owner: Jatin-Mehra119
Created: 2024-06-21T17:54:35.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2024-06-26T16:18:48.000Z (about 2 years ago)
Last Synced: 2025-03-10T18:54:42.167Z (over 1 year ago)
Topics: classification-model, datavisualization, pandas, scikit-learn
Language: Python
Homepage:
Size: 35.1 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Churn Modeling

This repository contains scripts for building and evaluating a model to predict customer churn. Customer churn refers to the loss of clients or customers. The project details the steps involved in data preprocessing, exploratory data analysis (EDA), feature engineering, model building, and evaluation.

# Notebook Link-
https://drive.google.com/file/d/1MrB7uxLFmWTElEld3xBp4pO1tReAGHr-/view?usp=sharing

## Table of Contents

1. [Installation](#installation)
2. [Data Description](#data-description)
3. [Script Overview](#script-overview)
4. [Modeling Process](#modeling-process)
5. [Results](#results)
6. [Conclusion](#conclusion)
7. [References](#references)

## Installation

To run the scripts locally, ensure you have the following dependencies installed:

- Python 3.x
- Pandas
- NumPy
- Scikit-learn
- Matplotlib
- Seaborn

You can install the required packages using the following command:

pip install pandas numpy scikit-learn matplotlib seaborn

## Data Description

The dataset contains customer-related information. Features include:

- **CustomerID**: Unique identifier for each customer.
- **Geography**: Customer's geographical location.
- **Gender**: Customer's gender.
- **Age**: Customer's age.
- **Tenure**: Number of years the customer has been with the company.
- **Balance**: Account balance of the customer.
- **ProductsNumber**: Number of products the customer has with the company.
- **HasCrCard**: Whether the customer has a credit card.
- **IsActiveMember**: Whether the customer is an active member.
- **EstimatedSalary**: Estimated salary of the customer.
- **Exited**: Whether the customer has churned (target variable).

## Script Overview

The project consists of the following scripts:

1. **LoadData.py**:
- Script for loading and preparing the dataset.
2. **Preprocessor.py**:
- Script for preprocessing the data.
3. **Trainer.py**:
- Script for training various machine learning models.
4. **Main.py**:
- Main script that orchestrates the workflow, including loading data, preprocessing, training models, and making predictions.

## Modeling Process

The modeling process includes:

- **Data Splitting**: Dividing the data into training and testing sets to evaluate model performance.
- **Model Training**: Training various machine learning algorithms to find the best model.
- **Hyperparameter Tuning**: Using techniques like Random Search to optimize model parameters.
- **Model Evaluation**: Assessing models using cross-validation and performance metrics.

## Results

### Logistic Regression

- **Accuracy**: 0.80
- **Precision (Positive Class)**: 0.58
- **Recall (Positive Class)**: 0.18
- **F1-Score (Positive Class)**: 0.28
- **ROC-AUC**: 0.77

### Random Forest Classifier

- **Accuracy**: 0.85
- **Precision (Positive Class)**: 0.72
- **Recall (Positive Class)**: 0.47
- **F1-Score (Positive Class)**: 0.57
- **ROC-AUC**: 0.85

### Gradient Boosting Classifier

- **Accuracy**: 0.87
- **Precision (Positive Class)**: 0.78
- **Recall (Positive Class)**: 0.49
- **F1-Score (Positive Class)**: 0.60
- **ROC-AUC**: 0.87

### AdaBoost Classifier

- **Accuracy**: 0.85
- **Precision (Positive Class)**: 0.71
- **Recall (Positive Class)**: 0.47
- **F1-Score (Positive Class)**: 0.57
- **ROC-AUC**: 0.85

### Support Vector Machine (SVM)

- **Accuracy**: 0.85
- **Precision (Positive Class)**: 0.79
- **Recall (Positive Class)**: 0.42
- **F1-Score (Positive Class)**: 0.55
- **ROC-AUC**: 0.83

### K-Nearest Neighbors Classifier

- **Accuracy**: 0.82
- **Precision (Positive Class)**: 0.60
- **Recall (Positive Class)**: 0.44
- **F1-Score (Positive Class)**: 0.51
- **ROC-AUC**: 0.78

### Extra Trees Classifier

- **Accuracy**: 0.86
- **Precision (Positive Class)**: 0.75
- **Recall (Positive Class)**: 0.46
- **F1-Score (Positive Class)**: 0.57
- **ROC-AUC**: 0.85

### MLP Classifier (Neural Networks)

- **Accuracy**: 0.86
- **Precision (Positive Class)**: 0.78
- **Recall (Positive Class)**: 0.45
- **F1-Score (Positive Class)**: 0.58
- **ROC-AUC**: 0.86

## Conclusion

The **Gradient Boosting Classifier** is the most promising model with 87% accuracy. Random Forest, AdaBoost, SVM, Extra Trees, and MLP (neural network) also perform well, scoring around 85% to 86% accuracy. Logistic Regression and K-Nearest Neighbors Classifier score around 80% to 82% accuracy.

## References

- [Scikit-learn Documentation](https://scikit-learn.org/stable/supervised_learning.html)
- [Seaborn Documentation](https://seaborn.pydata.org/tutorial.html)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jatin-mehra119/churn_modeling

Awesome Lists containing this project

README