Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/stephen-adwini-badu/10.-bank-churn-project

This project aims to predict customer churn in a banking environment using machine learning models. The goal is to identify patterns in customer behavior that lead to churn and use predictive analytics to classify customers as churners or non-churners.
https://github.com/stephen-adwini-badu/10.-bank-churn-project

churn-prediction data-science jupyter-notebook machine-learning predictive-modeling

Last synced: about 1 month ago
JSON representation

Host: GitHub
URL: https://github.com/stephen-adwini-badu/10.-bank-churn-project
Owner: Stephen-Adwini-Badu
Created: 2025-01-15T19:21:56.000Z (about 1 month ago)
Default Branch: main
Last Pushed: 2025-01-15T19:24:00.000Z (about 1 month ago)
Last Synced: 2025-01-15T21:44:26.031Z (about 1 month ago)
Topics: churn-prediction, data-science, jupyter-notebook, machine-learning, predictive-modeling
Language: Jupyter Notebook
Homepage:
Size: 6.87 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Bank Churn Prediction Project

## Project Objective
This project aims to predict customer churn in a banking environment using machine learning models. The goal is to identify patterns in customer behavior that lead to churn and use predictive analytics to classify customers as churners or non-churners.

## Dataset Description
Two datasets are utilized:
- **Training Dataset:** Contains labeled data for model training.
- **Testing Dataset:** Unlabeled data for evaluating model performance.

## Methodology
### 1. Data Exploration and Preprocessing
- **Data Loading:** Importing datasets for analysis.
- **Data Cleaning:** Handling missing values and duplicate records.
- **Feature Transformation:** Encoding categorical features for machine learning.

### 2. Exploratory Data Analysis (EDA)
- **Statistical Analysis:** Summary statistics to understand feature distributions.
- **Visualizations:** Using Seaborn and Matplotlib to explore trends and relationships in the data.

### 3. Feature Engineering
- Selection of relevant features to improve model accuracy.
- Transformation of categorical variables using encoding techniques.

### 4. Model Training and Evaluation
- Splitting data into training and validation subsets.
- Training various models, including:
- Random Forest Classifier
- Gradient Boosting Classifier
- XGBoost Classifier
- Evaluation metrics include:
- Accuracy
- Precision, Recall, F1-Score
- ROC-AUC Score
- Confusion Matrix Visualization

## Machine Learning Models
- **Random Forest Classifier:** Ensemble method that builds multiple decision trees.
- **Gradient Boosting Classifier:** Boosting technique for sequential model improvement.
- **XGBoost Classifier:** Optimized gradient boosting implementation.

## Results and Metrics
- **Model Performance:** Metrics for evaluating the effectiveness of predictions, including confusion matrices.

![Image](https://github.com/user-attachments/assets/ca74cb4c-7afe-4a05-a012-b5fe5ab0da26)

- **Insights:** Identification of key factors contributing to customer churn.
- Key Factors: **Age**, **Active Membership** and **Number of Products**

![Image](https://github.com/user-attachments/assets/9506dfc1-3f45-4770-b3f1-169628f452bf)