https://github.com/alessandrobasigli/telco-customer-churn-prediction-ibm-dataset
This project explores customer churn trends for a company in California using an IBM dataset. Built in a Jupyter Notebook, it employs pandas, NumPy, matplotlib, seaborn, plotly, and scipy to clean, analyze, and visualize data. SKlearn predictive model was trained using three main algorithms Decision Tree, Naive Bayes, and Random Forest
https://github.com/alessandrobasigli/telco-customer-churn-prediction-ibm-dataset
churn-prediction-models customer-churn-prediction decision-tree ibm-dataset jupyter-notebook matplotlib naive-bayes numpy pandas plotly predictive-modeling random-forest scipy seaborn
Last synced: 25 days ago
JSON representation
This project explores customer churn trends for a company in California using an IBM dataset. Built in a Jupyter Notebook, it employs pandas, NumPy, matplotlib, seaborn, plotly, and scipy to clean, analyze, and visualize data. SKlearn predictive model was trained using three main algorithms Decision Tree, Naive Bayes, and Random Forest
- Host: GitHub
- URL: https://github.com/alessandrobasigli/telco-customer-churn-prediction-ibm-dataset
- Owner: alessandrobasigli
- License: mit
- Created: 2025-04-10T12:09:29.000Z (26 days ago)
- Default Branch: main
- Last Pushed: 2025-04-10T22:35:06.000Z (25 days ago)
- Last Synced: 2025-04-10T22:48:43.626Z (25 days ago)
- Topics: churn-prediction-models, customer-churn-prediction, decision-tree, ibm-dataset, jupyter-notebook, matplotlib, naive-bayes, numpy, pandas, plotly, predictive-modeling, random-forest, scipy, seaborn
- Language: Jupyter Notebook
- Size: 3.09 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# 📊 Telco Customer Churn Prediction 📊
Welcome to the **Telco Customer Churn Prediction** project! This repository explores customer churn trends for a telecommunications company in California using an IBM dataset. The project leverages data analysis and machine learning techniques to predict customer churn effectively.
[](https://github.com/alessandrobasigli/Telco-Customer-Churn-Prediction-IBM-Dataset/releases)
## Table of Contents
- [Project Overview](#project-overview)
- [Dataset](#dataset)
- [Technologies Used](#technologies-used)
- [Getting Started](#getting-started)
- [Analysis and Visualization](#analysis-and-visualization)
- [Predictive Modeling](#predictive-modeling)
- [Results](#results)
- [Contributing](#contributing)
- [License](#license)
- [Contact](#contact)## Project Overview
Customer churn refers to the loss of clients or customers. Understanding the reasons behind churn can help companies develop strategies to retain customers. This project focuses on analyzing customer data to identify patterns and predict churn.
The project is built using a Jupyter Notebook, making it easy to follow along with the analysis. It includes data cleaning, analysis, and visualization steps to provide insights into customer behavior. Additionally, it employs various machine learning algorithms to build predictive models.
## Dataset
The dataset used in this project is sourced from IBM. It contains information about customers, including demographics, account information, and service usage. The dataset is rich in features that help in understanding customer behavior and predicting churn.
You can download the dataset from the IBM website or access it through the repository.
## Technologies Used
This project utilizes the following technologies:
- **Python**: The primary programming language for data analysis and modeling.
- **Jupyter Notebook**: For interactive data exploration and visualization.
- **Pandas**: For data manipulation and analysis.
- **NumPy**: For numerical computations.
- **Matplotlib**: For static data visualization.
- **Seaborn**: For enhanced data visualization.
- **Plotly**: For interactive visualizations.
- **SciPy**: For scientific and technical computing.
- **Scikit-learn**: For machine learning algorithms.## Getting Started
To get started with this project, follow these steps:
1. **Clone the Repository**:
```bash
git clone https://github.com/alessandrobasigli/Telco-Customer-Churn-Prediction-IBM-Dataset.git
```2. **Navigate to the Project Directory**:
```bash
cd Telco-Customer-Churn-Prediction-IBM-Dataset
```3. **Install Required Packages**:
Make sure you have Python installed. Then, install the required packages using pip:
```bash
pip install -r requirements.txt
```4. **Open the Jupyter Notebook**:
Launch Jupyter Notebook:
```bash
jupyter notebook
```5. **Run the Notebook**:
Open the notebook file and run the cells to explore the analysis and models.## Analysis and Visualization
The analysis section includes data cleaning, exploration, and visualization. Key steps include:
- **Data Cleaning**: Handling missing values, correcting data types, and removing duplicates.
- **Exploratory Data Analysis (EDA)**: Analyzing customer demographics, account information, and service usage to identify trends.
- **Visualizations**: Creating plots to illustrate customer behavior, churn rates, and feature importance.### Sample Visualizations
Here are some examples of the visualizations created in the project:
- **Churn Rate by Gender**:
- **Service Usage Patterns**:
## Predictive Modeling
The project implements three main algorithms to predict customer churn:
1. **Decision Tree**: A simple yet effective model that splits data based on feature values.
2. **Naive Bayes**: A probabilistic model that assumes independence among features.
3. **Random Forest**: An ensemble method that combines multiple decision trees for better accuracy.### Model Training
Each model is trained using a training dataset, and performance is evaluated using metrics such as accuracy, precision, and recall. The results help in understanding which model performs best for this specific dataset.
## Results
After training and evaluating the models, the results indicate varying levels of accuracy. The Random Forest model typically performs better due to its ensemble nature, capturing complex patterns in the data.
The results can be visualized through various plots to showcase model performance and feature importance.
### Sample Results Visualization
- **Model Accuracy Comparison**:
## Contributing
Contributions are welcome! If you have suggestions or improvements, please fork the repository and submit a pull request.
1. Fork the repository.
2. Create a new branch (`git checkout -b feature-branch`).
3. Make your changes.
4. Commit your changes (`git commit -m 'Add new feature'`).
5. Push to the branch (`git push origin feature-branch`).
6. Open a pull request.## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.
## Contact
For questions or feedback, please reach out:
- **Name**: Alessandro Basigli
- **Email**: [email protected]
- **GitHub**: [alessandrobasigli](https://github.com/alessandrobasigli)For more information, visit the [Releases](https://github.com/alessandrobasigli/Telco-Customer-Churn-Prediction-IBM-Dataset/releases) section for updates and downloadable content.
Thank you for checking out the **Telco Customer Churn Prediction** project! We hope you find it informative and useful in understanding customer churn trends.