https://github.com/laizaparizotto/churn-prediction-kedro

Last synced: about 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/laizaparizotto/churn-prediction-kedro
Owner: laizaparizotto
Created: 2023-06-19T03:32:12.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2023-07-07T10:10:00.000Z (almost 2 years ago)
Last Synced: 2024-08-01T10:19:12.928Z (10 months ago)
Language: Jupyter Notebook
Size: 5.16 MB
Stars: 8
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-kedro - Churn Prediction with Kedro - ready machine learning model. (Example projects)

README

# Churn Prediction with Kedro Framework

This is a Kedro repository that tackles a data science challenge of **predicting customer churn** for a fictional financial institution. The goal is to build an effective pipeline for a production-ready Machine Learning model to forecast customer churn accurately.

To approach this problem, it was first developed EDA, feature engineering and model training and evaluation using Jupyter Notebooks. The notebooks are located in `"churn-prediction-kedro/churn-prediction/notebooks/"`. Feel free to visit the notebooks and check my reasoning behind the solution before running the pipeline. :)

[Exaploratory Data Analysis](churn-prediction/notebooks/EDA.ipynb)

[Feature Engineering](churn-prediction/notebooks/feature_engineering.ipynb)

[Model Training and Evaluation](churn-prediction/notebooks/model_training.ipynb)

### Data Understanding:
- The first dataset, named `Abandono_clientes` contains 10,000 rows and 13 columns, including a target column "Exited" with binary data (1 if the customer has churned, 0 if not).
- The second dataset, named `Abandono_teste`, consists of 1,000 rows and 12 columns, excluding the `Exited` column.

### Key Concepts:
**Customer Churn:** Churn refers to the phenomenon of customers discontinuing their relationship with a company or service. In this context, it represents customers who have abandoned the financial institution.

**Features:** The dataset contains various features or attributes that provide information about the customers. Features include `Row Number`, `Customer Id`, `Surname`, `Credit Score`, `Geography`, `Gender`, `Age`, `Tenure` _(duration of the customer's relationship with the bank)_, `Balance`, `Number of Products Held`, `Has a Credit Card`, `Is Active Member` and `Estimated salary`.

**Exited:** The target variable `Exited` indicates whether a customer has churned (1) or not (0).

**Performance Metrics:** To assess the effectiveness of the model, various evaluation metrics are used, including accuracy, precision, recall, F1-score, and AUC-ROC curve. These metrics help gauge the model's predictive capability and its ability to correctly identify customers who are likely to churn.

## Getting started
Please note that this project was initially developed using Python 3.10.6 and on the Ubuntu operating system.

**Clone the repository**

To clone the repository and set up the development environment, follow the steps below:

1. Clone the repository using the command:
```
git clone https://github.com/laizaparizotto/churn-prediction-kedro.git
```

2. Change to the cloned repository directory:
```
cd churn-prediction-kedro
```

3. Create a virtual environment using `venv`:
```
python -m venv .venv
```

4. Activate the virtual environment:
- For Windows:
```
.venv\Scripts\activate
```
- For macOS and Linux:
```
source .venv/bin/activate
```

Now you have successfully cloned the repository and set up the virtual environment. You can proceed with the next steps as described in the project documentation.

**Install Kedro**

To install Kedro, run:
For more information, please check [Kedro Installation Documentation](https://docs.kedro.org/en/stable/get_started/install.html)

```
cd churn-prediction/
pip install kedro
```

**Install dependencies**

All necessary dependencies are located in `src/requirements.txt`.

To install them, run:

```
pip install -r src/requirements.txt
```

## How to run the pipeline

You can run the Kedro project with:

```
kedro run
```

This will run the pipeline, which consists in data loading, preprocessing, training and evaluating RandomForestClassifier, and finally prediciting for the test set.

**Final results will be stored at `'/churn-prediction/data/07_model_output/resultado_teste.csv'`** *

## Interactive Visualization

You can acess the interactive visualization with

```
kedro viz
```

![The final pipeline can be seen below:](churn-prediction/docs/pipeline.png)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/laizaparizotto/churn-prediction-kedro

Awesome Lists containing this project

README