Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/shubhamsoni98/prediction-with-binomial-logistic-regression

To predict client subscription to term deposits and optimize marketing strategies by identifying potential subscribers.
https://github.com/shubhamsoni98/prediction-with-binomial-logistic-regression

binomial data data-science eda machine-learning matplotlib pipeline python scikit-learn seaborn sklearn sql visualization

Last synced: 24 days ago
JSON representation

To predict client subscription to term deposits and optimize marketing strategies by identifying potential subscribers.

Host: GitHub
URL: https://github.com/shubhamsoni98/prediction-with-binomial-logistic-regression
Owner: shubhamsoni98
Created: 2024-09-17T06:45:17.000Z (5 months ago)
Default Branch: main
Last Pushed: 2024-09-17T07:53:15.000Z (5 months ago)
Last Synced: 2024-11-21T16:14:49.881Z (3 months ago)
Topics: binomial, data, data-science, eda, machine-learning, matplotlib, pipeline, python, scikit-learn, seaborn, sklearn, sql, visualization
Language: Jupyter Notebook
Homepage:
Size: 11.2 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # Predicting Term Deposit Subscription

## Project Overview

This project aims to predict whether clients will subscribe to a term deposit using machine learning techniques. By leveraging client data, the goal is to optimize marketing strategies and enhance the effectiveness of marketing campaigns.

## Objective

- **Predict Client Subscription**: Identify whether clients are likely to subscribe to a term deposit based on various features.

- **Optimize Marketing**: Improve the targeting of marketing efforts to reduce costs and increase campaign efficiency.

## Dataset

The dataset used for this project is `bank-full.csv`, which contains information about bank clients and their interactions with marketing campaigns. The features include:

- `age`: Age of the client

- `job`: Type of job

- `marital`: Marital status

- `education`: Level of education

- `default`: Whether the client has credit in default

- `balance`: Account balance

- `housing`: Whether the client has a housing loan

- `loan`: Whether the client has a personal loan

- `contact`: Type of communication used to contact the client

- `day`: Last contact day of the month

- `month`: Last contact month of the year

- `duration`: Duration of the last contact

- `campaign`: Number of contacts performed during this campaign

- `pdays`: Number of days since the client was last contacted

- `previous`: Number of contacts performed before this campaign

- `poutcome`: Outcome of the previous marketing campaign

- `y`: Whether the client subscribed to a term deposit (target variable)

## Solution

### Data Exploration

- **Initial Exploration**: Analyzed feature distributions and relationships.

- **Data Cleaning**: Handled missing values and outliers.

### Preprocessing

- **Feature Engineering**: Encoded categorical variables and scaled numerical features.

- **Train-Test Split**: Divided the dataset into training and testing sets.

### Modeling

- **Algorithm**: Implemented a logistic regression model using a pipeline.

- **Evaluation**: Assessed model performance with metrics such as accuracy, precision, recall, and confusion matrix.

### Code

```python

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

from sklearn.preprocessing import LabelEncoder, OneHotEncoder, OrdinalEncoder, MinMaxScaler

from sklearn.compose import ColumnTransformer

from sklearn.pipeline import Pipeline

from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

import seaborn as sns

import matplotlib.pyplot as plt

# Load data

bank = pd.read_csv('bank-full.csv', delimiter=';')

# Preprocessing

X = bank.drop(columns=['contact', 'day', 'y'])

y = bank['y']

# Label Encoding

le = LabelEncoder()

y = le.fit_transform(y)

# Data Splitting

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Column Transformer

preprocessor = ColumnTransformer(

    transformers=[

        ('num', MinMaxScaler(), ['age', 'balance', 'campaign', 'pdays', 'previous']),

        ('cat', OneHotEncoder(sparse=False, drop='first'), ['job', 'marital', 'education', 'default', 'housing', 'loan', 'month', 'poutcome'])

    ],

    remainder='passthrough'

)

# Pipeline

model = Pipeline(steps=[

    ('preprocessor', preprocessor),

    ('classifier', LogisticRegression())

])

# Train the model

model.fit(X_train, y_train)

# Predictions

y_pred = model.predict(X_test)

# Evaluation

accuracy = accuracy_score(y_test, y_pred)

print(f'Accuracy: {accuracy:.2f}')

print(classification_report(y_test, y_pred))

conf_matrix = confusion_matrix(y_test, y_pred)

sns.heatmap(conf_matrix, annot=True, fmt='d')

plt.title("Confusion Matrix")

plt.show()