Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/gpsyrou/binary_classification_of_bank_marketing_campaigns

Exploratory data analysis (EDA) and development of classification algorithms (Logistic Regression, Random Forest) to predict clients that are most likely to subscribe to a bank's product, as a result of marketing campaigns.
https://github.com/gpsyrou/binary_classification_of_bank_marketing_campaigns

classification eda logistic-regression python random-forest

Last synced: 17 days ago
JSON representation

Host: GitHub
URL: https://github.com/gpsyrou/binary_classification_of_bank_marketing_campaigns
Owner: gpsyrou
Created: 2020-08-08T09:00:46.000Z (over 4 years ago)
Default Branch: master
Last Pushed: 2024-11-30T11:32:06.000Z (2 months ago)
Last Synced: 2024-11-30T12:24:49.720Z (2 months ago)
Topics: classification, eda, logistic-regression, python, random-forest
Language: Jupyter Notebook
Homepage:
Size: 5.94 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 3
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Binary Classification of Direct Marketing Campaign Subscriptions: A Logistic Regression \& Random Forest Approach

## Project Description

Purpose of this project is to analyze a dataset containing information about marketing campaigns that were conducted via phone calls from a Portuguese banking institution to their clients. The main goal of these campaigns was to prompt their clients to subscribe for a specific financial product of the bank (term deposit). After each call was conducted, the client had to inform the institution about their intention of either subscribing to the product (indicating a successful campaign) or not (unsucessful campaign).

Our main task in this project is to create effective machine learning algorithms that are able to predict the probability of a client subscribing to the bank's product. We should note that, even though we are talking about calculating probabilites, we will create classification algorithms - meaning that the final output of our models will be a binary result indicating if the client subscribed ('yes') to the product or not ('no').

The dataset has 41188 rows (instances of calls to clients) and 21 columns (variables) which are describing certain aspects of the call. Please note that there are cases where the same client was contacted multiple times - something that practically doesn't affect our analysis as each call will be considered independent from each other, even if the client is the same.

Useful Links:
1. https://towardsdatascience.com/hyperparameter-tuning-the-random-forest-in-python-using-scikit-learn-28d2aa77dd74
2. https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html
3. https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
4. https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html#sklearn.metrics.roc_auc_score
5. https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
6. https://machinelearningmastery.com/tour-of-evaluation-metrics-for-imbalanced-classification/

The project's introduction picture has been taken from here.