Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/gpsyrou/binary_classification_of_bank_marketing_campaigns
Exploratory data analysis (EDA) and development of classification algorithms (Logistic Regression, Random Forest) to predict clients that are most likely to subscribe to a bank's product, as a result of marketing campaigns.
https://github.com/gpsyrou/binary_classification_of_bank_marketing_campaigns
classification eda logistic-regression python random-forest
Last synced: 17 days ago
JSON representation
Exploratory data analysis (EDA) and development of classification algorithms (Logistic Regression, Random Forest) to predict clients that are most likely to subscribe to a bank's product, as a result of marketing campaigns.
- Host: GitHub
- URL: https://github.com/gpsyrou/binary_classification_of_bank_marketing_campaigns
- Owner: gpsyrou
- Created: 2020-08-08T09:00:46.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2024-11-30T11:32:06.000Z (2 months ago)
- Last Synced: 2024-11-30T12:24:49.720Z (2 months ago)
- Topics: classification, eda, logistic-regression, python, random-forest
- Language: Jupyter Notebook
- Homepage:
- Size: 5.94 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Binary Classification of Direct Marketing Campaign Subscriptions: A Logistic Regression \& Random Forest Approach
Exploratory data analysis (EDA) and development of classification algorithms (Logistic Regression, Random Forest) to predict clients that are most likely to subscribe to a bank's product, as a result of marketing campaigns.
## Project Description
Purpose of this project is to analyze a dataset containing information about marketing campaigns that were conducted via phone calls from a Portuguese banking institution to their clients. The main goal of these campaigns was to prompt their clients to subscribe for a specific financial product of the bank (term deposit). After each call was conducted, the client had to inform the institution about their intention of either subscribing to the product (indicating a successful campaign) or not (unsucessful campaign).
Our main task in this project is to create effective machine learning algorithms that are able to predict the probability of a client subscribing to the bank's product. We should note that, even though we are talking about calculating probabilites, we will create classification algorithms - meaning that the final output of our models will be a binary result indicating if the client subscribed ('yes') to the product or not ('no').
The dataset has 41188 rows (instances of calls to clients) and 21 columns (variables) which are describing certain aspects of the call. Please note that there are cases where the same client was contacted multiple times - something that practically doesn't affect our analysis as each call will be considered independent from each other, even if the client is the same.
Useful Links:
1. https://towardsdatascience.com/hyperparameter-tuning-the-random-forest-in-python-using-scikit-learn-28d2aa77dd74
2. https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html
3. https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
4. https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html#sklearn.metrics.roc_auc_score
5. https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
6. https://machinelearningmastery.com/tour-of-evaluation-metrics-for-imbalanced-classification/The project's introduction picture has been taken from here.