https://github.com/heschmat/bertelsmann-arvato-project
Analyze demographics data for customers of a mail-order sales company in Germany.
https://github.com/heschmat/bertelsmann-arvato-project
class-imbalance classification customer-segmentation unsupervised-learning
Last synced: 2 months ago
JSON representation
Analyze demographics data for customers of a mail-order sales company in Germany.
- Host: GitHub
- URL: https://github.com/heschmat/bertelsmann-arvato-project
- Owner: heschmat
- Created: 2021-02-27T06:24:33.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2021-03-20T22:42:55.000Z (about 4 years ago)
- Last Synced: 2025-01-08T19:26:13.648Z (4 months ago)
- Topics: class-imbalance, classification, customer-segmentation, unsupervised-learning
- Language: Jupyter Notebook
- Homepage:
- Size: 26.4 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Udacity + Arvato Financial Solutions: Identify Customers from a Mailout Campaign
## Overview
In the project, a mail-order sales company in Germany is interested in identifying segments of the general population to target with their marketing in order to grow. Demographics information has been provided for both the general population at large as well as for prior customers of the mail-order company in order to build a model of the customer base of the company. The target dataset contains demographics information for targets of a mailout marketing campaign. The objective is __to identify which individuals are most likely to respond to the campaign and become customers of the mail-order company__.As part of the project, half of the mailout data has been provided with included response column. For the competition, the remaining half of the mailout data has had its response column withheld; the competition will be scored based on the predictions on that half of the data.
## Analysis
### Part 1: Customer Segmentation Report
Here, I use unsupervised learning techniques to describe the relationship between the demographics of the company's existing customers and the general population of Germany. The aim is to describe parts of the general population that are more likely to be part of the mail-order company's main customer base, and which parts of the general population are less so.### Part 2: Supervised Learning Model
After investigating which parts of the population are more likely to be customers of the mail-order company, it's time to build a prediction model. Each of the rows in the `MAILOUT` data files represents an individual that was targeted for a mailout campaign. Ideally, we should be able to use the demographic information from each individual to decide whether or not it will be worth it to include that person in the campaign. The `MAILOUT` data has been split into two approximately equal parts, each with almost 43,000 data rows.## Challenges
There is a large output class imbalance, where most individuals, more than 98%, did not respond to the mailout.
Thus, predicting individual classes and using accuracy does not seem to be an appropriate performance evaluation method. Instead, the competition will be using AUC to evaluate performance.## Note about demographics data
As according to the Terms & Conditions associated with the project on Udacity, and due to the sensitive and proprietary nature of the demographics data used in the project, the data is not to be shared. The data is to be considered exclusive to the project and not be used for any other purpose.