https://github.com/faraazarsath/customer-segmentation_e_commerce
Customer Segmentation of E commerce purchase database
https://github.com/faraazarsath/customer-segmentation_e_commerce
kmeans-clustering latent-semantic-analysis natural-language-processing tfidf-vectorizer
Last synced: 7 months ago
JSON representation
Customer Segmentation of E commerce purchase database
- Host: GitHub
- URL: https://github.com/faraazarsath/customer-segmentation_e_commerce
- Owner: FaraazArsath
- Created: 2023-01-18T08:43:57.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2025-02-15T05:20:14.000Z (over 1 year ago)
- Last Synced: 2025-02-15T06:26:25.758Z (over 1 year ago)
- Topics: kmeans-clustering, latent-semantic-analysis, natural-language-processing, tfidf-vectorizer
- Language: Jupyter Notebook
- Homepage:
- Size: 815 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Customer-segmentation_E_commerce_data
Customer Segmentation of E commerce purchase database

## Problem Statement
As an e-commerce platform, it is very important to profile your customers, dividing your clientele base into groups based on their needs and expectations. Grouping will help us come up with dedicated marketing strategies and will aid us in recommending products to different user bases. In this project, we are interested in analyzing the content of an E-commerce database that lists purchases made by ∼4000 customers over a period of one year (1/12/2010 to 9/12/2011). Based on this analysis, we would like to develop models to group the 4000 customers into different buckets. Such a model must take into account the similarity between the products purchased between the users (i.e. a user might purchase 2 different products which are very similar to each other), the spending patterns of a user, their meta information, etc.
## Minimum Requirements
The end objective of the participant is to come up with customer segmentations that take into account all the information that is presented in the dataset. The participant is expected to use NLP techniques to find similarity between the products.
## Project Summary:
In this project I used NLP (Natural Language Processing) techniques to extract frequent words in the 'Description' column of Products purchased and done Clustering text documents using k-means to obtain Product Categories.
Used Feature Extaction method TfidfVectorizer whch uses an in-memory vocabulary (a Python dict) to map the most frequent words to features indices and hence compute a word occurrence frequency (sparse) matrix. The word frequencies are then reweighted using the Inverse Document Frequency (IDF) vector collected feature-wise over the corpus. Performed dimensionality reduction using LSA (Latent semantic analysis) to this extracted features before fitting into K- means.
Finally with the Product Categories we clustered Customers based on thier purchases and spending pattern.
### EDA


### Product Category:

#### Monthly Sales per category

### Customer Cluster :


Out of total 4.35 million Sales in United Kingdom.
- The customers in cluster 1 are contributing 2.15 million Sales
- The customers in cluster 2 are contributing 0.87 million Sales
- The customers in cluster 3 are contributing 0.78 million Sales.
- The customers in cluster 0 are least contributing 0.53 million Sales