https://github.com/razalkr70/customer-segmentation-using-dataset
A data science project that segments mall customers using K-Means clustering. Based on age, income, and spending score, it identifies customer groups and visualizes them with 2D and 3D plots for targeted marketing insights.
https://github.com/razalkr70/customer-segmentation-using-dataset
clustering customer-segmentation data-science data-visualization kmeans machine-learning pca python scikit-learn
Last synced: about 2 months ago
JSON representation
A data science project that segments mall customers using K-Means clustering. Based on age, income, and spending score, it identifies customer groups and visualizes them with 2D and 3D plots for targeted marketing insights.
- Host: GitHub
- URL: https://github.com/razalkr70/customer-segmentation-using-dataset
- Owner: Razalkr70
- Created: 2025-05-13T11:51:14.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-05-14T04:32:52.000Z (about 1 year ago)
- Last Synced: 2025-05-14T06:05:49.183Z (about 1 year ago)
- Topics: clustering, customer-segmentation, data-science, data-visualization, kmeans, machine-learning, pca, python, scikit-learn
- Language: Python
- Homepage:
- Size: 382 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 🛍️ Customer Segmentation using K-Means Clustering
This project performs customer segmentation on a mall customer dataset using the K-Means clustering algorithm. It identifies groups based on features like age, income, and spending score, and visualizes the clusters using pair plots and PCA in 3D.
## 📌 Overview
Customer segmentation is a key technique in marketing and business analytics. In this project, the K-Means algorithm is applied to group customers based on their demographics and spending patterns.
### ✨ Features
- Data preprocessing and feature scaling
- Gender encoding
- Elbow method to determine optimal `k`
- Cluster formation using K-Means
- Cluster-wise statistical summary
- Visualizations using seaborn and matplotlib
- 3D PCA for better insight into clusters
- Customer labeling using custom logic
## 🛠️ Tech Stack
- Python
- Pandas, NumPy
- Matplotlib, Seaborn
- Scikit-learn
- PCA (Principal Component Analysis)
## 📊 How It Works
1. Dataset is preprocessed and gender is encoded.
2. Elbow method is used to determine the optimal number of clusters.
3. K-Means is applied to group customers.
4. Cluster visualization using seaborn and PCA.
5. Each cluster is labeled with intuitive names like "Young Spenders", "Savers", etc.
## 📂 Dataset
`Mall_Customers.csv` should be in your working directory. It contains:
- CustomerID
- Gender
- Age
- Annual Income (k$)
- Spending Score (1-100)
## 🚀 Run the Code
```bash
pip install pandas numpy matplotlib seaborn scikit-learn
python customer_segmentation.py
```
### 📈 Sample Output
- Cluster visualization via pairplots
- 3D PCA cluster plot
- Cluster statistics