https://github.com/preethi2805/customer-segmentation
This project applies Recency, Frequency, and Monetary (RFM) Analysis along with K-Means Clustering to segment customers based on their purchasing behavior. The goal is to identify distinct customer groups and develop targeted marketing strategies.
https://github.com/preethi2805/customer-segmentation
customer-segmentation kmeans-clustering python-3 rfm-analysis
Last synced: 24 days ago
JSON representation
This project applies Recency, Frequency, and Monetary (RFM) Analysis along with K-Means Clustering to segment customers based on their purchasing behavior. The goal is to identify distinct customer groups and develop targeted marketing strategies.
- Host: GitHub
- URL: https://github.com/preethi2805/customer-segmentation
- Owner: Preethi2805
- Created: 2025-01-28T04:50:43.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-02-20T05:27:31.000Z (over 1 year ago)
- Last Synced: 2025-02-20T05:29:56.688Z (over 1 year ago)
- Topics: customer-segmentation, kmeans-clustering, python-3, rfm-analysis
- Language: Jupyter Notebook
- Homepage:
- Size: 22.6 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# π Customer Segmentation using RFM and K-Means Clustering
## π Problem Statement
Understanding customer behavior is crucial for businesses. By segmenting customers based on their purchasing patterns, businesses can:
- Identify **high-value** customers.
- Recognize **at-risk** customers.
- Personalize marketing strategies for **customer retention** and **acquisition**.
## π Dataset
The dataset consists of transaction records from an e-commerce platform with the following key attributes:
- `InvoiceNo` β Unique invoice identifier.
- `StockCode` β Product identifier.
- `Description` β Product description.
- `Quantity` β Number of units purchased.
- `InvoiceDate` β Date of purchase.
- `UnitPrice` β Price per unit.
- `CustomerID` β Unique customer identifier.
- `Country` β Country of purchase.
## π οΈ Methodology
### 1οΈβ£ Data Preprocessing
- Handled missing values and duplicates.
- Converted `InvoiceDate` to a datetime format.
- Removed outliers using **Z-score analysis**.
### 2οΈβ£ RFM Analysis
- **Recency (R)**: Days since the customerβs last purchase.
- **Frequency (F)**: Number of transactions by each customer.
- **Monetary Value (M)**: Total spend per customer.
### 3οΈβ£ K-Means Clustering
- Determined the optimal number of clusters using the **Elbow Method** and **Silhouette Score**.
- Segmented customers into **4 clusters**.
### 4οΈβ£ Insights from Clusters
- **Cluster 3:** Loyal and high-spending customers.
- **Cluster 2:** Less engaged customers with high recency.
- **Cluster 1 & 0:** Intermediate customers who can be nurtured.
## π Results & Visualizations
- **Pair plots** to analyze feature distributions across clusters.
- **Box plots** to compare `Recency`, `Frequency`, and `Monetary Value` across clusters.
- **Bar charts** showing average RFM values per cluster.


## π§ Technologies Used
- **Python**
- **Pandas, NumPy** β Data Manipulation
- **Matplotlib, Seaborn** β Data Visualization
- **Scikit-Learn** β Machine Learning (K-Means Clustering)
- **StandardScaler** β Feature Scaling
## π Future Enhancements
- Implement **Hierarchical Clustering** for better interpretability.
- Develop **automated customer insights** with dashboards.
- Integrate with **real-time e-commerce data**.