An open API service indexing awesome lists of open source software.

https://github.com/preethi2805/customer-segmentation

This project applies Recency, Frequency, and Monetary (RFM) Analysis along with K-Means Clustering to segment customers based on their purchasing behavior. The goal is to identify distinct customer groups and develop targeted marketing strategies.
https://github.com/preethi2805/customer-segmentation

customer-segmentation kmeans-clustering python-3 rfm-analysis

Last synced: 24 days ago
JSON representation

This project applies Recency, Frequency, and Monetary (RFM) Analysis along with K-Means Clustering to segment customers based on their purchasing behavior. The goal is to identify distinct customer groups and develop targeted marketing strategies.

Awesome Lists containing this project

README

          

# πŸ“Š Customer Segmentation using RFM and K-Means Clustering

## πŸ” Problem Statement
Understanding customer behavior is crucial for businesses. By segmenting customers based on their purchasing patterns, businesses can:
- Identify **high-value** customers.
- Recognize **at-risk** customers.
- Personalize marketing strategies for **customer retention** and **acquisition**.

## πŸ“Œ Dataset
The dataset consists of transaction records from an e-commerce platform with the following key attributes:
- `InvoiceNo` – Unique invoice identifier.
- `StockCode` – Product identifier.
- `Description` – Product description.
- `Quantity` – Number of units purchased.
- `InvoiceDate` – Date of purchase.
- `UnitPrice` – Price per unit.
- `CustomerID` – Unique customer identifier.
- `Country` – Country of purchase.

## πŸ› οΈ Methodology
### 1️⃣ Data Preprocessing
- Handled missing values and duplicates.
- Converted `InvoiceDate` to a datetime format.
- Removed outliers using **Z-score analysis**.

### 2️⃣ RFM Analysis
- **Recency (R)**: Days since the customer’s last purchase.
- **Frequency (F)**: Number of transactions by each customer.
- **Monetary Value (M)**: Total spend per customer.

### 3️⃣ K-Means Clustering
- Determined the optimal number of clusters using the **Elbow Method** and **Silhouette Score**.
- Segmented customers into **4 clusters**.

### 4️⃣ Insights from Clusters
- **Cluster 3:** Loyal and high-spending customers.
- **Cluster 2:** Less engaged customers with high recency.
- **Cluster 1 & 0:** Intermediate customers who can be nurtured.

## πŸ“ˆ Results & Visualizations
- **Pair plots** to analyze feature distributions across clusters.
- **Box plots** to compare `Recency`, `Frequency`, and `Monetary Value` across clusters.
- **Bar charts** showing average RFM values per cluster.

![Pairplot - relationships between features and their cluster distribution.](pairplot.png)
![Boxplot](boxplot.png)

## πŸ”§ Technologies Used
- **Python**
- **Pandas, NumPy** – Data Manipulation
- **Matplotlib, Seaborn** – Data Visualization
- **Scikit-Learn** – Machine Learning (K-Means Clustering)
- **StandardScaler** – Feature Scaling

## πŸ“Œ Future Enhancements
- Implement **Hierarchical Clustering** for better interpretability.
- Develop **automated customer insights** with dashboards.
- Integrate with **real-time e-commerce data**.