Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/aryansk/customer-segmentation-analysis
Advanced customer segmentation project using K-Means clustering to analyze customer behavior based on annual income, spending score, and age.
https://github.com/aryansk/customer-segmentation-analysis
elbow-method exploratory-data-analysis machine-learning machine-learning-algorithms python scikit-learn sentiment-analysis sentiment-classification
Last synced: 2 days ago
JSON representation
Advanced customer segmentation project using K-Means clustering to analyze customer behavior based on annual income, spending score, and age.
- Host: GitHub
- URL: https://github.com/aryansk/customer-segmentation-analysis
- Owner: aryansk
- Created: 2025-01-24T02:19:35.000Z (11 days ago)
- Default Branch: main
- Last Pushed: 2025-01-31T19:16:10.000Z (3 days ago)
- Last Synced: 2025-01-31T20:23:45.231Z (3 days ago)
- Topics: elbow-method, exploratory-data-analysis, machine-learning, machine-learning-algorithms, python, scikit-learn, sentiment-analysis, sentiment-classification
- Language: Jupyter Notebook
- Homepage:
- Size: 290 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Customer Segmentation Analysis 📊🔍
![Python](https://img.shields.io/badge/Python-3.8+-blue.svg)
![scikit-learn](https://img.shields.io/badge/scikit--learn-1.0+-green.svg)
![Pandas](https://img.shields.io/badge/Pandas-1.3+-red.svg)
![NumPy](https://img.shields.io/badge/NumPy-1.20+-yellow.svg)
![License](https://img.shields.io/badge/License-MIT-yellow.svg)
![Maintenance](https://img.shields.io/badge/Maintenance-Active-brightgreen.svg)An advanced customer segmentation analysis project utilizing K-Means clustering to discover patterns in customer behavior based on multiple dimensions including annual income, spending score, and age.
## 📖 Table of Contents
- [Project Overview](#-project-overview)
- [Technical Architecture](#-technical-architecture)
- [Installation & Setup](#-installation--setup)
- [Analysis Pipeline](#-analysis-pipeline)
- [Visualization Gallery](#-visualization-gallery)
- [Clustering Results](#-clustering-results)
- [Development](#-development)
- [Contributing](#-contributing)
- [License](#-license)## 🎯 Project Overview
### 🔍 Analysis Objectives
- **Customer Profiling**
- Behavioral pattern identification
- Spending habit analysis
- Income-based segmentation
- Age group categorization
- **Business Insights**
- Target market identification
- Marketing strategy optimization
- Product recommendation enhancement
- Customer retention analysis### 📊 Data Dimensions
- **Key Variables**
- Annual Income
- Spending Score (1-100)
- Age
- Gender
- Customer ID## 🛠 Technical Architecture
### Analysis Flow
```mermaid
graph TD
A[Raw Customer Data] --> B[Data Preprocessing]
B --> C[Exploratory Analysis]
C --> D[Feature Engineering]
D --> E[K-Means Clustering]
E --> F[Cluster Analysis]
F --> G[Visualization]
G --> H[Business Insights]
```### Dependencies
```python
# requirements.txt
numpy>=1.20.0
pandas>=1.3.0
scikit-learn>=1.0.0
matplotlib>=3.4.0
seaborn>=0.11.0
plotly>=5.3.0
```## 💻 Installation & Setup
### System Requirements
- **Minimum Specifications**
- Python 3.8+
- 4GB RAM
- 2GB storage
- **Recommended Specifications**
- Python 3.9+
- 8GB RAM
- 5GB storage
- Multi-core processor### Quick Start
```bash
# Clone repository
git clone https://github.com/yourusername/customer-segmentation.git# Navigate to project
cd customer-segmentation# Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/Mac
.\venv\Scripts\activate # Windows# Install dependencies
pip install -r requirements.txt
```## 🔬 Analysis Pipeline
### Data Preprocessing
```python
def preprocess_data(df):
"""
Preprocesses customer data for analysis.
Args:
df (pandas.DataFrame): Raw customer data
Returns:
pandas.DataFrame: Processed data ready for clustering
"""
# Handle missing values
df = df.dropna()
# Feature scaling
scaler = StandardScaler()
features = ['Annual_Income', 'Spending_Score', 'Age']
df[features] = scaler.fit_transform(df[features])
return df
```### Clustering Implementation
```python
def perform_kmeans(data, n_clusters):
"""
Performs K-means clustering on customer data.
Args:
data (numpy.ndarray): Preprocessed customer data
n_clusters (int): Number of clusters
Returns:
tuple: Cluster labels and cluster centers
"""
kmeans = KMeans(
n_clusters=n_clusters,
init='k-means++',
n_init=10,
max_iter=300,
random_state=42
)
return kmeans.fit_predict(data), kmeans.cluster_centers_
```## 📊 Visualization Gallery
### Distribution Analysis
```python
def plot_distributions(df):
"""
Creates distribution plots for key variables.
"""
fig, axes = plt.subplots(1, 3, figsize=(18, 6))
# Income Distribution
sns.histplot(df['Annual_Income'], kde=True, ax=axes[0])
axes[0].set_title('Annual Income Distribution')
# Spending Score Distribution
sns.histplot(df['Spending_Score'], kde=True, ax=axes[1])
axes[1].set_title('Spending Score Distribution')
# Age Distribution
sns.histplot(df['Age'], kde=True, ax=axes[2])
axes[2].set_title('Age Distribution')
plt.tight_layout()
```### 3D Cluster Visualization
```python
def plot_3d_clusters(data, labels):
"""
Creates 3D visualization of customer clusters.
"""
fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d')
scatter = ax.scatter(
data[:, 0], data[:, 1], data[:, 2],
c=labels,
cmap='viridis'
)
ax.set_xlabel('Annual Income (Normalized)')
ax.set_ylabel('Spending Score (Normalized)')
ax.set_zlabel('Age (Normalized)')
plt.colorbar(scatter)
plt.title('3D Customer Segments')
```## ⚡ Clustering Results
### Segment Profiles
| Cluster | Size | Avg Income | Avg Spending | Avg Age | Description |
|---------|------|------------|--------------|---------|-------------|
| 1 | 89 | $75,000 | 85 | 28 | Young High-Spenders |
| 2 | 98 | $45,000 | 45 | 42 | Middle-Income Adults |
| 3 | 77 | $120,000 | 25 | 55 | Wealthy Conservatives |
| 4 | 82 | $35,000 | 75 | 32 | Budget-Conscious Shoppers |
| 5 | 54 | $85,000 | 65 | 38 | Balanced Spenders |### Key Insights
- Five distinct customer segments identified
- Clear correlation between age and spending patterns
- Income not directly proportional to spending score
- Young customers show higher spending propensity## 👨💻 Development
### Project Structure
```
customer-segmentation/
├── data/
│ ├── raw/
│ └── processed/
├── notebooks/
│ ├── exploration.ipynb
│ └── analysis.ipynb
├── src/
│ ├── preprocessing.py
│ ├── clustering.py
│ └── visualization.py
├── reports/
│ └── figures/
├── config.py
├── requirements.txt
└── README.md
```### Analysis Workflow
1. Data cleaning and preprocessing
2. Exploratory data analysis
3. Feature scaling and selection
4. Optimal cluster determination
5. K-means clustering
6. Result visualization
7. Insight generation## 🤝 Contributing
### Development Process
1. Fork repository
2. Create feature branch
3. Implement changes
4. Add documentation
5. Submit pull request### Code Style Guidelines
- Follow PEP 8
- Document functions
- Use meaningful variable names
- Include visualization labels## 📄 License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## 🙏 Acknowledgments
- scikit-learn community
- Seaborn visualization library
- Customer dataset providers
- Open source contributors