{"id":24896752,"url":"https://github.com/aryansk/customer-segmentation-analysis","last_synced_at":"2026-04-29T16:32:05.195Z","repository":{"id":273964830,"uuid":"921470870","full_name":"aryansk/Customer-Segmentation-Analysis","owner":"aryansk","description":"Advanced customer segmentation project using K-Means clustering to analyze customer behavior based on annual income, spending score, and age.","archived":false,"fork":false,"pushed_at":"2025-01-31T19:16:10.000Z","size":297,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-27T16:49:12.416Z","etag":null,"topics":["elbow-method","exploratory-data-analysis","machine-learning","machine-learning-algorithms","python","scikit-learn","sentiment-analysis","sentiment-classification"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aryansk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-24T02:19:35.000Z","updated_at":"2025-01-31T19:16:14.000Z","dependencies_parsed_at":null,"dependency_job_id":"757674c0-9a80-4fa5-b6ca-71c8db7e7e39","html_url":"https://github.com/aryansk/Customer-Segmentation-Analysis","commit_stats":null,"previous_names":["aryansk/customer-segmentation-analysis"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/aryansk/Customer-Segmentation-Analysis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aryansk%2FCustomer-Segmentation-Analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aryansk%2FCustomer-Segmentation-Analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aryansk%2FCustomer-Segmentation-Analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aryansk%2FCustomer-Segmentation-Analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aryansk","download_url":"https://codeload.github.com/aryansk/Customer-Segmentation-Analysis/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aryansk%2FCustomer-Segmentation-Analysis/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":264748911,"owners_count":23658095,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["elbow-method","exploratory-data-analysis","machine-learning","machine-learning-algorithms","python","scikit-learn","sentiment-analysis","sentiment-classification"],"created_at":"2025-02-01T20:15:07.083Z","updated_at":"2025-10-07T18:16:41.799Z","avatar_url":"https://github.com/aryansk.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Customer Segmentation Analysis 📊🔍\n\n![Python](https://img.shields.io/badge/Python-3.8+-blue.svg)\n![scikit-learn](https://img.shields.io/badge/scikit--learn-1.0+-green.svg)\n![Pandas](https://img.shields.io/badge/Pandas-1.3+-red.svg)\n![NumPy](https://img.shields.io/badge/NumPy-1.20+-yellow.svg)\n![License](https://img.shields.io/badge/License-MIT-yellow.svg)\n![Maintenance](https://img.shields.io/badge/Maintenance-Active-brightgreen.svg)\n\nAn advanced customer segmentation analysis project utilizing K-Means clustering to discover patterns in customer behavior based on multiple dimensions including annual income, spending score, and age.\n\n## 📖 Table of Contents\n- [Project Overview](#-project-overview)\n- [Technical Architecture](#-technical-architecture)\n- [Installation \u0026 Setup](#-installation--setup)\n- [Analysis Pipeline](#-analysis-pipeline)\n- [Visualization Gallery](#-visualization-gallery)\n- [Clustering Results](#-clustering-results)\n- [Development](#-development)\n- [Contributing](#-contributing)\n- [License](#-license)\n\n## 🎯 Project Overview\n\n### 🔍 Analysis Objectives\n- **Customer Profiling**\n  - Behavioral pattern identification\n  - Spending habit analysis\n  - Income-based segmentation\n  - Age group categorization\n- **Business Insights**\n  - Target market identification\n  - Marketing strategy optimization\n  - Product recommendation enhancement\n  - Customer retention analysis\n\n### 📊 Data Dimensions\n- **Key Variables**\n  - Annual Income\n  - Spending Score (1-100)\n  - Age\n  - Gender\n  - Customer ID\n\n## 🛠 Technical Architecture\n\n### Analysis Flow\n```mermaid\ngraph TD\n    A[Raw Customer Data] --\u003e B[Data Preprocessing]\n    B --\u003e C[Exploratory Analysis]\n    C --\u003e D[Feature Engineering]\n    D --\u003e E[K-Means Clustering]\n    E --\u003e F[Cluster Analysis]\n    F --\u003e G[Visualization]\n    G --\u003e H[Business Insights]\n```\n\n### Dependencies\n```python\n# requirements.txt\nnumpy\u003e=1.20.0\npandas\u003e=1.3.0\nscikit-learn\u003e=1.0.0\nmatplotlib\u003e=3.4.0\nseaborn\u003e=0.11.0\nplotly\u003e=5.3.0\n```\n\n## 💻 Installation \u0026 Setup\n\n### System Requirements\n- **Minimum Specifications**\n  - Python 3.8+\n  - 4GB RAM\n  - 2GB storage\n- **Recommended Specifications**\n  - Python 3.9+\n  - 8GB RAM\n  - 5GB storage\n  - Multi-core processor\n\n### Quick Start\n```bash\n# Clone repository\ngit clone https://github.com/yourusername/customer-segmentation.git\n\n# Navigate to project\ncd customer-segmentation\n\n# Create virtual environment\npython -m venv venv\nsource venv/bin/activate  # Linux/Mac\n.\\venv\\Scripts\\activate   # Windows\n\n# Install dependencies\npip install -r requirements.txt\n```\n\n## 🔬 Analysis Pipeline\n\n### Data Preprocessing\n```python\ndef preprocess_data(df):\n    \"\"\"\n    Preprocesses customer data for analysis.\n    \n    Args:\n        df (pandas.DataFrame): Raw customer data\n        \n    Returns:\n        pandas.DataFrame: Processed data ready for clustering\n    \"\"\"\n    # Handle missing values\n    df = df.dropna()\n    \n    # Feature scaling\n    scaler = StandardScaler()\n    features = ['Annual_Income', 'Spending_Score', 'Age']\n    df[features] = scaler.fit_transform(df[features])\n    \n    return df\n```\n\n### Clustering Implementation\n```python\ndef perform_kmeans(data, n_clusters):\n    \"\"\"\n    Performs K-means clustering on customer data.\n    \n    Args:\n        data (numpy.ndarray): Preprocessed customer data\n        n_clusters (int): Number of clusters\n        \n    Returns:\n        tuple: Cluster labels and cluster centers\n    \"\"\"\n    kmeans = KMeans(\n        n_clusters=n_clusters,\n        init='k-means++',\n        n_init=10,\n        max_iter=300,\n        random_state=42\n    )\n    \n    return kmeans.fit_predict(data), kmeans.cluster_centers_\n```\n\n## 📊 Visualization Gallery\n\n### Distribution Analysis\n```python\ndef plot_distributions(df):\n    \"\"\"\n    Creates distribution plots for key variables.\n    \"\"\"\n    fig, axes = plt.subplots(1, 3, figsize=(18, 6))\n    \n    # Income Distribution\n    sns.histplot(df['Annual_Income'], kde=True, ax=axes[0])\n    axes[0].set_title('Annual Income Distribution')\n    \n    # Spending Score Distribution\n    sns.histplot(df['Spending_Score'], kde=True, ax=axes[1])\n    axes[1].set_title('Spending Score Distribution')\n    \n    # Age Distribution\n    sns.histplot(df['Age'], kde=True, ax=axes[2])\n    axes[2].set_title('Age Distribution')\n    \n    plt.tight_layout()\n```\n\n### 3D Cluster Visualization\n```python\ndef plot_3d_clusters(data, labels):\n    \"\"\"\n    Creates 3D visualization of customer clusters.\n    \"\"\"\n    fig = plt.figure(figsize=(10, 8))\n    ax = fig.add_subplot(111, projection='3d')\n    \n    scatter = ax.scatter(\n        data[:, 0], data[:, 1], data[:, 2],\n        c=labels,\n        cmap='viridis'\n    )\n    \n    ax.set_xlabel('Annual Income (Normalized)')\n    ax.set_ylabel('Spending Score (Normalized)')\n    ax.set_zlabel('Age (Normalized)')\n    \n    plt.colorbar(scatter)\n    plt.title('3D Customer Segments')\n```\n\n## ⚡ Clustering Results\n\n### Segment Profiles\n| Cluster | Size | Avg Income | Avg Spending | Avg Age | Description |\n|---------|------|------------|--------------|---------|-------------|\n| 1 | 89 | $75,000 | 85 | 28 | Young High-Spenders |\n| 2 | 98 | $45,000 | 45 | 42 | Middle-Income Adults |\n| 3 | 77 | $120,000 | 25 | 55 | Wealthy Conservatives |\n| 4 | 82 | $35,000 | 75 | 32 | Budget-Conscious Shoppers |\n| 5 | 54 | $85,000 | 65 | 38 | Balanced Spenders |\n\n### Key Insights\n- Five distinct customer segments identified\n- Clear correlation between age and spending patterns\n- Income not directly proportional to spending score\n- Young customers show higher spending propensity\n\n## 👨‍💻 Development\n\n### Project Structure\n```\ncustomer-segmentation/\n├── data/\n│   ├── raw/\n│   └── processed/\n├── notebooks/\n│   ├── exploration.ipynb\n│   └── analysis.ipynb\n├── src/\n│   ├── preprocessing.py\n│   ├── clustering.py\n│   └── visualization.py\n├── reports/\n│   └── figures/\n├── config.py\n├── requirements.txt\n└── README.md\n```\n\n### Analysis Workflow\n1. Data cleaning and preprocessing\n2. Exploratory data analysis\n3. Feature scaling and selection\n4. Optimal cluster determination\n5. K-means clustering\n6. Result visualization\n7. Insight generation\n\n## 🤝 Contributing\n\n### Development Process\n1. Fork repository\n2. Create feature branch\n3. Implement changes\n4. Add documentation\n5. Submit pull request\n\n### Code Style Guidelines\n- Follow PEP 8\n- Document functions\n- Use meaningful variable names\n- Include visualization labels\n\n## 📄 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## 🙏 Acknowledgments\n\n- scikit-learn community\n- Seaborn visualization library\n- Customer dataset providers\n- Open source contributors\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faryansk%2Fcustomer-segmentation-analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faryansk%2Fcustomer-segmentation-analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faryansk%2Fcustomer-segmentation-analysis/lists"}