{"id":30028967,"url":"https://github.com/zenklinov/customer-segmentation","last_synced_at":"2026-05-17T11:33:09.574Z","repository":{"id":307036887,"uuid":"1028096065","full_name":"zenklinov/Customer-Segmentation","owner":"zenklinov","description":"An interactive Streamlit app that performs customer segmentation using RFM analysis (Recency, Frequency, Monetary) and K‑Means clustering. You can use a built‑in sample dataset or upload your own transactions, explore distributions, determine the optimal number of clusters, visualize segments, and download the results.","archived":false,"fork":false,"pushed_at":"2025-07-29T03:48:59.000Z","size":1903,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-06T17:03:48.219Z","etag":null,"topics":["customer-segmentation","k-means-clustering","streamlit"],"latest_commit_sha":null,"homepage":"https://customer-segmentation-eqpmr72b6iaecr3wyyjx5s.streamlit.app/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zenklinov.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-29T03:09:14.000Z","updated_at":"2025-07-29T03:49:02.000Z","dependencies_parsed_at":"2025-07-29T05:38:11.759Z","dependency_job_id":"1d608766-8b8b-4dd0-a98b-9c9f3f73fbb1","html_url":"https://github.com/zenklinov/Customer-Segmentation","commit_stats":null,"previous_names":["zenklinov/customer-segmentation"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/zenklinov/Customer-Segmentation","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zenklinov%2FCustomer-Segmentation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zenklinov%2FCustomer-Segmentation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zenklinov%2FCustomer-Segmentation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zenklinov%2FCustomer-Segmentation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zenklinov","download_url":"https://codeload.github.com/zenklinov/Customer-Segmentation/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zenklinov%2FCustomer-Segmentation/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33136746,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-17T09:28:26.183Z","status":"ssl_error","status_checked_at":"2026-05-17T09:27:52.702Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["customer-segmentation","k-means-clustering","streamlit"],"created_at":"2025-08-06T16:47:16.278Z","updated_at":"2026-05-17T11:33:09.570Z","avatar_url":"https://github.com/zenklinov.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Customer Segmentation (RFM + K‑Means)\n![License](https://img.shields.io/badge/license-MIT-blue.svg)\n![Python](https://img.shields.io/badge/python-3.9%2B-3776AB?logo=python\u0026logoColor=white)\n![RFM](https://img.shields.io/badge/Method-RFM%20Analysis-6A5ACD)\n![KMeans](https://img.shields.io/badge/Clustering-K--Means-8B4513)\n![scikit-learn](https://img.shields.io/badge/ML-scikit--learn-F7931E?logo=scikit-learn\u0026logoColor=white)\n![Streamlit](https://img.shields.io/badge/Streamlit-Interactive%20App-FF4B4B?logo=streamlit\u0026logoColor=white)\n![Business](https://img.shields.io/badge/Use%20Case-Customer%20Analytics-2E8B57)\n![Status](https://img.shields.io/badge/status-stable-success)\n\n\n\nAn interactive Streamlit app that performs **customer segmentation** using **RFM analysis** (Recency, Frequency, Monetary) and **K‑Means clustering**. You can use a built‑in sample dataset or upload your own transactions, explore distributions, determine the optimal number of clusters, visualize segments, and download the results.\n\n- **Live demo:** https://customer-segmentation-eqpmr72b6iaecr3wyyjx5s.streamlit.app/  \n- **Source code:** https://github.com/zenklinov/Customer-Segmentation  \n- **Main app file:** `customer-segmentation.py`\n\n---\n\n## Features\n\n- **Two data sources**\n  - Use a **sample dataset** (synthetic transactions).\n  - **Upload CSV** with your own transaction data.\n- **RFM calculation**\n  - Computes Recency (days since last purchase), Frequency (number of invoices), and Monetary (total spend).\n- **Exploratory visuals**\n  - Histograms (Recency/Frequency/Monetary), summary tables, and dataset info.\n- **Clustering**\n  - Preprocessing: outlier removal (IQR), **standardization**.\n  - **Elbow method** to help choose the optimal number of clusters.\n  - **K‑Means** (2–6 clusters) with interactive plots (Plotly scatter, counts, pie chart).\n- **Business insights**\n  - Automatic labeling of common segments (e.g., **Champions**, **Loyal Customers**, **Recent Customers**, **Lost Customers**, **Need Attention**) with actionable recommendations.\n- **Export**\n  - Download segmentation results as **CSV**.\n\n---\n\n## What is RFM?\n\n- **Recency** – How recently a customer purchased (lower is better).\n- **Frequency** – How often they purchase (higher is better).\n- **Monetary** – How much they spend (higher is better).\n\nThis app computes RFM from your transaction data and applies K‑Means to group customers into data‑driven segments.\n\n---\n\n## Expected CSV Columns\n\nYour uploaded CSV should contain at least:\n\n- `CustomerID` – Unique customer identifier  \n- `InvoiceDate` – Transaction date (parseable as a datetime)  \n- `InvoiceNo` – Invoice number  \n- `Quantity` – Units purchased  \n- `UnitPrice` – Price per unit  \n\nIf `TotalAmount` is missing, the app will compute it as `Quantity * UnitPrice`.\n\n\u003e **Note:** You can **map columns** from your CSV to these fields inside the app (so your column names do not have to match exactly).\n\n---\n\n## Quick Start (Local)\n\n### 1) Clone the repo\n```bash\ngit clone https://github.com/zenklinov/Customer-Segmentation.git\ncd Customer-Segmentation\n```\n\n### 2) Create a virtual environment (recommended)\n```bash\npython -m venv .venv\n# Windows\n.venv\\Scripts\\activate\n# macOS/Linux\nsource .venv/bin/activate\n```\n\n### 3) Install dependencies\n```txt\nstreamlit\npandas\nnumpy\nmatplotlib\nseaborn\nscikit-learn\nplotly\n```\nThen install:\n```bash\npip install -r requirements.txt\n```\n\n_Alternatively_\n```bash\npip install streamlit pandas numpy matplotlib seaborn scikit-learn plotly\n```\n\n### 4) Run the app\n```bash\nstreamlit run customer-segmentation.py\n```\nOpen the URL printed in the terminal (usually http://localhost:8501).\n\n## 🧭 App Workflow\n\nThe sidebar lets you navigate through four sections:\n\n### 1) **Data Source Configuration**\n- Choose **Use Sample Dataset** or **Upload Your Data (CSV)**.\n- Preview your data, see basic info and summary stats.\n- If uploading, map your columns (CustomerID, InvoiceDate, InvoiceNo, Quantity, UnitPrice).\n- Proceed to RFM analysis.\n\n### 2) **RFM Analysis**\n- The app computes RFM per customer:\n  - **Recency**: days since the most recent purchase (relative to max date + 1 day in your data).\n  - **Frequency**: count of invoices.\n  - **Monetary**: sum of `TotalAmount`.\n- View histograms for each metric and a data preview.\n- Choose a scoring approach:\n  - **K‑Means Clustering (Recommended)** – proceed to preprocessing \u0026 clustering.\n  - **Manual Scoring (1–4)** – placeholder UI (implementation stub for custom scoring).\n\n### 3) **Clustering Results**\n- **Outlier removal**: IQR method (1.5×IQR) for R, F, M.\n- **Scaling**: Standardization (`StandardScaler`).\n- **Elbow plot**: Inspect inertia for k ∈ [1..9].\n- Select **number of clusters (2–6)** and run **K‑Means**.\n- Explore results:\n  - Cluster size chart and pie chart.\n  - **Cluster profile heatmap** (mean R/F/M by cluster).\n  - **Interactive scatter** (choose axes, colored by cluster).\n- **Download** the full customer‑level results as CSV.\n\n### 4) **Business Insights**\n- Automatically assigns human‑readable labels (e.g., **Champions**, **Loyal Customers**, **Recent Customers**, **Lost Customers**, **Need Attention**) based on relative R/F/M.\n- For each segment:\n  - Shows **metrics** (Recency, Frequency, Monetary).\n  - Lists **actionable recommendations** (e.g., win‑back, loyalty, upsell/cross‑sell).\n  - Displays **sample customers**.\n- Includes high‑level **marketing strategy** suggestions.\n\n---\n\n## How RFM is Computed (Conceptual)\n\nFor each `CustomerID`:\n- `Recency = (reference_date - last(InvoiceDate)).days`, where `reference_date = max(InvoiceDate) + 1 day`\n- `Frequency = number of InvoiceNo`\n- `Monetary = sum(TotalAmount)`\n\n---\n\n## Notes \u0026 Limitations\n\n- **Manual Scoring** is currently a placeholder in the UI. If you need it, you can extend the code to assign quartile‑based (or custom) scores for R/F/M.\n- **Outlier removal** uses a simple IQR rule; consider tuning the factor or method for your data.\n- The **sample dataset** is synthetic and generated for demonstration (randomized transactions across a fixed period). Real data will behave differently.\n- For **very large datasets**, you may want to optimize memory usage or switch to chunked processing.\n\n---\n\n## Tech Stack\n\n- **Frontend**: Streamlit  \n- **Data**: pandas, numpy  \n- **Visualization**: matplotlib, seaborn, Plotly  \n- **ML**: scikit‑learn (StandardScaler, KMeans)\n\n---\n\n## Contributing\n\nPull requests and issues are welcome:\n1. Fork the repository.\n2. Create a feature branch.\n3. Commit changes with clear messages.\n4. Open a PR describing your changes.\n\n---\n\n## License\n\nPlease refer to the repository for licensing information.\n\n---\n\n## 🙌 Acknowledgements\n\nThanks to the open‑source community behind Streamlit, pandas, scikit‑learn, matplotlib, seaborn, and Plotly.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzenklinov%2Fcustomer-segmentation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzenklinov%2Fcustomer-segmentation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzenklinov%2Fcustomer-segmentation/lists"}