{"id":41202587,"url":"https://github.com/databricks-industry-solutions/segmentation","last_synced_at":"2026-01-22T21:30:08.596Z","repository":{"id":72601778,"uuid":"521841413","full_name":"databricks-industry-solutions/segmentation","owner":"databricks-industry-solutions","description":"Create advanced customer segments to drive better purchasing predictions based on behaviors. Using sales data, campaigns and promotions systems, this solution helps derive a number of features that capture the behavior of various households. Build useful customer clusters to target with different promos and offers.","archived":false,"fork":false,"pushed_at":"2025-08-28T22:30:22.000Z","size":147,"stargazers_count":10,"open_issues_count":3,"forks_count":7,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-08-29T03:50:04.005Z","etag":null,"topics":["cme","databricks-industry-solutions","rcg","serverless"],"latest_commit_sha":null,"homepage":"https://www.databricks.com/solutions/accelerators/customer-segmentation","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/databricks-industry-solutions.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-08-06T03:59:43.000Z","updated_at":"2025-07-21T15:28:41.000Z","dependencies_parsed_at":null,"dependency_job_id":"c54a188a-99ed-4805-b2a4-0c918521a689","html_url":"https://github.com/databricks-industry-solutions/segmentation","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":"databricks-industry-solutions/industry-solutions-blueprints","purl":"pkg:github/databricks-industry-solutions/segmentation","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks-industry-solutions%2Fsegmentation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks-industry-solutions%2Fsegmentation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks-industry-solutions%2Fsegmentation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks-industry-solutions%2Fsegmentation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/databricks-industry-solutions","download_url":"https://codeload.github.com/databricks-industry-solutions/segmentation/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks-industry-solutions%2Fsegmentation/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28671719,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-22T20:48:19.482Z","status":"ssl_error","status_checked_at":"2026-01-22T20:48:14.968Z","response_time":144,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cme","databricks-industry-solutions","rcg","serverless"],"created_at":"2026-01-22T21:30:07.639Z","updated_at":"2026-01-22T21:30:08.589Z","avatar_url":"https://github.com/databricks-industry-solutions.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Customer Segmentation Solution Accelerator\n\n[![Databricks](https://img.shields.io/badge/Databricks-Solution_Accelerator-FF3621?style=for-the-badge\u0026logo=databricks)](https://databricks.com)\n[![Unity Catalog](https://img.shields.io/badge/Unity_Catalog-Enabled-00A1C9?style=for-the-badge)](https://docs.databricks.com/en/data-governance/unity-catalog/index.html)\n[![Serverless](https://img.shields.io/badge/Serverless-Compute-00C851?style=for-the-badge)](https://docs.databricks.com/en/compute/serverless.html)\n\n**Transform customer data into actionable business insights with modern RFM analysis and behavioral segmentation.**\n\n## 🚀 What is Customer Segmentation?\n\nCustomer segmentation divides your customer base into distinct groups based on shared characteristics and behaviors. This solution creates **6 distinct customer segments**:\n\n1. **Champions** - Premium customers generating highest revenue with frequent purchasing patterns\n2. **Loyal** - High value customers with consistent purchase patterns and high revenue\n3. **Regular** - Regular customers, with normal purchasing patterns and revenue\n4. **New Customers** - New customers, only having made one purchase\n5. **At Risk** - Customers who are at risk of churning, no recent activity\n6. **Churned** - Customers who have already churned, need to win back\n\n## 📦 Installation\n\nThis solution uses [Databricks Asset Bundle](https://docs.databricks.com/en/dev-tools/bundles/index.html) for deployment:\n\n```bash\n# Clone the repository\ngit clone https://github.com/databricks-industry-solutions/customer-segmentation.git\ncd customer-segmentation\n\n# Deploy to Databricks\ndatabricks bundle deploy\n\n# Run the complete workflow\ndatabricks bundle run customer_segmentation_demo_install\n```\n\n### Prerequisites\n- Databricks workspace with Unity Catalog enabled\n- Databricks CLI installed and configured\n- Ability to use Serverless compute (or Cluster creation permissions)\n\n## 🏗️ Project Structure\n\n```\ncustomer-segmentation/\n├── databricks.yml                 # Databricks Asset Bundle configuration\n├── src/\n│   ├── customer_segmentation.lvdash.json          The AI/BI dashboard. Make sure to change the catalog and schema names in this file to your catalog and schema\n├── notebooks/\n│   ├── 01_Data_Setup.py          # Synthetic data generation\n│   ├── 02a_Segmentation_Lakeflow.py    # Lakeflow Declarative Pipelines for segmentation\n│   ├── 02b_Segmentation_MLflow.py    # Unsupervised clustering with MLflow for segmentation (builds off of 02a_Segmentation_Lakeflow)\n│   └── 03_Business_Insights.py   # Business visualizations\n└── .github/workflows/             # CI/CD automation\n```\n\n## 🔄 Segmentation Pipeline\n\nThe solution implements a **3-stage customer segmentation pipeline**:\n\n### Stage 1: Data Setup\n- Generates **1,000 synthetic customers** with realistic demographics\n- Creates **transaction history** with seasonal patterns and behavioral variety\n- Stores data in **Unity Catalog managed tables**\n\n### Stage 2: Segmentation Analysis (Lakeflow Declarative Pipelines or Unsupervised Clustering)\n- **RFM Analysis**: Calculates Recency, Frequency, and Monetary scores\n- **Behavioral Clustering**: Groups customers by purchase patterns\n- **Segment Profiles**: Creates business-ready segment characteristics\n\n### Stage 3: Business Insights\n- **AI/BI Dashboard**: A dashboard for viewing RFM scores, trends, and customer demographics\n\n## ⚙️ Configuration\nEither:\n1. Create a `.env` file based on `.env.example`:\n```yaml\n# databricks.yml variables\nvariables:\n  catalog_name: your_catalog_name\n  schema_name: your_schema_name\n  warehouse_id: your_warehouse_id\n```\nor \n2. Create a variable-overrides.json file under .databricks \u003e bundle \u003e {your target}\n```json\n// variable-overrides.json variables\n{\n  \"catalog_name\": \"your_catalog_name\",\n  \"schema_name\": \"your_schema_name\",\n  \"warehouse_id\": \"your_warehouse_id\"\n}\n```\n\n## 📊 Expected Business Impact\n\nBased on industry benchmarks, implementing this segmentation strategy delivers:\n- **20% average revenue lift** through targeted campaigns\n- **15-30% improvement** in customer lifetime value\n- **40% increase** in marketing campaign effectiveness\n- **25% reduction** in customer acquisition costs\n\n## 🎨 Visualization Highlights\n\nThe solution includes 5 essential visualizations:\n1. **Customer Distribution** - Segment size analysis\n2. **Revenue Distribution** - Revenue concentration by segment\n3. **Performance Metrics** - Customer value benchmarks\n4. **Lifetime Value** - CLV projections by segment\n5. **ROI Analysis** - Business impact projections\n\n## 🔧 Technical Architecture\n\n- **Unity Catalog**: Data governance and managed tables\n- **Lakeflow Declarative Pipelines**: Declarative data pipelines\n- **Serverless Compute**: Cost-effective processing\n- **Plotly Express**: Accessible, interactive visualizations\n- **Synthetic Data**: Faker\n\n## 🤝 Contributing\n\nWe welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.\n\n## 📄 Third-Party Package Licenses\n\n\u0026copy; 2025 Databricks, Inc. All rights reserved. The source in this project is provided subject to the Databricks License [https://databricks.com/db-license-source]. All included or referenced third party libraries are subject to the licenses set forth below.\n\n| Package | License | Copyright |\n|---------|---------|-----------|\n| plotly\u003e=5.15.0 | MIT | Copyright (c) 2016-2023 Plotly, Inc |\n| numpy\u003e=1.21.0 | BSD-3-Clause | Copyright (c) 2005-2023, NumPy Developers |\n| pandas\u003e=1.5.0 | BSD-3-Clause | Copyright (c) 2008-2023, AQR Capital Management, LLC |\n| scikit-learn\u003e=1.3.0 | BSD-3-Clause | Copyright (c) 2007-2023 The scikit-learn developers |\n| Faker | MIT | Copyright (c) 2012-2023 joke2k |\n\n## 📜 License\n\nThis project is licensed under the Databricks License - see the [LICENSE](LICENSE) file for details.\n\n## ⚠️ Disclaimer\n\nPlease note the code in this project is provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements (SLAs). They are provided AS-IS and we do not make any guarantees of any kind. Please do not submit a support ticket relating to any issues arising from the use of these projects.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabricks-industry-solutions%2Fsegmentation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatabricks-industry-solutions%2Fsegmentation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabricks-industry-solutions%2Fsegmentation/lists"}