{"id":34095650,"url":"https://github.com/frauddi/dataspot","last_synced_at":"2026-04-09T00:02:44.851Z","repository":{"id":300883246,"uuid":"1007413469","full_name":"frauddi/dataspot","owner":"frauddi","description":"Find data concentration patterns and hotspots. Built for fraud detection and risk analysis.","archived":false,"fork":false,"pushed_at":"2026-01-26T03:22:37.000Z","size":8175,"stargazers_count":6,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-01-26T17:54:33.618Z","etag":null,"topics":["anomalies","anomalies-detection","data-analysis","data-science","fraud-detection","hotspots","pattern-mining","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/frauddi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"docs/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"docs/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-06-24T00:51:29.000Z","updated_at":"2026-01-26T03:22:38.000Z","dependencies_parsed_at":"2026-01-26T05:03:50.155Z","dependency_job_id":null,"html_url":"https://github.com/frauddi/dataspot","commit_stats":null,"previous_names":["frauddi/hotspot","frauddi/dataspot"],"tags_count":11,"template":false,"template_full_name":null,"purl":"pkg:github/frauddi/dataspot","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/frauddi%2Fdataspot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/frauddi%2Fdataspot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/frauddi%2Fdataspot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/frauddi%2Fdataspot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/frauddi","download_url":"https://codeload.github.com/frauddi/dataspot/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/frauddi%2Fdataspot/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31579058,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-08T14:31:17.711Z","status":"ssl_error","status_checked_at":"2026-04-08T14:31:17.202Z","response_time":54,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anomalies","anomalies-detection","data-analysis","data-science","fraud-detection","hotspots","pattern-mining","python"],"created_at":"2025-12-14T15:16:58.573Z","updated_at":"2026-04-09T00:02:44.846Z","avatar_url":"https://github.com/frauddi.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Dataspot 🔥\n\n\u003e **Find data concentration patterns and dataspots in your datasets**\n\n[![PyPI version](https://img.shields.io/pypi/v/dataspot.svg)](https://pypi.org/project/dataspot/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Maintained by Frauddi](https://img.shields.io/badge/Maintained%20by-Frauddi-blue.svg)](https://frauddi.com)\n[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)\n\nDataspot automatically discovers **where your data concentrates**, helping you identify patterns, anomalies, and insights in datasets. Originally developed for fraud detection at Frauddi, now available as open source.\n\n## ✨ Why Dataspot?\n\n- 🎯 **Purpose-built** for finding data concentrations, not just clustering\n- 🔍 **Fraud detection ready** - spot suspicious behavior patterns\n- ⚡ **Simple API** - get insights in 3 lines of code\n- 📊 **Hierarchical analysis** - understand data at multiple levels\n- 🔧 **Flexible filtering** - customize analysis with powerful options\n- 📈 **Field-tested** - validated in real fraud detection systems\n\n## 🚀 Quick Start\n\n```bash\npip install dataspot\n```\n\n```python\nfrom dataspot import Dataspot\nfrom dataspot.models.finder import FindInput, FindOptions\n\n# Sample transaction data\ndata = [\n    {\"country\": \"US\", \"device\": \"mobile\", \"amount\": \"high\", \"user_type\": \"premium\"},\n    {\"country\": \"US\", \"device\": \"mobile\", \"amount\": \"medium\", \"user_type\": \"premium\"},\n    {\"country\": \"EU\", \"device\": \"desktop\", \"amount\": \"low\", \"user_type\": \"free\"},\n    {\"country\": \"US\", \"device\": \"mobile\", \"amount\": \"high\", \"user_type\": \"premium\"},\n]\n\n# Find concentration patterns\ndataspot = Dataspot()\nresult = dataspot.find(\n    FindInput(data=data, fields=[\"country\", \"device\", \"user_type\"]),\n    FindOptions(min_percentage=10.0, limit=5)\n)\n\n# Results show where data concentrates\nfor pattern in result.patterns:\n    print(f\"{pattern.path} → {pattern.percentage}% ({pattern.count} records)\")\n\n# Output:\n# country=US \u003e device=mobile \u003e user_type=premium → 75.0% (3 records)\n# country=US \u003e device=mobile → 75.0% (3 records)\n# device=mobile → 75.0% (3 records)\n```\n\n## 🎯 Real-World Use Cases\n\n### 🚨 Fraud Detection\n\n```python\nfrom dataspot.models.finder import FindInput, FindOptions\n\n# Find suspicious transaction patterns\nresult = dataspot.find(\n    FindInput(\n        data=transactions,\n        fields=[\"country\", \"payment_method\", \"time_of_day\"]\n    ),\n    FindOptions(min_percentage=15.0, contains=\"crypto\")\n)\n\n# Spot unusual concentrations that might indicate fraud\nfor pattern in result.patterns:\n    if pattern.percentage \u003e 30:\n        print(f\"⚠️ High concentration: {pattern.path}\")\n```\n\n### 📊 Business Intelligence\n\n```python\nfrom dataspot.models.analyzer import AnalyzeInput, AnalyzeOptions\n\n# Discover customer behavior patterns\ninsights = dataspot.analyze(\n    AnalyzeInput(\n        data=customer_data,\n        fields=[\"region\", \"device\", \"product_category\", \"tier\"]\n    ),\n    AnalyzeOptions(min_percentage=10.0)\n)\n\nprint(f\"📈 Found {len(insights.patterns)} concentration patterns\")\nprint(f\"🎯 Top opportunity: {insights.patterns[0].path}\")\n```\n\n### 🔍 Temporal Analysis\n\n```python\nfrom dataspot.models.compare import CompareInput, CompareOptions\n\n# Compare patterns between time periods\ncomparison = dataspot.compare(\n    CompareInput(\n        current_data=this_month_data,\n        baseline_data=last_month_data,\n        fields=[\"country\", \"payment_method\"]\n    ),\n    CompareOptions(\n        change_threshold=0.20,\n        statistical_significance=True\n    )\n)\n\nprint(f\"📊 Changes detected: {len(comparison.changes)}\")\nprint(f\"🆕 New patterns: {len(comparison.new_patterns)}\")\n```\n\n### 🌳 Hierarchical Visualization\n\n```python\nfrom dataspot.models.tree import TreeInput, TreeOptions\n\n# Build hierarchical tree for data exploration\ntree = dataspot.tree(\n    TreeInput(\n        data=sales_data,\n        fields=[\"region\", \"product_category\", \"sales_channel\"]\n    ),\n    TreeOptions(min_value=10, max_depth=3, sort_by=\"value\")\n)\n\nprint(f\"🌳 Total records: {tree.value}\")\nprint(f\"📊 Main branches: {len(tree.children)}\")\n\n# Navigate the hierarchy\nfor region in tree.children:\n    print(f\"  📍 {region.name}: {region.value} records\")\n    for product in region.children:\n        print(f\"    📦 {product.name}: {product.value} records\")\n```\n\n### 🤖 Auto Discovery\n\n```python\nfrom dataspot.models.discovery import DiscoverInput, DiscoverOptions\n\n# Automatically discover important patterns\ndiscovery = dataspot.discover(\n    DiscoverInput(data=transaction_data),\n    DiscoverOptions(max_fields=3, min_percentage=15.0)\n)\n\nprint(f\"🎯 Top patterns discovered: {len(discovery.top_patterns)}\")\nfor field_ranking in discovery.field_ranking[:3]:\n    print(f\"📈 {field_ranking.field}: {field_ranking.score:.2f}\")\n```\n\n## 🛠️ Core Methods\n\n| Method | Purpose | Input Model | Options Model | Output Model |\n|--------|---------|-------------|---------------|--------------|\n| `find()` | Find concentration patterns | `FindInput` | `FindOptions` | `FindOutput` |\n| `analyze()` | Statistical analysis | `AnalyzeInput` | `AnalyzeOptions` | `AnalyzeOutput` |\n| `compare()` | Temporal comparison | `CompareInput` | `CompareOptions` | `CompareOutput` |\n| `discover()` | Auto pattern discovery | `DiscoverInput` | `DiscoverOptions` | `DiscoverOutput` |\n| `tree()` | Hierarchical visualization | `TreeInput` | `TreeOptions` | `TreeOutput` |\n\n### Advanced Filtering Options\n\n```python\n# Complex analysis with multiple criteria\nresult = dataspot.find(\n    FindInput(\n        data=data,\n        fields=[\"country\", \"device\", \"payment\"],\n        query={\"country\": [\"US\", \"EU\"]}  # Pre-filter data\n    ),\n    FindOptions(\n        min_percentage=10.0,      # Only patterns with \u003e10% concentration\n        max_depth=3,             # Limit hierarchy depth\n        contains=\"mobile\",       # Must contain \"mobile\" in pattern\n        min_count=50,           # At least 50 records\n        sort_by=\"percentage\",   # Sort by concentration strength\n        limit=20                # Top 20 patterns\n    )\n)\n```\n\n## ⚡ Performance\n\nDataspot delivers consistent, predictable performance with exceptionally efficient memory usage and linear scaling.\n\n### 🚀 Real-World Performance\n\n| Dataset Size | Processing Time | Memory Usage | Patterns Found |\n|--------------|-----------------|---------------|----------------|\n| 1,000 records | **~5ms** | **~1.4MB** | 12 patterns |\n| 10,000 records | **~43ms** | **~2.8MB** | 12 patterns |\n| 100,000 records | **~375ms** | **~2.9MB** | 20 patterns |\n| 1,000,000 records | **~3.7s** | **~3.0MB** | 20 patterns |\n\n\u003e **Benchmark Methodology**: Performance measured using validated testing with 5 iterations per dataset size on MacBook Pro (M-series). Test data specifications:\n\u003e\n\u003e - **JSON Size**: ~164 bytes per JSON record (~0.16 KB each)\n\u003e - **JSON Structure**: 8 keys per JSON record (`country`, `device`, `payment_method`, `amount`, `user_type`, `channel`, `status`, `id`)\n\u003e - **Analysis Scope**: 4 fields analyzed simultaneously (`country`, `device`, `payment_method`, `user_type`)\n\u003e - **Configuration**: `min_percentage=5.0`, `limit=50` patterns\n\u003e - **Results**: Consistently finds 12 concentration patterns across all dataset sizes\n\u003e - **Variance**: Minimal timing variance (±1-6ms), demonstrating algorithmic stability\n\u003e - **Memory Efficiency**: Near-constant memory usage regardless of dataset size\n\n### 💡 Performance Tips\n\n```python\n# Optimize for speed\nresult = dataspot.find(\n    FindInput(data=large_dataset, fields=fields),\n    FindOptions(\n        min_percentage=10.0,    # Skip low-concentration patterns\n        max_depth=3,           # Limit hierarchy depth\n        limit=100             # Cap results\n    )\n)\n\n# Memory efficient processing\nfrom dataspot.models.tree import TreeInput, TreeOptions\n\ntree = dataspot.tree(\n    TreeInput(data=data, fields=[\"country\", \"device\"]),\n    TreeOptions(min_value=10, top=5)  # Simplified tree\n)\n```\n\n## 📈 What Makes Dataspot Different?\n\n| **Traditional Clustering** | **Dataspot Analysis** |\n|---------------------------|---------------------|\n| Groups similar data points | **Finds concentration patterns** |\n| Equal-sized clusters | **Identifies where data accumulates** |\n| Distance-based | **Percentage and count based** |\n| Hard to interpret | **Business-friendly hierarchy** |\n| Generic approach | **Built for real-world analysis** |\n\n## 🎬 Dataspot in Action\n\n[View the algorithm](https://frauddi.github.io/dataspot/algorithm-dataspot.html)\n![Dataspot in action - Finding data concentration patterns](algorithm-dataspot.gif)\n\nSee Dataspot discover concentration patterns and dataspots in real-time with hierarchical analysis and statistical insights.\n\n## 📊 API Structure\n\n### Input Models\n\n- `FindInput` - Data and fields for pattern finding\n- `AnalyzeInput` - Statistical analysis configuration\n- `CompareInput` - Current vs baseline data comparison\n- `DiscoverInput` - Automatic pattern discovery\n- `TreeInput` - Hierarchical tree visualization\n\n### Options Models\n\n- `FindOptions` - Filtering and sorting for patterns\n- `AnalyzeOptions` - Statistical analysis parameters\n- `CompareOptions` - Change detection thresholds\n- `DiscoverOptions` - Auto-discovery constraints\n- `TreeOptions` - Tree structure customization\n\n### Output Models\n\n- `FindOutput` - Pattern discovery results with statistics\n- `AnalyzeOutput` - Enhanced analysis with insights and confidence scores\n- `CompareOutput` - Change detection results with significance tests\n- `DiscoverOutput` - Auto-discovery findings with field rankings\n- `TreeOutput` - Hierarchical tree structure with navigation\n\n## 🔧 Installation \u0026 Requirements\n\n```bash\n# Install from PyPI\npip install dataspot\n\n# Development installation\ngit clone https://github.com/frauddi/dataspot.git\ncd dataspot\npip install -e \".[dev]\"\n```\n\n**Requirements:**\n\n- Python 3.9+\n- No heavy dependencies (just standard library + optional speedups)\n\n## 🛠️ Development Commands\n\n| Command | Description |\n|---------|-------------|\n| `make lint` | Check code for style and quality issues |\n| `make lint-fix` | Automatically fix linting issues where possible |\n| `make tests` | Run all tests with coverage reporting |\n| `make check` | Run both linting and tests |\n| `make clean` | Remove cache files, build artifacts, and temporary files |\n| `make install` | Create virtual environment and install dependencies |\n\n## 📚 Documentation \u0026 Examples\n\n- 📖 [User Guide](docs/user-guide.md) - Complete usage documentation\n- 💡 [Examples](examples/) - Real-world usage examples:\n  - `01_basic_query_filtering.py` - Query and filtering basics\n  - `02_pattern_filtering_basic.py` - Pattern-based filtering\n  - `06_real_world_scenarios.py` - Business use cases\n  - `08_auto_discovery.py` - Automatic pattern discovery\n  - `09_temporal_comparison.py` - A/B testing and change detection\n  - `10_stats.py` - Statistical analysis\n- 🤝 [Contributing](docs/CONTRIBUTING.md) - How to contribute\n\n## 🌟 Why Open Source?\n\nDataspot was born from real-world fraud detection needs at Frauddi. We believe powerful pattern analysis shouldn't be locked behind closed doors. By open-sourcing Dataspot, we hope to:\n\n- 🎯 **Advance fraud detection** across the industry\n- 🤝 **Enable collaboration** on pattern analysis techniques\n- 🔍 **Help companies** spot issues in their data\n- 📈 **Improve data quality** everywhere\n\n## 🤝 Contributing\n\nWe welcome contributions! Whether you're:\n\n- 🐛 Reporting bugs\n- 💡 Suggesting features\n- 📝 Improving documentation\n- 🔧 Adding new analysis methods\n\nSee our [Contributing Guide](docs/CONTRIBUTING.md) for details.\n\n## 📄 License\n\nMIT License - see [LICENSE](LICENSE) file for details.\n\n## 🙏 Acknowledgments\n\n- **Created by [@3l1070r](https://github.com/3l1070r)** - Original algorithm and implementation\n- **Sponsored by [Frauddi](https://frauddi.com)** - Field testing and open source support\n- **Inspired by real fraud detection challenges** - Built to solve actual problems\n\n## 🔗 Links\n\n- 🏠 [Homepage](https://github.com/frauddi/dataspot)\n- 📦 [PyPI Package](https://pypi.org/project/dataspot/)\n- 🐛 [Issue Tracker](https://github.com/frauddi/dataspot/issues)\n\n---\n\n**Find your data's dataspots. Discover what others miss.**\nBuilt with ❤️ by [Frauddi](https://frauddi.com)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffrauddi%2Fdataspot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffrauddi%2Fdataspot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffrauddi%2Fdataspot/lists"}