{"id":33901102,"url":"https://github.com/gawryco/fate","last_synced_at":"2025-12-11T23:46:21.691Z","repository":{"id":324085903,"uuid":"1095962806","full_name":"gawryco/fate","owner":"gawryco","description":"High-performance probabilistic data structures for Elixir","archived":false,"fork":false,"pushed_at":"2025-11-13T19:07:45.000Z","size":0,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-11-13T19:25:11.633Z","etag":null,"topics":["bloom-filter","cuckoo-filter","elixir","elixir-lang","hyperloglog","probabilistic-data-structures"],"latest_commit_sha":null,"homepage":"https://hexdocs.pm/fate/","language":"Elixir","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gawryco.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-13T18:48:53.000Z","updated_at":"2025-11-13T19:07:48.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/gawryco/fate","commit_stats":null,"previous_names":["gawryco/fate"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/gawryco/fate","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gawryco%2Ffate","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gawryco%2Ffate/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gawryco%2Ffate/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gawryco%2Ffate/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gawryco","download_url":"https://codeload.github.com/gawryco/fate/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gawryco%2Ffate/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":27672385,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-12-11T02:00:11.302Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bloom-filter","cuckoo-filter","elixir","elixir-lang","hyperloglog","probabilistic-data-structures"],"created_at":"2025-12-11T23:46:21.100Z","updated_at":"2025-12-11T23:46:21.677Z","avatar_url":"https://github.com/gawryco.png","language":"Elixir","readme":"# Fate\n\n[![Hex.pm](https://img.shields.io/hexpm/v/fate.svg)](https://hex.pm/packages/fate)\n[![Hex.pm](https://img.shields.io/hexpm/dt/fate.svg)](https://hex.pm/packages/fate)\n[![CI](https://github.com/YOUR_USERNAME/fate/workflows/CI/badge.svg)](https://github.com/gawryco/fate/actions)\n[![Coverage](https://img.shields.io/badge/coverage-87%25-green)](https://github.com/gawryco/fate)\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/assets/fate-logo.png\" alt=\"Fate logo\" width=\"260\"\u003e\n\u003c/p\u003e\n\n**High-performance probabilistic data structures for Elixir**\n\nFate provides concurrent, space-efficient implementations of probabilistic data structures backed by `:atomics` for thread-safe operations. Perfect for membership testing, cardinality estimation, and frequency counting at scale.\n\n## Table of Contents\n\n- [Features](#features)\n- [Installation](#installation)\n- [Quick Start](#quick-start)\n- [Data Structures](#data-structures)\n  - [Bloom Filter](#bloom-filter)\n  - [Cuckoo Filter](#cuckoo-filter)\n  - [HyperLogLog](#hyperloglog)\n- [Performance](#performance)\n- [When to Use What](#when-to-use-what)\n- [Documentation](#documentation)\n- [Testing](#testing)\n- [Contributing](#contributing)\n- [License](#license)\n- [References](#references)\n\n## Features\n\n- 🚀 **High Performance**: Optimized implementations matching or exceeding Erlang reference libraries\n- 🔒 **Concurrent**: Thread-safe operations using `:atomics` with lock-free reads\n- 🎯 **Pluggable Hashing**: Support for multiple hash functions (xxHash, Murmur3, XXH3, FNV1a)\n- 📊 **Accurate Cardinality**: HyperLogLog delivers lock-free distinct counting at sub-millisecond speeds\n- 📦 **Zero Dependencies**: Core functionality requires no external dependencies\n- 🧪 **Well Tested**: Comprehensive test coverage (~87%) with 61+ tests\n- 🔄 **Production Ready**: Battle-tested algorithms with comprehensive error handling\n\n## Installation\n\nAdd `fate` to your list of dependencies in `mix.exs`:\n\n```elixir\ndef deps do\n  [\n    {:fate, \"~\u003e 0.1.0\"}\n  ]\nend\n```\n\nThen run `mix deps.get` to install.\n\n### Optional Hash Function Dependencies\n\nFor better performance, install one or more hash function libraries:\n\n```elixir\ndef deps do\n  [\n    {:fate, \"~\u003e 0.1.0\"},\n    # Choose one or more:\n    {:xxh3, \"~\u003e 0.3\"},      # Fastest for most workloads\n    {:xxhash, \"~\u003e 0.3\"},    # Fast general-purpose hash\n    {:murmur, \"~\u003e 2.0\"}     # Good distribution\n  ]\nend\n```\n\nFate will automatically select the best available hash function, or you can specify one explicitly.\n\n## Quick Start\n\n```elixir\n# Bloom Filter - Fast membership testing\nbloom = Fate.Filter.Bloom.new(1000, false_positive_probability: 0.01)\nFate.Filter.Bloom.put(bloom, \"user:123\")\nFate.Filter.Bloom.member?(bloom, \"user:123\")  # =\u003e true\n\n# Cuckoo Filter - Membership testing with deletion\ncuckoo = Fate.Filter.Cuckoo.new(1000)\nFate.Filter.Cuckoo.put(cuckoo, \"session:abc\")\nFate.Filter.Cuckoo.delete(cuckoo, \"session:abc\")  # =\u003e :ok\nFate.Filter.Cuckoo.member?(cuckoo, \"session:abc\")  # =\u003e false\n```\n\n## Data Structures\n\n### Bloom Filter\n\nSpace-efficient probabilistic set membership testing with configurable false-positive rates.\n\n**Key Features:**\n- Configurable false-positive probability\n- Cardinality estimation\n- Serialization/deserialization\n- Set operations (merge, intersection)\n- No deletion support\n\n**Example:**\n\n```elixir\nalias Fate.Filter.Bloom\n\n# Create a filter for ~1000 items with 1% false positive rate\nbloom = Bloom.new(1000, false_positive_probability: 0.01)\n\n# Insert items\nBloom.put(bloom, \"user:123\")\nBloom.put(bloom, \"user:456\")\n\n# Check membership\nBloom.member?(bloom, \"user:123\")  # =\u003e true\nBloom.member?(bloom, \"user:789\")  # =\u003e false (probably)\n\n# Get statistics\nBloom.cardinality(bloom)  # =\u003e ~2\nBloom.false_positive_probability(bloom)  # =\u003e ~0.01\nBloom.bits_info(bloom)    # =\u003e %{total_bits: ..., set_bits_count: ..., set_ratio: ...}\n\n# Serialize for storage\nbinary = Bloom.serialize(bloom)\nrestored = Bloom.deserialize(binary)\n\n# Merge multiple filters\nmerged = Bloom.merge([bloom1, bloom2, bloom3])\nintersected = Bloom.intersection([bloom1, bloom2])\n```\n\n**Performance**: ~2x faster than existing Elixir implementations\n\n### Cuckoo Filter\n\nCompact filter with deletion support, making it more versatile than Bloom filters.\n\n**Key Features:**\n- Item deletion support (unlike Bloom filters)\n- Dynamic insertion with bounded relocation\n- Exact item count tracking\n- Serialization/deserialization\n- Set operations (merge, intersection)\n- Statistics and analytics\n\n**Example:**\n\n```elixir\nalias Fate.Filter.Cuckoo\n\n# Create a filter for ~1000 items\ncuckoo = Cuckoo.new(1000)\n\n# Insert items\n:ok = Cuckoo.put(cuckoo, \"session:abc\")\n:ok = Cuckoo.put(cuckoo, \"session:def\")\n\n# Check membership\nCuckoo.member?(cuckoo, \"session:abc\")  # =\u003e true\nCuckoo.member?(cuckoo, \"session:xyz\")  # =\u003e false\n\n# Delete items (unique to Cuckoo filters!)\n:ok = Cuckoo.delete(cuckoo, \"session:abc\")\nCuckoo.member?(cuckoo, \"session:abc\")  # =\u003e false\n\n# Check capacity and load\nCuckoo.size(cuckoo)         # =\u003e 1\nCuckoo.capacity(cuckoo)     # =\u003e 1000\nCuckoo.load_factor(cuckoo)  # =\u003e 0.0002...\n\n# Get statistics\nCuckoo.bits_info(cuckoo)              # =\u003e %{total_slots: ..., occupied_slots: ..., ...}\nCuckoo.cardinality(cuckoo)            # =\u003e 1 (same as size for Cuckoo)\nCuckoo.false_positive_probability(cuckoo)  # =\u003e ~0.0001\n\n# Serialize for storage\nbinary = Cuckoo.serialize(cuckoo)\nrestored = Cuckoo.deserialize(binary)\n\n# Merge multiple filters\nmerged = Cuckoo.merge([cuckoo1, cuckoo2, cuckoo3])\nintersected = Cuckoo.intersection([cuckoo1, cuckoo2])\n\n# Handle full filter\ncase Cuckoo.put(cuckoo, item) do\n  :ok -\u003e :inserted\n  {:error, :full} -\u003e :filter_full\nend\n```\n\n**Performance**: On par with Erlang reference implementation when using the same hash function\n\n### HyperLogLog\n\nApproximate distinct counter with configurable precision and lock-free updates, ideal for large-scale analytics.\n\n**Key Features:**\n- Precision range 4–18 with \u003c1% typical relative error at default settings\n- Lock-free concurrent updates backed by `:atomics`\n- Supports `add_hashed/2` for workloads that already have 64-bit hashes\n- Mergeable sketches with serialization support\n\n**Example:**\n\n```elixir\nalias Fate.Cardinality.HyperLogLog\n\n# Create a sketch with default precision (14)\nhll = HyperLogLog.new()\n\n# Add raw items using the selected hash module\nEnum.each(1..1_000, \u0026HyperLogLog.add(hll, \u00261))\n\n# Or skip hashing if you already have 64-bit hashes\nEnum.each(1..1_000, fn value -\u003e\n  hash = :erlang.phash2({value, 0})\n  HyperLogLog.add_hashed(hll, hash)\nend)\n\n# Estimate distinct count\nHyperLogLog.cardinality(hll)  # =\u003e ~1000\n\n# Merge sketches\nmerged = HyperLogLog.merge([hll, HyperLogLog.new()])\n```\n\n**Performance**: Outruns Erlang `HLL` and Elixir `Hypex` when fed pre-hashed values (see [Performance](#performance))\n\n## Configuration\n\n### Custom Hash Functions\n\n```elixir\n# Specify a hash function explicitly\nbloom = Fate.Filter.Bloom.new(1000, hash_module: Fate.Hash.XXH3)\ncuckoo = Fate.Filter.Cuckoo.new(1000, hash_module: Fate.Hash.Murmur3)\n\n# Or let Fate choose the best available\nhash_module = Fate.Hash.module()  # Auto-selects best available\n```\n\n### Advanced Configuration\n\n```elixir\n# Bloom filter with custom parameters\nbloom = Fate.Filter.Bloom.new(10_000,\n  false_positive_probability: 0.001,  # 0.1% FPP\n  hash_count: 10,                     # Override optimal k\n  hash_module: Fate.Hash.XXH3\n)\n\n# Cuckoo filter with custom parameters\ncuckoo = Fate.Filter.Cuckoo.new(10_000,\n  bucket_size: 4,          # Slots per bucket (default: 4)\n  fingerprint_bits: 16,   # Bits per fingerprint (default: 16)\n  load_factor: 0.95,      # Target load before failures (default: 0.95)\n  max_kicks: 100,         # Max relocation attempts (default: 100)\n  hash_module: Fate.Hash.Default\n)\n```\n\n## Performance\n\nBenchmarks on Intel Xeon E5-2695 v4 @ 2.10GHz:\n\n### Bloom Filter vs Talan.BloomFilter\n\n```\n# Insert 10k items\nFate.Bloom put/2         12.82 ips (78.00 ms)\nTalan.BloomFilter put/2   5.74 ips (174.21 ms) - 2.23x slower\n\n# Lookup 10k items\nFate.Bloom member?/2     12.89 ips (77.59 ms)\nTalan.BloomFilter        6.09 ips (164.29 ms) - 2.12x slower\n```\n\n### Cuckoo Filter vs :cuckoo_filter (Erlang)\n\n```\n# Insert 10k items (with phash2)\nFate.Cuckoo put/2         ~50 ips (~20 ms)\n:cuckoo_filter.add/2      ~55 ips (~18 ms) - 1.1x faster\n\n# Lookup 10k items (with phash2)\nFate.Cuckoo member?/2     121 ips (8.24 ms)\n:cuckoo_filter.contains   133 ips (7.53 ms) - 1.09x faster\n```\n\n**Note**: Performance is on par when using the same hash function. Differences are within measurement variance.\n\nRun benchmarks locally:\n\n```bash\nmix run bench/bloom_cmp.exs\nmix run bench/cuckoo_cmp.exs\n```\n\n### HyperLogLog vs HLL / Hypex\n\n```\n# Insert 1M hashed values (phash2)\nFate.HyperLogLog add_hashed/2   5.22 ips (191.75 ms)\nHLL.add/2                       3.19 ips (313.88 ms) - 1.64x slower\nHypex.update/2                  2.68 ips (373.43 ms) - 1.95x slower\n\n# Insert 1k raw values\nFate.HyperLogLog add/2          1.92 K ips (520.71 µs)\nHLL.add/2                       2.86 K ips (349.85 µs) - 1.49x faster\nHypex.update/2                  1.15 K ips (872.18 µs) - 1.67x slower\n\n# Cardinality (1M values)\nFate.HyperLogLog cardinality    1.24 K ips (805.81 µs)\nHLL.cardinality                 0.72 K ips (1397.38 µs) - 1.73x slower\nHypex.cardinality               1.15 K ips (867.12 µs) - 1.08x slower\n```\n\n## When to Use What\n\n### Use Bloom Filter when:\n- ✅ You only need membership testing (no deletions)\n- ✅ You want to minimize memory usage\n- ✅ You need set operations (merge/intersection)\n- ✅ False positives are acceptable\n- ✅ You need cardinality estimation\n\n### Use Cuckoo Filter when:\n- ✅ You need to delete items\n- ✅ You want bounded false-positive rates\n- ✅ You need exact item counts\n- ✅ You need set operations with deletion support\n- ✅ Slightly higher memory usage than Bloom is acceptable\n\n### Use HyperLogLog when:\n- ✅ You need approximate distinct counts with fixed memory\n- ✅ Lock-free concurrent updates are important\n- ✅ Sub-millisecond cardinality queries matter\n- ✅ You can tolerate small relative error (≈1% by default)\n- ✅ You already hash keys and want to reuse those 64-bit hashes (via `add_hashed/2`)\n\n## Documentation\n\nFull documentation is available on [HexDocs](https://hexdocs.pm/fate) (when published).\n\nGenerate local documentation:\n\n```bash\nmix docs\n```\n\n## Testing\n\nThe project includes comprehensive test coverage:\n\n```bash\n# Run tests\nmix test\n\n# Run tests with coverage\nmix coveralls\n\n# Generate HTML coverage report\nmix coveralls.html\n```\n\n**Current Coverage**: ~87% overall\n- `Fate.Filter.Bloom`: 97.3%\n- `Fate.Filter.Cuckoo`: 89.4%\n- `Fate.Hash`: 52.3% (expected - many hash modules are optional)\n\n## Implementation Details\n\n- **Bloom Filter**: Uses double hashing with configurable hash functions, stores bits in `:atomics` words with lock-free CAS operations. Optimized with direct recursion to avoid list allocations.\n- **Cuckoo Filter**: Follows the Erlang reference implementation with bitpacked bucket storage, eviction caching, and bounded relocation. Uses fast bit-mixing for alternate index calculation.\n- **Hash Functions**: Pluggable via `Fate.Hash` behaviour, with runtime availability checking. Supports XXH3, XXHash, Murmur3, FNV1a, and Erlang's `phash2`.\n\n## Roadmap\n\nFuture data structures planned:\n- Count-Min Sketch (frequency estimation)\n- Quotient Filter (compact alternative to Cuckoo)\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n### Development Setup\n\n```bash\n# Clone the repository\ngit clone https://github.com/YOUR_USERNAME/fate.git\ncd fate\n\n# Install dependencies\nmix deps.get\n\n# Run tests\nmix test\n\n# Run benchmarks\nmix run bench/bloom_cmp.exs\nmix run bench/cuckoo_cmp.exs\n\n# Format code\nmix format\n\n# Check formatting\nmix format --check-formatted\n```\n\n### Contribution Guidelines\n\n1. Fork the repository\n2. Create your feature branch (`git checkout -b feature/amazing-feature`)\n3. Make sure tests pass (`mix test`)\n4. Ensure code is formatted (`mix format`)\n5. Add tests for new functionality\n6. Update documentation as needed\n7. Commit your changes (`git commit -m 'Add some amazing feature'`)\n8. Push to the branch (`git push origin feature/amazing-feature`)\n9. Open a Pull Request\n\n### Code Style\n\n- Follow Elixir style guide\n- Use `mix format` before committing\n- Write descriptive commit messages\n- Add tests for new features\n- Update documentation for API changes\n\n## License\n\nCopyright (c) 2025 Gustavo Gawryszeski\n\nLicensed under the MIT License. See [LICENSE](LICENSE) for details.\n\n## References\n\n- [Bloom Filter (Wikipedia)](https://en.wikipedia.org/wiki/Bloom_filter)\n- [Cuckoo Filter: Practically Better Than Bloom](https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf)\n- [Erlang cuckoo_filter](https://github.com/farhadi/cuckoo_filter)\n\n## Acknowledgments\n\n- Inspired by the Erlang `cuckoo_filter` implementation\n- Built with performance and concurrency in mind\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgawryco%2Ffate","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgawryco%2Ffate","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgawryco%2Ffate/lists"}