{"id":21906233,"url":"https://github.com/alfex4936/sbf-go","last_synced_at":"2026-05-18T01:05:20.475Z","repository":{"id":257195554,"uuid":"857349643","full_name":"Alfex4936/sbf-go","owner":"Alfex4936","description":"Stable Bloom Filter (SBF) in Go","archived":false,"fork":false,"pushed_at":"2024-09-14T12:24:17.000Z","size":12,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-27T07:41:41.339Z","etag":null,"topics":["bloomfilter","golang","sbf","stable-bloom-filter"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Alfex4936.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-14T12:23:53.000Z","updated_at":"2024-09-14T12:33:34.000Z","dependencies_parsed_at":null,"dependency_job_id":"3e2dae38-1e39-4f0a-be49-16a167a946d5","html_url":"https://github.com/Alfex4936/sbf-go","commit_stats":null,"previous_names":["alfex4936/sbf-go"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Alfex4936%2Fsbf-go","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Alfex4936%2Fsbf-go/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Alfex4936%2Fsbf-go/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Alfex4936%2Fsbf-go/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Alfex4936","download_url":"https://codeload.github.com/Alfex4936/sbf-go/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244920306,"owners_count":20531997,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bloomfilter","golang","sbf","stable-bloom-filter"],"created_at":"2024-11-28T16:43:05.440Z","updated_at":"2026-05-18T01:05:20.449Z","avatar_url":"https://github.com/Alfex4936.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Stable Bloom Filter (SBF) for Go\n\nA **Stable Bloom Filter (SBF)** implementation in Go, providing an approximate membership data structure with support for element decay over time.\n\nIt allows you to efficiently test whether an element is likely present in a set, with a configurable false positive rate, and automatically forgets old elements based on a decay mechanism.\n\n## Features\n\n- **Approximate Membership Testing**: Quickly check if an element is likely in the set.\n- **Element Decay**: Automatically removes old elements over time to prevent filter saturation.\n- **Concurrent Access**: Safe for use by multiple goroutines simultaneously.\n- **Customizable Parameters**: Configure false positive rate, decay rate, and decay interval.\n- **Optimized for Performance**: Efficient memory usage and fast operations.\n- **Scalability**: Designed to handle large data sets and high-throughput applications.\n\n## Table of Contents\n\n- [Stable Bloom Filter (SBF) for Go](#stable-bloom-filter-sbf-for-go)\n  - [Features](#features)\n  - [Table of Contents](#table-of-contents)\n  - [Installation](#installation)\n  - [Quick Start](#quick-start)\n  - [Usage Examples](#usage-examples)\n    - [Detecting Duplicates Among Users](#detecting-duplicates-among-users)\n  - [When to Use](#when-to-use)\n  - [When Not to Use](#when-not-to-use)\n  - [Parameters Explanation](#parameters-explanation)\n  - [Scalability](#scalability)\n    - [Concurrent Access](#concurrent-access)\n    - [Memory Efficiency](#memory-efficiency)\n    - [Horizontal Scaling](#horizontal-scaling)\n    - [Example: Scaling with Multiple Filters](#example-scaling-with-multiple-filters)\n    - [Considerations](#considerations)\n  - [Performance Considerations](#performance-considerations)\n  - [Limitations](#limitations)\n  - [License](#license)\n\n## Installation\n\n```bash\ngo get github.com/Alfex4936/sbf-go\n```\n\n## Quick Start\n\nHere's how to create a new Stable Bloom Filter and use it to add and check elements:\n\n```go\npackage main\n\nimport (\n    \"fmt\"\n    \"github.com/Alfex4936/sbf-go\"\n    \"time\"\n)\n\nfunc main() {\n    // Parameters for the Stable Bloom Filter\n    expectedItems := uint32(1_000_000) // Expected number of items\n    falsePositiveRate := 0.01          // Desired false positive rate (1%)\n\n    // Create a Stable Bloom Filter with default decay settings\n    sbfInstance, err := sbf.NewDefaultStableBloomFilter(expectedItems, falsePositiveRate, 0, 0)\n    if err != nil {\n        panic(err)\n    }\n    defer sbfInstance.StopDecay() // Ensure resources are cleaned up\n\n    // Add an element\n    element := []byte(\"example_element\")\n    sbfInstance.Add(element)\n\n    // Check if the element is in the filter\n    if sbfInstance.Check(element) {\n        fmt.Println(\"Element is probably in the set.\")\n    } else {\n        fmt.Println(\"Element is definitely not in the set.\")\n    }\n\n    // Wait for some time to let decay happen\n    time.Sleep(2 * time.Minute)\n\n    // Check again after decay\n    if sbfInstance.Check(element) {\n        fmt.Println(\"Element is probably still in the set.\")\n    } else {\n        fmt.Println(\"Element has likely decayed from the set.\")\n    }\n}\n```\n\n## Usage Examples\n\n### Detecting Duplicates Among Users\n\nIn this example, we'll use the Stable Bloom Filter to detect duplicates among a stream of user registrations. This can be useful in preventing duplicate entries, replay attacks, or filtering repeated events.\n\n```go\npackage main\n\nimport (\n    \"fmt\"\n    \"math/rand\"\n    \"time\"\n\n    \"github.com/Alfex4936/sbf-go\"\n)\n\nfunc main() {\n    // Parameters for the Stable Bloom Filter\n    expectedItems := uint32(1_000_000) // Expected number of items\n    falsePositiveRate := 0.01          // Desired false positive rate (1%)\n\n    // Create a Stable Bloom Filter with default decay settings\n    sbfInstance, err := sbf.NewDefaultStableBloomFilter(expectedItems, falsePositiveRate, 0, 0)\n    if err != nil {\n        panic(err)\n    }\n    defer sbfInstance.StopDecay() // Ensure resources are cleaned up\n\n    // Seed the random number generator\n    rand.Seed(time.Now().UnixNano())\n\n    // Simulate a stream of usernames with potential duplicates\n    totalUsers := 1_000_000\n    maxUserID := totalUsers / 2 // Adjust to increase the chance of duplicates\n    duplicateCount := 0\n\n    for i := 0; i \u003c totalUsers; i++ {\n        // Generate a username (e.g., \"user_123456\")\n        userID := rand.Intn(maxUserID)\n        username := fmt.Sprintf(\"user_%d\", userID)\n\n        // Check if the username might have been seen before\n        if sbfInstance.Check([]byte(username)) {\n            // Likely a duplicate\n            duplicateCount++\n        } else {\n            // New username, add it to the filter\n            sbfInstance.Add([]byte(username))\n        }\n    }\n\n    fmt.Printf(\"Total users processed: %d\\n\", totalUsers)\n    fmt.Printf(\"Potential duplicates detected: %d\\n\", duplicateCount)\n\n    // Estimate the false positive rate after processing\n    estimatedFPR := sbfInstance.EstimateFalsePositiveRate()\n    fmt.Printf(\"Estimated false positive rate: %.6f\\n\", estimatedFPR)\n}\n```\n\n**Output:**\n\n```\nTotal users processed: 1000000\nPotential duplicates detected: 567435\nEstimated false positive rate: 0.000107\n```\n\n**Explanation:**\n\n- We simulate one million user registrations, with user IDs ranging from 0 to 499,999. This means duplicates are likely, as we have more registrations than unique user IDs.\n- We use the SBF to check for duplicates:\n  - If `Check` returns `true`, we increment the `duplicateCount`.\n  - If `Check` returns `false`, we add the username to the filter.\n- The high number of duplicates detected is expected due to the limited range of user IDs, not because of false positives.\n- The estimated false positive rate is very low (`0.0107%`), indicating that almost all duplicates detected are actual duplicates.\n\n## When to Use\n\n- **High Throughput Systems**: Applications that require fast insertion and query times with minimal memory overhead.\n- **Streaming Data**: Scenarios where data is continuously flowing, and old data becomes less relevant over time.\n- **Duplicate Detection**: Identifying duplicate events or entries without storing all elements.\n- **Cache Expiration**: Probabilistically determining if an item is still fresh or should be re-fetched.\n- **Approximate Membership Testing**: When exact membership testing is less critical than speed and memory usage.\n\n## When Not to Use\n\n- **Exact Membership Required**: Applications that cannot tolerate false positives or require exact deletions of elements.\n- **Small Data Sets**: When the data set is small enough to be stored and managed with precise data structures.\n- **Sensitive Data**: Scenarios where the cost of a false positive is too high (e.g., financial transactions, critical security checks).\n- **Complex Deletion Requirements**: If you need to delete specific elements immediately, a Stable Bloom Filter is not suitable.\n\n## Parameters Explanation\n\n- **`expectedItems`**: The estimated number of unique items you expect to store in the filter. This helps calculate the optimal size of the filter.\n- **`falsePositiveRate`**: The desired probability of false positives. Lowering this value reduces false positives but increases memory usage.\n- **`decayRate`**: The probability that each bit in the filter will decay (be unset) during each decay interval. Default is `0.01` (1%).\n- **`decayInterval`**: The time duration between each decay operation. Default is `1 * time.Minute`.\n\n**Choosing `decayRate` and `decayInterval`:**\n\n- **Element Retention Time**: If you want elements to persist longer in the filter, decrease the `decayRate` or increase the `decayInterval`.\n- **High Insertion Rate**: For applications with high insertion rates, you may need a higher `decayRate` or shorter `decayInterval` to prevent the filter from becoming saturated.\n\n## Scalability\n\nThe Stable Bloom Filter is designed to be scalable and can handle large data sets and high-throughput applications efficiently. Here's how:\n\n### Concurrent Access\n\n- **Thread-Safe Operations**: The SBF implementation uses atomic operations, making it safe for concurrent use by multiple goroutines without additional locking mechanisms.\n- **High Throughput**: Insertion (`Add`) and query (`Check`) operations are fast and have constant time complexity `O(k)`, where `k` is the number of hash functions. This allows the filter to handle a high rate of operations per second.\n\n### Memory Efficiency\n\n- **Low Memory Footprint**: The SBF is space-efficient, requiring minimal memory to represent large sets. Memory usage is directly related to the desired false positive rate and the expected number of items.\n- **Configurable Parameters**: Adjusting the `falsePositiveRate` and `expectedItems` allows you to scale the filter to match your application's memory constraints and performance requirements.\n\n### Horizontal Scaling\n\n- **Sharding Filters**: For extremely large data sets or to distribute load, you can partition your data and use multiple SBF instances (shards). Each shard handles a subset of the data, allowing the system to scale horizontally.\n- **Distributed Systems**: In distributed environments, you can deploy SBF instances across multiple nodes, ensuring that each node maintains its own filter or shares filters through a coordination mechanism.\n\n### Example: Scaling with Multiple Filters\n\n```go\n// Number of shards (e.g., based on the number of CPUs or nodes)\nnumShards := 10\nsbfShards := make([]*sbf.StableBloomFilter, numShards)\n\n// Initialize each shard\nfor i := 0; i \u003c numShards; i++ {\n    sbfInstance, err := sbf.NewDefaultStableBloomFilter(expectedItems/uint32(numShards), falsePositiveRate, 0, 0)\n    if err != nil {\n        panic(err)\n    }\n    sbfShards[i] = sbfInstance\n    defer sbfInstance.StopDecay()\n}\n\n// Function to determine which shard to use (e.g., based on hash of the element)\nfunc getShardIndex(element []byte) int {\n    hashValue := someHashFunction(element)\n    return int(hashValue % uint32(numShards))\n}\n\n// Adding and checking elements\nelement := []byte(\"example_element\")\nshardIndex := getShardIndex(element)\nsbfShards[shardIndex].Add(element)\n\nif sbfShards[shardIndex].Check(element) {\n    fmt.Println(\"Element is probably in the set.\")\n}\n```\n\n**Explanation:**\n\n- **Sharding Logic**: Elements are distributed among shards based on a hash function. This reduces the load on individual filters and allows the system to handle more data and higher throughput.\n- **Scalability**: By adding more shards, you can scale horizontally to accommodate growing data volumes or increased performance demands.\n\n### Considerations\n\n- **Consistent Hashing**: Use consistent hashing to minimize data redistribution when adding or removing shards.\n- **Synchronization**: In some cases, you might need to synchronize filters or handle cross-shard queries, which can add complexity.\n- **Monitoring and Balancing**: Monitor the load on each shard to ensure even distribution and adjust the sharding strategy if necessary.\n\n## Performance Considerations\n\n- **Memory Efficiency**: Bloom filters are space-efficient, requiring minimal memory to represent large sets.\n- **Fast Operations**: Both insertion (`Add`) and query (`Check`) operations are fast and have constant time complexity `O(k)`, where `k` is the number of hash functions.\n- **Concurrency**: The implementation is safe for concurrent use by multiple goroutines without additional locking mechanisms.\n- **Decay Overhead**: The decay process runs in a separate goroutine. The overhead is minimal but should be considered in resource-constrained environments.\n\n## Limitations\n\n- **No Deletion of Specific Elements**: You cannot remove specific elements from the filter. Elements decay over time based on the decay parameters.\n- **False Positives**: The filter can return false positives (i.e., it may indicate that an element is present when it's not). The false positive rate is configurable but cannot be entirely eliminated.\n- **Not Suitable for Counting**: If you need to count occurrences of elements, consider using a Counting Bloom Filter instead.\n- **Sharding Complexity**: While sharding allows horizontal scaling, it introduces additional complexity in managing multiple filters and ensuring consistent hashing.\n\n## License\n\nThis project is licensed under the MIT License.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falfex4936%2Fsbf-go","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falfex4936%2Fsbf-go","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falfex4936%2Fsbf-go/lists"}