{"id":27156026,"url":"https://github.com/poabob/counting-bloom-filter","last_synced_at":"2025-09-10T05:09:59.096Z","repository":{"id":285356116,"uuid":"957838299","full_name":"POABOB/counting-bloom-filter","owner":"POABOB","description":"A Counting Bloom Filter Easy to Use.","archived":false,"fork":false,"pushed_at":"2025-03-31T08:13:10.000Z","size":7,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-08-01T00:30:35.016Z","etag":null,"topics":["bloom-filter","counting-bloom-filter","go","golang","memory-management"],"latest_commit_sha":null,"homepage":"https://pkg.go.dev/github.com/POABOB/counting-bloom-filter","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/POABOB.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-03-31T08:08:43.000Z","updated_at":"2025-04-01T03:20:41.000Z","dependencies_parsed_at":"2025-03-31T09:24:32.062Z","dependency_job_id":"db6fd19a-05ab-48b9-8134-dfb0643ffa89","html_url":"https://github.com/POABOB/counting-bloom-filter","commit_stats":null,"previous_names":["poabob/counting-bloom-filter"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/POABOB/counting-bloom-filter","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/POABOB%2Fcounting-bloom-filter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/POABOB%2Fcounting-bloom-filter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/POABOB%2Fcounting-bloom-filter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/POABOB%2Fcounting-bloom-filter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/POABOB","download_url":"https://codeload.github.com/POABOB/counting-bloom-filter/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/POABOB%2Fcounting-bloom-filter/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274416132,"owners_count":25280888,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-10T02:00:12.551Z","response_time":83,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bloom-filter","counting-bloom-filter","go","golang","memory-management"],"created_at":"2025-04-08T19:57:36.911Z","updated_at":"2025-09-10T05:09:59.048Z","avatar_url":"https://github.com/POABOB.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Counting Bloom Filter\n\n`Counting Bloom Filter` 是一個 `使用計數的 Bloom Filter`，它可以 `儲存元素` 並支援 `元素的過期處理`。\n這個過濾器適用於需要 `高效存儲大量資料`、`查詢是否存在的場景`，並且 `允許對已過期的元素進行清理`。\n\n## 1. 使用方法\n\n### 安裝\n\n```bash\ngo get github.com/POABOB/counting-bloom-filter\n```\n\n### 創建 Counting Bloom Filter\n\n你可以使用 `NewCountingBloomFilter` 或 `NewDefaultCountingBloomFilter` 來創建一個新的 Counting Bloom Filter。\n\n#### 使用自定義大小的 Counting Bloom Filter 與過期策略\n\n```go\npackage main\n\nimport (\n\t\"fmt\"\n\t\"time\"\n\tbloom \"github.com/POABOB/counting-bloom-filter\"\n)\n\nfunc main() {\n\t// 創建一個大小為 1MB 的 Counting Bloom Filter，並且每 30 秒定期刪除 1/10 的元素\n\tcbf := bloom.NewCountingBloomFilter(1 * 1024 * 1024, bloom.WithExpiryDuration(bloom.LAZY_EXPIRATION, 30 * time.Second))\n\n\t// 添加元素\n\tcbf.Add(\"item1\")\n\tcbf.Add(\"item2\")\n\n\t// 檢查元素是否存在\n\tfmt.Println(cbf.Check(\"item1\")) // true\n\tfmt.Println(cbf.Check(\"item2\")) // true\n\tfmt.Println(cbf.Check(\"item3\")) // false\n}\n```\n\n#### 使用預設的 Counting Bloom Filter\n\n```go\npackage main\n\nimport (\n\t\"fmt\"\n\tbloom \"github.com/POABOB/counting-bloom-filter\"\n)\n\nfunc main() {\n\t// 使用預設的大小和設置創建 Counting Bloom Filter\n\tcbf := bloom.NewDefaultCountingBloomFilter()\n\n\t// 添加元素\n\tcbf.Add(\"item1\")\n\tcbf.Add(\"item2\")\n\n\t// 檢查元素是否存在\n\tfmt.Println(cbf.Check(\"item1\")) // true\n\tfmt.Println(cbf.Check(\"item2\")) // true\n\tfmt.Println(cbf.Check(\"item3\")) // false\n}\n```\n\n### 方法說明\n\n- `Add(item string)`：將元素 `item` 添加到 Counting Bloom Filter 中。\n- `Check(item string) bool`：檢查元素 `item` 是否在 Bloom Filter 中。若元素已過期或不存在，將返回 `false`。\n- `Remove(item string)`：從 Bloom Filter 中移除元素 `item`。\n- `RemoveAll()`：從 Bloom Filter 中移除所有元素。\n\n## 2. 評估假陽性命中率\n\nBloom Filter 的 `假陽性命中率 (False Positive Rate, FPR)` 是指過濾器 `錯誤地判斷元素存在的機率`。根據設計，FPR 會受到以下因素的影響：\n- `Bloom Filter 的 bit 大小 m`。\n- `哈希函數的次數 K`。\n- `插入的元素數量 n`。\n\n根據這些因素，我們可以計算 FPR。假設：\n\n- 使用 `12` 次哈希函數（`K = 12`）。\n- 假設平均元素數量是 `33,333`。\n- bit 大小為 `1MB`（即 `1024 * 1024` 位元）。\n\n在這種配置下，布隆過濾器的 FPR 可以通過以下公式計算：\n\n```text\nFPR ≈ (1 - e^(-K * n / m))^K\n```\n\n在這種配置下，經過計算，FPR 約為 `0.00000165`（即 `1.65e-06`），這代表在平均插入 33,333 個元素的情況下，FPR 非常低。\n\n所以，可以透過 `n (平均插入元素數量)` 來調整 `m (bit 大小)`，以達到最佳的性能和精確度。\n\n如果希望達到 `1.65e-06` 的錯誤率且 `平均元素有 100 萬個`，建議 `m = 1_000_000 * 30`，`m` 設定為 `30MiB` 會比較好。\n\n## 3. 預設值\n- `m`：預設大小為 `1MB`（`1024 * 1024` 位元），這在存儲大約 33,333 個元素的情況下，FPR 為 `0.00000165`。\n- `K`：預設使用 `12` 次哈希函數，這可以保證即使在高並發的情況下，FPR 依然保持在合理範圍。\n\n## 4. 配置選項\n\n你可以在創建 `CountingBloomFilter` 時傳入自定義選項來控制過濾器的行為，例如過期策略、清理間隔時間等。\n\n### 支援的過期策略：\n- `NO_EXPIRATION`：不過期。\n- `LAZY_EXPIRATION`：模擬延遲過期（通過減少計數器來實現過期），每次會 `減少 1/10 的元素`。\n- `RESET_EVERY_PERIOD`：定期 `重設所有元素` 的計數器。\n- `EXPIRY_DURATION`：基於時間的過期策略，`當元素過期時將被移除`，記憶體使用量較大，`GB` 級別資料量不建議使用。\n\n例如，若你想設置自定義的過期策略和清理時間間隔，可以這樣配置：\n```go\npackage main\n\nimport (\n\tbloom \"github.com/POABOB/counting-bloom-filter\"\n)\n\nopts := bloom.WithExpiryDuration(bloom.LAZY_EXPIRATION, 60 * time.Second))\ncbf := bloom_filter.NewCountingBloomFilter(1 * 1024 * 1024, opts)\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpoabob%2Fcounting-bloom-filter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpoabob%2Fcounting-bloom-filter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpoabob%2Fcounting-bloom-filter/lists"}