{"id":47300536,"url":"https://github.com/tdortman/Cuckoo-GPU","last_synced_at":"2026-03-31T06:00:41.022Z","repository":{"id":331092839,"uuid":"1125497686","full_name":"tdortman/Cuckoo-GPU","owner":"tdortman","description":"High-Performance GPU Cuckoo Filter","archived":false,"fork":false,"pushed_at":"2026-03-15T01:32:01.000Z","size":4140,"stargazers_count":35,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-15T13:27:02.649Z","etag":null,"topics":["bloom-filter","cuckoo-filter","cuda","gpu","membership-query","probabilistic-data-structures","probabilistic-filters"],"latest_commit_sha":null,"homepage":"http://tdortman.github.io/Cuckoo-GPU/","language":"Cuda","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsl-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tdortman.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-30T20:49:50.000Z","updated_at":"2026-03-15T01:32:04.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/tdortman/Cuckoo-GPU","commit_stats":null,"previous_names":["tiltedtoast/cuckoo-filter","tdortman/cuckoo-filter","tdortman/cuckoo-gpu"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/tdortman/Cuckoo-GPU","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tdortman%2FCuckoo-GPU","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tdortman%2FCuckoo-GPU/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tdortman%2FCuckoo-GPU/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tdortman%2FCuckoo-GPU/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tdortman","download_url":"https://codeload.github.com/tdortman/Cuckoo-GPU/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tdortman%2FCuckoo-GPU/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31223286,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-31T04:08:55.938Z","status":"ssl_error","status_checked_at":"2026-03-31T04:08:47.883Z","response_time":111,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bloom-filter","cuckoo-filter","cuda","gpu","membership-query","probabilistic-data-structures","probabilistic-filters"],"created_at":"2026-03-17T01:38:19.960Z","updated_at":"2026-03-31T06:00:41.017Z","avatar_url":"https://github.com/tdortman.png","language":"Cuda","funding_links":[],"categories":["Cuda"],"sub_categories":[],"readme":"# Cuckoo-GPU\n\n[![Documentation](https://img.shields.io/badge/docs-latest-blue.svg)](https://tdortman.github.io/Cuckoo-GPU/)\n[![arXiv](https://img.shields.io/badge/arXiv-2603.15486-b31b1b.svg)](https://arxiv.org/abs/2603.15486)\n\nA high-performance, lock-free CUDA implementation of the Cuckoo Filter. This library is the companion code for the paper **\"Cuckoo-GPU: Accelerating Cuckoo Filters on Modern GPUs\"**.\n\n## Overview\n\nThis library provides a GPU-accelerated Cuckoo Filter implementation optimized for high-throughput batch operations. Cuckoo Filters are space-efficient probabilistic data structures that support insertion, lookup, and deletion operations with a configurable false positive rate.\n\n## Features\n\n- CUDA-accelerated batch insert, lookup, and delete operations\n- Configurable fingerprint size and bucket size\n- Multiple eviction policies (DFS, BFS)\n- Sorted insertion mode for improved memory coalescing\n- Multi-GPU support via [gossip](https://github.com/Funatiq/gossip)\n- Experimental IPC support for cross-process filter sharing\n- Header-only library design\n\n## Performance\n\n![image](./docs/load_factor_bar_95.png)\n\nBenchmarks at 95% load factor on an NVIDIA GH200 (H100 HBM3, 3.4 TB/s) with 16-bit fingerprints and equivalent space allocation for the Blocked Bloom Filter. The PCF runs on an Intel Xeon W9-3595X CPU (120 threads).\n\nCuckoo-GPU is compared against:\n\n- [CPU Partitioned Cuckoo Filter (PCF)](https://github.com/tum-db/partitioned-filters)\n- [GPU Bulk Two-Choice Filter (TCF)](https://github.com/saltsystemslab/gpu-filters/tree/main/bulk-tcf)\n- [GPU Counting Quotient Filter (GQF)](https://github.com/saltsystemslab/gpu-filters/tree/main/gqf)\n- [GPU Blocked Bloom Filter (GBBF)](https://github.com/NVIDIA/cuCollections)\n- [GPU Bucketed Cuckoo Hash Table (BCHT)](https://github.com/owensgroup/BGHT)\n\n### L2-Resident (4M items, ~8 MiB)\n\n| Comparison         | Insert         | Query        | Delete      | FPR              |\n| ------------------ | -------------- | ------------ | ----------- | ---------------- |\n| Cuckoo-GPU vs PCF  | 175× faster    | 351× faster  | N/A         | 0.046% vs 0.011% |\n| Cuckoo-GPU vs TCF  | 4× faster      | 35× faster   | 108× faster | 0.046% vs 0.409% |\n| Cuckoo-GPU vs GQF  | 378× faster    | 6× faster    | 258× faster | 0.046% vs 0.001% |\n| Cuckoo-GPU vs GBBF | 0.35× (slower) | 1.2× faster  | N/A         | 0.046% vs 2.503% |\n| Cuckoo-GPU vs BCHT | 11.3× faster   | 40.9× faster | N/A         | 0.046% vs 0%     |\n\n### DRAM-Resident (268M items, ~512 MiB)\n\n| Comparison         | Insert        | Query         | Delete       | FPR              |\n| ------------------ | ------------- | ------------- | ------------ | ---------------- |\n| Cuckoo-GPU vs PCF  | 69× faster    | 143× faster   | N/A          | 0.044% vs 0.010% |\n| Cuckoo-GPU vs TCF  | 2.1× faster   | 9.9× faster   | 44.9× faster | 0.044% vs 0.467% |\n| Cuckoo-GPU vs GQF  | 10.1× faster  | 2.6× faster   | 3.6× faster  | 0.044% vs 0.001% |\n| Cuckoo-GPU vs GBBF | 0.7× (slower) | 0.9× (slower) | N/A          | 0.044% vs 6.092% |\n| Cuckoo-GPU vs BCHT | 8.5× faster   | 15.9× faster  | N/A          | 0.044% vs 0%     |\n\n\u003e [!NOTE]\n\u003e A much more comprehensive evaluation, including additional systems and analyses, is presented in the [accompanying thesis](https://tdortman.github.io/thesis/thesis.pdf).\n\n## Requirements\n\n- CUDA Toolkit (\u003e= 12.9)\n- C++20 compatible compiler\n- Meson build system (\u003e= 1.3.0)\n\n## Building\n\n```bash\nmeson setup build\nmeson compile -C build\n```\n\nBenchmarks and tests are built by default. To disable them:\n\n```bash\nmeson setup build -DBUILD_BENCHMARKS=false -DBUILD_TESTS=false\n```\n\n## Usage\n\n```cpp\n#include \u003ccuckoogpu/CuckooFilter.cuh\u003e\n\n// Configure the filter: key type, fingerprint bits, max evictions, block size, bucket size\nusing Config = cuckoogpu::Config\u003cuint64_t, 16, 500, 256, 16\u003e;\n\n// Create a filter with the desired capacity\ncuckoogpu::Filter\u003cConfig\u003e filter(1 \u003c\u003c 20);  // capacity for ~1M items\n\n// Insert keys (d_keys is a device pointer)\nfilter.insertMany(d_keys, numKeys);\n\n// Or use sorted insertion\nfilter.insertManySorted(d_keys, numKeys);\n\n// Check membership\nfilter.containsMany(d_keys, numKeys, d_results);\n\n// Delete keys\nfilter.deleteMany(d_keys, numKeys, d_results);\n```\n\n### Configuration Options\n\nThe `Config` template accepts the following parameters:\n\n| Parameter         | Description                              | Default              |\n| ----------------- | ---------------------------------------- | -------------------- |\n| `T`               | Key type                                 | -                    |\n| `bitsPerTag`      | Fingerprint size in bits (8, 16, 32)     | -                    |\n| `maxEvictions`    | Maximum eviction attempts before failure | 500                  |\n| `blockSize`       | CUDA block size                          | 256                  |\n| `bucketSize`      | Slots per bucket (must be power of 2)    | 16                   |\n| `AltBucketPolicy` | Alternate bucket calculation policy      | `XorAltBucketPolicy` |\n| `evictionPolicy`  | Eviction strategy (DFS or BFS)           | `BFS`                |\n| `WordType`        | Atomic type (uint32_t or uint64_t)       | `uint64_t`           |\n\n## Multi-GPU Support\n\nFor workloads that exceed single GPU capacity:\n\n```cpp\n#include \u003ccuckoogpu/CuckooFilterMultiGPU.cuh\u003e\n\ncuckoogpu::FilterMultiGPU\u003cConfig\u003e filter(numGPUs, totalCapacity);\nfilter.insertMany(h_keys, numKeys);\nfilter.containsMany(h_keys, numKeys, h_results);\n```\n\n## Project Structure\n\n```\ninclude/cuckoogpu/   - Header files\n  CuckooFilter.cuh           - Main filter implementation\n  CuckooFilterMultiGPU.cuh   - Multi-GPU implementation\n  CuckooFilterIPC.cuh        - IPC support\n  bucket_policies.cuh        - Alternative bucket policies\n  helpers.cuh                - Helper functions\nsrc/                 - Example applications\nbenchmarks/          - benchmarks\ntests/               - Unit tests\nscripts/             - Scripts for running/plotting benchmarks\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftdortman%2FCuckoo-GPU","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftdortman%2FCuckoo-GPU","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftdortman%2FCuckoo-GPU/lists"}