{"id":18009174,"url":"https://github.com/maawad/ptx_bcht","last_synced_at":"2026-05-01T13:32:24.186Z","repository":{"id":43015397,"uuid":"465913350","full_name":"maawad/PTX_BCHT","owner":"maawad","description":"Bucketed Cuckoo hash set written in PTX and JIT-compiled.","archived":false,"fork":false,"pushed_at":"2022-04-03T21:18:32.000Z","size":487,"stargazers_count":1,"open_issues_count":1,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-04T12:51:27.660Z","etag":null,"topics":["cuckoo","cuda","gpu","hash","hashset","ptx"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/maawad.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-03-03T23:08:39.000Z","updated_at":"2022-04-05T17:10:02.000Z","dependencies_parsed_at":"2022-08-27T12:41:35.606Z","dependency_job_id":null,"html_url":"https://github.com/maawad/PTX_BCHT","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/maawad/PTX_BCHT","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maawad%2FPTX_BCHT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maawad%2FPTX_BCHT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maawad%2FPTX_BCHT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maawad%2FPTX_BCHT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/maawad","download_url":"https://codeload.github.com/maawad/PTX_BCHT/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maawad%2FPTX_BCHT/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32499681,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-30T13:12:12.517Z","status":"online","status_checked_at":"2026-05-01T02:00:05.856Z","response_time":64,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cuckoo","cuda","gpu","hash","hashset","ptx"],"created_at":"2024-10-30T02:08:54.665Z","updated_at":"2026-05-01T13:32:24.163Z","avatar_url":"https://github.com/maawad.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PTX BCHT\nBucketed Cuckoo hash table written in PTX and JIT-compiled.\n\n## About\nThis repository aims to experiment with JIT compilations of hand-written PTX CUDA files (we also love assembly). We implemented [insertion](./ptx/bcht_insert_kernel.ptx) and [query](./ptx/bcht_find_kernel.ptx)  kernels for a bucketed cuckoo hash set entirely in NVIDIA's PTX.\n\n\nCuckoo hashing is a probing scheme that achieves very low number of probes at very high load factors. This implementation is probably the fastest GPU hash set implementation, but if you are interested in the state-of-the-art cuckoo hashing implementation, check out our bucketed cuckoo hashing [implementation](https://github.com/owensgroup/BGHT) and [paper](https://arxiv.org/abs/2108.07232).\n\n## Build\n```bash\ngit clone https://github.com/maawad/PTX_BCHT.git\ncd PTX_BCHT\nmkdir build \u0026\u0026 cd build\ncmake ..\nmake\n```\n\n## Run\nRequirement: NVIDIA GPU with Volta or later microarchitecture and CUDA.\n\nUsage\n```bash\n./ptx_cuckoo_hashtable\n--num-keys                    Number of keys\n--load-factor                 Load factor of the hash set\n--exist-ratio                 Ratio of queries that exist in the hash set\n--device_id                   GPU device ID\n```\n\nExample:\n```bash\n# from the build directory:\n./ptx_cuckoo_hashtable --num-keys=1'000'000 --load-factor=0.99 --exist-ratio=1.0\n```\nOutput:\n```bash\nnum_keys: 1000000\nload_factor: 0.99\ndevice_id: 0\nDevice[0]: NVIDIA TITAN V\n--------------------------------------------------------\nCUDA Link Completed in 0.000000 ms.\nLinker Output:\nptxas info    : 0 bytes gmem\nptxas info    : Compiling entry function 'bcht_insert' for 'sm_70'\nptxas info    : Function properties for bcht_insert\nptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads\nptxas info    : Used 28 registers, 412 bytes cmem[0]\ninfo    : 0 bytes gmem\ninfo    : Function properties for 'bcht_insert':\ninfo    : used 28 registers, 0 stack, 0 bytes smem, 412 bytes cmem[0], 0 bytes lmem\n--------------------------------------------------------\nCUDA kernel: bcht_insert launched\n--------------------------------------------------------\nCUDA Link Completed in 0.000000 ms.\nLinker Output:\nptxas info    : 98 bytes gmem\nptxas info    : Compiling entry function 'bcht_find' for 'sm_70'\nptxas info    : Function properties for bcht_find\nptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads\nptxas info    : Used 28 registers, 424 bytes cmem[0]\ninfo    : 98 bytes gmem\ninfo    : Function properties for 'bcht_find':\ninfo    : used 28 registers, 0 stack, 0 bytes smem, 424 bytes cmem[0], 0 bytes lmem\n--------------------------------------------------------\nCUDA kernel: bcht_find launched\nSuccess!\nFind rate:      3160.4 million keys/s\nInsert rate:    1650.99 million keys/s\nFind ratio was: 100%\n```\n\n## Performance\nBelow are results for building a hash table with a given number of keys then performing the same number of queries as the number of keys. Queries are performed for different positive queries ratios (e.g., 100% for queries that all exist in the hash set).\n```\n   Millions    |      %      |                  Million keys/s\nNumber of keys | Load factor | Insertion rate |             Find rate\n               |             |                |    0%        50%        100%\n      50       |     60      |     1461.81    |  3708.09    3800.74     3897.83\n      50       |     65      |     1462.44    |  3663.85    3779.85     3889.45\n      50       |     70      |     1462.18    |  3791.30    3736.75     3888.21\n      50       |     75      |     1458.97    |  3487.22    3660.28     3991.88\n      50       |     80      |     1451.99    |  3258.47    3538.27     3864.82\n      50       |     82      |     1446.38    |  3302.10    3454.41     3965.90\n      50       |     84      |     1440.71    |  3017.43    3378.41     3822.16\n      50       |     86      |     1431.60    |  2855.28    3262.82     3808.15\n      50       |     88      |     1419.21    |  2684.79    3264.78     3766.44\n      50       |     90      |     1404.71    |  2593.79    2987.34     3727.34\n      50       |     91      |     1395.24    |  2400.24    2917.55     3712.32\n      50       |     92      |     1383.33    |  2381.16    2822.76     3684.59\n      50       |     93      |     1371.44    |  2190.39    2740.23     3652.07\n      50       |     94      |     1356.04    |  2153.77    2724.78     3724.78\n      50       |     95      |     1337.41    |  1969.59    2601.53     3668.25\n      50       |     96      |     1314.53    |  1850.32    2431.92     3621.73\n      50       |     97      |     1285.49    |  1735.31    2316.00     3537.24\n      50       |     98      |     1243.96    |  1622.30    2188.13     3340.51\n      50       |     99      |     1178.17    |  1499.96    2045.16     3178.91\n```\n\n## PTX Editor\n\nIf you have QT 6.3 or higher, you can also build a text editor where you can load the PTX kernels and edit them with an option to hot-reload and recompile the kernels with each edit.\n\n![](/img/ptx_editor.png)\n\n## Authors:\n[Muhammad Awad](https://github.com/maawad) and [Serban Porumbescu](https://github.com/porumbes).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaawad%2Fptx_bcht","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmaawad%2Fptx_bcht","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaawad%2Fptx_bcht/lists"}