{"id":21833665,"url":"https://github.com/upsj/gpu_selection","last_synced_at":"2025-04-14T07:52:00.999Z","repository":{"id":69058777,"uuid":"203400364","full_name":"upsj/gpu_selection","owner":"upsj","description":"Parallel selection on GPUs","archived":false,"fork":false,"pushed_at":"2021-03-23T00:14:00.000Z","size":537,"stargazers_count":15,"open_issues_count":0,"forks_count":4,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-06T20:53:04.617Z","etag":null,"topics":["algorithms","gpgpu","gpu-computing","quickselect"],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/upsj.png","metadata":{"files":{"readme":"README","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-08-20T15:08:26.000Z","updated_at":"2024-12-07T02:08:08.000Z","dependencies_parsed_at":"2023-02-22T17:45:56.001Z","dependency_job_id":null,"html_url":"https://github.com/upsj/gpu_selection","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/upsj%2Fgpu_selection","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/upsj%2Fgpu_selection/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/upsj%2Fgpu_selection/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/upsj%2Fgpu_selection/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/upsj","download_url":"https://codeload.github.com/upsj/gpu_selection/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248844075,"owners_count":21170486,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algorithms","gpgpu","gpu-computing","quickselect"],"created_at":"2024-11-27T19:32:36.130Z","updated_at":"2025-04-14T07:52:00.990Z","avatar_url":"https://github.com/upsj.png","language":"C++","readme":"This library implements a bucket-based selection algorithm on GPUs\n\nMore details can be found in\n\n* T. Ribizel and H. Anzt, \"Approximate and Exact Selection on GPUs,\" 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Rio de Janeiro, Brazil, 2019, pp. 471-478.\ndoi: 10.1109/IPDPSW.2019.00088\n* T. Ribizel, H. Anzt, \"Parallel selection on GPUs,\" Parallel Computing, Volume 91, 2020, doi: 10.1016/j.parco.2019.102588\n\nIt uses Catch2 as a test framework and the CUB library as a reference implementation for sorting.\n\nThe tests can be run by simply executing `app/unittests`, the benchmarks can be run by executing `app/benchmark` with one of the following parameters\n\n[full]             The full benchmark for exact single and multiple selection and the individual kernels (sample, count, reduce, filter)\n[full-multionly]   The full benchmark for multiple selection only\n[approx]           The full benchmark for approximate selection with shared-memory atomics\n[approx-g]         The full benchmark for approximate selection with global-memory atomics\n[multi]            The full benchmark for multiple selection with different numbers of ranks\n[test]             A small benchmark that only executes a single benchmark with small input size\n\nThe output of these tests is the following:\nOn stdout, they print error messages in case the algorithm execution produces invalid results. For the approx tests, additionally the exact and approximate rank are being output in CSV format.\nOn stderr, they print the individual timings of the kernels in CSV format for different input sizes given by the first CSV field. Runtime breakdowns are listed within parentheses ().\n\n`app/benchmark-sort` contains a benchmark for the CUB radix sort implementation as a performance baseline for the multiple selection.\n\nStructure of the project\n\ninclude/cpu_reference.hpp     - Reference implementations for testing\ninclude/verification.hpp      - Validation functions for testing\ninclude/cuda_definitions.cuh  - Type definitions and hardware limits\ninclude/cuda_error.cuh        - Wrapper for CUDA error handling\ninclude/cuda_memory.cuh       - Wrapper for CUDA memory allocations\ninclude/cuda_timer.cuh        - Wrapper for CUDA timing measurements\ninclude/kernel_config.cuh     - Configuration struct for kernel templates\ninclude/launcher_fwd.cuh      - Forward-declarations of launcher and kernel templates\n\nlib/generated/*               - Explicit template instantiations to parallelize compilation\nlib/cpu_reference.cpp         - Reference implementations for testing\nlib/verification.cpp          - Validation functions for testing\nlib/qs_launchers.cuh          - Wrappers for quickselect kernels\nlib/qs_recursion.cuh          - Kernels for quickselect single-selection\nlib/qs_recursion_multi.cuh    - Kernels for quickselect multi-selection\nlib/qs_reduce.cuh             - Kernels for reducing quickselect partial sums\nlib/qs_scan.cuh               - Kernels for quickselect bipartitioning\nlib/ssss_build_searchtree.cuh - Kernels for sampleselect sampling\nlib/ssss_collect.cuh          - Kernels for sampleselect single-selection filtering\nlib/ssss_collect_multi.cuh    - Kernels for sampleselect multi-selection filtering\nlib/ssss_count.cuh            - Kernels for sampleselect counting\nlib/ssss_launchers.cuh        - Wrappers for sampleselect kernels\nlib/ssss_merged.cuh           - Kernels for multiple simultaneous sampleselects\nlib/ssss_merged_memory.cuh    - Auxiliary data structure for sampleselect multi-selection\nlib/ssss_recursion.cuh        - Kernels for sampleselect single-selection\nlib/ssss_recursion_multi.cuh  - Kernels for sampleselect multi-selection\nlib/ssss_reduce.cuh           - Kernels for reducing sampleselect partial sums\nlib/utils_basecase.cuh        - Kernels for recursion basecase\nlib/utils_bytestorage.cuh     - Auxiliary functions for reading/writing unaligned bytes\nlib/utils_mask.cuh            - Auxiliary functions for bitmasks\nlib/utils_prefixsum.cuh       - Auxiliary functions for tree-based partial sums\nlib/utils_sampling.cuh        - Auxiliary functions for sampling\nlib/utils_search.cuh          - Auxiliary functions for binary and warp-ary searches\nlib/utils_sort.cuh            - Auxiliary functions for bitonic sorting\nlib/utils_warpaggr.cuh        - Auxiliary functions for warp-aggregation\nlib/utils_work.cuh            - Auxiliary functions for work-distribution\nlib/utils.cuh                 - Auxiliary wrappers for basic operations\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fupsj%2Fgpu_selection","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fupsj%2Fgpu_selection","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fupsj%2Fgpu_selection/lists"}