{"id":31288993,"url":"https://github.com/numpy/x86-simd-sort","last_synced_at":"2025-09-24T13:03:29.601Z","repository":{"id":61770572,"uuid":"554411496","full_name":"numpy/x86-simd-sort","owner":"numpy","description":"C++ template library for high performance SIMD based sorting algorithms","archived":false,"fork":false,"pushed_at":"2025-09-16T04:47:56.000Z","size":1130,"stargazers_count":977,"open_issues_count":24,"forks_count":69,"subscribers_count":21,"default_branch":"main","last_synced_at":"2025-09-23T00:37:58.197Z","etag":null,"topics":["argsort","avx2","avx512","partialsort","quickselect","quicksort","sort","x86"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/numpy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"open_collective":"numpy","tidelift":"pypi/numpy","custom":"https://numpy.org/about/#donate"}},"created_at":"2022-10-19T19:14:04.000Z","updated_at":"2025-09-22T18:57:50.000Z","dependencies_parsed_at":"2023-02-16T23:01:19.503Z","dependency_job_id":"f816c927-33eb-401f-9d60-799b17a19917","html_url":"https://github.com/numpy/x86-simd-sort","commit_stats":null,"previous_names":["numpy/x86-simd-sort"],"tags_count":8,"template":false,"template_full_name":null,"purl":"pkg:github/numpy/x86-simd-sort","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/numpy%2Fx86-simd-sort","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/numpy%2Fx86-simd-sort/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/numpy%2Fx86-simd-sort/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/numpy%2Fx86-simd-sort/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/numpy","download_url":"https://codeload.github.com/numpy/x86-simd-sort/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/numpy%2Fx86-simd-sort/sbom","scorecard":{"id":490977,"data":{"date":"2025-08-19T12:06:38Z","repo":{"name":"github.com/intel/x86-simd-sort","commit":"058f9132b87bb11ab7dd1aa27e07b070c6ba0f4b"},"scorecard":{"version":"v5.0.0","commit":"ea7e27ed41b76ab879c862fa0ca4cc9c61764ee4"},"score":6,"checks":[{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/ea7e27ed41b76ab879c862fa0ca4cc9c61764ee4/docs/checks.md#binary-artifacts"}},{"name":"Branch-Protection","score":-1,"reason":"internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration","details":null,"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/ea7e27ed41b76ab879c862fa0ca4cc9c61764ee4/docs/checks.md#branch-protection"}},{"name":"CI-Tests","score":10,"reason":"9 out of 9 merged PRs checked by a CI test -- score normalized to 10","details":null,"documentation":{"short":"Determines if the project runs tests before pull requests are merged.","url":"https://github.com/ossf/scorecard/blob/ea7e27ed41b76ab879c862fa0ca4cc9c61764ee4/docs/checks.md#ci-tests"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/ea7e27ed41b76ab879c862fa0ca4cc9c61764ee4/docs/checks.md#cii-best-practices"}},{"name":"Code-Review","score":8,"reason":"Found 8/9 approved changesets -- score normalized to 8","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/ea7e27ed41b76ab879c862fa0ca4cc9c61764ee4/docs/checks.md#code-review"}},{"name":"Contributors","score":3,"reason":"project has 1 contributing companies or organizations -- score normalized to 3","details":["Info: intel corporation contributor org/company found, "],"documentation":{"short":"Determines if the project has a set of contributors from multiple organizations (e.g., companies).","url":"https://github.com/ossf/scorecard/blob/ea7e27ed41b76ab879c862fa0ca4cc9c61764ee4/docs/checks.md#contributors"}},{"name":"Dangerous-Workflow","score":10,"reason":"no dangerous workflow patterns detected","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/ea7e27ed41b76ab879c862fa0ca4cc9c61764ee4/docs/checks.md#dangerous-workflow"}},{"name":"Dependency-Update-Tool","score":0,"reason":"no update tool detected","details":["Warn: no dependency update tool configurations found"],"documentation":{"short":"Determines if the project uses a dependency update tool.","url":"https://github.com/ossf/scorecard/blob/ea7e27ed41b76ab879c862fa0ca4cc9c61764ee4/docs/checks.md#dependency-update-tool"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/ea7e27ed41b76ab879c862fa0ca4cc9c61764ee4/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE.md:0","Info: FSF or OSI recognized license: BSD 3-Clause \"New\" or \"Revised\" License: LICENSE.md:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/ea7e27ed41b76ab879c862fa0ca4cc9c61764ee4/docs/checks.md#license"}},{"name":"Maintained","score":0,"reason":"1 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/ea7e27ed41b76ab879c862fa0ca4cc9c61764ee4/docs/checks.md#maintained"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/ea7e27ed41b76ab879c862fa0ca4cc9c61764ee4/docs/checks.md#packaging"}},{"name":"Pinned-Dependencies","score":4,"reason":"dependency not pinned by hash detected -- score normalized to 4","details":["Warn: pipCommand not pinned by hash: .github/workflows/build-test-on-32bit.sh:6","Warn: pipCommand not pinned by hash: .github/workflows/build-numpy.yml:123","Warn: pipCommand not pinned by hash: .github/workflows/build-numpy.yml:124","Warn: pipCommand not pinned by hash: .github/workflows/build-numpy.yml:57","Warn: pipCommand not pinned by hash: .github/workflows/build-numpy.yml:58","Info:  20 out of  20 GitHub-owned GitHubAction dependencies pinned","Info:   1 out of   1 third-party GitHubAction dependencies pinned","Info:   0 out of   5 pipCommand dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/ea7e27ed41b76ab879c862fa0ca4cc9c61764ee4/docs/checks.md#pinned-dependencies"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 30 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/ea7e27ed41b76ab879c862fa0ca4cc9c61764ee4/docs/checks.md#sast"}},{"name":"Security-Policy","score":10,"reason":"security policy file detected","details":["Info: security policy file detected: SECURITY.md:1","Info: Found linked content: SECURITY.md:1","Info: Found disclosure, vulnerability, and/or timelines in security policy: SECURITY.md:1","Info: Found text in security policy: SECURITY.md:1"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/ea7e27ed41b76ab879c862fa0ca4cc9c61764ee4/docs/checks.md#security-policy"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/ea7e27ed41b76ab879c862fa0ca4cc9c61764ee4/docs/checks.md#signed-releases"}},{"name":"Token-Permissions","score":10,"reason":"GitHub workflow tokens follow principle of least privilege","details":["Info: topLevel permissions set to 'read-all': .github/workflows/build-numpy.yml:11","Info: topLevel permissions set to 'read-all': .github/workflows/c-cpp.yml:9","Info: topLevel permissions set to 'read-all': .github/workflows/linting.yml:9","Info: topLevel permissions set to 'read-all': .github/workflows/scorecard.yml:18","Info: no jobLevel write permissions found"],"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/ea7e27ed41b76ab879c862fa0ca4cc9c61764ee4/docs/checks.md#token-permissions"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/ea7e27ed41b76ab879c862fa0ca4cc9c61764ee4/docs/checks.md#vulnerabilities"}}]},"last_synced_at":"2025-08-19T19:08:21.566Z","repository_id":61770572,"created_at":"2025-08-19T19:08:21.567Z","updated_at":"2025-08-19T19:08:21.567Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":276754055,"owners_count":25698832,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-24T02:00:09.776Z","response_time":97,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["argsort","avx2","avx512","partialsort","quickselect","quicksort","sort","x86"],"created_at":"2025-09-24T13:01:54.949Z","updated_at":"2025-09-24T13:03:29.596Z","avatar_url":"https://github.com/numpy.png","language":"C++","readme":"# x86-simd-sort\n\nC++ template library for high performance SIMD based sorting routines for\nbuilt-in integers and floats (16-bit, 32-bit and 64-bit data types) and custom\ndefined C++ objects. The sorting routines are accelerated using AVX-512/AVX2\nwhen available. The library auto picks the best version depending on the\nprocessor it is run on. If you are looking for the AVX-512 or AVX2 specific\nimplementations, please see\n[README](https://github.com/intel/x86-simd-sort/blob/main/src/README.md) file\nunder `src/` directory. The following routines are currently supported:\n\n## Sort an array of custom defined class objects (uses `O(N)` space)\n``` cpp\ntemplate \u003ctypename T, typename U, typename Func\u003e\nvoid x86simdsort::object_qsort(T *arr, U arrsize, Func key_func)\n```\n`T` is any user defined struct or class and `arr` is a pointer to the first\nelement in the array of objects of type `T`. The `arrsize` parameter can be any\n32-bit or 64-bit integer type. `Func` is a lambda function that computes the\n`key` value for each object which is the metric used to sort the objects.\n`Func` needs to have the following signature:\n\n```cpp\n[] (T obj) -\u003e key_t { key_t key; /* compute key for obj */ return key; }\n```\n\nNote that the return type of the key `key_t` needs to be one of the following :\n`[float, uint32_t, int32_t, double, uint64_t, int64_t]`. `object_qsort` has a\nspace complexity of `O(N)`. Specifically, it requires `arrsize * sizeof(key_t)`\nbytes to store a vector with all the keys and an additional `arrsize *\nsizeof(uint32_t)` bytes to store the indexes of the object array.  For\nperformance reasons, we recommend using `object_qsort` when the array size\nis less than or equal to `UINT32_MAX`. An example usage of `object_qsort` is\nprovided in the [examples](#Sort-an-array-of-Points-using-object_qsort)\nsection.  Refer to [section](#Performance-of-object_qsort) to get a sense of\nhow fast this is relative to `std::sort`.\n\n## Sort an array of built-in integers and floats\n```cpp\nvoid x86simdsort::qsort(T* arr, size_t size, bool hasnan, bool descending);\nvoid x86simdsort::qselect(T* arr, size_t k, size_t size, bool hasnan, bool descending);\nvoid x86simdsort::partial_qsort(T* arr, size_t k, size_t size, bool hasnan, bool descending);\n```\nSupported datatypes: `T` $\\in$ `[_Float16, uint16_t, int16_t, float, uint32_t,\nint32_t, double, uint64_t, int64_t]`\n\n## Key-value sort routines on pairs of arrays\n```cpp\nvoid x86simdsort::keyvalue_qsort(T1* key, T2* val, size_t size, bool hasnan, bool descending);\nvoid x86simdsort::keyvalue_select(T1* key, T2* val, size_t k, size_t size, bool hasnan, bool descending);\nvoid x86simdsort::keyvalue_partial_sort(T1* key, T2* val, size_t k, size_t size, bool hasnan, bool descending);\n```\nSupported datatypes: `T1`, `T2` $\\in$ `[float, uint32_t, int32_t, double,\nuint64_t, int64_t]` Note that keyvalue sort is not yet supported for 16-bit\ndata types.\n\n## Arg sort routines on arrays\n```cpp\nstd::vector\u003csize_t\u003e arg = x86simdsort::argsort(T* arr, size_t size, bool hasnan, bool descending);\nstd::vector\u003csize_t\u003e arg = x86simdsort::argselect(T* arr, size_t k, size_t size, bool hasnan);\n```\nSupported datatypes: `T` $\\in$ `[_Float16, uint16_t, int16_t, float, uint32_t, int32_t, double,\nuint64_t, int64_t]` Note that argsort and argselect are not accelerated with SIMD when using 16-bit\ndata types.\n\n## Build/Install\n\n[meson](https://github.com/mesonbuild/meson) is the used build system. Command\nto build and install the library:\n\n```\nmeson setup --buildtype release builddir \u0026\u0026 cd builddir\nmeson compile\nsudo meson install\n```\n\nOnce installed, you can use `pkg-config --cflags --libs x86simdsortcpp` to\npopulate the right cflags and ldflags to compile and link your C++ program.\nThis repository also contains a test suite and benchmarking suite which are\nwritten using [googletest](https://github.com/google/googletest) and [google\nbenchmark](https://github.com/google/benchmark) (\u003e= v1.9.2) frameworks\nrespectively. You can configure meson to build them both by using\n`-Dbuild_tests=true` and `-Dbuild_benchmarks=true`.\n\n## Build using OpenMP\n\n`qsort`, `argsort`, and `keyvalue_qsort` can achieve even greater performance\n(up-to 3x speedup) through parallelization with\n[OpenMP](https://www.openmp.org/). By default, OpenMP support is disabled; to\nenable it, set the `-Duse_openmp=true` flag when configuring Meson. If you are\nusing only the static SIMD implementations, compile with `-fopenmp\n-DXSS_USE_OPENMP`.\n\nOpenMP-based parallel sorting routines are used for arrays larger than a\nspecific threshold where threading makes sense. The number of threads is\nlimited to a maximum of 16.  You can control the number of threads by setting\nthe `OMP_NUM_THREADS` environment variable.\n\n## Using x86-simd-sort as a Meson subproject\n\nIf you would like to use this as a Meson subproject, then create `subprojects`\ndirectory and copy `x86-simd-sort` into it. Add these two lines\nin your meson.build.\n```\nxss = subproject('x86-simd-sort')\nxss_dep = xss.get_variable('x86simdsortcpp_dep')\n```\n\nFor more detailed instructions please refer to Meson\n[documentation](https://mesonbuild.com/Subprojects.html#using-a-subproject).\n\n## Example usage\n\n#### Sort an array of floats\n\n```cpp\n#include \"x86simdsort.h\"\n\nint main() {\n    std::vector\u003cfloat\u003e arr{1000};\n    x86simdsort::qsort(arr.data(), 1000, true);\n    return 0;\n}\n```\n\n#### Sort an array of Points using object_qsort\n```cpp\n#include \"x86simdsort.h\"\n#include \u003ccmath\u003e\n\nstruct Point {\n    double x, y, z;\n};\n\nint main() {\n    std::vector\u003cPoint\u003e arr{1000};\n    // Sort an array of Points by its x value:\n    x86simdsort::object_qsort(arr.data(), 1000, [](Point p) { return p.x; });\n    // Sort an array of Points by its distance from origin:\n    x86simdsort::object_qsort(arr.data(), 1000, [](Point p) {\n        return sqrt(p.x*p.x+p.y*p.y+p.z*p.z);\n        });\n    return 0;\n}\n```\n\n## Details\n\n- `x86simdsort::qsort` is equivalent to `qsort` in\n  [C](https://www.tutorialspoint.com/c_standard_library/c_function_qsort.htm)\n  or `std::sort` in [C++](https://en.cppreference.com/w/cpp/algorithm/sort).\n- `x86simdsort::qselect` is equivalent to `std::nth_element` in\n  [C++](https://en.cppreference.com/w/cpp/algorithm/nth_element) or\n  `np.partition` in\n  [NumPy](https://numpy.org/doc/stable/reference/generated/numpy.partition.html).\n- `x86simdsort::partial_qsort` is equivalent to `std::partial_sort` in\n  [C++](https://en.cppreference.com/w/cpp/algorithm/partial_sort).\n- `x86simdsort::argsort` is equivalent to `np.argsort` in\n  [NumPy](https://numpy.org/doc/stable/reference/generated/numpy.argsort.html).\n- `x86simdsort::argselect` is equivalent to `np.argpartition` in\n  [NumPy](https://numpy.org/doc/stable/reference/generated/numpy.argpartition.html).\n\nSupported datatypes: `uint16_t, int16_t, _Float16, uint32_t, int32_t, float,\nuint64_t, int64_t, double`. Note that `_Float16` will require building this\nlibrary with g++ \u003e= 12.x. All the functions have an optional argument `bool\nhasnan` set to `false` by default (these are relevant to floating point data\ntypes only).  If your array has NAN's, the the behaviour of the sorting routine\nis undefined. If `hasnan` is set to true, NAN's are always sorted to the end of\nthe array. In addition to that, qsort will replace all your NAN's with\n`std::numeric_limits\u003cT\u003e::quiet_NaN`. The original bit-exact NaNs in\nthe input are not preserved. Also note that the arg methods (argsort and\nargselect) will not use the SIMD based algorithms if they detect NAN's in the\narray. You can read details of all the implementations\n[here](https://github.com/intel/x86-simd-sort/blob/main/src/README.md).\n\n## Performance comparison on AVX-512: `object_qsort` v/s `std::sort`\nPerformance of `object_qsort` can vary significantly depending on the defintion\nof the custom class and we highly recommend benchmarking before using it. For\nthe sake of illustration, we provide a few examples in\n[./benchmarks/bench-objsort.hpp](./benchmarks/bench-objsort.hpp) which measures\nperformance of `object_qsort` relative to `std::sort` when sorting an array of\n3D points represented by the class: `struct Point {double x, y, z;}` and\n`struct Point {float x, y, x;}`. We sort these points based on several\ndifferent metrics:\n\n+ sort by coordinate `x`\n+ sort by manhanttan distance (relative to origin): `abs(x) + abx(y) + abs(z)`\n+ sort by Euclidean distance (relative to origin): `sqrt(x*x + y*y + z*z)`\n+ sort by Chebyshev distance (relative to origin): `max(abs(x), abs(y), abs(z))`\n\nThe performance data (shown in the plot below) can be collected by building the\nbenchmarks suite and running `./builddir/benchexe --benchmark_filter==*obj*`.\nThe data plot shown below was collected on a processor with AVX-512. For the\nsimplest of cases where we want to sort an array of struct by one of its\nmembers, `object_qsort` can be up-to 5x faster for 32-bit data type and about\n4x for 64-bit data type.  It tends to do even better when the metric to sort by\ngets more complicated. Sorting by Euclidean distance can be up-to 10x faster.\n\n![alt text](./misc/object_qsort-perf.jpg?raw=true)\n\n## Downstream projects using x86-simd-sort\n\n- NumPy uses this as a [submodule](https://github.com/numpy/numpy/pull/22315) to accelerate `np.sort, np.argsort, np.partition and np.argpartition`.\n- PyTorch uses this as a [submodule](https://github.com/pytorch/pytorch/pull/127936) to accelerate `torch.sort, torch.argsort`.\n- A slightly modifed version this library has been integrated into [openJDK](https://github.com/openjdk/jdk/pull/14227).\n- [GRAPE](https://github.com/alibaba/libgrape-lite.git): C++ library for parallel graph processing.\n- AVX-512 version of the key-value sort has been submitted to [Oceanbase](https://github.com/oceanbase/oceanbase/pull/1325).\n","funding_links":["https://opencollective.com/numpy","https://tidelift.com/funding/github/pypi/numpy","https://numpy.org/about/#donate"],"categories":["Sorting","HarmonyOS","C++"],"sub_categories":["Windows Manager"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnumpy%2Fx86-simd-sort","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnumpy%2Fx86-simd-sort","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnumpy%2Fx86-simd-sort/lists"}