{"id":13442101,"url":"https://github.com/intel/x86-simd-sort","last_synced_at":"2025-03-20T13:32:23.318Z","repository":{"id":61770572,"uuid":"554411496","full_name":"intel/x86-simd-sort","owner":"intel","description":"C++ template library for high performance SIMD based sorting algorithms","archived":false,"fork":false,"pushed_at":"2024-10-22T04:22:50.000Z","size":1064,"stargazers_count":851,"open_issues_count":16,"forks_count":56,"subscribers_count":22,"default_branch":"main","last_synced_at":"2024-10-23T06:36:55.625Z","etag":null,"topics":["argsort","avx2","avx512","partialsort","quickselect","quicksort","sort","x86"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/intel.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-10-19T19:14:04.000Z","updated_at":"2024-10-22T04:22:54.000Z","dependencies_parsed_at":"2023-02-16T23:01:19.503Z","dependency_job_id":"f816c927-33eb-401f-9d60-799b17a19917","html_url":"https://github.com/intel/x86-simd-sort","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/intel%2Fx86-simd-sort","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/intel%2Fx86-simd-sort/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/intel%2Fx86-simd-sort/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/intel%2Fx86-simd-sort/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/intel","download_url":"https://codeload.github.com/intel/x86-simd-sort/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":221768461,"owners_count":16877642,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["argsort","avx2","avx512","partialsort","quickselect","quicksort","sort","x86"],"created_at":"2024-07-31T03:01:41.707Z","updated_at":"2025-03-20T13:32:23.313Z","avatar_url":"https://github.com/intel.png","language":"C++","readme":"# x86-simd-sort\n\nC++ template library for high performance SIMD based sorting routines for\nbuilt-in integers and floats (16-bit, 32-bit and 64-bit data types) and custom\ndefined C++ objects. The sorting routines are accelerated using AVX-512/AVX2\nwhen available. The library auto picks the best version depending on the\nprocessor it is run on. If you are looking for the AVX-512 or AVX2 specific\nimplementations, please see\n[README](https://github.com/intel/x86-simd-sort/blob/main/src/README.md) file\nunder `src/` directory. The following routines are currently supported:\n\n## Sort an array of custom defined class objects (uses `O(N)` space)\n``` cpp\ntemplate \u003ctypename T, typename Func\u003e\nvoid x86simdsort::object_qsort(T *arr, uint32_t arrsize, Func key_func)\n```\n`T` is any user defined struct or class and `arr` is a pointer to the first\nelement in the array of objects of type `T`. `Func` is a lambda function that\ncomputes the `key` value for each object which is the metric used to sort the\nobjects. `Func` needs to have the following signature:\n\n```cpp\n[] (T obj) -\u003e key_t { key_t key; /* compute key for obj */ return key; }\n```\n\nNote that the return type of the key `key_t` needs to be one of the following\n: `[float, uint32_t, int32_t, double, uint64_t, int64_t]`. `object_qsort` has a\nspace complexity of `O(N)`. Specifically, it requires `arrsize *\nsizeof(key_t)` bytes to store a vector with all the keys and an additional\n`arrsize * sizeof(uint32_t)` bytes to store the indexes of the object array.\nFor performance reasons, we support `object_qsort` only when the array size is\nless than or equal to `UINT32_MAX`.  An example usage of `object_qsort` is\nprovided in the [examples](#Sort-an-array-of-Points-using-object_qsort)\nsection.  Refer to [section](#Performance-of-object_qsort) to get a sense of\nhow fast this is relative to `std::sort`.\n\n## Sort an array of built-in integers and floats\n```cpp\nvoid x86simdsort::qsort(T* arr, size_t size, bool hasnan, bool descending);\nvoid x86simdsort::qselect(T* arr, size_t k, size_t size, bool hasnan, bool descending);\nvoid x86simdsort::partial_qsort(T* arr, size_t k, size_t size, bool hasnan, bool descending);\n```\nSupported datatypes: `T` $\\in$ `[_Float16, uint16_t, int16_t, float, uint32_t,\nint32_t, double, uint64_t, int64_t]`\n\n## Key-value sort routines on pairs of arrays\n```cpp\nvoid x86simdsort::keyvalue_qsort(T1* key, T2* val, size_t size, bool hasnan, bool descending);\nvoid x86simdsort::keyvalue_select(T1* key, T2* val, size_t k, size_t size, bool hasnan, bool descending);\nvoid x86simdsort::keyvalue_partial_sort(T1* key, T2* val, size_t k, size_t size, bool hasnan, bool descending);\n```\nSupported datatypes: `T1`, `T2` $\\in$ `[float, uint32_t, int32_t, double,\nuint64_t, int64_t]` Note that keyvalue sort is not yet supported for 16-bit\ndata types.\n\n## Arg sort routines on arrays\n```cpp\nstd::vector\u003csize_t\u003e arg = x86simdsort::argsort(T* arr, size_t size, bool hasnan, bool descending);\nstd::vector\u003csize_t\u003e arg = x86simdsort::argselect(T* arr, size_t k, size_t size, bool hasnan);\n```\nSupported datatypes: `T` $\\in$ `[_Float16, uint16_t, int16_t, float, uint32_t,\nint32_t, double, uint64_t, int64_t]`\n\n## Build/Install\n\n[meson](https://github.com/mesonbuild/meson) is the used build system. Command\nto build and install the library:\n\n```\nmeson setup --buildtype release builddir \u0026\u0026 cd builddir\nmeson compile\nsudo meson install\n```\n\nOnce installed, you can use `pkg-config --cflags --libs x86simdsortcpp` to\npopulate the right cflags and ldflags to compile and link your C++ program.\nThis repository also contains a test suite and benchmarking suite which are\nwritten using [googletest](https://github.com/google/googletest) and [google\nbenchmark](https://github.com/google/benchmark) frameworks respectively. You\ncan configure meson to build them both by using `-Dbuild_tests=true` and\n`-Dbuild_benchmarks=true`.\n\n## Example usage\n\n#### Sort an array of floats\n\n```cpp\n#include \"x86simdsort.h\"\n\nint main() {\n    std::vector\u003cfloat\u003e arr{1000};\n    x86simdsort::qsort(arr.data(), 1000, true);\n    return 0;\n}\n```\n\n#### Sort an array of Points using object_qsort\n```cpp\n#include \"x86simdsort.h\"\n#include \u003ccmath\u003e\n\nstruct Point {\n    double x, y, z;\n};\n\nint main() {\n    std::vector\u003cPoint\u003e arr{1000};\n    // Sort an array of Points by its x value:\n    x86simdsort::object_qsort(arr.data(), 1000, [](Point p) { return p.x; });\n    // Sort an array of Points by its distance from origin:\n    x86simdsort::object_qsort(arr.data(), 1000, [](Point p) {\n        return sqrt(p.x*p.x+p.y*p.y+p.z*p.z);\n        });\n    return 0;\n}\n```\n\n## Details\n\n- `x86simdsort::qsort` is equivalent to `qsort` in\n  [C](https://www.tutorialspoint.com/c_standard_library/c_function_qsort.htm)\n  or `std::sort` in [C++](https://en.cppreference.com/w/cpp/algorithm/sort).\n- `x86simdsort::qselect` is equivalent to `std::nth_element` in\n  [C++](https://en.cppreference.com/w/cpp/algorithm/nth_element) or\n  `np.partition` in\n  [NumPy](https://numpy.org/doc/stable/reference/generated/numpy.partition.html).\n- `x86simdsort::partial_qsort` is equivalent to `std::partial_sort` in\n  [C++](https://en.cppreference.com/w/cpp/algorithm/partial_sort).\n- `x86simdsort::argsort` is equivalent to `np.argsort` in\n  [NumPy](https://numpy.org/doc/stable/reference/generated/numpy.argsort.html).\n- `x86simdsort::argselect` is equivalent to `np.argpartition` in\n  [NumPy](https://numpy.org/doc/stable/reference/generated/numpy.argpartition.html).\n\nSupported datatypes: `uint16_t, int16_t, _Float16, uint32_t, int32_t, float,\nuint64_t, int64_t, double`. Note that `_Float16` will require building this\nlibrary with g++ \u003e= 12.x. All the functions have an optional argument `bool\nhasnan` set to `false` by default (these are relevant to floating point data\ntypes only).  If your array has NAN's, the the behaviour of the sorting routine\nis undefined. If `hasnan` is set to true, NAN's are always sorted to the end of\nthe array. In addition to that, qsort will replace all your NAN's with\n`std::numeric_limits\u003cT\u003e::quiet_NaN`. The original bit-exact NaNs in\nthe input are not preserved. Also note that the arg methods (argsort and\nargselect) will not use the SIMD based algorithms if they detect NAN's in the\narray. You can read details of all the implementations\n[here](https://github.com/intel/x86-simd-sort/blob/main/src/README.md).\n\n## Performance comparison on AVX-512: `object_qsort` v/s `std::sort`\nPerformance of `object_qsort` can vary significantly depending on the defintion\nof the custom class and we highly recommend benchmarking before using it. For\nthe sake of illustration, we provide a few examples in\n[./benchmarks/bench-objsort.hpp](./benchmarks/bench-objsort.hpp) which measures\nperformance of `object_qsort` relative to `std::sort` when sorting an array of\n3D points represented by the class: `struct Point {double x, y, z;}` and\n`struct Point {float x, y, x;}`. We sort these points based on several\ndifferent metrics:\n\n+ sort by coordinate `x`\n+ sort by manhanttan distance (relative to origin): `abs(x) + abx(y) + abs(z)`\n+ sort by Euclidean distance (relative to origin): `sqrt(x*x + y*y + z*z)`\n+ sort by Chebyshev distance (relative to origin): `max(abs(x), abs(y), abs(z))`\n\nThe performance data (shown in the plot below) can be collected by building the\nbenchmarks suite and running `./builddir/benchexe --benchmark_filter==*obj*`.\nThe data plot shown below was collected on a processor with AVX-512. For the\nsimplest of cases where we want to sort an array of struct by one of its\nmembers, `object_qsort` can be up-to 5x faster for 32-bit data type and about\n4x for 64-bit data type.  It tends to do even better when the metric to sort by\ngets more complicated. Sorting by Euclidean distance can be up-to 10x faster.\n\n![alt text](./misc/object_qsort-perf.jpg?raw=true)\n\n## Downstream projects using x86-simd-sort\n\n- NumPy uses this as a [submodule](https://github.com/numpy/numpy/pull/22315) to accelerate `np.sort, np.argsort, np.partition and np.argpartition`.\n- PyTorch uses this as a [submodule](https://github.com/pytorch/pytorch/pull/127936) to accelerate `torch.sort, torch.argsort`.\n- A slightly modifed version this library has been integrated into [openJDK](https://github.com/openjdk/jdk/pull/14227).\n- [GRAPE](https://github.com/alibaba/libgrape-lite.git): C++ library for parallel graph processing.\n- AVX-512 version of the key-value sort has been submitted to [Oceanbase](https://github.com/oceanbase/oceanbase/pull/1325).\n","funding_links":[],"categories":["HarmonyOS","C++"],"sub_categories":["Windows Manager"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fintel%2Fx86-simd-sort","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fintel%2Fx86-simd-sort","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fintel%2Fx86-simd-sort/lists"}