{"id":45038635,"url":"https://github.com/dsharlet/array","last_synced_at":"2026-02-19T07:19:34.560Z","repository":{"id":37951122,"uuid":"146819401","full_name":"dsharlet/array","owner":"dsharlet","description":"C++ multidimensional arrays in the spirit of the STL","archived":false,"fork":false,"pushed_at":"2025-05-05T00:57:57.000Z","size":10055,"stargazers_count":202,"open_issues_count":24,"forks_count":16,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-05-05T01:30:50.108Z","etag":null,"topics":["cpp","cpp14","header-only","multidimensional-arrays","performance","stl-containers","template-metaprogramming","tensors"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dsharlet.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2018-08-31T00:02:10.000Z","updated_at":"2025-05-05T00:58:00.000Z","dependencies_parsed_at":"2023-02-05T05:47:00.469Z","dependency_job_id":"6b9dcdc3-41ee-41c9-b6ac-232adfb152d0","html_url":"https://github.com/dsharlet/array","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dsharlet/array","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dsharlet%2Farray","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dsharlet%2Farray/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dsharlet%2Farray/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dsharlet%2Farray/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dsharlet","download_url":"https://codeload.github.com/dsharlet/array/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dsharlet%2Farray/sbom","scorecard":{"id":357687,"data":{"date":"2025-08-11","repo":{"name":"github.com/dsharlet/array","commit":"416e90b00b9cf1a3dad5bdd3816b31f44c2c43f4"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3.7,"checks":[{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 1 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Dangerous-Workflow","score":10,"reason":"no dangerous workflow patterns detected","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Code-Review","score":3,"reason":"Found 11/30 approved changesets -- score normalized to 3","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Token-Permissions","score":0,"reason":"detected GitHub workflow tokens with excessive permissions","details":["Warn: no topLevel permission defined: .github/workflows/.ci.yml:1","Info: no jobLevel write permissions found"],"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Pinned-Dependencies","score":0,"reason":"dependency not pinned by hash detected -- score normalized to 0","details":["Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/.ci.yml:15: update your workflow using https://app.stepsecurity.io/secureworkflow/dsharlet/array/.ci.yml/master?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/.ci.yml:20: update your workflow using https://app.stepsecurity.io/secureworkflow/dsharlet/array/.ci.yml/master?enable=pin","Info:   0 out of   1 GitHub-owned GitHubAction dependencies pinned","Info:   0 out of   1 third-party GitHubAction dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: Apache License 2.0: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 13 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-18T10:02:29.065Z","repository_id":37951122,"created_at":"2025-08-18T10:02:29.065Z","updated_at":"2025-08-18T10:02:29.065Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29606200,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-19T06:47:36.664Z","status":"ssl_error","status_checked_at":"2026-02-19T06:45:47.551Z","response_time":117,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cpp","cpp14","header-only","multidimensional-arrays","performance","stl-containers","template-metaprogramming","tensors"],"created_at":"2026-02-19T07:19:34.481Z","updated_at":"2026-02-19T07:19:34.545Z","avatar_url":"https://github.com/dsharlet.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"## About\n\nThis library provides a multidimensional array class for C++, with the following design goals:\n* Enable specification of array parameters as [compile-time constants](#compile-time-constant-shapes) per parameter, enabling more efficient code generation, while retaining run-time flexibility where needed.\n* Provide an API following the conventions of the C++ STL where possible.\n* Minimal dependencies and requirements (the library is currently a single header file, and depends only on the C++ STL).\n\nThe library uses some ideas established in other existing projects, such as [numpy](https://numpy.org/doc/1.17/reference/arrays.ndarray.html), [Halide](https://halide-lang.org/docs/class_halide_1_1_runtime_1_1_buffer.html), and [Eigen](http://eigen.tuxfamily.org).\nArray shapes are specified as a list of N dimensions, where each dimension has parameters such as an extent and a stride.\nArray references and objects use shape objects to map N-dimensional indices to a flat index.\nN-dimensional indices are mapped to flat offsets with the following formula:\n```\nflat_offset = (x0 - min0)*stride0 + (x1 - min1)*stride1 + ... + (xN - minN)*strideN\n```\nwhere:\n* `xN` are the indices in each dimension.\n* `minN` are the mins in each dimension. The min is the value of the first in-range index in this dimension (the max is `minN + extentN - 1`).\n* `strideN` are the distances in the flat offsets between elements in each dimension.\n\nArrays efficiently support advanced manipulations like [cropping, slicing, and splitting](#slicing-cropping-and-splitting) arrays and loops, all while preserving compile-time constant parameters when possible.\nAlthough it is a heavily templated library, incorrect usage generates informative and helpful error messages.\nTypically, an issue will result in only one error message, located at the site of the problem in user code.\nThis is something [we test for](https://github.com/dsharlet/array/blob/master/test/errors.cpp#L18-L19).\n\nMany other libraries offering multi-dimensional arrays or tensors allow compile-time constant shapes.\n*However*, most if not all of them only allow either all of the shape parameters to be compile-time constant, or none of them.\nThis is really limiting; often only a few key parameters of a shape need to be compile-time constant for performance, while other dimensions need flexibility to accommodate runtime-valued shape parameters.\nSome examples of this are:\n* '[Chunky](https://en.wikipedia.org/wiki/Packed_pixel)' image formats with a small fixed number of channels.\n* Matrices where one dimension represent variables intrinsic to the problem, while the other dimension represents a number of samples of data.\n* Algorithms optimized by splitting or tiling intermediate stages will have intermediate buffers that have a constant extent in the dimensions that are split or tiled.\n\nSome other features of the library are:\n* [CUDA support](#cuda-support) for use in `__device__` functions.\n* [Einstein reduction](#einstein-reductions) helpers, enabling many kinds of reductions and other array operations to be expressed safely.\n\nFor more detailed documentation, see the generated [documentation](https://dsharlet.github.io/array/files.html).\n\n## Usage\n\n### Shapes\n\nThe basic types provided by the library are:\n* `dim\u003cMin, Extent, Stride\u003e`, a description of a single dimension. The template parameters specify a compile-time constant min, extent, or stride, or are `dynamic` (meaning unknown) and are specified at runtime.\n* `shape\u003cDim0, Dim1, ...\u003e`, a description of multiple dimensions. `Dim0` is referred to as the innermost dimension.\n* `array\u003cT, Shape, Allocator\u003e`, a container following the conventions of `std::vector` where possible. This container manages the allocation of a buffer associated with a `Shape`.\n* `array_ref\u003cT, Shape\u003e`, a wrapper for addressing existing memory with a shape `Shape`.\n\nTo define an array, define a shape type, and use it to define an array object:\n```c++\n  using my_3d_shape_type = shape\u003cdim\u003c\u003e, dim\u003c\u003e, dim\u003c\u003e\u003e;\n  constexpr int width = 16;\n  constexpr int height = 10;\n  constexpr int depth = 3;\n  my_3d_shape_type my_3d_shape(width, height, depth);\n  array\u003cint, my_3d_shape_type\u003e my_array(my_3d_shape);\n```\n\nGeneral shapes and arrays like this have the following built-in aliases:\n* `shape_of_rank\u003cN\u003e`, an N-dimensional shape.\n* `array_ref_of_rank\u003cT, N\u003e` and `array_of_rank\u003cT, N, Allocator\u003e`, N-dimensional arrays with a shape of `shape_of_rank\u003cN\u003e`.\n\n### Access and iteration\n\nAccessing `array` or `array_ref` is done via `operator(...)` and `operator[index_type]`.\nThere are both variadic and `index_type` overloads of `operator()`.\n`index_type` is a specialization of `std::tuple` defined by `shape` (and `array` and `array_ref`), e.g. `my_3d_shape_type::index_type`.\n```c++\n  for (int z = 0; z \u003c depth; z++) {\n    for (int y = 0; y \u003c height; y++) {\n      for (int x = 0; x \u003c width; x++) {\n        // Variadic version:\n        my_array(x, y, z) = 5;\n        // Or the index_type version:\n        my_array[{x, y, z}] = 5;\n      }\n    }\n  }\n```\n\n`array::for_each_value` and `array_ref::for_each_value` calls a function with a reference to each value in the array.\n```c++\n  my_array.for_each_value([](int\u0026 value) {\n    value = 5;\n  });\n```\n\n`for_all_indices` is a free function taking a shape object and a function to call with every index in the shape.\n`for_each_index` is similar, calling a free function with the index as an instance of the index type `my_3d_shape_type::index_type`.\n```c++\n  for_all_indices(my_3d_shape, [\u0026](int x, int y, int z) {\n    my_array(x, y, z) = 5;\n  });\n  for_each_index(my_3d_shape, [\u0026](my_3d_shape_type::index_type i) {\n    my_array[i] = 5;\n  });\n```\n\nThe order in which each of `for_each_value`, `for_each_index`, and `for_all_indices` execute their traversal over the shape is defined by `shape_traits\u003cShape\u003e`.\nThe default implementation of `shape_traits\u003cShape\u003e::for_each_index` iterates over the innermost dimension as the innermost loop, and proceeds in order to the outermost dimension.\n```c++\n  my_3d_shape_type my_shape(2, 2, 2);\n  for_all_indices(my_shape, [](int x, int y, int z) {\n    std::cout \u003c\u003c x \u003c\u003c \", \" \u003c\u003c y \u003c\u003c \", \" \u003c\u003c z \u003c\u003c std::endl;\n  });\n  // Output:\n  // 0, 0, 0\n  // 1, 0, 0\n  // 0, 1, 0\n  // 1, 1, 0\n  // 0, 0, 1\n  // 1, 0, 1\n  // 0, 1, 1\n  // 1, 1, 1\n```\n\nThe default implementation of `shape_traits\u003cShape\u003e::for_each_value` iterates over a dynamically optimized shape.\nThe order will vary depending on the properties of the shape.\n\nThere are overloads of `for_all_indices` and `for_each_index` accepting a permutation to indicate the loop order. In this example, the permutation `\u003c2, 0, 1\u003e` iterates over the `z` dimension as the innermost loop, then `x`, then `y`.\n```c++\n  for_all_indices\u003c2, 0, 1\u003e(my_shape, [](int x, int y, int z) {\n    std::cout \u003c\u003c x \u003c\u003c \", \" \u003c\u003c y \u003c\u003c \", \" \u003c\u003c z \u003c\u003c std::endl;\n  });\n  // Output:\n  // 0, 0, 0\n  // 0, 0, 1\n  // 1, 0, 0\n  // 1, 0, 1\n  // 0, 1, 0\n  // 0, 1, 1\n  // 1, 1, 0\n  // 1, 1, 1\n```\n\n### Compile-time constant shapes\n\nIn the previous examples, no array parameters are compile time constants, so all of these accesses and loops expand to a `flat_offset` expression where the mins, extents, and strides are runtime variables.\nThis can prevent the compiler from generating efficient code.\nFor example, the compiler may be able to auto-vectorize these loops, but if the stride of the dimension accessed by the vectorized loop is a runtime variable, the compiler will have to generate gathers and scatters instead of vector load and store instructions, even if the stride is one at runtime.\n\nTo avoid this, we need to make array parameters compile time constants.\nHowever, while making array parameters compile time constants helps the compiler generate efficient code, it also makes the program less flexible.\n\nThis library helps balance this tradeoff by enabling any of the array parameters to be compile time constants, but not require it.\nWhich parameters should be made into compile time constants will vary depending on the use case.\nA common case is to make the innermost dimension have stride 1:\n```c++\n  using my_dense_3d_shape_type = shape\u003c\n      dim\u003c/*Min=*/dynamic, /*Extent=*/dynamic, /*Stride=*/1\u003e,\n      dim\u003c\u003e,\n      dim\u003c\u003e\u003e;\n  array\u003cchar, my_dense_3d_shape_type\u003e my_dense_array({16, 3, 3});\n  for (auto x : my_dense_array.x()) {\n    // The compiler knows that each loop iteration accesses\n    // elements that are contiguous in memory for contiguous x.\n    my_dense_array(x, y, z) = 0;\n  }\n```\n\nA dimension with unknown min and extent, and stride 1, is common enough that it has a built-in alias `dense_dim\u003c\u003e`, and shapes with a dense first dimension are common enough that shapes and arrays have the following built-in aliases:\n* `dense_shape\u003cN\u003e`, an N-dimensional shape with the first dimension being dense.\n* `dense_array_ref\u003cT, N\u003e` and `dense_array\u003cT, N, Allocator\u003e`, N-dimensional arrays with a shape of `dense_shape\u003cN\u003e`.\n\nThere are other common examples that are easy to support.\nA very common array is an image where 3-channel RGB or 4-channel RGBA pixels are stored together in a 'chunky' format.\n```c++\ntemplate \u003cint Channels, int XStride = Channels\u003e\nusing chunky_image_shape = shape\u003c\n    strided_dim\u003c/*Stride=*/XStride\u003e,\n    dim\u003c\u003e,\n    dense_dim\u003c/*Min=*/0, /*Extent=*/Channels\u003e\u003e;\narray\u003cuint8_t, chunky_image_shape\u003c3\u003e\u003e my_chunky_image({1920, 1080, {}});\n```\n\n`strided_dim\u003c\u003e` is another alias for `dim\u003c\u003e` where the min and extent are unknown, and the stride may be a compile-time constant.\n[`image.h`](include/array/image.h) is a small helper library of typical image shape and object types defined using arrays, including `chunky_image_shape`.\n\nAnother common example is matrices indexed `(row, column)` with the column dimension stored densely:\n```c++\n  using matrix_shape = shape\u003cdim\u003c\u003e, dense_dim\u003c\u003e\u003e;\n  array\u003cdouble, matrix_shape\u003e my_matrix({10, 4});\n  for (auto i : my_matrix.i()) {\n    for (auto j : my_matrix.j()) {\n      // This loop ordering is efficient for this type.\n      my_matrix(i, j) = 0.0;\n    }\n  }\n```\n\nThere are also many use cases for matrices with small constant sizes.\nThis library provides `auto_allocator\u003cT, N\u003e`, an `std::allocator` compatible allocator that only allocates buffers of `N` small fixed sized objects with automatic storage.\nThis makes it possible to define a small matrix type that will not use any dynamic memory allocation:\n```c++\ntemplate \u003cint M, int N\u003e\nusing small_matrix_shape = shape\u003c\n    dim\u003c0, M\u003e,\n    dense_dim\u003c0, N\u003e\u003e;\ntemplate \u003ctypename T, int M, int N\u003e\nusing small_matrix = array\u003cT, small_matrix_shape\u003cM, N\u003e, auto_allocator\u003cT, M*N\u003e\u003e;\nsmall_matrix\u003cfloat, 4, 4\u003e my_small_matrix;\n// my_small_matrix is only one fixed size allocation, no new/delete calls\n// happen. sizeof(small_matrix) = sizeof(float) * 4 * 4 + (overhead)\n```\n\n[`matrix.h`](include/array/matrix.h) is a small helper library of typical matrix shape and object types defined using arrays, including the examples above.\n\n### Slicing, cropping, and splitting\n\nShapes and arrays can be sliced and cropped using `interval\u003cMin, Extent\u003e` objects, which are similar to `dim\u003c\u003e`s.\nThey can have either a compile-time constant or runtime valued min and extent.\n`range(begin, end)` is a helper functions to construct an `interval`.\n```c++\n  // Slicing\n  array_ref_of_rank\u003cint, 2\u003e channel1 = my_array(_, _, 1);\n  array_ref_of_rank\u003cint, 1\u003e row4_channel2 = my_array(_, 4, 2);\n\n  // Cropping\n  array_ref_of_rank\u003cint, 3\u003e top_left = my_array(interval\u003c\u003e{0, 2}, interval\u003c\u003e{0, 4}, _);\n  array_ref_of_rank\u003cint, 2\u003e center_channel0 = my_array(interval\u003c\u003e{1, 2}, interval\u003c\u003e{2, 4}, 0);\n```\nThe `_` or `all` constants are placeholders indicating the entire dimension should be preserved.\nDimensions that are sliced are removed from the shape of the array.\n\nWhen iterating a `dim`, it is possible to `split` it first by either a compile-time constant or a runtime-valued split factor.\nA split `dim` produces an iterator range that produces `interval\u003c\u003e` objects.\nThis allows easy tiling of algorithms:\n```c++\n  constexpr index_t x_split_factor = 3;\n  const index_t y_split_factor = 5;\n  for (auto yo : split(my_array.y(), y_split_factor)) {\n    for (auto xo : split\u003cx_split_factor\u003e(my_array.x())) {\n      auto tile = my_array(xo, yo, _);\n      for (auto x : tile.x()) {\n        // The compiler knows this loop has a fixed extent x_split_factor!\n        tile(x, y, z) = x;\n      }\n    }\n  }\n```\n\nBoth loops have extents that are not divided by their split factors.\nTo avoid generating an `array_ref` referencing data out of bounds of the original array, the split iterators modify the last iteration.\nThe behavior of each kind of split is different:\n* Because the extent of `yo` can vary, it is reduced on the last iteration. This strategy can accommodate dimensions of any extent.\n* Because the extent of `xo` must be a constant, the last iteration will be shifted to overlap the previous iteration. This strategy requires the extent of the dimension being split is greater than the split factor (but not a multiple!)\n\nCompile-time constant split factors produce ranges with compile-time extents, and shapes and arrays cropped with these ranges will have a corresponding `dim\u003c\u003e` with a compile-time constant extent.\nThis allows potentially significant optimizations to be expressed relatively easily!\n\n### Einstein reductions\n\nThe [`ein_reduce.h`](include/array/ein_reduce.h) header provides [Einstein notation](https://en.wikipedia.org/wiki/Einstein_notation) reductions and summation helpers, similar to [np.einsum](https://numpy.org/doc/stable/reference/generated/numpy.einsum.html) or [tf.einsum](https://www.tensorflow.org/api_docs/python/tf/einsum).\nThese are zero-cost abstractions for describing loops that allow expressing a wide variety of array operations.\nEinstein notation expression operands are constructed using the `ein\u003ci, j, ...\u003e(x)` helper function, where `x` can be any callable object, including an `array\u003c\u003e` or `array_ref\u003c\u003e`.\n`i, j, ...` are `constexpr` integers indicating which dimensions of the reduction operation are used to evaluate `x`.\nTherefore, the number of arguments of `x` must match the number of dimensions provided to `ein`.\nOperands can be combined into larger expressions using typical binary operators.\n\nEinstein notation expressions can be evaluated using one of the following functions:\n* `ein_reduce(expression)`, evaluate an arbitrary Einstein notation `expression`.\n* `lhs = make_ein_sum\u003cT, i, j, ...\u003e(rhs)`, evaluate the summation `ein\u003ci, j, ...\u003e(lhs) += rhs`, and return `lhs`. The shape of `lhs` is inferred from the expression.\n\nHere are some examples using these reduction operations to compute summations:\n```c++\n  // Name the dimensions we use in Einstein reductions.\n  enum { i = 0, j = 1, k = 2, l = 3 };\n\n  // Dot product dot1 = dot2 = x.y:\n  vector\u003cfloat\u003e x({10});\n  vector\u003cfloat\u003e y({10});\n  float dot1 = make_ein_sum\u003cfloat\u003e(ein\u003ci\u003e(x) * ein\u003ci\u003e(y));\n  float dot2 = 0.0f;\n  ein_reduce(ein(dot2) += ein\u003ci\u003e(x) * ein\u003ci\u003e(y));\n\n  // Matrix multiply C1 = C2 = A*B:\n  matrix\u003cfloat\u003e A({10, 10});\n  matrix\u003cfloat\u003e B({10, 15});\n  matrix\u003cfloat\u003e C1({10, 15});\n  fill(C1, 0.0f);\n  ein_reduce(ein\u003ci, j\u003e(C1) += ein\u003ci, k\u003e(A) * ein\u003ck, j\u003e(B));\n  auto C2 = make_ein_sum\u003cfloat, i, j\u003e(ein\u003ci, k\u003e(A) * ein\u003ck, j\u003e(B));\n```\n\nWe can use arbitrary functions as expression operands:\n```c++\n  // Cross product array crosses_n = x_n x y_n:\n  using vector_array = array\u003cfloat, shape\u003cdim\u003c0, 3\u003e, dense_dim\u003c\u003e\u003e\u003e;\n  vector_array xs({3, 100});\n  vector_array ys({3, 100});\n  vector_array crosses({3, 100});\n  auto epsilon3 = [](int i, int j, int k) { return sgn(j - i) * sgn(k - i) * sgn(k - j); };\n  ein_reduce(ein\u003ci, l\u003e(crosses) += ein\u003ci, j, k\u003e(epsilon3) * ein\u003cj, l\u003e(xs) * ein\u003ck, l\u003e(ys));\n```\n\nThese operations generally produce loop nests that are as readily optimized by the compiler as hand-written loops.\nIn this example, `crosses`, `xs`, and `ys` have shape `shape\u003cdim\u003c0, 3\u003e, dense_dim\u003c\u003e\u003e`, so the compiler will see small constant loops and likely be able to optimize this to similar efficiency as hand-written code, by unrolling and evaluating the function at compile time.\nThe compiler also should be able to efficiently vectorize the `l` dimension of the `ein_reduce`, because that dimension has a constant stride 1.\n\nThe expression can be another kind of reduction, or not a reduction at all:\n```c++\n  // Matrix transpose AT = A^T:\n  matrix\u003cfloat\u003e AT({10, 10});\n  ein_reduce(ein\u003ci, j\u003e(AT) = ein\u003cj, i\u003e(A));\n\n  // Maximum of each x-y plane of a 3D volume:\n  dense_array\u003cfloat, 3\u003e volume({8, 12, 20});\n  dense_array\u003cfloat, 1\u003e max_xy({20});\n  auto r = ein\u003ck\u003e(max_xy);\n  ein_reduce(r = max(r, ein\u003ci, j, k\u003e(volume)));\n```\n\nReductions can have a mix of result and operand types:\n```c++\n  // Compute X1 = X2 = DFT[x]:\n  using complex = std::complex\u003cfloat\u003e;\n  dense_array\u003ccomplex, 2\u003e W({10, 10});\n  for_all_indices(W.shape(), [\u0026](int j, int k) {\n    W(j, k) = exp(-2.0f * pi * complex(0, 1) * (static_cast\u003cfloat\u003e(j * k) / 10));\n  });\n  // Using `make_ein_sum`, returning the result:\n  auto X1 = make_ein_sum\u003ccomplex, j\u003e(ein\u003cj, k\u003e(W) * ein\u003ck\u003e(x));\n  // Using `ein_reduce`, computing the result in place:\n  vector\u003ccomplex\u003e X2({10}, 0.0f);\n  ein_reduce(ein\u003cj\u003e(X2) += ein\u003cj, k\u003e(W) * ein\u003ck\u003e(x));\n```\n\nThese reductions also compose well with loop transformations like `split` and array operations like [slicing and cropping](#slicing-cropping-and-splitting).\nFor example, a matrix multiplication can be tiled like so:\n```c++\n  // Adjust this depending on the target architecture. For AVX2, vectors are 256-bit.\n  constexpr index_t vector_size = 32 / sizeof(float);\n\n  // We want the tiles to be big without spilling the accumulators to the stack.\n  constexpr index_t tile_rows = 4;\n  constexpr index_t tile_cols = vector_size * 3;\n\n  for (auto io : split\u003ctile_rows\u003e(C.i())) {\n    for (auto jo : split\u003ctile_cols\u003e(C.j())) {\n      auto C_ijo = C(io, jo);\n      fill(C_ijo, 0.0f);\n      ein_reduce(ein\u003ci, j\u003e(C_ijo) += ein\u003ci, k\u003e(A) * ein\u003ck, j\u003e(B));\n    }\n  }\n```\n\nThis generates the following machine code(\\*) for the inner loop using clang 18 with -O2 -ffast-math:\n```assembly\nvbroadcastss\t(%rsi,%rdi,4), %ymm12\nvmovups\t-64(%r12,%r15,4), %ymm13\nvmovups\t-32(%r12,%r15,4), %ymm14\nvmovups\t(%r12,%r15,4), %ymm15\naddq\t%rbx, %r15\nvfmadd231ps\t%ymm12, %ymm13, %ymm11\nvfmadd231ps\t%ymm12, %ymm14, %ymm10\nvfmadd231ps\t%ymm12, %ymm15, %ymm9\nvbroadcastss\t(%r8,%rdi,4), %ymm12\nvfmadd231ps\t%ymm12, %ymm13, %ymm8\nvfmadd231ps\t%ymm12, %ymm14, %ymm7\nvfmadd231ps\t%ymm12, %ymm15, %ymm6\nvbroadcastss\t(%r10,%rdi,4), %ymm12\nvfmadd231ps\t%ymm12, %ymm13, %ymm5\nvfmadd231ps\t%ymm12, %ymm14, %ymm4\nvfmadd231ps\t%ymm12, %ymm15, %ymm3\nvbroadcastss\t(%rdx,%rdi,4), %ymm12\nincq\t%rdi\nvfmadd231ps\t%ymm13, %ymm12, %ymm2\nvfmadd231ps\t%ymm14, %ymm12, %ymm1\nvfmadd231ps\t%ymm12, %ymm15, %ymm0\ncmpq\t%rdi, %r13\njne\t.LBB8_12\n```\nThis is **40-50x** faster than a naive C implementation of nested loops on my machine, and it should be within a factor of 2 of the peak possible performance.\nA [similar example](examples/linear_algebra/matrix.cpp#L265-L271) that is only a little more complicated achieves around 90% of the peak possible performance.\n\n(\\*) Unfortunately, this doesn't generate performant code currently and requires a few tweaks to work around an [issue](https://bugs.llvm.org/show_bug.cgi?id=45863) in LLVM.\nSee the [matrix example](examples/linear_algebra/matrix.cpp) for the code that produces the above assembly.\nTo summarise, it is currently necessary to perform the accumulation into a temporary buffer instead of accumulating directly into the output.\n\n### CUDA support\n\nMost of the functions in this library are marked with `__device__`, enabling them to be used in CUDA code.\nThis includes `array_ref\u003cT, Shape\u003e` and most of its helper functions.\nThe exceptions to this are functions and classes that allocate memory, primarily `array\u003cT, Shape, Alloc\u003e`.\n\n### Try it on Compiler Explorer\n\nThis library is available on [Compiler Explorer](https://godbolt.org/) as `Array`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdsharlet%2Farray","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdsharlet%2Farray","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdsharlet%2Farray/lists"}