{"id":31730751,"url":"https://github.com/tttapa/batmat","last_synced_at":"2026-03-03T22:07:14.838Z","repository":{"id":314895920,"uuid":"1008452718","full_name":"tttapa/batmat","owner":"tttapa","description":"Fast linear algebra routines for batches of small matrices.","archived":false,"fork":false,"pushed_at":"2026-03-01T21:56:30.000Z","size":5648,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-02T01:20:24.927Z","etag":null,"topics":["cpp","linear-algebra","simd"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tttapa.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-06-25T15:06:45.000Z","updated_at":"2026-02-20T17:55:47.000Z","dependencies_parsed_at":"2025-09-15T14:33:57.038Z","dependency_job_id":"ee332b65-2655-4d34-9a09-0b4269bb5e00","html_url":"https://github.com/tttapa/batmat","commit_stats":null,"previous_names":["tttapa/batmat"],"tags_count":16,"template":false,"template_full_name":null,"purl":"pkg:github/tttapa/batmat","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tttapa%2Fbatmat","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tttapa%2Fbatmat/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tttapa%2Fbatmat/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tttapa%2Fbatmat/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tttapa","download_url":"https://codeload.github.com/tttapa/batmat/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tttapa%2Fbatmat/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30063421,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-03T18:21:05.932Z","status":"ssl_error","status_checked_at":"2026-03-03T18:20:59.341Z","response_time":61,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cpp","linear-algebra","simd"],"created_at":"2025-10-09T07:40:00.592Z","updated_at":"2026-03-03T22:07:14.833Z","avatar_url":"https://github.com/tttapa.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\u003cimg src=\"docs/batmat-small.png\" alt=\"batmat logo\" width=160\u003e\u003c/p\u003e\n\n# batmat\n\nFast linear algebra routines for batches of small matrices.\n\nBatmat is used as the linear algebra backend for the [Cyqlone](https://github.com/kul-optec/cyqlone) solver,\nwhere it is used to perform vectorized operations across multiple stages in an optimal control problem.\n\nTo enable vectorization, batmat stores batches of small matrices in an interleaved “compact” format in memory, where the corresponding elements of all matrices in a batch are stored together, as shown in the figure below (for a batch size of two).\nCustom linear algebra routines then operate on all matrices in a batch simultaneously using SIMD instructions. These routines are built on top of highly optimized micro-kernels.\n\n\u003cp align=\"center\"\u003e\n\u003cpicture\u003e\n  \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"docs/interleaved-dark.svg\"\u003e\n  \u003csource media=\"(prefers-color-scheme: light)\" srcset=\"docs/interleaved-light.svg\"\u003e\n  \u003cimg src=\"docs/interleaved-light.svg\" alt=\"visualization of batched matrices\"\u003e\n\u003c/picture\u003e\n\u003c/p\u003e\n\n## Supported routines\n\n| Routine                        | Operation                                        | Notes                                                                    |\n|-------------------------------:|:-------------------------------------------------|:-------------------------------------------------------------------------|\n| `gemm(A, B, D)`                | $D_i = A_i B_i$                                  |                                                                          |\n| `gemm_neg(A, B, D)`            | $D_i = -A_i B_i$                                 |                                                                          |\n| `gemm_add(A, B, C, D)`         | $D_i = C_i + A_i B_i$                            |                                                                          |\n| `gemm_sub(A, B, C, D)`         | $D_i = C_i - A_i B_i$                            |                                                                          |\n| `syrk(A, D)`                   | $D_i = A_i A_i^\\top$                             | $D_i$ symmetric                                                          |\n| `syrk_neg(A, D)`               | $D_i = -A_i A_i^\\top$                            | $D_i$ symmetric                                                          |\n| `syrk_add(A, C, D)`            | $D_i = C_i + A_i A_i^\\top$                       | $C_i, D_i$ symmetric                                                     |\n| `syrk_sub(A, C, D)`            | $D_i = C_i - A_i A_i^\\top$                       | $C_i, D_i$ symmetric                                                     |\n| `trmm(A, B, D)`                | $D_i = A_i B_i$                                  | $A_i$ and/or $B_i$ triangular                                            |\n| `trmm_neg(A, B, D)`            | $D_i = -A_i B_i$                                 | $A_i$ and/or $B_i$ triangular                                            |\n| `trmm_add(A, B, C, D)`         | $D_i = C_i + A_i B_i$                            | $A_i$ and/or $B_i$ triangular                                            |\n| `trmm_sub(A, B, C, D)`         | $D_i = C_i - A_i B_i$                            | $A_i$ and/or $B_i$ triangular                                            |\n| `potrf(C, D)`                  | $D_i = \\mathrm{chol}(C_i)$                       | $C_i$ symmetric positive definite, $D_i$ lower triangular                |\n| `syrk_add_potrf(A, C, D)`      | $D_i = \\mathrm{chol}(C_i + A_i A_i^\\top)$        | $C_i + A_i A_i^\\top$ symmetric positive definite, $D_i$ lower triangular |\n| `syrk_sub_potrf(A, C, D)`      | $D_i = \\mathrm{chol}(C_i - A_i A_i^\\top)$        | $C_i - A_i A_i^\\top$ symmetric positive definite, $D_i$ lower triangular |\n| `trsm(A, B, D)`                | $D_i = A_i^{-1} B_i$ or $D_i = A_i B_i^{-1}$     | $A_i$ or $B_i$ triangular                                                |\n| `trtri(A, D)`                  | $D_i = A_i^{-1}$                                 | $A_i$ triangular                                                         |\n| `gemm_diag(A, B, D, d)`        | $D_i = A_i \\mathrm{diag}(d_i) B_i$               |                                                                          |\n| `gemm_diag_add(A, B, C, D, d)` | $D_i = C_i + A_i \\mathrm{diag}(d_i) B_i$         |                                                                          |\n| `syrk_diag_add(A, C, D, d)`    | $D_i = C_i + A_i \\mathrm{diag}(d_i) A_i^\\top$    | $C_i, D_i$ symmetric                                                     |\n| `symm_add(A, B, C, D)`         | $D_i = C_i + A_i B_i$                            | $A_i$ symmetric                                                          |\n| `copy(A, B)`                   | $B_i = A_i$                                      |                                                                          |\n| `fill(a, B)`                   | $B_i = \\mathrm{broadcast}(a)$                    |                                                                          |\n\nA selection of these routines also support masking, shifting, or rotating the arguments (for example, $D_{i+1} = C_i + A_i B_i$).\n\n## Example usage\n\n```cpp\n#include \u003cbatmat/linalg/copy.hpp\u003e\n#include \u003cbatmat/linalg/gemm.hpp\u003e\n#include \u003cbatmat/linalg/potrf.hpp\u003e\n#include \u003cbatmat/matrix/matrix.hpp\u003e\n#include \u003cguanaqo/print.hpp\u003e\n#include \u003calgorithm\u003e\n#include \u003ccmath\u003e\n#include \u003ciostream\u003e\n#include \u003climits\u003e\n#include \u003crandom\u003e\n\nusing batmat::index_t;\nusing batmat::real_t;\nnamespace la = batmat::linalg;\n\nint main() {\n    using batch_size             = std::integral_constant\u003cindex_t, 4\u003e;\n    constexpr auto storage_order = batmat::matrix::StorageOrder::ColMajor;\n    // Class representing a batch of four matrices.\n    using Mat = batmat::matrix::Matrix\u003creal_t, index_t, batch_size, batch_size, storage_order\u003e;\n    // Allocate some batches of matrices (initialized to zero).\n    index_t n = 3, m = n + 5;\n    Mat C{{.rows = n, .cols = n}}, A{{.rows = n, .cols = m}};\n    // Fill A with random values.\n    std::mt19937 rng{12345};\n    std::uniform_real_distribution\u003creal_t\u003e uni{-1.0, 1.0};\n    std::ranges::generate(A, [\u0026] { return uni(rng); });\n    // Compute C = AAᵀ to make it symmetric positive definite (lower triangular part only).\n    la::syrk(A, la::tril(C));\n    // Allocate L for the Cholesky factors.\n    Mat L{{.rows = n, .cols = n}, batmat::matrix::uninitialized};\n    // Compute the Cholesky factors L of C (lower triangular).\n    la::fill(0, la::triu(L));\n    la::potrf(la::tril(C), la::tril(L));\n    // Print the results.\n    for (index_t l = 0; l \u003c C.depth(); ++l) {\n        guanaqo::print_python(std::cout \u003c\u003c \"C[\" \u003c\u003c l \u003c\u003c \"] =\\n\", C(l));\n        guanaqo::print_python(std::cout \u003c\u003c \"L[\" \u003c\u003c l \u003c\u003c \"] =\\n\", L(l));\n    }\n    // Compute LLᵀ (in-place).\n    la::syrk(la::tril(L));\n    // Check that LLᵀ == C.\n    int errors     = 0;\n    const auto eps = std::numeric_limits\u003creal_t\u003e::epsilon();\n    for (index_t l = 0; l \u003c C.depth(); ++l)\n        for (index_t c = 0; c \u003c C.cols(); ++c)\n            for (index_t r = c; r \u003c C.rows(); ++r)\n                errors += std::abs(C(l, r, c) - L(l, r, c)) \u003c 10 * eps ? 0 : 1;\n    return errors;\n}\n```\n\n## Installation\n\n**Dependencies:** [guanaqo](https://github.com/tttapa/guanaqo)\n\nBatmat can be installed using the standard CMake workflow.\nTo install the necessary dependencies, it is recommended to use the [Conan](https://conan.io/) package manager:\n```sh\ngit clone https://github.com/tttapa/conan-recipes.git\nconan remote add tttapa-conan-recipes \"$PWD/conan-recipes\"\nconan install . --build=missing\ncmake --fresh --preset conan-release\ncmake --build --preset conan-release\n```\n\n\u003e [!TIP]\n\u003e Batmat makes extensive use of vectorization (in fact, that's kind of the whole point).\n\u003e Be sure to enable SIMD ISA extension support in your Conan profile's\n\u003e compiler flags for the best performance.\n\u003e See [scripts/dev/profiles/laptop](scripts/dev/profiles/laptop) for an example.\n\nGCC 14 or later is required (and can be installed through Conan).\nClang is also supported, but this requires the `-o\\\u0026:with_gsi_hpc_simd=True` option to be passed to Conan\nto enable support for the GSI-HPC SIMD library (Clang does not support libstdc++'s `\u003cexperimental/simd\u003e` header).\n\n## Benchmarks\n\nBatmat performs exceptionally well on matrices smaller than around 100×100 (that fit in the L2 cache), where it outperforms traditional scalar linear algebra libraries such as Intel MKL, OpenBLAS, and BLASFEO (especially for triangular or symmetric matrices).\n\n```sh\n. ~/intel/oneapi/setvars.sh  # For the Intel MKL\npython3 -m pip install -r benchmarks/scripts/requirements.txt\nconan install . --build=missing -o guanaqo/\\*:with_mkl=True -o\\\u0026:with_benchmarks=True -o\\\u0026:with_blasfeo=True\ncmake --fresh --preset conan-release -DBATMAT_WITH_ACCURATE_BUILD_TIME=Off\ncmake --build --preset conan-release -t viz-benchmark-potrf  # or gemm, syrk, trsm, syrk-potrf, trmm, trtri, hyh\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftttapa%2Fbatmat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftttapa%2Fbatmat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftttapa%2Fbatmat/lists"}