{"id":16922539,"url":"https://github.com/drbh/simd-alphatensor-rs","last_synced_at":"2025-03-22T11:30:59.287Z","repository":{"id":64844496,"uuid":"548650786","full_name":"drbh/simd-alphatensor-rs","owner":"drbh","description":"🧮 alphatensor matrix breakthrough algorithms + simd + rust.","archived":false,"fork":false,"pushed_at":"2022-10-10T02:23:35.000Z","size":1875,"stargazers_count":59,"open_issues_count":0,"forks_count":6,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-01T15:37:31.190Z","etag":null,"topics":["ai","matrix-multiplication","rust","simd"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/drbh.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-10-10T01:10:44.000Z","updated_at":"2025-02-28T09:56:22.000Z","dependencies_parsed_at":"2022-12-16T05:30:32.678Z","dependency_job_id":null,"html_url":"https://github.com/drbh/simd-alphatensor-rs","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/drbh%2Fsimd-alphatensor-rs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/drbh%2Fsimd-alphatensor-rs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/drbh%2Fsimd-alphatensor-rs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/drbh%2Fsimd-alphatensor-rs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/drbh","download_url":"https://codeload.github.com/drbh/simd-alphatensor-rs/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244207715,"owners_count":20416104,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","matrix-multiplication","rust","simd"],"created_at":"2024-10-13T19:55:50.989Z","updated_at":"2025-03-22T11:30:58.293Z","avatar_url":"https://github.com/drbh.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# simd-alphatensor-rs\n\n\u003e tldr; alphatensor matrix breakthrough algorithims + simd + rust.\n\nThis repo contains the cutting edge matrix multiplication algorithms that were found by [alphatensor](https://www.deepmind.com/blog/discovering-novel-algorithms-with-alphatensor), and as far as I know is the first novel machine imagined algorithm ever 🦾 🧠 \n\nThis repo/library by default includes the first 25 algorithms and the one we are most interested in `multiply_4_by_4_matrix_a_with_4_by_4_matrix_b`. Additionally this implementations aims to minimize the number of multiplication steps and attempts to aggregate as many multiplication steps into a single SIMD vector when possible.\n\n### ELI5\n\nA super smart computer figured out how to do an important math problem in less steps than we knew was possible. By doing this math in less steps we can save time and electricity every time these things are used. And they're used trillions, yes trillions of times a day. \n\n### Example use\n\n```rust\nuse simd_alphatensor_rs::{\n    multiply_2_by_2_matrix_a_with_2_by_2_matrix_b, multiply_4_by_4_matrix_a_with_4_by_4_matrix_b,\n};\n\n// all arrays are unrolled row wise as in and output\nfn main() {\n    let result = multiply_2_by_2_matrix_a_with_2_by_2_matrix_b(\n        [1000, 2000, 3000, 4000],\n        [3000, 4000, 5000, 6000],\n    );\n    println!(\"Example of 2x2 * 2x2 {:?}\", result);\n\n    let result = multiply_4_by_4_matrix_a_with_4_by_4_matrix_b(\n        [\n            1000, 2000, 3000, 4000, 1000, 2000, 3000, 4000, 1000, 2000, 3000, 4000, 1000, 2000,\n            3000, 4000,\n        ],\n        [\n            1000, 2000, 3000, 4000, 1000, 2000, 3000, 4000, 1000, 2000, 3000, 4000, 1000, 2000,\n            3000, 4000,\n        ],\n    );\n    println!(\"Example of 4x4 * 4x4 {:?}\", result)\n}\n```\n\n### Benchmarks\n\nRun with\n```bash\ncargo bench\n```\n\n| Benchmark      | Average Runtime |\n| ----------- | ----------- |\n| ndarry_2x2x2      | 314.53 ns       |\n| simd_alphatensor_2x2x2   | 13.821 ns        |\n| ndarry_4x4x4   | 402.60 ns        |\n| simd_alphatensor_4x4x4   | 92.732 ns        |\n\n```\n➜  simd-alphatensor-rs (main) ✗ cargo bench\n    Finished bench [optimized] target(s) in 0.13s\n     Running unittests src/lib.rs (target/release/deps/simd_alphatensor_rs-115473bd3321baa9)\n\nrunning 0 tests\n\ntest result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s\n\nBenchmarking simd_alphatensor_2x2x2: Collecting 100 samples in estimated 5.0000 s (3                                                                                    simd_alphatensor_2x2x2  time:   [13.798 ns 13.821 ns 13.847 ns]\n                        change: [-4.8455% -3.8407% -2.9273%] (p = 0.00 \u003c 0.05)\n                        Performance has improved.\nFound 7 outliers among 100 measurements (7.00%)\n  4 (4.00%) high mild\n  3 (3.00%) high severe\n\nBenchmarking ndarry_2x2x2: Collecting 100 samples in estimated 5.0003 s (16M iterati                                                                                    ndarry_2x2x2            time:   [311.79 ns 314.53 ns 317.32 ns]\n                        change: [-10.524% -8.5993% -6.6801%] (p = 0.00 \u003c 0.05)\n                        Performance has improved.\n\nBenchmarking simd_alphatensor_4x4x4: Collecting 100 samples in estimated 5.0002 s (5                                                                                    simd_alphatensor_4x4x4  time:   [92.532 ns 92.732 ns 92.946 ns]\n                        change: [-2.1432% -1.3174% -0.6179%] (p = 0.00 \u003c 0.05)\n                        Change within noise threshold.\nFound 5 outliers among 100 measurements (5.00%)\n  2 (2.00%) high mild\n  3 (3.00%) high severe\n\nBenchmarking ndarry_4x4x4: Collecting 100 samples in estimated 5.0006 s (12M iterati                                                                                    ndarry_4x4x4            time:   [400.73 ns 402.60 ns 404.58 ns]\n                        change: [-1.2978% -0.4004% +0.5315%] (p = 0.41 \u003e 0.05)\n                        No change in performance detected.\nFound 3 outliers among 100 measurements (3.00%)\n  1 (1.00%) high mild\n  2 (2.00%) high severe\n```\n\n### Generation\n\nYou can manually generate this code via some scripts. Currently the algorithms are parsed by first converting the released numpy matrix files into python code. 🙏 big shout out to @https://github.com/99991 for this gem: [here](https://github.com/deepmind/alphatensor/issues/3). Once this python code is generated we can parse it into rust code and finally we can use the code as a library.\n\n```bash\n# fetch files from deepmind's github\nmake get-data\n\n# convert alphatensors output\nmake gen\n\n# parse python file and write to lib\ncargo run -p codegen \u003e src/gen.rs\n\n# format the generated code\ncargo fmt\n\n# make sure it works with small example\ncargo run --example simple\n\n# profit 🙌\n```\n\n### Known limitations\n\n1. This code is all experimental and is 2 steps removed from the output file shared by deepmind. We've tried our best to preserve the accuracy and the output is validated between steps. However please use this at you're own risk! and please please - do not use this in any production system!\n2. Not all of the algorithms are ported into rust. Since the output results in vert large expanded rust functions, for demonstration purposes this library only ships with the following algorithm. This subset was selected since the specific `4x4x4` algorithm is likely the most applicable and also one of the algorithms that is superior to any known methods.\n\n### Implementation considerations\n\n```rust\n// explicit funcs for each algo and explicit input and output sized arrays\n// since everything is statically sized we can keep everything on the stack\npub fn multiply_2_by_2_matrix_a_with_2_by_2_matrix_b(a: [i32; 4], b: [i32; 4]) -\u003e [i32; 4] {\n    let [a11, a12, a21, a22] = a;\n    let [b11, b12, b21, b22] = b;\n\n    // we inline all of our multiplication steps in as few fixed sized \n    // 512-bit SIMD vector with 16 elements of type i32.\n    // we don't always need all of the elements but in this impl we skipped\n    // dynamically resizing the vector.\n    let lefts = [i32x16::from([\n        (a21 - a22),\n        (a11 + a21 - a22),\n        (a11 - a12 + a21 - a22),\n        a12,\n        (a11 + a21),\n        a11,\n        a22,\n        0,\n        0,\n        0,\n        0,\n        0,\n        0,\n        0,\n        0,\n        0,\n    ])];\n    let rights = [i32x16::from([\n        b12,\n        (b12 + b21 + b22),\n        (b21 + b22),\n        b21,\n        (b11 + b12 + b21 + b22),\n        b11,\n        (b12 + b22),\n        0,\n        0,\n        0,\n        0,\n        0,\n        0,\n        0,\n        0,\n        0,\n    ])];\n    \n    // here we do all of the multiplications above in only as many steps as \n    // simd vectors. In this case we only need one instruction (multiplication)\n    // to perform all 7 multiplications needed for this algorithm\n    let hs = [lefts[0] * rights[0]];\n\n    // do the final summation steps\n    let c11 = (hs[0][3] + hs[0][5]);\n    let c12 = (-hs[0][1] + hs[0][4] - hs[0][5] - hs[0][6]);\n    let c21 = (-hs[0][0] + hs[0][1] - hs[0][2] - hs[0][3]);\n    let c22 = (hs[0][0] + hs[0][6]);\n\n    // return new array\n    return [c11, c12, c21, c22];\n}\n```\n\n### TODOS\n\n1. parse numpy directly in rust\n2. include all (novel) algorithms\n3. add a lot of benchmarking\n4. clean up codegen\n5. explain SIMD better\n6. adjust SIMD array to minimize trailing\n7. maybe SIMD config?\n8. recursively make into SIMD only steps?\n9. tests tests tests","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdrbh%2Fsimd-alphatensor-rs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdrbh%2Fsimd-alphatensor-rs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdrbh%2Fsimd-alphatensor-rs/lists"}