{"id":15519061,"url":"https://github.com/danieldk/gemm-benchmark","last_synced_at":"2025-04-19T12:48:16.908Z","repository":{"id":57633428,"uuid":"383556709","full_name":"danieldk/gemm-benchmark","owner":"danieldk","description":"Simple [sd]gemm benchmark, similar to ACES dgemm","archived":false,"fork":false,"pushed_at":"2024-05-19T11:51:39.000Z","size":54,"stargazers_count":9,"open_issues_count":2,"forks_count":3,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-11-28T20:12:00.324Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/danieldk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-07-06T17:58:51.000Z","updated_at":"2024-05-23T09:22:54.000Z","dependencies_parsed_at":"2024-05-18T13:24:50.274Z","dependency_job_id":"42b89342-716e-46c5-9c04-8e2c9dfdbc6a","html_url":"https://github.com/danieldk/gemm-benchmark","commit_stats":{"total_commits":20,"total_committers":1,"mean_commits":20.0,"dds":0.0,"last_synced_commit":"4ecf3fd45af325cb875074d39d5debdbaa61373c"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieldk%2Fgemm-benchmark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieldk%2Fgemm-benchmark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieldk%2Fgemm-benchmark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieldk%2Fgemm-benchmark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/danieldk","download_url":"https://codeload.github.com/danieldk/gemm-benchmark/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":227598475,"owners_count":17791605,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-02T10:19:56.755Z","updated_at":"2024-12-02T18:04:12.446Z","avatar_url":"https://github.com/danieldk.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# `[sd]gemm` benchmark\n\n## Introduction\n\nThis is a small `[sd]gemm` benchmark based, similar to\n[ACES DGEMM](https://www.lanl.gov/projects/crossroads/benchmarks-performance-analysis.php),\nimplemented in Rust. It supports the following BLAS libraries:\n\n- Accelerate (macOS)\n- Intel MKL\n- OpenBLAS\n\n## Building\n\n### Build with Accelerate (macOS)\n\n```\n$ cargo install gemm-benchmark --features accelerate\n```\n\n### Build with BLIS\n\n```\n$ cargo install gemm-benchmark --features blis\n```\n\n### Build with Intel MKL\n\nTo build the benchmark with Intel MKL statically linked, use:\n\n```\n$ cargo install gemm-benchmark --features intel-mkl\n```\n\nIntel MKL uses Zen-specific `[sd]gemm`kernels on AMD Zen CPUs.\nHowever, these kernels are slower on many Zen CPUs than the AVX2\nkernels. You can build the benchmark to override Intel CPU\ndetection, so that MKL uses AVX2 kernels on Zen CPUs as well. This\ndoes require dynamic linking, since it is not permitted to modify\nMKL binaries. To enable this override, use the `intel-mkl-amd`\nfeature:\n\n```\n$ cargo install gemm-benchmark --features intel-mkl-amd\n```\n\n### Build with OpenBLAS\n\n```shell\n$ cargo install gemm-benchmark --features openblas\n```\n\nSet `OPENBLAS_NUM_THREADS=1` before running.\n\n## Benchmarking\n\nBy default, `sgemm` is benchmarked using _256 x 256_ matrices, for\n_1,000_ iterations and _1_ thread. The dimensionality (`-d`), number\nof iterations (`-i`), and the number of threads (`-t`) can be set\nwith command-line flags. For example:\n\n```shell\n$ gemm-benchmark -d 1024 -i 2000 -t 4\n```\n\nRuns the benchmark using _1024 x 1024_ matrices, for _1,000_ iterations,\nand _4_ threads. It is also possible to benchmark `dgem,` using the\n`--dgemm` option:\n\n```shell\n$ gemm-benchmark -d 1024 -i 2000 -t 4 --dgemm\n```\n\n## Example results\n\nThe following table shows GFLOPS for various CPUs using 1 to 16 threads on\nmatrix size 768, tested for 1000 iterations (`gemm-benchmark -d 768 -t NTHREADS`).\n\n| Threads | M1 Accelerate | M1 Pro Accelerate | M1 Ultra Accelerate | M2 Accelerate | i7-13700K |\n| ------- | ------------- | ----------------- | ------------------- | ------------- | --------- |\n| 1       | 1340          | 2061              | 2177                | 1475          | 165       |\n| 2       | 1226          | 2583              | 3427                | 1639          | 323       |\n| 4       | 1102          | 2685              | 3788                | 1730          | 646       |\n| 8       | 1253          | 2381              | 4344                | 1601          | 1279      |\n| 12      | 1225          | 2248              | 4261                | 1456          | 1148      |\n| 16      | 1217          | 2254              | 4376                | 1388          | 1524      |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanieldk%2Fgemm-benchmark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdanieldk%2Fgemm-benchmark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanieldk%2Fgemm-benchmark/lists"}