{"id":14995734,"url":"https://github.com/dev0x13/gemm-benchmark-2023","last_synced_at":"2025-03-21T12:25:31.970Z","repository":{"id":213728906,"uuid":"732169080","full_name":"dev0x13/gemm-benchmark-2023","owner":"dev0x13","description":"Benchmarks for some modern (2023) high-performance floating-point GEMM implementations compared to Mojo language","archived":false,"fork":false,"pushed_at":"2024-06-18T14:52:17.000Z","size":705,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-01-26T08:28:05.787Z","etag":null,"topics":["benchmark","gemm","mojo"],"latest_commit_sha":null,"homepage":"","language":"Mojo","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dev0x13.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-15T20:25:35.000Z","updated_at":"2024-06-18T14:54:38.000Z","dependencies_parsed_at":"2024-09-24T16:22:49.405Z","dependency_job_id":"6a905af4-3fe3-48dd-97e5-086ac4e45ba0","html_url":"https://github.com/dev0x13/gemm-benchmark-2023","commit_stats":null,"previous_names":["dev0x13/gemm-benchmark-2023"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dev0x13%2Fgemm-benchmark-2023","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dev0x13%2Fgemm-benchmark-2023/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dev0x13%2Fgemm-benchmark-2023/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dev0x13%2Fgemm-benchmark-2023/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dev0x13","download_url":"https://codeload.github.com/dev0x13/gemm-benchmark-2023/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244796834,"owners_count":20511758,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","gemm","mojo"],"created_at":"2024-09-24T16:19:46.914Z","updated_at":"2025-03-21T12:25:31.946Z","avatar_url":"https://github.com/dev0x13.png","language":"Mojo","funding_links":[],"categories":["Performance Benchmark"],"sub_categories":[],"readme":"# gemm-benchmark-2023\n\nThis repository hosts a set of benchmarks for modern (2023) high-performance floating-point GEMM implementations.\n\nImplementaitons covered:\n* Naive C++\n* Intel MKL 2020.4.304\n* Eigen 3.4.0\n* OpenBLAS 0.3.25\n* Mojo 0.6.0\n\nThe results and background were discussed in the following articles:\n* [Medium (EN)](https://medium.com/@dev0x13/a-first-look-at-the-performance-of-floating-point-gemm-implementation-for-cpu-in-mojo-d7d05299657a)\n* [Habr (RU)](https://habr.com/ru/articles/783138/)\n\n## Running benchmarks\n\n1.  Build the Docker image:\n```shell\ndocker build --build-arg=\"MODULAR_AUTH_TOKEN=\u003ctoken\u003e\" -t gemm-benchmark-2023 .\n```\nThe `MODULAR_AUTH_TOKEN` argument is optional. If set, the Docker image will also include benchmark for the Mojo language. The Modular auth token can be obtained from [Modular's website](https://developer.modular.com/download).\n\n2. Run the Docker container:\n```shell\ndocker run -it gemm-benchmark-2023\n```\nThe container will output benchmark results in GFLOPS to the console.\n\n## Reference numbers\n\nHere are some reference results obtained on Intel Xeon Platinum 8124M (16 cores, `c5.4xlarge` AWS EC2 instance) running Ubuntu 22.04:\n\n### Multithread\n\n| Problem size   | Mojo (\"Swizzled\") | Eigen | MKL    | OpenBLAS |\n|----------------|-------------------|-------|--------|----------|\n| 128x128x128    |              24.4 | 109.0 |  543.8 |    143.6 |\n| 256x256x256    |             129.6 | 190.3 |  939.5 |    380.9 |\n| 256x1024x4096  |             835.4 | 378.8 | 1063.6 |    860.6 |\n| 256x4096x1024  |             818.0 | 428.0 | 1001.6 |    770.9 |\n| 256x1024x1024  |             690.8 | 387.8 | 1037.7 |    806.8 |\n| 128x1024x4096  |             820.5 | 390.8 | 1078.6 |    784.4 |\n| 128x4096x1024  |             795.0 | 404.6 | 1044.2 |    688.9 |\n| 128x1024x1024  |             679.6 | 380.9 | 1028.9 |    707.4 |\n| 256x768x768    |             582.2 | 351.1 | 1051.7 |    818.5 |\n| 128x768x768    |             579.0 | 342.1 |  893.4 |    707.5 |\n| 128x3072x768   |             783.4 | 396.7 |  990.2 |    755.0 |\n| 128x768x3072   |             814.3 | 381.2 | 1085.7 |    784.2 |\n| 256x3072x768   |             794.7 | 424.3 | 1096.7 |    846.4 |\n| 256x768x3072   |             819.6 | 356.1 | 1116.0 |    865.4 |\n| 128x768x2304   |             797.8 | 381.0 | 1089.7 |    826.5 |\n| 1024x2560x1024 |             808.8 | 437.3 | 1176.1 |   1126.2 |\n| 1024x1024x512  |             556.7 | 399.0 | 1227.9 |    933.6 |\n| 1024x352x512   |               0.0 | 206.3 | 1185.6 |    575.1 |\n| 1024x512x256   |             245.4 | 313.5 | 1231.7 |    749.2 |\n\n### Single thread\n\n| Problem size   | Mojo (\"Vectorized\") | Eigen | MKL   | OpenBLAS | Naive |\n|----------------|---------------------|-------|-------|----------|-------|\n| 128x128x128    |                18.6 |  67.8 | 166.9 |     98.7 |  21.1 |\n| 256x256x256    |                20.5 |  57.1 | 149.6 |    133.5 |  19.5 |\n| 256x1024x4096  |                10.0 |  56.0 | 136.4 |    125.4 |  11.0 |\n| 256x4096x1024  |                 8.3 |  56.5 | 138.6 |    128.0 |  11.0 |\n| 256x1024x1024  |                11.9 |  57.7 | 152.0 |    140.5 |  13.2 |\n| 128x1024x4096  |                10.3 |  51.0 |  80.5 |    112.3 |  11.1 |\n| 128x4096x1024  |                 8.5 |  51.0 |  77.4 |    112.3 |  10.8 |\n| 128x1024x1024  |                11.5 |  53.4 |  91.4 |    127.9 |  13.2 |\n| 256x768x768    |                26.0 |  58.2 | 155.0 |    151.0 |  13.1 |\n| 128x768x768    |                12.5 |  54.6 | 163.3 |    137.7 |  13.2 |\n| 128x3072x768   |                 9.6 |  52.1 |  87.7 |    124.5 |  12.5 |\n| 128x768x3072   |                11.2 |  51.9 | 157.2 |    134.6 |  12.6 |\n| 256x3072x768   |                 9.7 |  58.6 | 147.4 |    137.0 |  12.4 |\n| 256x768x3072   |                12.0 |  59.6 | 148.4 |    143.3 |  12.6 |\n| 128x768x2304   |                12.4 |  52.4 | 162.7 |    135.2 |  12.7 |\n| 1024x2560x1024 |                10.7 |  58.8 | 160.3 |    151.9 |  12.6 |\n| 1024x1024x512  |                11.7 |  58.1 | 163.5 |    141.8 |  13.2 |\n| 1024x352x512   |                 0.0 |  56.6 | 162.3 |    159.4 |  17.1 |\n| 1024x512x256   |                25.6 |  57.5 | 167.7 |    143.8 |  20.3 |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdev0x13%2Fgemm-benchmark-2023","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdev0x13%2Fgemm-benchmark-2023","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdev0x13%2Fgemm-benchmark-2023/lists"}