{"id":17278327,"url":"https://github.com/projectphysx/opencl-benchmark","last_synced_at":"2025-04-04T17:05:55.182Z","repository":{"id":159390068,"uuid":"634619703","full_name":"ProjectPhysX/OpenCL-Benchmark","owner":"ProjectPhysX","description":"A small OpenCL benchmark program to measure peak GPU/CPU performance.","archived":false,"fork":false,"pushed_at":"2025-03-20T06:03:52.000Z","size":307,"stargazers_count":194,"open_issues_count":6,"forks_count":27,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-03-28T16:04:49.951Z","etag":null,"topics":["bandwidth","benchmark","benchmarking","flops","gpgpu","gpu","gpu-computing","high-performance-computing","hpc","opencl","tool","tools"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ProjectPhysX.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-30T18:02:00.000Z","updated_at":"2025-03-25T06:40:43.000Z","dependencies_parsed_at":null,"dependency_job_id":"8b1e7c0e-9e05-46ae-9adf-97a2478fb830","html_url":"https://github.com/ProjectPhysX/OpenCL-Benchmark","commit_stats":{"total_commits":35,"total_committers":2,"mean_commits":17.5,"dds":"0.34285714285714286","last_synced_commit":"1ece450876d8b7c4f1d7784911190d5c3b48e37d"},"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ProjectPhysX%2FOpenCL-Benchmark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ProjectPhysX%2FOpenCL-Benchmark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ProjectPhysX%2FOpenCL-Benchmark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ProjectPhysX%2FOpenCL-Benchmark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ProjectPhysX","download_url":"https://codeload.github.com/ProjectPhysX/OpenCL-Benchmark/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247217174,"owners_count":20903008,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bandwidth","benchmark","benchmarking","flops","gpgpu","gpu","gpu-computing","high-performance-computing","hpc","opencl","tool","tools"],"created_at":"2024-10-15T09:11:25.969Z","updated_at":"2025-04-04T17:05:55.148Z","avatar_url":"https://github.com/ProjectPhysX.png","language":"C++","readme":"# OpenCL-Benchmark\n\nA small [OpenCL](https://github.com/ProjectPhysX/OpenCL-Wrapper \"OpenCL-Wrapper\") benchmark program to measure peak GPU/CPU performance.\n\nWorks with any GPU in Windows, Linux, macOS and Android.\n\n\n\n## Measurements\n- compute performance (`FP64` (scalar), `FP32` (scalar), `FP16` (half2), `INT64` (scalar), `INT32` (scalar), `INT16` (short2), `INT8` (dp4a))\n  - closest possible fraction/multiplicator of `measured compute performance` divided by `reported theoretical FP32 performance` is shown in `(round brackets)`\n    - for example when OpenCL reports `19.492` TFLOPs/s theoretical FP32, and the benchmark measures `9.512` TFLOPs/s for FP64, the ratio of `(measured FP64)/(theoretical FP32) = 9.512/19.492 = 1/2.05` is rounded to the next possible value of `1/2` and reported as such\n    - these ratios for any GPU/CPU architecture can only be either `1/64`, `1/32`, `1/24`, `1/16`, `1/12`, `1/8`, `1/4`, `1/3`, `1/2`, `2/3`, `1x`, `2x`, `4x`, `8x`, `16x`, `32x`, `64x`, and nothing in between\n- memory bandwidth (`coalesced`/`misaligned` `read`/`write`)\n- PCIe bandwidth (`send`/`receive`/`bidirectional`)\n  - PCIe Gen is estimated based on measured PCIe bandwidth and assumed x16 link width\n\n\n\n## How to use?\n\n### Windows\n- Download and install [Visual Studio Community](https://visualstudio.microsoft.com/de/vs/community/). In Visual Studio Installer, add:\n  - Desktop development with C++\n  - MSVC v142\n  - Windows 10 SDK\n- Open [`OpenCL-Benchmark.sln`](OpenCL-Benchmark.sln) in [Visual Studio Community](https://visualstudio.microsoft.com/de/vs/community/).\n- Compile and run by clicking the \u003ckbd\u003e► Local Windows Debugger\u003c/kbd\u003e button.\n- To run outside of [Visual Studio Community](https://visualstudio.microsoft.com/de/vs/community/), open Windows CMD in the `OpenCL-Benchmark` folder (type `cmd` in File Explorer in the directory field and press \u003ckbd\u003eEnter\u003c/kbd\u003e), then run\n  ```\n  OpenCL-Benchmark.exe\n  ```\n\n### Linux / macOS / Android\n- Download, compile and run:\n  ```\n  git clone https://github.com/ProjectPhysX/OpenCL-Benchmark.git\n  cd OpenCL-Benchmark\n  chmod +x make.sh\n  ./make.sh\n  ```\n- Run\n  ```\n  bin/OpenCL-Benchmark\n  ```\n\n### Run only for a specified list of devices\n- call `bin\\OpenCL-Benchmark.exe 0 2 5` (Windows) or `bin/OpenCL-Benchmark 0 2 5` (Linux/macOS) with the number(s) being the device IDs to be benchmarked\n\n\n\n## Examples\n```\n|----------------.------------------------------------------------------------|\n| Device ID      | 0                                                          |\n| Device Name    | NVIDIA H100 80GB HBM3                                      |\n| Device Vendor  | NVIDIA Corporation                                         |\n| Device Driver  | 565.57.01 (Linux)                                          |\n| OpenCL Version | OpenCL C 3.0                                               |\n| Compute Units  | 132 at 1980 MHz (16896 cores, 66.908 TFLOPs/s)             |\n| Memory, Cache  | 81105 MB VRAM, 4224 KB global / 48 KB local                |\n| Buffer Limits  | 20276 MB global, 64 KB constant                            |\n|----------------'------------------------------------------------------------|\n| Info: OpenCL C code successfully compiled.                                  |\n| FP64  compute                                        31.184 TFLOPs/s (1/2 ) |\n| FP32  compute                                        62.908 TFLOPs/s ( 1x ) |\n| FP16  compute                                       123.749 TFLOPs/s ( 2x ) |\n| INT64 compute                                         3.227  TIOPs/s (1/24) |\n| INT32 compute                                        32.946  TIOPs/s (1/2 ) |\n| INT16 compute                                        30.901  TIOPs/s (1/2 ) |\n| INT8  compute                                       103.204  TIOPs/s ( 2x ) |\n| Memory Bandwidth ( coalesced read      )                       3025.53 GB/s |\n| Memory Bandwidth ( coalesced      write)                       3055.98 GB/s |\n| Memory Bandwidth (misaligned read      )                       2102.44 GB/s |\n| Memory Bandwidth (misaligned      write)                        314.25 GB/s |\n| PCIe   Bandwidth (send                 )                         10.53 GB/s |\n| PCIe   Bandwidth (   receive           )                         11.47 GB/s |\n| PCIe   Bandwidth (        bidirectional)            (Gen4 x16)   10.91 GB/s |\n|-----------------------------------------------------------------------------|\n```\n```\n|----------------.------------------------------------------------------------|\n| Device ID      | 0                                                          |\n| Device Name    | AMD Instinct MI300X                                        |\n| Device Vendor  | Advanced Micro Devices, Inc.                               |\n| Device Driver  | 3635.0 (HSA1.1,LC) (Linux)                                 |\n| OpenCL Version | OpenCL C 2.0                                               |\n| Compute Units  | 304 at 2100 MHz (19456 cores, 81.715 TFLOPs/s)             |\n| Memory, Cache  | 196592 MB VRAM, 32 KB global / 64 KB local                 |\n| Buffer Limits  | 196592 MB global, 201310208 KB constant                    |\n|----------------'------------------------------------------------------------|\n| Info: OpenCL C code successfully compiled.                                  |\n| FP64  compute                                        54.944 TFLOPs/s (2/3 ) |\n| FP32  compute                                       130.000 TFLOPs/s ( 2x ) |\n| FP16  compute                                       141.320 TFLOPs/s ( 2x ) |\n| INT64 compute                                         3.666  TIOPs/s (1/24) |\n| INT32 compute                                        47.736  TIOPs/s (2/3 ) |\n| INT16 compute                                        69.022  TIOPs/s ( 1x ) |\n| INT8  compute                                       106.178  TIOPs/s ( 1x ) |\n| Memory Bandwidth ( coalesced read      )                       3756.64 GB/s |\n| Memory Bandwidth ( coalesced      write)                       4686.31 GB/s |\n| Memory Bandwidth (misaligned read      )                       3881.24 GB/s |\n| Memory Bandwidth (misaligned      write)                       2491.25 GB/s |\n| PCIe   Bandwidth (send                 )                         54.57 GB/s |\n| PCIe   Bandwidth (   receive           )                         55.79 GB/s |\n| PCIe   Bandwidth (        bidirectional)            (Gen4 x16)   55.21 GB/s |\n|-----------------------------------------------------------------------------|\n```\n```\n|----------------.------------------------------------------------------------|\n| Device ID      | 0                                                          |\n| Device Name    | Intel(R) Arc(TM) B580 Graphics                             |\n| Device Vendor  | Intel(R) Corporation                                       |\n| Device Driver  | 32.0.101.6559 (Windows)                                    |\n| OpenCL Version | OpenCL C 3.0                                               |\n| Compute Units  | 160 at 2850 MHz (2560 cores, 14.592 TFLOPs/s)              |\n| Memory, Cache  | 12187 MB VRAM, 18432 KB global / 128 KB local              |\n| Buffer Limits  | 11944 MB global, 12230900 KB constant                      |\n|----------------'------------------------------------------------------------|\n| Info: OpenCL C code successfully compiled.                                  |\n| FP64  compute                                         0.896 TFLOPs/s (1/16) |\n| FP32  compute                                        14.249 TFLOPs/s ( 1x ) |\n| FP16  compute                                        26.547 TFLOPs/s ( 2x ) |\n| INT64 compute                                         0.636  TIOPs/s (1/24) |\n| INT32 compute                                         4.556  TIOPs/s (1/3 ) |\n| INT16 compute                                        37.082  TIOPs/s ( 2x ) |\n| INT8  compute                                        48.668  TIOPs/s ( 4x ) |\n| Memory Bandwidth ( coalesced read      )                        574.09 GB/s |\n| Memory Bandwidth ( coalesced      write)                        468.07 GB/s |\n| Memory Bandwidth (misaligned read      )                        796.23 GB/s |\n| Memory Bandwidth (misaligned      write)                        383.15 GB/s |\n| PCIe   Bandwidth (send                 )                          4.99 GB/s |\n| PCIe   Bandwidth (   receive           )                          4.87 GB/s |\n| PCIe   Bandwidth (        bidirectional)            (Gen3 x16)    5.11 GB/s |\n|-----------------------------------------------------------------------------|\n|----------------.------------------------------------------------------------|\n```\n```\n|----------------.------------------------------------------------------------|\n| Device ID      | 0                                                          |\n| Device Name    | AMD EPYC 9554 64-Core Processor                            |\n| Device Vendor  | Intel(R) Corporation                                       |\n| Device Driver  | 2024.18.10.0.08_160000 (Linux)                             |\n| OpenCL Version | OpenCL C 3.0                                               |\n| Compute Units  | 128 at 0 MHz (64 cores, 0.000 TFLOPs/s)                    |\n| Memory, Cache  | 386363 MB RAM, 1024 KB global / 256 KB local               |\n| Buffer Limits  | 386363 MB global, 128 KB constant                          |\n|----------------'------------------------------------------------------------|\n| Info: OpenCL C code successfully compiled.                                  |\n| FP64  compute                                         3.739 TFLOPs/s (1/64) |\n| FP32  compute                                         3.842 TFLOPs/s (1/64) |\n| FP16  compute                                         0.863 TFLOPs/s (1/64) |\n| INT64 compute                                         1.506  TIOPs/s (1/64) |\n| INT32 compute                                         4.240  TIOPs/s (1/64) |\n| INT16 compute                                         8.592  TIOPs/s (1/64) |\n| INT8  compute                                         2.774  TIOPs/s (1/64) |\n| Memory Bandwidth ( coalesced read      )                        391.09 GB/s |\n| Memory Bandwidth ( coalesced      write)                        167.26 GB/s |\n| Memory Bandwidth (misaligned read      )                        248.65 GB/s |\n| Memory Bandwidth (misaligned      write)                        156.18 GB/s |\n|-----------------------------------------------------------------------------|\n```\n```\n|----------------.------------------------------------------------------------|\n| Device ID      | 1                                                          |\n| Device Name    | Intel(R) UHD Graphics 630                                  |\n| Device Vendor  | Intel(R) Corporation                                       |\n| Device Driver  | 31.0.101.2130 (Windows)                                    |\n| OpenCL Version | OpenCL C 3.0                                               |\n| Compute Units  | 24 at 1200 MHz (192 cores, 0.461 TFLOPs/s)                 |\n| Memory, Cache  | 6500 MB RAM, 768 KB global / 64 KB local                   |\n| Buffer Limits  | 3250 MB global, 3328048 KB constant                        |\n|----------------'------------------------------------------------------------|\n| Info: OpenCL C code successfully compiled.                                  |\n| FP64  compute                                         0.112 TFLOPs/s (1/4 ) |\n| FP32  compute                                         0.437 TFLOPs/s ( 1x ) |\n| FP16  compute                                         0.801 TFLOPs/s ( 2x ) |\n| INT64 compute                                         0.016  TIOPs/s (1/32) |\n| INT32 compute                                         0.149  TIOPs/s (1/3 ) |\n| INT16 compute                                         0.863  TIOPs/s ( 2x ) |\n| INT8  compute                                         0.213  TIOPs/s (1/2 ) |\n| Memory Bandwidth ( coalesced read      )                         20.98 GB/s |\n| Memory Bandwidth ( coalesced      write)                         25.18 GB/s |\n| Memory Bandwidth (misaligned read      )                         35.16 GB/s |\n| Memory Bandwidth (misaligned      write)                         16.18 GB/s |\n|-----------------------------------------------------------------------------|\n```\n```\n|----------------.------------------------------------------------------------|\n| Device ID      | 2                                                          |\n| Device Name    | Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz                   |\n| Device Vendor  | Intel(R) Corporation                                       |\n| Device Driver  | 2024.17.3.0.08_160000 (Windows)                            |\n| OpenCL Version | OpenCL C 3.0                                               |\n| Compute Units  | 12 at 3700 MHz (6 cores, 0.710 TFLOPs/s)                   |\n| Memory, Cache  | 16250 MB RAM, 256 KB global / 32 KB local                  |\n| Buffer Limits  | 16250 MB global, 128 KB constant                           |\n|----------------'------------------------------------------------------------|\n| Info: OpenCL C code successfully compiled.                                  |\n| FP64  compute                                         0.151 TFLOPs/s (1/4 ) |\n| FP32  compute                                         0.158 TFLOPs/s (1/4 ) |\n| FP16  compute                                          not supported        |\n| INT64 compute                                         0.042  TIOPs/s (1/16) |\n| INT32 compute                                         0.063  TIOPs/s (1/12) |\n| INT16 compute                                         0.224  TIOPs/s (1/3 ) |\n| INT8  compute                                         0.059  TIOPs/s (1/12) |\n| Memory Bandwidth ( coalesced read      )                         16.92 GB/s |\n| Memory Bandwidth ( coalesced      write)                          8.08 GB/s |\n| Memory Bandwidth (misaligned read      )                         40.02 GB/s |\n| Memory Bandwidth (misaligned      write)                         13.69 GB/s |\n|-----------------------------------------------------------------------------|\n```","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprojectphysx%2Fopencl-benchmark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fprojectphysx%2Fopencl-benchmark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprojectphysx%2Fopencl-benchmark/lists"}