{"id":18021274,"url":"https://github.com/ho-cooh/clbench","last_synced_at":"2025-04-04T17:23:13.238Z","repository":{"id":112726944,"uuid":"321569071","full_name":"HO-COOH/CLBench","owner":"HO-COOH","description":null,"archived":false,"fork":false,"pushed_at":"2020-12-15T06:11:40.000Z","size":313,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-10T02:45:37.589Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HO-COOH.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-12-15T06:03:24.000Z","updated_at":"2020-12-15T06:11:43.000Z","dependencies_parsed_at":null,"dependency_job_id":"ebde22c6-81fe-4270-85e6-0a96116daf91","html_url":"https://github.com/HO-COOH/CLBench","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HO-COOH%2FCLBench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HO-COOH%2FCLBench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HO-COOH%2FCLBench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HO-COOH%2FCLBench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HO-COOH","download_url":"https://codeload.github.com/HO-COOH/CLBench/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247217903,"owners_count":20903160,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-30T06:09:17.811Z","updated_at":"2025-04-04T17:23:13.223Z","avatar_url":"https://github.com/HO-COOH.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CLBench\nThis is an OpenCL benchmark that examine an end-to-end performance of a typical OpenCL application, which is part of my master degree project.\n\n## What is benchmarked?\nThe benchmark will run the following testing on **your default GPU**. On a laptop, this is usually your integrated GPU.\n- Data transfer\n  + host -\u003e device\n  + device -\u003e host\n- Kernel compilation\n  + compile from source string (both single-threaded \u0026 multi-threaded)\n  + compile from saved binary (both single-threaded \u0026 multi-threaded)\n- Some mathematical operations\n  - Reduction\n  - Matrix multiplication\n  - Convolution\nNote: Some of the benchmark may fail on your GPU. Do NOT use the kernels in the project for real-world application, they are only naive implementations.\n## Dependency\nI packaged dependencies (dll and lib) in [./dependency](./dependency) for Windows 10 64bit, so it should build and run without any additional step.\n\nOtherwise, use [vcpkg](https://github.com/microsoft/vcpkg) to install `OpenCL` with the command:\n```\nvcpkg install OpenCL\n```\nThen do your usual `CMAKE_TOOLCHAIN_FILE` stuff which I do not bother to write here :)\n\n## Sample output\nBelow is an example of running the project on my 1660 Super\n```\n///////////Host -\u003e Device///////////\nTesting \u003cCL_MEM_COPY_HOST_PTR\u003e 0.00390625 MB -\u003e 312.5 MB/s\nTesting \u003cCL_MEM_COPY_HOST_PTR\u003e 1 MB -\u003e 4089.98 MB/s\nTesting \u003cCL_MEM_COPY_HOST_PTR\u003e 32 MB -\u003e 4744.33 MB/s\nTesting \u003cCL_MEM_COPY_HOST_PTR\u003e 512 MB -\u003e 5391.13 MB/s\nTesting \u003cCL_MEM_COPY_HOST_PTR\u003e 1024 MB -\u003e 5265.75 MB/s\nTesting \u003cCL_MEM_COPY_HOST_PTR\u003e 2048 MB -\u003e 5275.61 MB/s\nTesting \u003cCL_MEM_COPY_HOST_PTR\u003e 4096 MB -\u003e 5367.32 MB/s\nTesting \u003cCL_MEM_COPY_HOST_PTR\u003e 6144 MB -\u003e 5288.88 MB/s\nTesting \u003cclEnqueueWriteBuffer\u003e 0.00390625 MB -\u003e 5.66698 MB/s\nTesting \u003cclEnqueueWriteBuffer\u003e 1 MB -\u003e 2070.82 MB/s\nTesting \u003cclEnqueueWriteBuffer\u003e 32 MB -\u003e 4873.29 MB/s\nTesting \u003cclEnqueueWriteBuffer\u003e 512 MB -\u003e 5614.5 MB/s\nTesting \u003cclEnqueueWriteBuffer\u003e 1024 MB -\u003e 5644.04 MB/s\nTesting \u003cclEnqueueWriteBuffer\u003e 2048 MB -\u003e 5539.07 MB/s\nTesting \u003cclEnqueueWriteBuffer\u003e 4096 MB -\u003e 5788.36 MB/s\nTesting \u003cclEnqueueWriteBuffer\u003e 6144 MB -\u003e\nTesting \u003cclEnqueueWriteBuffer\u003e failed: Code: -4 clEnqueueWriteBuffer\nTesting \u003cclEnqueueMapBuffer\u003e 0.00390625 MB -\u003e 2.86005 MB/s\nTesting \u003cclEnqueueMapBuffer\u003e 1 MB -\u003e 768.226 MB/s\nTesting \u003cclEnqueueMapBuffer\u003e 32 MB -\u003e 1819.25 MB/s\nTesting \u003cclEnqueueMapBuffer\u003e 512 MB -\u003e 2161.07 MB/s\nTesting \u003cclEnqueueMapBuffer\u003e 1024 MB -\u003e 2397.11 MB/s\nTesting \u003cclEnqueueMapBuffer\u003e 2048 MB -\u003e 2462.09 MB/s\nTesting \u003callocate + clEnqueueWriteBuffer\u003e 0.00390625 MB -\u003e 5.85118 MB/s\nTesting \u003callocate + clEnqueueWriteBuffer\u003e 1 MB -\u003e 1533.74 MB/s\nTesting \u003callocate + clEnqueueWriteBuffer\u003e 32 MB -\u003e 4796.31 MB/s\nTesting \u003callocate + clEnqueueWriteBuffer\u003e 512 MB -\u003e 5353.68 MB/s\nTesting \u003callocate + clEnqueueWriteBuffer\u003e 1024 MB -\u003e 4712.38 MB/s\nTesting \u003callocate + clEnqueueWriteBuffer\u003e 2048 MB -\u003e 5553.98 MB/s\nTesting \u003callocate + clEnqueueWriteBuffer\u003e 4096 MB -\u003e 5474.66 MB/s\nTesting \u003callocate + clEnqueueWriteBuffer\u003e 6144 MB -\u003e\nTesting \u003cclEnqueueWriteBuffer\u003e failed: Code: -4 clEnqueueWriteBuffer\nTesting \u003callocate + clEnqueueMapBuffer\u003e 0.00390625 MB -\u003e 2.27081 MB/s\nTesting \u003callocate + clEnqueueMapBuffer\u003e 1 MB -\u003e 554.508 MB/s\nTesting \u003callocate + clEnqueueMapBuffer\u003e 32 MB -\u003e 1493.6 MB/s\nTesting \u003callocate + clEnqueueMapBuffer\u003e 512 MB -\u003e 1523.53 MB/s\nTesting \u003callocate + clEnqueueMapBuffer\u003e 1024 MB -\u003e 1669.59 MB/s\nTesting \u003callocate + clEnqueueMapBuffer\u003e 2048 MB -\u003e 1749.95 MB/s\n\n\n////////////Device -\u003e Host///////////\nTesting \u003cclEnqueueReadBuffer\u003e 0.00390625 MB -\u003e 65.8727 MB/s\nTesting \u003cclEnqueueReadBuffer\u003e 1 MB -\u003e 3411.8 MB/s\nTesting \u003cclEnqueueReadBuffer\u003e 32 MB -\u003e 6107.8 MB/s\nTesting \u003cclEnqueueReadBuffer\u003e 512 MB -\u003e 6304.11 MB/s\nTesting \u003cclEnqueueReadBuffer\u003e 1024 MB -\u003e 6257.96 MB/s\nTesting \u003cclEnqueueReadBuffer\u003e 2048 MB -\u003e 6023.15 MB/s\nTesting \u003cclEnqueueMapBuffer\u003e 0.00390625 MB -\u003e 5.17247 MB/s\nTesting \u003cclEnqueueMapBuffer\u003e 1 MB -\u003e 1181.75 MB/s\nTesting \u003cclEnqueueMapBuffer\u003e 32 MB -\u003e 2465.22 MB/s\nTesting \u003cclEnqueueMapBuffer\u003e 512 MB -\u003e 2416.68 MB/s\nTesting \u003cclEnqueueMapBuffer\u003e 1024 MB -\u003e 2369.02 MB/s\nTesting \u003cclEnqueueMapBuffer\u003e 2048 MB -\u003e 2476.68 MB/s\nTesting \u003cCompileSingleThread\u003e -\u003e 1239.43 kernels /s\n14037 microsec\nTesting \u003cCompilingMultiThreadAsync\u003e -\u003e 1528.34 kernels /s\n11465 microsec\nTesting \u003cCompilingMultiThread\u003e -\u003e 1439.45 kernels /s\n12092 microsec\nTesting \u003cCompileSingleThread\u003e -\u003e 424.445 kernels /s\n40500 microsec\nTesting \u003cLoadingBinarySingleThread\u003e -\u003e Loaded 17 kernels from binary 2606.28 kernels /s\n6736 microsec\nTesting \u003cCompileSingleThread\u003e -\u003e 415.394 kernels /s\n41397 microsec\nTesting \u003cLoadingBinaryMultiThread\u003e -\u003e Loaded 17 kernels from binary 2571.43 kernels /s\n6809 microsec\nTesting \u003cReduceInterleaved\u003e with 262144\n1.06997 GB/s Round = 3\n1253 microsec\nReduce result: -338.325\nTesting \u003cReduceInterleaved\u003e with 524288\n1.29406 GB/s Round = 3\n1862 microsec\nReduce result: 225.024\nTesting \u003cReduceInterleaved\u003e with 1048576\n1.84509 GB/s Round = 3\n2607 microsec\nReduce result: -182.728\nTesting \u003cReduceInterleaved\u003e with 2097152\n2.55879 GB/s Round = 3\n3549 microsec\nReduce result: 120.716\nTesting \u003cReduceInterleaved\u003e with 4194304\n3.08496 GB/s Round = 3\n5531 microsec\nReduce result: -489.62\nTesting \u003cReduceInterleaved\u003e with 8388608\n3.53403 GB/s Round = 3\n9343 microsec\nReduce result: 400.092\nTesting \u003cReduceInterleaved\u003e with 16777216\n3.78492 GB/s Round = 4\n17107 microsec\nReduce result: -2265.84\nTesting \u003cReduceInterleaved\u003e with 33554432\n3.95854 GB/s Round = 4\n32094 microsec\nReduce result: 2501.33\nTesting \u003cReduceInterleaved\u003e with 67108864\n4.09666 GB/s Round = 4\n61556 microsec\nReduce result: -1839.39\nTesting \u003cReduceInterleavedNonDivergent\u003e with 262144\n1.14944 GB/s Round = 3\n1205 microsec\nReduce result: -664.753\nTesting \u003cReduceInterleavedNonDivergent\u003e with 524288\n1.22017 GB/s Round = 3\n2113 microsec\nReduce result: 391.283\nTesting \u003cReduceInterleavedNonDivergent\u003e with 1048576\n1.93724 GB/s Round = 3\n2427 microsec\nReduce result: 583.619\nTesting \u003cReduceInterleavedNonDivergent\u003e with 2097152\n2.49314 GB/s Round = 3\n3775 microsec\nReduce result: -51.6634\nTesting \u003cReduceInterleavedNonDivergent\u003e with 4194304\n3.12063 GB/s Round = 3\n5484 microsec\nReduce result: 51.8917\nTesting \u003cReduceInterleavedNonDivergent\u003e with 8388608\n3.53651 GB/s Round = 3\n9346 microsec\nReduce result: -70.9239\nTesting \u003cReduceInterleavedNonDivergent\u003e with 16777216\n3.76654 GB/s Round = 4\n17086 microsec\nReduce result: 51.5029\nTesting \u003cReduceInterleavedNonDivergent\u003e with 33554432\n4.03851 GB/s Round = 4\n31471 microsec\nReduce result: -2708.48\nTesting \u003cReduceInterleavedNonDivergent\u003e with 67108864\n4.17352 GB/s Round = 4\n60432 microsec\nReduce result: -3104.14\nTesting \u003cReduceSequential\u003e with 262144\n1.21102 GB/s Round = 3\n1105 microsec\nReduce result: -334.107\nTesting \u003cReduceSequential\u003e with 524288\n1.39899 GB/s Round = 3\n1711 microsec\nReduce result: -471.559\nTesting \u003cReduceSequential\u003e with 1048576\n1.8424 GB/s Round = 3\n2599 microsec\nReduce result: -48.9131\nTesting \u003cReduceSequential\u003e with 2097152\n2.3791 GB/s Round = 3\n3797 microsec\nReduce result: 291.875\nTesting \u003cReduceSequential\u003e with 4194304\n3.09388 GB/s Round = 3\n5592 microsec\nReduce result: 318.68\nTesting \u003cReduceSequential\u003e with 8388608\n3.55461 GB/s Round = 3\n9293 microsec\nReduce result: 116.345\nTesting \u003cReduceSequential\u003e with 16777216\n3.72765 GB/s Round = 4\n17272 microsec\nReduce result: 510.172\nTesting \u003cReduceSequential\u003e with 33554432\n4.06113 GB/s Round = 4\n31294 microsec\nReduce result: -6119.86\nTesting \u003cReduceSequential\u003e with 67108864\n4.21989 GB/s Round = 4\n59780 microsec\nReduce result: -11.2635\nTesting \u003cFirstAddDuringLoad\u003e with 262144\n665 microsec\nTesting \u003cNaiveConv\u003e with 4096 x 4096channel = 1 with filter = 3\n12.0911 GFlops\nTesting \u003cNaiveConv\u003e with 4096 x 4096channel = 1 with filter = 5\n26.9653 GFlops\nTesting \u003cNaiveConv\u003e with 4096 x 4096channel = 1 with filter = 7\n33.088 GFlops\nTesting \u003cNaiveConv\u003e with 4096 x 4096channel = 1 with filter = 9\n35.1942 GFlops\n```\n\n## Further development\n- Add GNUPlot for sexy output\n- Enable Multi-GPU testing\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fho-cooh%2Fclbench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fho-cooh%2Fclbench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fho-cooh%2Fclbench/lists"}