Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/harrism/cuda_event_benchmark
Unit benchmarks of CUDA event APIs.
https://github.com/harrism/cuda_event_benchmark
benchmarks cuda
Last synced: 3 months ago
JSON representation
Unit benchmarks of CUDA event APIs.
- Host: GitHub
- URL: https://github.com/harrism/cuda_event_benchmark
- Owner: harrism
- License: apache-2.0
- Created: 2020-06-19T01:07:31.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2024-04-23T04:47:11.000Z (10 months ago)
- Last Synced: 2024-10-14T14:39:17.927Z (4 months ago)
- Topics: benchmarks, cuda
- Language: C++
- Homepage:
- Size: 23.4 KB
- Stars: 17
- Watchers: 3
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# CUDA Event Benchmarks
Simple benchmarks of `cudaEvent_t` APIs:
- `cudaEventCreate`
- `cudaEventRecord`
- `cudaEventQuery`
- `cudaStreamWaitEvent`
- `cudaEventDestroy`
- A simulated event pool that maintains a list of free events (more of a benchmark of `std::list`
push/pop for cost comparison to `cudaEventCreate`).Each test is performed once using default-created events (support timing) and once with events that
do not support timing.Here are the results from running with a single GPU (`CUDA_VISIBLE_DEVICES` is set to only that GPU)
of an NVIDIA DGX1 (with Tesla V100 GPUs with 32GB each).
- OS: `Ubuntu 18.04`.
- CUDA: `10.2`.
- NVIDIA Driver: `440.64.00`.```
(cudf_dev_10.2) mharris@dgx02:~/github/cuda_event_benchmark/build$ CUDA_VISIBLE_DEVICES=3 ./cuda_event_bench
2020-06-18T18:57:26-07:00
Running ./cuda_event_bench
Run on (80 X 3600 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x40)
L1 Instruction 32 KiB (x40)
L2 Unified 256 KiB (x40)
L3 Unified 51200 KiB (x2)
Load Average: 1.85, 1.80, 1.17
------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
------------------------------------------------------------------------------------
BM_EventCreate 782 us 782 us 923 items_per_second=1.27941M/s
BM_EventCreate 422 us 422 us 1558 items_per_second=2.36813M/s
BM_EventPool 14.6 us 14.6 us 48904 items_per_second=68.6765M/s
BM_EventPool 13.0 us 13.0 us 53373 items_per_second=76.659M/s
BM_EventRecord 2499 us 2499 us 278 items_per_second=400.15k/s
BM_EventRecord 244 us 244 us 2762 items_per_second=4.09725M/s
BM_EventQuery 1046 us 1046 us 707 items_per_second=956.295k/s
BM_EventQuery 1016 us 1016 us 665 items_per_second=984.674k/s
BM_StreamWaitEvent 258 us 258 us 2706 items_per_second=3.88102M/s
BM_StreamWaitEvent 254 us 254 us 2752 items_per_second=3.93252M/s
BM_EventDestroy 121 us 121 us 5894 items_per_second=8.28793M/s
BM_EventDestroy 119 us 119 us 5959 items_per_second=8.4334M/s
```