Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/harrism/cuda_event_benchmark

Unit benchmarks of CUDA event APIs.
https://github.com/harrism/cuda_event_benchmark

benchmarks cuda

Last synced: 3 months ago
JSON representation

Unit benchmarks of CUDA event APIs.

Awesome Lists containing this project

README

        

# CUDA Event Benchmarks

Simple benchmarks of `cudaEvent_t` APIs:

- `cudaEventCreate`
- `cudaEventRecord`
- `cudaEventQuery`
- `cudaStreamWaitEvent`
- `cudaEventDestroy`
- A simulated event pool that maintains a list of free events (more of a benchmark of `std::list`
push/pop for cost comparison to `cudaEventCreate`).

Each test is performed once using default-created events (support timing) and once with events that
do not support timing.

Here are the results from running with a single GPU (`CUDA_VISIBLE_DEVICES` is set to only that GPU)
of an NVIDIA DGX1 (with Tesla V100 GPUs with 32GB each).
- OS: `Ubuntu 18.04`.
- CUDA: `10.2`.
- NVIDIA Driver: `440.64.00`.

```
(cudf_dev_10.2) mharris@dgx02:~/github/cuda_event_benchmark/build$ CUDA_VISIBLE_DEVICES=3 ./cuda_event_bench
2020-06-18T18:57:26-07:00
Running ./cuda_event_bench
Run on (80 X 3600 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x40)
L1 Instruction 32 KiB (x40)
L2 Unified 256 KiB (x40)
L3 Unified 51200 KiB (x2)
Load Average: 1.85, 1.80, 1.17
------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
------------------------------------------------------------------------------------
BM_EventCreate 782 us 782 us 923 items_per_second=1.27941M/s
BM_EventCreate 422 us 422 us 1558 items_per_second=2.36813M/s
BM_EventPool 14.6 us 14.6 us 48904 items_per_second=68.6765M/s
BM_EventPool 13.0 us 13.0 us 53373 items_per_second=76.659M/s
BM_EventRecord 2499 us 2499 us 278 items_per_second=400.15k/s
BM_EventRecord 244 us 244 us 2762 items_per_second=4.09725M/s
BM_EventQuery 1046 us 1046 us 707 items_per_second=956.295k/s
BM_EventQuery 1016 us 1016 us 665 items_per_second=984.674k/s
BM_StreamWaitEvent 258 us 258 us 2706 items_per_second=3.88102M/s
BM_StreamWaitEvent 254 us 254 us 2752 items_per_second=3.93252M/s
BM_EventDestroy 121 us 121 us 5894 items_per_second=8.28793M/s
BM_EventDestroy 119 us 119 us 5959 items_per_second=8.4334M/s
```