https://github.com/pwhiddy/webgpu-atomics-benchmark

Atomics Benchmark using WebGPU
https://github.com/pwhiddy/webgpu-atomics-benchmark

Last synced: 7 months ago
JSON representation

Atomics Benchmark using WebGPU

Host: GitHub
URL: https://github.com/pwhiddy/webgpu-atomics-benchmark
Owner: PWhiddy
License: mit
Created: 2024-04-03T21:21:16.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-10-29T20:23:20.000Z (11 months ago)
Last Synced: 2025-03-17T19:11:37.881Z (7 months ago)
Language: HTML
Homepage: https://pwhiddy.github.io/webgpu-atomics-benchmark/
Size: 29.3 KB
Stars: 5
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

### WebGPU Atomics Benchmark

A simple test of the throughput of atomics on your gpu using webgpu.

While building a very custom GPU memory allocator for my game engine, I've been relying heavily on atomics. This was inspired by the old [CUDA blog post on warp-aggregated atomics](https://developer.nvidia.com/blog/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/) which demonstrated that compiler magic can counterintuitively make certain GPU atomics extremely fast (faster than a CPU). I've been very curious to know if the same holds true for modern APIs and GPUs in general. Results indicate that even on non-nvidia systems and high level APIs such as WebGPU, these optimizations are clearly available!

*PRs adding results for your GPU are welcome!*

----

Current configuration is 32 atomic adds per thread, launching a total of 15M threads, all writing to a single global memory address.

| GPU | Max Bandwidth | Ops/s | Bandwidth Utilization* |
|----- | ----- | ----- | ----- |
|M1 Max | 400 GB/s | 20B | 40% |
| RTX 4090 | 1008 GB/s | 62B | 49% |

*This may not be actual global memory utilization, but the utilization that would be required if operations were not aggregated prior to global memory.

----
### Find out your GPU's performance

1. Go to https://pwhiddy.github.io/webgpu-atomics-benchmark/

2. Copy the result: `Operations per second`

3. Calculate results using this formula:

```python
operations_per_second = # your result here
gpu_max_bandwidth = # your gpu max bandwidth (look this up online)
# 1 read + 1 write for a 4 byte u32
bandwidth_utilized = ((operations_per_second * 4 * 2) / gpu_max_bandwidth) * 100
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/pwhiddy/webgpu-atomics-benchmark

Awesome Lists containing this project

README