Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bd2720/accesspatterns
Comparing chunked vs. striped memory access patterns for CPU and GPU code using the CUDA toolkit in C.
https://github.com/bd2720/accesspatterns
c cache cuda cuda-toolkit performance-analysis performance-testing profiling
Last synced: 13 days ago
JSON representation
Comparing chunked vs. striped memory access patterns for CPU and GPU code using the CUDA toolkit in C.
- Host: GitHub
- URL: https://github.com/bd2720/accesspatterns
- Owner: bd2720
- Created: 2024-05-12T21:11:07.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2024-05-12T21:12:26.000Z (9 months ago)
- Last Synced: 2024-12-04T11:07:59.244Z (2 months ago)
- Topics: c, cache, cuda, cuda-toolkit, performance-analysis, performance-testing, profiling
- Language: Cuda
- Homepage:
- Size: 1.95 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.txt
Awesome Lists containing this project
README
access-cpu: Uses pthreads to demonstrate how chunked memory access is faster
than striped access on the CPU. This is because threads are scheduled for
a period of time on the CPU, where each one is scheduled after the next. This
means cache usage is maximized in a given thread when memory is accessed in
a sequential pattern (chunked). Striped access is slow because it only allows
a given thread to access a fraction (1 / NTHREADS) of each cache line.access-gpu: Uses CUDA to demonstrate how striped memory access is faster
than chunked access on the GPU. This is because GPU threads execute together
on a per-block basis. Since they share the same cache, an interleaved (striped)
memory access pattern will allow all threads in a block to read from the same
cache line.General Findings:
pthread speedup 1 -> 10 (bad access): 1.15x
pthread speedup 1 -> 10 (good access): 4-5xcuda speedup <<<1,1>>> -> <<<8,64>>> (bad access): 9x
cuda speedup <<<1,1>>> -> <<<8,64>>> (good access): 256x