https://github.com/puzzlef/vector-max-cuda
Performance of sequential vs CUDA-based vector element max.
https://github.com/puzzlef/vector-max-cuda
basics cuda element experiment max vector
Last synced: about 1 month ago
JSON representation
Performance of sequential vs CUDA-based vector element max.
- Host: GitHub
- URL: https://github.com/puzzlef/vector-max-cuda
- Owner: puzzlef
- License: mit
- Created: 2022-10-26T18:40:34.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2025-04-08T18:02:50.000Z (about 1 year ago)
- Last Synced: 2025-09-05T15:28:19.972Z (10 months ago)
- Topics: basics, cuda, element, experiment, max, vector
- Language: C++
- Homepage:
- Size: 57.6 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
README
Comparing performance of *sequential* vs *CUDA-based* **vector element max**.
For each experiment given below, we attempt each approach on a number of vector
sizes, running each approach 5 times per size to get a good time measure. Note
that time taken to copy data back and forth from the GPU is not measured, and
the sequential approach does not make use of *SIMD instructions*. The experiments
are done with guidance from [Prof. Kishore Kothapalli] and
[Prof. Dip Sankar Banerjee].
### Comparison with Sequential approach
This experiment ([compare-sequential], [main]) compares the performance
between finding `max(x)` using a single thread (**sequential**) and using
**CUDA** (*not power-of-2* and *power-of-2* reduce). Here `x` is a 32-bit
integer vector. While it might seem that **CUDA** approach would be a clear
winner, the results indicate it is dependent upon the workload. Results indicate
that **from 10^5 elements, CUDA approach performs better** than sequential.
Both CUDA approaches (*not power-of-2*/*power-of-2* reduce) seem to have
similar performance. All outputs are saved in a [gist]. Some [charts] are also
included below, generated from [sheets].
[][sheetp]
[compare-sequential]: https://github.com/puzzlef/vector-max-cuda/tree/compare-sequential
[main]: https://github.com/puzzlef/vector-max-cuda
## References
- [CUDA by Example :: Jason Sanders, Edward Kandrot](https://gist.github.com/wolfram77/72c51e494eaaea1c21a9c4021ad0f320)
- [Managed memory vs cudaHostAlloc - TK1](https://forums.developer.nvidia.com/t/managed-memory-vs-cudahostalloc-tk1/34281)
- [How to enable C++17 code generation in VS2019 CUDA project](https://stackoverflow.com/a/63057409/1413259)
- ["More than one operator + matches these operands" error](https://stackoverflow.com/a/10343618/1413259)
- [How to import VSCode keybindings into Visual Studio?](https://stackoverflow.com/a/62417446/1413259)
- [Explicit conversion constructors (C++ only)](https://www.ibm.com/docs/en/i/7.3?topic=only-explicit-conversion-constructors-c)
- [Configure X11 Forwarding with PuTTY and Xming](https://www.centlinux.com/2019/01/configure-x11-forwarding-putty-xming-windows.html)
- [code-server setup and configuration](https://coder.com/docs/code-server/latest/guide)
- [Installing snap on CentOS](https://snapcraft.io/docs/installing-snap-on-centos)
[](https://www.youtube.com/watch?v=E0_Ic1P-Hzg)
[](https://puzzlef.github.io)
[](https://zenodo.org/badge/latestdoi/558019967)

[Prof. Dip Sankar Banerjee]: https://sites.google.com/site/dipsankarban/
[Prof. Kishore Kothapalli]: https://faculty.iiit.ac.in/~kkishore/
[gist]: https://gist.github.com/wolfram77/57ea86e0e71fb88f2dfd925b7fb753cd
[charts]: https://imgur.com/a/AO4iYAB
[sheets]: https://docs.google.com/spreadsheets/d/1TSEh0slMEZg47Rp01LzoPVvG9kVJZLP2RbGJdwsqmP0/edit?usp=sharing
[sheetp]: https://docs.google.com/spreadsheets/d/e/2PACX-1vTOsNQOXDX3K7nQ256HHwKRnIydERHPoYA7IFmNlH58pTQb7sGBSMu1fAjA-Tk_VEs4tfm9iXb22_FS/pubhtml