https://github.com/puzzlef/vector-multiplication-cuda
Comparing approaches for CUDA-based vector multiplication.
https://github.com/puzzlef/vector-multiplication-cuda
algorithm cuda map multiply operation pagerank primitive
Last synced: about 1 month ago
JSON representation
Comparing approaches for CUDA-based vector multiplication.
- Host: GitHub
- URL: https://github.com/puzzlef/vector-multiplication-cuda
- Owner: puzzlef
- License: mit
- Created: 2021-06-08T16:20:25.000Z (almost 5 years ago)
- Default Branch: main
- Last Pushed: 2025-04-08T18:00:13.000Z (about 1 year ago)
- Last Synced: 2025-09-05T15:28:36.446Z (9 months ago)
- Topics: algorithm, cuda, map, multiply, operation, pagerank, primitive
- Language: C++
- Homepage: https://gist.github.com/wolfram77/4ef16ab9699ac03a617b8731dd240e1f
- Size: 218 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
README
Comparing approaches for *CUDA-based* **vector multiplication**.
In each of the experiments given below, we multiply two floating-point vectors
`x` and `y`, with number of **elements** from `10^6` to `10^9` using OpenMP.
Each element count is attempted with various approaches, running each approach 5
times to get a good time measure. Multiplication here represents any
memory-aligned independent operation, or a `map()` operation.
### Adjusting Launch config
In this experiment ([adjust-launch]), we multiply two floating-point vectors `x`
and `y` using CUDA. Each element count is attempted with various **CUDA launch**
**configs**. Results indicate that a **grid_limit** of `16384/32768`, and a
**block_size** of `128/256` to be suitable for both **float** and **double**.
Using a **grid_limit** of `MAX` and a **block_size** of `256` could be a decent
choice.
[adjust-launch]: https://github.com/puzzlef/vector-multiplication-cuda/tree/adjust-launch
### Adjusting Thread duty
In this experiment ([adjust-duty]), we compare various *per-thread duty numbers*
for CUDA-based vector multiplication. Each element count is attempted with
various CUDA launch configs and per-thread duties. Results indicate no
significant difference between [adjust-launch] approach, and this one.
[adjust-duty]: https://github.com/puzzlef/vector-multiplication-cuda/tree/adjust-duty
## References
- [CUDA by Example :: Jason Sanders, Edward Kandrot](https://www.slideshare.net/SubhajitSahu/cuda-by-example-notes)
- [Git pulling a branch from another repository?](https://stackoverflow.com/a/46289324/1413259)
[](https://www.youtube.com/watch?v=vTdodyhhjww)
[](https://puzzlef.github.io)
[](https://zenodo.org/badge/latestdoi/375073607)

[Prof. Dip Sankar Banerjee]: https://sites.google.com/site/dipsankarban/
[Prof. Kishore Kothapalli]: https://faculty.iiit.ac.in/~kkishore/