https://github.com/puzzlef/vector-multiplication-cuda

Comparing approaches for CUDA-based vector multiplication.
https://github.com/puzzlef/vector-multiplication-cuda

algorithm cuda map multiply operation pagerank primitive

Last synced: 3 months ago
JSON representation

Comparing approaches for CUDA-based vector multiplication.

Host: GitHub
URL: https://github.com/puzzlef/vector-multiplication-cuda
Owner: puzzlef
License: mit
Created: 2021-06-08T16:20:25.000Z (about 5 years ago)
Default Branch: main
Last Pushed: 2025-04-08T18:00:13.000Z (over 1 year ago)
Last Synced: 2025-09-05T15:28:36.446Z (10 months ago)
Topics: algorithm, cuda, map, multiply, operation, pagerank, primitive
Language: C++
Homepage: https://gist.github.com/wolfram77/4ef16ab9699ac03a617b8731dd240e1f
Size: 218 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff

Awesome Lists containing this project

README

          Comparing approaches for *CUDA-based* **vector multiplication**.

In each of the experiments given below, we multiply two floating-point vectors

`x` and `y`, with number of **elements** from `10^6` to `10^9` using OpenMP.

Each element count is attempted with various approaches, running each approach 5

times to get a good time measure. Multiplication here represents any

memory-aligned independent operation, or a `map()` operation.




### Adjusting Launch config

In this experiment ([adjust-launch]), we multiply two floating-point vectors `x`

and `y` using CUDA. Each element count is attempted with various **CUDA launch**

**configs**. Results indicate that a **grid_limit** of `16384/32768`, and a

**block_size** of `128/256` to be suitable for both **float** and **double**.

Using a **grid_limit** of `MAX` and a **block_size** of `256` could be a decent

choice.

[adjust-launch]: https://github.com/puzzlef/vector-multiplication-cuda/tree/adjust-launch




### Adjusting Thread duty

In this experiment ([adjust-duty]), we compare various *per-thread duty numbers*

for CUDA-based vector multiplication. Each element count is attempted with

various CUDA launch configs and per-thread duties. Results indicate no

significant difference between [adjust-launch] approach, and this one.

[adjust-duty]: https://github.com/puzzlef/vector-multiplication-cuda/tree/adjust-duty







## References

- [CUDA by Example :: Jason Sanders, Edward Kandrot](https://www.slideshare.net/SubhajitSahu/cuda-by-example-notes)

- [Git pulling a branch from another repository?](https://stackoverflow.com/a/46289324/1413259)







[![](https://i.imgur.com/azEBS7Y.png)](https://www.youtube.com/watch?v=vTdodyhhjww)

[![ORG](https://img.shields.io/badge/org-puzzlef-green?logo=Org)](https://puzzlef.github.io)

[![DOI](https://zenodo.org/badge/375073607.svg)](https://zenodo.org/badge/latestdoi/375073607)

![](https://ga-beacon.deno.dev/G-KD28SG54JQ:hbAybl6nQFOtmVxW4if3xw/github.com/puzzlef/vector-multiplication-cuda)

[Prof. Dip Sankar Banerjee]: https://sites.google.com/site/dipsankarban/

[Prof. Kishore Kothapalli]: https://faculty.iiit.ac.in/~kkishore/

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/puzzlef/vector-multiplication-cuda

Awesome Lists containing this project

README