https://github.com/avicted/gpu-time-measurement

Cuda Template for AMD GPUs on Linux
https://github.com/avicted/gpu-time-measurement

Last synced: 12 months ago
JSON representation

Cuda Template for AMD GPUs on Linux

Host: GitHub
URL: https://github.com/avicted/gpu-time-measurement
Owner: Avicted
License: mit
Created: 2022-08-06T21:12:59.000Z (almost 4 years ago)
Default Branch: master
Last Pushed: 2023-07-04T11:27:33.000Z (almost 3 years ago)
Last Synced: 2025-01-13T03:42:09.602Z (over 1 year ago)
Language: Cuda
Homepage:
Size: 6.84 KB
Stars: 0
Watchers: 2
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Cuda Template for AMD GPUs on Linux

The template invokes an empty kernel and measures the amount of time it

takes for the kernel to start up, execute and end.

```bash

# Install the needed packages

sudo pacman -S opencl-amd opencl-amd-dev

# Add the ROCm compiler and scripts and executable to the user path

# Typically inside of ~/.bashrc or ~/.zshrc

# source the config file (source ~/.zshrc) or start a new terminal session after

export PATH="/opt/rocm-5.5.0/bin:$PATH" # <--- change version to the installed

```

## Run

```bash

make

```

## Example output

```bash

Cleaning

rm -r build 2> /dev/null || true

rm -r code/*.cu 2> /dev/null || true

Creating directories

mkdir -p build

Hipifying the Cuda C++ code to HIP C++ code

hipify-perl ./code/main.cpp -o ./code/main.cpp.hip.cu

Building the program

hipcc -O3  ./code/main.cpp.hip.cu -o ./build/gpu_signal_processing.out

Running the executable

./build/gpu_signal_processing.out

        Starting the program

   Found 1 CUDA devices

         Device AMD Radeon RX 6900 XT                    = device 0

         compute capability           =         10.3

         totalGlobalMemory            =        17.16 GB

         l2CacheSize                  =     4194304 B

         regsPerBlock                 =       65536

         multiProcessorCount          =          40

         maxThreadsPerMultiprocessor  =        2048

         sharedMemPerBlock            =       65536 B

         warpSize                     =          32

         clockRate                    =     2660.00 MHz

         maxThreadsPerBlock           =        1024

         maxGridSize                  =    2147483647 x 2147483647 x 2147483647

         maxThreadsDim                =    1024 x 1024 x 1024

   Using CUDA device 0

====================================================================

blocksInGrid:	{1, 1, 1} blocks.

threadsInBlock:	1024 threads.

number of threads: 1024

        The program took 164043 microseconds

        The program took 164 milliseconds

        The program took 0.164043 seconds

        To execute the GPU kernel

The program has been built and runned successfully!

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/avicted/gpu-time-measurement

Awesome Lists containing this project

README