An open API service indexing awesome lists of open source software.

https://github.com/marcramonmoreno/cuda-saxpy-kernel

CUDA parallelized SAXPY kernel defined by the BLAS (Basic Lineal Algebra Subroutine)
https://github.com/marcramonmoreno/cuda-saxpy-kernel

Last synced: about 2 months ago
JSON representation

CUDA parallelized SAXPY kernel defined by the BLAS (Basic Lineal Algebra Subroutine)

Awesome Lists containing this project

README

        

# CUDA SAXPY Kernel Implementation
Overview

This code provides a CUDA implementation of the SAXPY operation, a common computation in linear algebra. SAXPY stands for "Single-Precision A*X Plus Y", where A, X, and Y are vectors. The code performs this operation in parallel on a GPU using CUDA, NVIDIA's parallel computing platform and programming model.
Requirements

NVIDIA GPU with CUDA support
CUDA Toolkit installed on your system

Description

The code is divided into two main parts:

CUDA Kernel (saxpy_parallel): This function runs on the GPU and performs the SAXPY computation. Each thread computes a single element of the result vector.

Host Code (main function): This part runs on the CPU. It allocates memory on the GPU, copies data from the host to the GPU, invokes the CUDA kernel, and then copies the result back to the host.

Key Components

saxpy_parallel: CUDA kernel function for the SAXPY operation.
main: Host function to set up CUDA environment and invoke the kernel.
n: Size of the input vectors.
alpha: Scalar value in the SAXPY operation.
x, y: Input vectors.
d_x, d_y: Device (GPU) copies of x and y.

Usage

Compile the code using the CUDA compiler (nvcc).
Run the executable. It will perform the SAXPY operation and print the results.

Important Considerations

The code assumes a certain size for the vectors (n = 2000), but this can be adjusted as needed.
The number of threads per block is set to 256, but this value can be tuned for different GPUs.
Memory allocation and free operations are crucial to avoid memory leaks.
Error checks for CUDA operations (like cudaMalloc, cudaMemcpy) are not included but are recommended for robust code.

Conclusion

This CUDA SAXPY implementation demonstrates basic GPU programming concepts, including memory management, kernel invocation, and parallel computation. It serves as a starting point for more complex GPU-accelerated applications.
Before running the make command, ensure that the makefile has the correct paths and instructions for your system's architecture and the intended CUDA toolkit version.