https://github.com/marcramonmoreno/cuda-saxpy-kernel

CUDA parallelized SAXPY kernel defined by the BLAS (Basic Lineal Algebra Subroutine)
https://github.com/marcramonmoreno/cuda-saxpy-kernel

Last synced: about 2 months ago
JSON representation

CUDA parallelized SAXPY kernel defined by the BLAS (Basic Lineal Algebra Subroutine)

Host: GitHub
URL: https://github.com/marcramonmoreno/cuda-saxpy-kernel
Owner: MarcRamonMoreno
License: gpl-3.0
Created: 2023-11-07T11:47:37.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2023-11-16T06:43:50.000Z (over 1 year ago)
Last Synced: 2025-02-12T17:59:49.846Z (4 months ago)
Language: Cuda
Homepage:
Size: 616 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # CUDA SAXPY Kernel Implementation

Overview

This code provides a CUDA implementation of the SAXPY operation, a common computation in linear algebra. SAXPY stands for "Single-Precision A*X Plus Y", where A, X, and Y are vectors. The code performs this operation in parallel on a GPU using CUDA, NVIDIA's parallel computing platform and programming model.

Requirements

    NVIDIA GPU with CUDA support

    CUDA Toolkit installed on your system

Description

The code is divided into two main parts:

    CUDA Kernel (saxpy_parallel): This function runs on the GPU and performs the SAXPY computation. Each thread computes a single element of the result vector.

    Host Code (main function): This part runs on the CPU. It allocates memory on the GPU, copies data from the host to the GPU, invokes the CUDA kernel, and then copies the result back to the host.

Key Components

    saxpy_parallel: CUDA kernel function for the SAXPY operation.

    main: Host function to set up CUDA environment and invoke the kernel.

    n: Size of the input vectors.

    alpha: Scalar value in the SAXPY operation.

    x, y: Input vectors.

    d_x, d_y: Device (GPU) copies of x and y.

Usage

    Compile the code using the CUDA compiler (nvcc).

    Run the executable. It will perform the SAXPY operation and print the results.

Important Considerations

    The code assumes a certain size for the vectors (n = 2000), but this can be adjusted as needed.

    The number of threads per block is set to 256, but this value can be tuned for different GPUs.

    Memory allocation and free operations are crucial to avoid memory leaks.

    Error checks for CUDA operations (like cudaMalloc, cudaMemcpy) are not included but are recommended for robust code.

Conclusion

This CUDA SAXPY implementation demonstrates basic GPU programming concepts, including memory management, kernel invocation, and parallel computation. It serves as a starting point for more complex GPU-accelerated applications.

Before running the make command, ensure that the makefile has the correct paths and instructions for your system's architecture and the intended CUDA toolkit version.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/marcramonmoreno/cuda-saxpy-kernel

Awesome Lists containing this project

README