https://github.com/Lyn-liyuan/ndarray-cuda-matmul

a high-performance computing solution designed to accelerate matrix operations using Nvidia's CUDA technology with Rust's ndarray data structure.
https://github.com/Lyn-liyuan/ndarray-cuda-matmul

Last synced: 3 months ago
JSON representation

a high-performance computing solution designed to accelerate matrix operations using Nvidia's CUDA technology with Rust's ndarray data structure.

Host: GitHub
URL: https://github.com/Lyn-liyuan/ndarray-cuda-matmul
Owner: Lyn-liyuan
License: mit
Created: 2024-04-28T19:57:27.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-05-18T08:16:25.000Z (over 1 year ago)
Last Synced: 2024-09-25T16:09:12.119Z (about 1 year ago)
Language: Rust
Size: 176 KB
Stars: 24
Watchers: 2
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-rust-list - Lyn-liyuan/ndarray-cuda-matmul - liyuan/ndarray-cuda-matmul?style=social"/> : a high-performance computing solution designed to accelerate matrix operations using Nvidia's CUDA technology with Rust's ndarray data structure. (GPU Programming)

README

          # Ndarray CUDA Matrix Operations

Welcome to the Ndarray CUDA Matrix Operations library, a high-performance computing solution designed to accelerate matrix operations using Nvidia's CUDA technology with Rust's ndarray data structure. This library leverages the powerful cuBLAS library to perform efficient matrix multiplications on compatible Nvidia GPUs.

For ideas and methods of creating this library, please refer to:

[Improving ndrray matrix computing performance with CUDA](https://medium.com/@yuanli13/improving-ndrray-matrix-computing-performance-with-cuda-cce63a749643)

## Features

- Seamless integration with Rust's ndarray crate.

- High-performance matrix operations utilizing CUDA.

- Support for one-dimensional and two-dimensional arrays.

- Automatic memory management between host and device.

- Simple and intuitive API mirroring that of ndarray.

## Prerequisites

To use this library, you will need:

- g++ 7 or above

- rustc 1.77.2

- cuda 10 or above

## Usage

First, ensure that you have initialized the CUDA environment by calling init_cublas() before any matrix operations, and call destory_cublas() to clean up resources upon completion:

```Rust

extern crate ndarray_cuda_matmul;

use ndarray_cuda_matmul::{init_cublas, destory_cublas};

fn main() {

    // Initialize cublas context

    init_cublas();

    // Your matrix operations here

    // Clean up cublas context

    destory_cublas();

}

```

To perform matrix multiplication, use the cuda_dot method provided by the trait CudaDot implemented for ndarray’s ArrayBase:

```Rust

use ndarray::Array;

use ndarray_cuda_matmul::CudaDot;

let a = Array::from_shape_vec((m, k), vec![...]).unwrap();

let b = Array::from_shape_vec((k, n), vec![...]).unwrap();

let result = a.cuda_dot(&b);

```

Here m, n, and k represent the dimensions of the matrices, and vec![...] should be replaced with your actual data.

Using the method of first copying the matrix into GPU memory, here's a code example

```Rust

let a = array![[1.0_f32, 2.0_f32, 3.0_f32], [4.0_f32, 5.0_f32, 6.0_f32]];

let b = array![[1.0_f32, 2.0_f32], [3.0_f32, 4.0_f32], [5.0_f32, 6.0_f32]];

let c = array![[1.0f32,1.0f32],[1.0f32,1.0f32]];

init_cublas();

let out = a.to_device().dot(&b.to_device()).dot(&c.to_device()).to_host();

destory_cublas();

```

Matrix-scalar multiplication code example:

```Rust

    init_cublas();

    let out = a

        .to_device()

        .dot(&b.to_device())

        .dot(&c.to_device())

        .mul_scalar(2.0_f32)

        .to_host();

    destory_cublas();

```

Matrix inversion code example:

```Rust

let a = array![[1.0_f32, 2.0_f32, 3.0_f32, 4.0_f32],

                [2.0_f32, 3.0_f32, 1.0_f32, 2.0_f32],

                [1.0_f32, 1.0_f32, 1.0_f32, -1.0_f32],

                [1.0_f32, 0.0_f32, -2.0_f32, -6.0_f32],

            ];

    

init_cublas();

let out = a.to_device().inv().to_host();

destory_cublas();

```

Using run macro can simplify the code and write it like a mathematical expression. The following is an example of using run macro.

```Rust

fn least_squares_method()

{

    let x = array![[1f32, 1f32], [1f32, 2f32], [1f32, 3f32], [1f32, 4f32]];

    let y = array![[6f32], [5f32], [7f32], [10f32]];

    let bate_hat = run!(x,y => {

        let x_t = x.t();

        x_t.dot(x).inv().dot(&x_t).dot(y)

    }).to_host();

    println!("{:?}",bate_hat);

}

```

The example code implements the least squares method using the code that is most similar to the mathematical expression.

$$(X^TX)^{-1}X^Ty$$

V.S.

```Rust

x_t.dot(x).inv().dot(&x_t).dot(y)

```

## Safety and Error Handling

This library uses unsafe code to interface with CUDA functions. It includes error handling that checks the status of each CUDA and cuBLAS call, ensuring that any errors are handled gracefully and reported appropriately.

## Performance

The performance test was conducted using the following code, comparing the dot method provided by ndarray-linalg

```Rust

fn dot_with_ndarry() {

    let a = Array::from_elem((H_SIZE, H_SIZE), 1.0_f32);

    let b = Array::from_elem((H_SIZE, V_SIZE), 1.0_f32);

    let start = Instant::now();

    for _ in 0..100 {

        let _ = a.dot(&b);

    }

    println!("ndarray dot elapsed: {:.2?}", start.elapsed());

}

fn dot_with_cuda() {

    let a = Array::from_elem((H_SIZE, H_SIZE), 1.0_f32);

    let b = Array::from_elem((H_SIZE, V_SIZE), 1.0_f32);

    let start = Instant::now();

    for _ in 0..100 {

        let _ = a.cuda_dot(&b);

    }

    println!("matmul elapsed: {:.2?}", start.elapsed());

}

```

Comparing result:

|Rows|columns|run times|ndarra-linalg|cuda_dot|

|----|----|----|----|----|

|64|64|100|2.27ms|9.89ms|

|128|80|100|11.37ms|10.66ms|

|768|128|100|438.01ms|57.86ms|

|2048|1000|100|22800ms|323.30ms|

## Contribution

Contributions to this library are welcome! Whether it's through reporting issues, proposing new features, improving documentation, or submitting pull requests, all forms of contribution are encouraged.

## License

This library is distributed under the MIT license.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/Lyn-liyuan/ndarray-cuda-matmul

Awesome Lists containing this project

README