An open API service indexing awesome lists of open source software.

https://github.com/guoriyue/warp-from-device


https://github.com/guoriyue/warp-from-device

cuda-programming warp

Last synced: 3 months ago
JSON representation

Awesome Lists containing this project

README

        

Call [NVIDIA Warp](https://github.com/nvidia/warp) kernels from device

Install Warp

```
pip install numpy
git clone https://github.com/NVIDIA/warp.git
cd warp
python build_lib.py --cuda_path=/usr/local/warp
pip install -e .
```

Run this Warp Python example to jit compile the example_add_float_array.py
```
python3 float_arrays_add.py
```

Note the Kernel cache path, it will contain the generated CUDA code.
Warp uses a Python->C++/CUDA compilation model that generates kernel code from Python function definitions.
Copy the CUDA file and save it in current directory.

```
nvcc call_wp_add_float.cu -rdc=true -lcudadevrt
./a.out
```

In call_wp_add_float.cu, we need to include:

```
#include "warp/warp/native/builtin.h"
#include "wp_float_arrays_add.cu"
```

To call the Warp kernel function, we need to convert our data into Warp variables.
```
extern "C" __global__ void add_float_arrays_cuda_kernel_forward(
wp::launch_bounds_t dim,
wp::array_t var_dest,
wp::array_t var_a,
wp::array_t var_b)
```

If we only have the precompiled PTX file, we need to declare add_float_arrays_cuda_kernel_forward as an external function and perform a dry run to save the compilation commands. Afterward, we need to modify the intermediate PTX file during the compilation process using the PTX file generated by our Warp code.

```
nvcc call_wp_add_float.cu -rdc=true -lcudadevrt -dryrun
```

Then run everything before:

```
ptxas -arch=sm_52 -m64 --compile-only "/tmp/tmpxft_0024aebf_00000000-6_call_wp_add_float.ptx" -o "/tmp/tmpxft_0024aebf_00000000-10_call_wp_add_float.sm_52.cubin"
```

Remove the undefined function 'add_float_arrays_cuda_kernel_forward' in the file '/tmp/tmpxft_0024aebf_00000000-6_call_wp_add_float.ptx' and replace it with the precompiled Warp PTX code. Due to the Static Single Assignment (SSA) form, we may want to rename the values to ensure correctness, though I have tested it without renaming, and it still works.

Then run the ptxas command and everything after to get a.out.