https://github.com/guoriyue/warp-from-device
https://github.com/guoriyue/warp-from-device
cuda-programming warp
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/guoriyue/warp-from-device
- Owner: guoriyue
- Created: 2024-02-03T18:18:19.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-02-05T06:04:40.000Z (over 1 year ago)
- Last Synced: 2025-02-13T06:35:55.228Z (5 months ago)
- Topics: cuda-programming, warp
- Language: Cuda
- Homepage:
- Size: 1.71 MB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Call [NVIDIA Warp](https://github.com/nvidia/warp) kernels from device
Install Warp
```
pip install numpy
git clone https://github.com/NVIDIA/warp.git
cd warp
python build_lib.py --cuda_path=/usr/local/warp
pip install -e .
```Run this Warp Python example to jit compile the example_add_float_array.py
```
python3 float_arrays_add.py
```Note the Kernel cache path, it will contain the generated CUDA code.
Warp uses a Python->C++/CUDA compilation model that generates kernel code from Python function definitions.
Copy the CUDA file and save it in current directory.```
nvcc call_wp_add_float.cu -rdc=true -lcudadevrt
./a.out
```In call_wp_add_float.cu, we need to include:
```
#include "warp/warp/native/builtin.h"
#include "wp_float_arrays_add.cu"
```To call the Warp kernel function, we need to convert our data into Warp variables.
```
extern "C" __global__ void add_float_arrays_cuda_kernel_forward(
wp::launch_bounds_t dim,
wp::array_t var_dest,
wp::array_t var_a,
wp::array_t var_b)
```If we only have the precompiled PTX file, we need to declare add_float_arrays_cuda_kernel_forward as an external function and perform a dry run to save the compilation commands. Afterward, we need to modify the intermediate PTX file during the compilation process using the PTX file generated by our Warp code.
```
nvcc call_wp_add_float.cu -rdc=true -lcudadevrt -dryrun
```Then run everything before:
```
ptxas -arch=sm_52 -m64 --compile-only "/tmp/tmpxft_0024aebf_00000000-6_call_wp_add_float.ptx" -o "/tmp/tmpxft_0024aebf_00000000-10_call_wp_add_float.sm_52.cubin"
```Remove the undefined function 'add_float_arrays_cuda_kernel_forward' in the file '/tmp/tmpxft_0024aebf_00000000-6_call_wp_add_float.ptx' and replace it with the precompiled Warp PTX code. Due to the Static Single Assignment (SSA) form, we may want to rename the values to ensure correctness, though I have tested it without renaming, and it still works.
Then run the ptxas command and everything after to get a.out.