An open API service indexing awesome lists of open source software.

https://github.com/reuben-sun/pybind-cuda-demo

一个 基于pybind11实现python调用cuda C++接口 的示例
https://github.com/reuben-sun/pybind-cuda-demo

cpp cuda pybind11 python pytorch

Last synced: 3 months ago
JSON representation

一个 基于pybind11实现python调用cuda C++接口 的示例

Awesome Lists containing this project

README

          

# pybind-cuda-demo
基于pybind11实现python调用cuda C++接口

## install

```bash
git clone https://github.com/Reuben-Sun/pybind-cuda-demo.git
cd pybind-cuda-demo
pip install -e . --no-build-isolation
```

## test

```python
import torch
import time
import cuda_demo_ext # 导入我们编译好的 C++ 扩展

def run_benchmark():
# 数据规模:5000万个元素 (约 200MB 数据)
size = 50 * 1000 * 1000
print(f"Testing with vector size: {size}")

# 准备数据
a_cpu = torch.rand(size, dtype=torch.float32)
b_cpu = torch.rand(size, dtype=torch.float32)
scalar = 5.0

a_gpu = a_cpu.cuda()
b_gpu = b_cpu.cuda()

# --- 1. CPU 基准 (使用 PyTorch CPU,已经优化过,但受限于内存带宽和计算力) ---
start = time.time()
res_cpu = a_cpu + b_cpu + scalar
end = time.time()
print(f"PyTorch CPU time: {(end - start) * 1000:.4f} ms")

# --- 2. 自定义 CUDA Kernel ---
# 预热 GPU
_ = cuda_demo_ext.vector_add(a_gpu, b_gpu, scalar)
torch.cuda.synchronize()

start = time.time()
res_custom = cuda_demo_ext.vector_add(a_gpu, b_gpu, scalar)
torch.cuda.synchronize() # 等待 GPU 完成
end = time.time()
print(f"Custom CUDA Kernel: {(end - start) * 1000:.4f} ms")

# --- 验证正确性 ---
# 将 GPU 结果转回 CPU 对比
if torch.allclose(res_cpu, res_custom.cpu(), atol=1e-5):
print("✅ Results match!")
else:
print("❌ Results mismatch!")

if __name__ == "__main__":
run_benchmark()
```