https://github.com/reuben-sun/pybind-cuda-demo
一个 基于pybind11实现python调用cuda C++接口 的示例
https://github.com/reuben-sun/pybind-cuda-demo
cpp cuda pybind11 python pytorch
Last synced: 3 months ago
JSON representation
一个 基于pybind11实现python调用cuda C++接口 的示例
- Host: GitHub
- URL: https://github.com/reuben-sun/pybind-cuda-demo
- Owner: Reuben-Sun
- License: mit
- Created: 2025-12-22T05:12:47.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2025-12-22T05:17:56.000Z (6 months ago)
- Last Synced: 2025-12-23T16:53:17.810Z (6 months ago)
- Topics: cpp, cuda, pybind11, python, pytorch
- Language: Cuda
- Homepage:
- Size: 3.91 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# pybind-cuda-demo
基于pybind11实现python调用cuda C++接口
## install
```bash
git clone https://github.com/Reuben-Sun/pybind-cuda-demo.git
cd pybind-cuda-demo
pip install -e . --no-build-isolation
```
## test
```python
import torch
import time
import cuda_demo_ext # 导入我们编译好的 C++ 扩展
def run_benchmark():
# 数据规模:5000万个元素 (约 200MB 数据)
size = 50 * 1000 * 1000
print(f"Testing with vector size: {size}")
# 准备数据
a_cpu = torch.rand(size, dtype=torch.float32)
b_cpu = torch.rand(size, dtype=torch.float32)
scalar = 5.0
a_gpu = a_cpu.cuda()
b_gpu = b_cpu.cuda()
# --- 1. CPU 基准 (使用 PyTorch CPU,已经优化过,但受限于内存带宽和计算力) ---
start = time.time()
res_cpu = a_cpu + b_cpu + scalar
end = time.time()
print(f"PyTorch CPU time: {(end - start) * 1000:.4f} ms")
# --- 2. 自定义 CUDA Kernel ---
# 预热 GPU
_ = cuda_demo_ext.vector_add(a_gpu, b_gpu, scalar)
torch.cuda.synchronize()
start = time.time()
res_custom = cuda_demo_ext.vector_add(a_gpu, b_gpu, scalar)
torch.cuda.synchronize() # 等待 GPU 完成
end = time.time()
print(f"Custom CUDA Kernel: {(end - start) * 1000:.4f} ms")
# --- 验证正确性 ---
# 将 GPU 结果转回 CPU 对比
if torch.allclose(res_cpu, res_custom.cpu(), atol=1e-5):
print("✅ Results match!")
else:
print("❌ Results mismatch!")
if __name__ == "__main__":
run_benchmark()
```