https://github.com/ironjr/minimal-cuda-pytorch
Repository-level snippet for minimal implementation of a PyTorch CUDA extension.
https://github.com/ironjr/minimal-cuda-pytorch
cuda minimal pytorch
Last synced: about 2 months ago
JSON representation
Repository-level snippet for minimal implementation of a PyTorch CUDA extension.
- Host: GitHub
- URL: https://github.com/ironjr/minimal-cuda-pytorch
- Owner: ironjr
- License: mit
- Created: 2025-06-24T07:59:00.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2025-06-24T08:12:29.000Z (12 months ago)
- Last Synced: 2026-05-04T22:42:36.721Z (about 2 months ago)
- Topics: cuda, minimal, pytorch
- Language: Python
- Homepage:
- Size: 8.79 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Minimal PyTorch CUDA Extension Example
This is a minimal implementation of a PyTorch model that uses a single CUDA function written in C++ CUDA.
### Usage
Just clone this repository and start building custom CUDA extensions for PyTorch.
---
## Generated Instructions
### Files
- `cuda_kernels.cu` - CUDA kernel implementation (element-wise addition)
- `bindings.cpp` - Python bindings using pybind11
- `setup.py` - Build configuration
- `model.py` - PyTorch model that uses the custom CUDA function
- `test.py` - Test script to verify the CUDA function works correctly
### Requirements
- PyTorch with CUDA support
- CUDA toolkit
- C++ compiler (gcc/g++ on Linux, MSVC on Windows)
### Build Instructions
1. Navigate to this directory:
```bash
cd minimal_cuda_example
```
2. Build and install the extension:
```bash
python setup.py install
```
Or for development (builds in-place):
```bash
python setup.py build_ext --inplace
```
### Usage
- Test the CUDA function
```bash
python test.py
```
- Run the model example
```bash
python model.py
```
### How it works
1. **CUDA Kernel** (`cuda_kernels.cu`): Implements a simple element-wise addition kernel that runs on GPU
2. **Python Bindings** (`bindings.cpp`): Uses pybind11 to expose the CUDA function to Python
3. **PyTorch Integration** (`model.py`): Wraps the CUDA function in a `torch.autograd.Function` for seamless integration with PyTorch's automatic differentiation
4. **Model Usage**: The custom CUDA function is used within a standard PyTorch model
### Extending this example
You can extend this example by:
- Adding more complex CUDA kernels
- Implementing custom backward passes
- Adding support for different data types
- Optimizing memory access patterns
- Adding error checking and validation
### Notes
- This example uses float32 tensors only
- The CUDA kernel is optimized for simplicity, not performance
- Error checking is minimal - production code should have more robust error handling