https://github.com/szagoruyko/cutorch-rtc
lua apply function for cutorch
https://github.com/szagoruyko/cutorch-rtc
Last synced: 5 months ago
JSON representation
lua apply function for cutorch
- Host: GitHub
- URL: https://github.com/szagoruyko/cutorch-rtc
- Owner: szagoruyko
- License: bsd-2-clause
- Created: 2015-03-20T13:01:07.000Z (about 10 years ago)
- Default Branch: master
- Last Pushed: 2017-01-05T17:23:25.000Z (over 8 years ago)
- Last Synced: 2025-01-01T18:35:06.859Z (5 months ago)
- Language: Lua
- Size: 48.8 KB
- Stars: 17
- Watchers: 5
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# cutorch-rtc
Basic feature list:
* cutorch.launchPTX function
* apply kernels from cutorchThis package brings CUDA 7 runtime compilation to Torch. Linux or OS X with C++11 compiler required.
Installation:
```
luarocks install https://raw.githubusercontent.com/szagoruyko/cutorch-rtc/master/cutorch-rtc-scm-1.rockspec
```
Then after requiring ```cutorch-rtc``` you will get ```launchPTX``` function, which can run ptx code generated with NVRTC, and ```cutorch.apply``` functions:
```lua
require 'cutorch-rtc'
t = torch.randn(8):cuda()
t:apply1'x = x < 0 ? 0 : x'
```
That would be a simple ReLU implementation.## Documentation
### cutorch.launchPTX
Runs compiled PTX.
```lua
function cutorch.launchPTX(ptx, kernel_name, arguments, gridDim, blockDim)
```
Arguments:
* ptx - compiled PTX lua string
* kernel_name - name of kernel to run from the given PTX
* arguments - lua table with CudaTensors as inputs and subtables in the form {'int', n} to provide scalar arguments
* gridDim - size of the grid table, has to have at least one value, others will be filled with ones
* blockDim - size of block table, again has to have at least one value, others will be onesPTX can be generated in runtime with https://github.com/szagoruyko/nvrtc.torch
Short example:
```lua
local kernel = [[
extern "C" __global__
void kernel(float *a, int n)
{
int tx = blockIdx.x*blockDim.x + threadIdx.x;
if(tx < n)
a[tx] *= 2.f;
}
]]local ptx = nvrtc.compileReturnPTX(kernel)
local a = torch.randn(32):cuda()
local b = a:clone()
cutorch.launchPTX(ptx, 'kernel', {a, {'int', a:numel()}}, {1}, {32})
```### apply1
Applies provided operator to a tensor:
```lua
function CudaTensor.apply1(self, op)
```
op has to be a lua string assigning a value to variable 'x'. CUDA built-in __device__ functions can be used, see CUDA documentation for more information. Multiline ops supported, has to be separated with ;
Both contiguous and non-contiguous tensors are valid. First call to any apply operation takes about 0.5s, then the compiled code is cached and other calls are fast.### apply2
Applies provided operator using two tensors:
```lua
function CudaTensor.apply2(self, a, op)
```
op has to use 'x' and 'y' - self and a tensors. Can assign values to both tensors. See apply1 for properties.### apply3
Applies provided operator using three tensors:
```lua
function CudaTensor.apply3(self, a, b, op)
```
op has to use 'x', 'y' and 'z' - self, a and b tensors. Can assign values to all three tensors. See apply1 for properties.