https://github.com/szagoruyko/cutorch-rtc

lua apply function for cutorch
https://github.com/szagoruyko/cutorch-rtc

Last synced: 5 months ago
JSON representation

lua apply function for cutorch

Host: GitHub
URL: https://github.com/szagoruyko/cutorch-rtc
Owner: szagoruyko
License: bsd-2-clause
Created: 2015-03-20T13:01:07.000Z (about 10 years ago)
Default Branch: master
Last Pushed: 2017-01-05T17:23:25.000Z (over 8 years ago)
Last Synced: 2025-01-01T18:35:06.859Z (5 months ago)
Language: Lua
Size: 48.8 KB
Stars: 17
Watchers: 5
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # cutorch-rtc

Basic feature list:

 * cutorch.launchPTX function

 * apply kernels from cutorch

This package brings CUDA 7 runtime compilation to Torch. Linux or OS X with C++11 compiler required.

Installation:

```

luarocks install https://raw.githubusercontent.com/szagoruyko/cutorch-rtc/master/cutorch-rtc-scm-1.rockspec

```

Then after requiring ```cutorch-rtc``` you will get ```launchPTX``` function, which can run ptx code generated with NVRTC, and ```cutorch.apply``` functions:

```lua

require 'cutorch-rtc'

t = torch.randn(8):cuda()

t:apply1'x = x < 0 ? 0 : x'

```

That would be a simple ReLU implementation.

## Documentation

### cutorch.launchPTX

Runs compiled PTX.

```lua

function cutorch.launchPTX(ptx, kernel_name, arguments, gridDim, blockDim)

```

Arguments:

 * ptx - compiled PTX lua string

 * kernel_name - name of kernel to run from the given PTX

 * arguments - lua table with CudaTensors as inputs and subtables in the form {'int', n} to provide scalar arguments

 * gridDim - size of the grid table, has to have at least one value, others will be filled with ones

 * blockDim - size of block table, again has to have at least one value, others will be ones

PTX can be generated in runtime with https://github.com/szagoruyko/nvrtc.torch

Short example:

```lua

local kernel = [[

extern "C" __global__

void kernel(float *a, int n)

{

  int tx = blockIdx.x*blockDim.x + threadIdx.x;

  if(tx < n)

  a[tx] *= 2.f;

}

]]

local ptx = nvrtc.compileReturnPTX(kernel)

local a = torch.randn(32):cuda()

local b = a:clone()

cutorch.launchPTX(ptx, 'kernel', {a, {'int', a:numel()}}, {1}, {32})

```

### apply1

Applies provided operator to a tensor:

```lua

function CudaTensor.apply1(self, op)

```

op has to be a lua string assigning a value to variable 'x'. CUDA built-in __device__ functions can be used, see CUDA documentation for more information. Multiline ops supported, has to be separated with ;

Both contiguous and non-contiguous tensors are valid. First call to any apply operation takes about 0.5s, then the compiled code is cached and other calls are fast.

### apply2

Applies provided operator using two tensors:

```lua

function CudaTensor.apply2(self, a, op)

```

op has to use 'x' and 'y' - self and a tensors. Can assign values to both tensors. See apply1 for properties.

### apply3

Applies provided operator using three tensors:

```lua

function CudaTensor.apply3(self, a, b, op)

```

op has to use 'x', 'y' and 'z' - self, a and b tensors. Can assign values to all three tensors. See apply1 for properties.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/szagoruyko/cutorch-rtc

Awesome Lists containing this project

README