Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/christopher-hesse/tenet
Automatic differentiation prototype in Zig
https://github.com/christopher-hesse/tenet
Last synced: 3 months ago
JSON representation
Automatic differentiation prototype in Zig
- Host: GitHub
- URL: https://github.com/christopher-hesse/tenet
- Owner: christopher-hesse
- License: mit
- Created: 2021-05-04T06:39:42.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2021-05-06T04:33:52.000Z (almost 4 years ago)
- Last Synced: 2024-04-18T16:01:45.283Z (10 months ago)
- Language: Zig
- Size: 58.6 KB
- Stars: 15
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-zig - tenet🗒️Automatic differentiation prototype in Zig
README
# tenet
A [torch](https://github.com/pytorch/pytorch)-inspired automatic differentiation prototype for [Zig](https://ziglang.org/).
Imagine the [numpy](https://numpy.org/) NDArray, only you can also compute backward in time using inverted functions. Well, not quite, but you *can* calculate derivatives with respect to the inputs of your computation.
## Usage
The main struct is `Tensor`, an N-dimensional array of numbers, usually floating point numbers. Here's a short example showing how to do a `+` operation along with a backward pass:
```zig
const tenet = @import("tenet.zig");
const alc = std.testing.allocator;
var a = try tenet.Tensor.allocWithValue(f32, alc, &[_]u64{2, 3, 4}, 1.0, tenet.tensor.REQUIRES_GRAD);
defer a.release();
var b = try tenet.Tensor.allocWithValue(f32, alc, &[_]u64{2, 3, 4}, 2.0, tenet.tensor.REQUIRES_GRAD);
defer b.release();
var out = try tenet.tensor.plusAlloc(alc, a, b);
defer out.release();
var grad_out = try tenet.Tensor.allocWithValue(f32, alc, &[_]u64{2, 3, 4}, 4.0, 0);
defer grad_out.release();
try tenet.tensor.backwardAlloc(alc, out, grad_out);
std.testing.expect(tenet.array.equal(a.grad.?, grad_out.data));
std.testing.expect(tenet.array.equal(b.grad.?, grad_out.data));
```For a full example, look at the [MNIST example](src/main.zig).
## Automatic Differentiation
If you have a function `z = f(x, y)` and you want to know how to change `x` and `y` to minimize `z`, how do you do find that out? One way would be to increase and decrease `x` and `y` individually to see how much `z` changes, then move them in whichever direction is better. That method is called ["finite differences"](https://en.wikipedia.org/wiki/Finite_difference#Relation_with_derivatives).
For a couple of input variables, this is fine, but it's not very efficient with a large number of input variables. Instead of doing that, you can find the derivatives by constructing a sort of backward version of the computation graph of your function. If the function `f` looked like this:
```py
def square(x):
return x ** 2def cube(x):
return x ** 3def multiply(x, y):
return x * ydef f(x, y):
a = square(x)
b = cube(y)
c = multiply(a, b)
return c
```You might have a backward function like this:
```py
def backward_multiply(x, y, grad_out):
grad_in_x = y * grad_out
grad_in_y = x * grad_out
return grad_in_x, grad_in_ydef backward_square(x, grad_out):
grad_in = 2 * x * grad_out
return grad_indef backward_cube(x, grad_out):
grad_in = 3 * x ** 2 * grad_out
return grad_indef backward_f(x, y, grad_z):
# we actually need the intermediate values to call the backward functions
# so re-calculate them here (normally we would just store them when running f() the first time)
a = square(x)
b = cube(y)
_c = multiply(a, b)grad_a, grad_b = backward_multiply(a, b, grad_z)
grad_y = backward_cube(y, grad_b)
grad_x = backward_square(x, grad_a)
return grad_x, grad_y
```Where the `backward_` functions are the derivatives of the original functions, using the chain rule to combine them together. Each `backward_` function takes the original inputs to the normal function, plus an extra `grad_out` parameter, then returns `grad_in_` for each of the original inputs. You end up with the same information about how the output changes as you would get from changing each input variable individually, only with fewer calculations:
```py
# run the function normally
x = 1.0
y = 2.0
z = f(x, y)
print(f"f(x,y): {z}")# run the backward function
grad_z = 1.0 # the initial grad value is set to 1
grad_x, grad_y = backward_f(x, y, grad_z)
print(f"backward_f(x, y, grad_z): grad_x = {grad_x}, grad_y = {grad_y}")# check the backward function using finite differences
# by making small changes to each input to find how the output changes
def finite_differences(x, y, f, epsilon=1e-6):
grad_x = (f(x + epsilon, y) - f(x - epsilon, y)) / (2 * epsilon)
grad_y = (f(x, y + epsilon) - f(x, y - epsilon)) / (2 * epsilon)
return grad_x, grad_ygrad_x_fd, grad_y_fd = finite_differences(x, y, f)
print(f"finite differences approximation: grad_x = {grad_x_fd}, grad_y = {grad_y_fd}")
```See [scripts/grad_example.py](scripts/grad_example.py) for the full script. In the case where the inputs and outputs are matrices instead of scalars, `grad_out` will have the shape of the output, and each `grad_in_` will have the shape of the corresponding input.
In automatic differentiation, you create `backward_f` automatically based on the operations done by `f`. Like in torch, no explicit graph is defined when using this prototype. Arrays in `tenet` track the series of operations used to create them, so when you do the backward pass, each `backward_` function is run for you, automatically.
## Interesting Features
There's only one sort of interesting feature about this prototype. Zig does not support operator overloading, but it would still be nice to write out equations. Writing out the operations by hand is a bit of a pain:
```zig
// (x * y + z) ^ 2.0
var a = try multiplyAlloc(alc, x, y);
defer a.release();
var b = try addAlloc(alc, a, z);
defer b.release()
var two = try Tensor.allocWithValue(f32, alc, &[_]u64{}, 2, tensor.NO_FLAGS);
defer two.release();
var c = try powerAlloc(alc, b, two);
defer c.release();
```The `expr` function does all the same stuff, but uses a string at compile time:
```zig
var c = try expr(alc, "(x .* y + z) .^ 2.0", .{.x=x, .y=y, .z=z});
defer c.release();
```Actually it only parses the expression at compile time, it doesn't fully unroll all the operations. I suspect the only thing keeping it from fully unrolling is some Zig compiler bug.
Because operator overloading is not used, the `expr` syntax has much fewer limitations. For this prototype, it uses [MATLAB style operators](https://www.mathworks.com/help/matlab/matlab_prog/matlab-operators-and-special-characters.html).
## Downsides
* Defining an explicit graph may be a better approach than this and is used in the [kann](https://github.com/attractivechaos/kann) library
* Deallocating memory immediately is kind of annoying when you don't use `expr`. If you use `defer`, it won't be deallocated until the end of the block
* Performance is mediocre, there has been no tuning for performance beyond an option to use [Intel's MKL library](https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/onemkl.html#gs.zou9ms). The option is `-Duse-mkl` when using `zig build`.
* CPU only for now
* Only tested on windows
* Probably contains serious bugs
* This is mostly a proof-of-concept, and will likely not be maintained as a generally useful library.