https://github.com/harvardnlp/namedtensor

Named Tensor implementation for Torch
https://github.com/harvardnlp/namedtensor

Last synced: 3 months ago
JSON representation

Named Tensor implementation for Torch

Host: GitHub
URL: https://github.com/harvardnlp/namedtensor
Owner: harvardnlp
License: mit
Created: 2019-01-01T18:22:58.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2022-07-29T12:22:27.000Z (almost 3 years ago)
Last Synced: 2025-03-28T09:03:31.462Z (4 months ago)
Language: Jupyter Notebook
Homepage: http://nlp.seas.harvard.edu/NamedTensor
Size: 7.26 MB
Stars: 444
Watchers: 21
Forks: 42
Open Issues: 25
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE

Awesome Lists containing this project

README

        This draft implementation should now be considered completed. If you are interested in using Named Tensor check out the core PyTorch implementation: 

https://pytorch.org/docs/stable/named_tensor.html

Thanks for everyone who contributed to this version. 

# Named Tensor for Torch

[![Build Status](https://travis-ci.org/harvardnlp/namedtensor.svg?branch=master)](https://travis-ci.org/harvardnlp/namedtensor) [![Coverage Status](https://coveralls.io/repos/github/harvardnlp/namedtensor/badge.svg?branch=master)](https://coveralls.io/github/harvardnlp/namedtensor?branch=master)

## Introduction

A proposal for a named tensor for Torch described here:

http://nlp.seas.harvard.edu/NamedTensor

NamedTensor is an thin-wrapper on Torch tensor that makes three changes to the API:

1) **Naming**: Dimension access and reduction use a named `dim` argument instead of an index. Constructing and adding dimensions use a `name` argument. Axis-based indexing \[ \] is replaced by named indexing. 

2) **Broadcasting**: All functions broadcast based on set-operations not through heuristic ordering rules, e.g. if `z = x + y` then `z` has the union of the dimension in `x` and `y`.    

3) **Lifting**: Order-based functions can be lifted by providing name annotations through `.spec` methods. For instance, convolution requires the user to name the channel and kernel dims, e.g .`conv2d.spec("channel", ("x", "y"))`. This provides dynamic checks, better error messages, and consistent documentation.

## Setup 

```

pip install git+https://github.com/harvardnlp/namedtensor

```

## Usage

```python

from namedtensor import ntorch

```

### Building tensors.

All pytorch builders have an extra keyword argument names.

```python

x = ntorch.randn(10, 10, 20, names=("batch", "h", "w"))

x = ntorch.ones(10, 10, 20, names=("batch", "h", "w"))

```

### Standard functions

All functions that keep dimensionality work in the same way.

```python

x = x.log()

x = x.float()

x = ntorch.exp(x)

```

### Named Indexing

Indexing and masking operation work by name as opposed to absolute position. 

```python

first_batch = x[{"batch": 1}]

three_examples = x[{"batch": slice(1, 4)}]

masked = x[ x > 0.5 ]

```

Advanced indexing by named tensors.

```python

select = ntorch.tensor([1, 4, 5], names=("rows",))

y = x[{"h": select}] 

# y shape ("batch", "rows", "w")

```

### No view or unsqueeze

View, tranpose, and friends are deprecated in favor of named

access and movement.

```python

x = x.stack(("w", "h"), "stackdim")

# Roundtrip

x = x.split("stackdim", ("w", "h"), w=20)

```

There is no need to ever have unsqueeze since broadcasting is done by name overlap. 

Similar notation can be used for setting values.

### All methods take named args

Any function with a `dim` argument now can be accessed based on the

dimension name.

```python

x = x.narrow("w", 0, 10)

x = x.softmax("w")

```

This is true of reductions functions as well, where the named

dimension is eliminated.

```python

x = x.mean("w")

x, argmax = x.max("w")

```

### Tensor contractions

Matrix operations also use the dimension arguments.

We can replace einsum based on persistent names.

```python

x = ntorch.randn(10, 10, 20, names=("batch", "h", "w"))

y = ntorch.randn(10, 20, 30, names=("batch", "w", "c"))

x.dot("w", y)

```

This also makes indexing much easier to read.

```python

x = ntorch.ones(10, 10, 20, names=("batch", "time", "vocab"))

y = ntorch.randn(20, 30, names=("vocab", "embsize"))

y.index_select("vocab", x)

```

### Removed Functions

The following functions are removed from the stdlib. 

* view, expand, squeeze, unsqueeze, transpose

## NN Modules

NN units no longer take ordered tensors. They now have a required additional method `spec`

that lets the user set the the input and output dimensions of the object. 

Examples

```python

  conv = ntorch.nn.Conv1d(5, 10, 2).spec("input", "time", "output")

  n = ntorch.randn(20, 30, 5, names=("batch", "time", "input"))

  out = conv(n)

```

```python

  drop = ntorch.nn.Dropout()

  n = ntorch.randn(4, 20, names=("batch", "target"))

  out = drop(n)

```

```python

  loss = ntorch.nn.NLLLoss().spec("target")

  predict = ntorch.randn(20, 4, names=("target", "batch"))

  target = ntorch.tensor([2, 2, 3, 4], names=("batch",))

  out = loss(predict, target)

```

## Distributions

```python

# Univariate

mu = ntorch.randn(10, names=("batch"))

sigma = ntorch.randn(10, names=("batch"))

dist = ntorch.distributions.Normal(mu, sigma)

sample = dist.sample((30, 40), names=("sample1", "sample2"))

# Discrete

params = ntorch.randn(10, 20, 30, names=("batch1", "batch2", "logits"))

dist = ntorch.distributions.Categorical(logits=params, logit_dim="logits")

```

## Documentation

http://nlp.seas.harvard.edu/namedtensor/

## Author

* Alexander Rush ([email protected], @harvardnlp)

## Contributors

(NamedTensor is being collectively developed by Harvard CS 287)

* Yuntian Deng

* Justin Chiu

* Francisco Rivera

* Jiafeng Chen

* Celine Liang

* Miro Furtado

* Roshan Padaki

* Mirac Suzgun

* Belén Saldías

* Jason Ren

* Josh Feldman

* Jambay Kinley

* Ian Kivlichan

* Sanyuan Chen

* Simon Shen

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/harvardnlp/namedtensor

Awesome Lists containing this project

README