Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lucidrains/pause-transformer
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount of time on any token
https://github.com/lucidrains/pause-transformer
adaptive-computation artificial-intelligence attention-mechanisms deep-learning transformers
Last synced: 19 days ago
JSON representation
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount of time on any token
- Host: GitHub
- URL: https://github.com/lucidrains/pause-transformer
- Owner: lucidrains
- License: mit
- Created: 2023-10-18T16:14:12.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2023-10-22T16:41:27.000Z (about 1 year ago)
- Last Synced: 2024-05-02T01:14:22.661Z (7 months ago)
- Topics: adaptive-computation, artificial-intelligence, attention-mechanisms, deep-learning, transformers
- Language: Python
- Homepage:
- Size: 659 KB
- Stars: 42
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## Pause Transformer (wip)
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount of time on any token.
Again, the idea relies on axial attention; one axis attends along the sequence length as in the usual transformer, the other along a `thinking` or `pause` dimension.
## Todo
- [x] allow for custom pause distributions across token
- [x] see if one can do a two pass, using the logit entropy as a way to decide how to shape the pause mask- [ ] run experiments on enwik8, but if do not see anything, move onwards to something harder, say arithmetic
## Citations
```bibtex
@inproceedings{Goyal2023ThinkBY,
title = {Think before you speak: Training Language Models With Pause Tokens},
author = {Sachin Goyal and Ziwei Ji and Ankit Singh Rawat and Aditya Krishna Menon and Sanjiv Kumar and Vaishnavh Nagarajan},
year = {2023},
url = {https://api.semanticscholar.org/CorpusID:263608983}
}
```