Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/HazyResearch/safari
Convolutions for Sequence Modeling
https://github.com/HazyResearch/safari
Last synced: 3 months ago
JSON representation
Convolutions for Sequence Modeling
- Host: GitHub
- URL: https://github.com/HazyResearch/safari
- Owner: HazyResearch
- License: apache-2.0
- Created: 2023-02-14T08:15:25.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-06-13T21:57:14.000Z (8 months ago)
- Last Synced: 2024-08-01T04:02:19.245Z (6 months ago)
- Language: Assembly
- Size: 7.45 MB
- Stars: 856
- Watchers: 35
- Forks: 70
- Open Issues: 22
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
- Awesome-state-space-models - GitHub
README
# Convolutions for Sequence Modeling
This repository provides implementations and experiments for the following papers, as well as simplified presentations of earlier work such as [S4](https://github.com/HazyResearch/state-spaces).
Please see these instructions for how to download weights and run our pretrained models:
* [H3](https://github.com/HazyResearch/H3/tree/main/examples) (125m-2.7B)
* [Hyena](https://github.com/HazyResearch/safari/blob/main/experiments.md#downstream-evaluations) (small, 150M)## Hyena
**Hyena Hierarchy: Towards Larger Convolutional Language models**
Michael Poli\*, Stefano Massaroli\*, Eric Nguyen\*, Daniel Y. Fu, Tri Dao, Stephen Baccus, Yoshua Bengio, Stefano Ermon, Christopher Ré \
ICML 2023. **Oral**.\
[Paper](https://arxiv.org/abs/2302.10866)
![Hyena](assets/hyena.png "Hyena Hierarchy")## Long Convs
**Simple Hardware-Efficient Long Convolutions for Sequence Modeling**\
Daniel Y. Fu*, Elliot L. Epstein*, Eric Nguyen, Armin W. Thomas, Michael Zhang, Tri Dao, Atri Rudra, Christopher Ré\
ICML 2023. \
[Paper](https://arxiv.org/abs/2302.06646)
![LongConvs](assets/long_convs.png "Long Convolutions for Sequence Modeling")## Hungry Hungry Hippos (H3)
**Hungry Hungry Hippos: Towards Language Modeling with State Space Models**
Daniel Y. Fu\*, Tri Dao\*, Khaled K. Saab, Armin W. Thomas, Atri Rudra, Christopher Ré
ICLR 2023. **Notable top-25% (spotlight).**
[Paper](https://arxiv.org/abs/2212.14052)
![H3](assets/h3.png "Hungry Hungry Hippos")### Roadmap
- ~~Include H3, LLM training, and synthetics in this repository~~
- ~~Move in fast convolution code~~
- ~~Add Hyena implementation and experiments~~
- pip package### Changelog
See [CHANGELOG.md](CHANGELOG.md)## Setup
### Requirements
This repository requires Python 3.8+ and Pytorch 1.10+.
Other packages are listed in [requirements.txt](./requirements.txt).## Getting Started
The easiest way to get started is to run the [`standalone_cifar.py`](./standalone_cifar.py) script.
This scripts trains a simple long convolution model on CIFAR-10:
```
python -m standalone_cifar
```See the [experiments](./experiments.md) page for more:
* LRA experiments from the Long Convs paper
* H3 experiments (language model, synthetics)
* H3 + Long Conv experiments
* Hyena language and vision experiments## Resources
We're happy to share independent reimplementations and explainer posts about methods presented in this repository.
#### Hyena:
* [irhum's JAX reimplementation and comparison with nanoGPT](https://github.com/irhum/hyena)
* [exps's reimplementation](https://github.com/expz/annotated-hyena) and [explainer post](https://medium.com/@jskowera/the-annotated-hyena-3e50e0aa372a)## Citation
If you use this codebase, or otherwise found our work valuable, you can cite us as follows:
```
@article{poli2023hyena,
title={Hyena Hierarchy: Towards Larger Convolutional Language Models},
author={Poli, Michael and Massaroli, Stefano and Nguyen, Eric and Fu, Daniel Y and Dao, Tri and Baccus, Stephen and Bengio, Yoshua and Ermon, Stefano and R{\'e}, Christopher},
journal={arXiv preprint arXiv:2302.10866},
year={2023}
}@article{fu2023simple,
title={Simple Hardware-Efficient Long Convolutions for Sequence Modeling},
author={Fu, Daniel Y. and Epstein, Elliot L. and Nguyen, Eric and Thomas, Armin W. and Zhang, Michael and Dao, Tri and Rudra, Atri and R{\'e}, Christopher},
journal={International Conference on Machine Learning},
year={2023}
}@inproceedings{fu2023hungry,
title={Hungry {H}ungry {H}ippos: Towards Language Modeling with State Space Models},
author={Fu, Daniel Y. and Dao, Tri and Saab, Khaled K. and Thomas, Armin W.
and Rudra, Atri and R{\'e}, Christopher},
booktitle={International Conference on Learning Representations},
year={2023}
}
```## Acknowledgements
This repo was forked from Albert Gu's [state spaces](https://github.com/HazyResearch/state-spaces) repo and borrows its structure.
It also contains code from the [FlashAttention](https://github.com/HazyResearch/flash-attention) training scripts.