https://github.com/oelin/hourglass-vq-vae
An Hourglass Transformer VQ-VAE architecture.
https://github.com/oelin/hourglass-vq-vae
Last synced: 3 months ago
JSON representation
An Hourglass Transformer VQ-VAE architecture.
- Host: GitHub
- URL: https://github.com/oelin/hourglass-vq-vae
- Owner: oelin
- License: mit
- Created: 2024-02-15T14:46:19.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-02-15T16:04:58.000Z (over 1 year ago)
- Last Synced: 2024-02-16T15:56:37.007Z (over 1 year ago)
- Language: Jupyter Notebook
- Size: 46.9 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Hourglass VQ-VAE
An Hourglass Transformer VQ-VAE architecture.
## Goal
As part of the LatentLM project, a first-stage model capable of compressing very long sequences is neccessary. We achieve this by combining [Hourglass Transformer](https://arxiv.org/abs/2110.13711) with [FSQ](https://arxiv.org/abs/2309.15505) and [Contrastive Weight Tying](https://arxiv.org/abs/2309.08351) to construct an attention-only VQ-VAE architecture.
## TODO
- [x] Linear attention.
- [ ] GQA.
- [ ] FlashAttention2 with sliding window to replace linear attention.
- [ ] Attention upsampling to replace linear upsampling.
- [ ] (Optional) experiment with adverserial losses (Hourglass VQ-GAN).
- [ ] Hyperparameter tuning.