https://github.com/lucidrains/transframer-pytorch
Implementation of Transframer, Deepmind's U-net + Transformer architecture for up to 30 seconds video generation, in Pytorch
https://github.com/lucidrains/transframer-pytorch
artificial-intelligence attention-mechanisms deep-learning transformers unet video-generation
Last synced: 5 months ago
JSON representation
Implementation of Transframer, Deepmind's U-net + Transformer architecture for up to 30 seconds video generation, in Pytorch
- Host: GitHub
- URL: https://github.com/lucidrains/transframer-pytorch
- Owner: lucidrains
- License: mit
- Created: 2022-08-17T15:20:47.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2022-08-23T20:33:15.000Z (almost 4 years ago)
- Last Synced: 2025-03-30T14:51:10.319Z (over 1 year ago)
- Topics: artificial-intelligence, attention-mechanisms, deep-learning, transformers, unet, video-generation
- Language: Python
- Homepage:
- Size: 159 KB
- Stars: 70
- Watchers: 4
- Forks: 6
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README

## Transframer - Pytorch (wip)
Implementation of Transframer, Deepmind's U-net + Transformer architecture for up to 30 seconds video generation, in Pytorch
The gist of the paper is the usage of a Unet as a multi-frame encoder, along with a regular transformer decoder cross attending and predicting the rest of the frames. The author builds upon his prior work where images are encoded as sparse discrete cosine transform (DCT) sequences.
I will deviate from the implementation in this paper, using a hierarchical autoregressive transformer, and just a regular resnet block in place of the NF-net block (this design choice is just Deepmind reusing their own code, as NF-net was developed at Deepmind by Brock et al).
Update: On further meditation, there is nothing new in this paper except for generative modeling on DCT representations
## Appreciation
- This work would not be possible without the generous sponsorship from Stability AI, as well as my other sponsors
## Todo
- [ ] figure out if dct can be directly extracted from images in jpeg format
## Citations
```bibtex
@article{Nash2022TransframerAF,
title = {Transframer: Arbitrary Frame Prediction with Generative Models},
author = {Charlie Nash and Jo{\~a}o Carreira and Jacob Walker and Iain Barr and Andrew Jaegle and Mateusz Malinowski and Peter W. Battaglia},
journal = {ArXiv},
year = {2022},
volume = {abs/2203.09494}
}
```