An open API service indexing awesome lists of open source software.

https://github.com/lucidrains/transframer-pytorch

Implementation of Transframer, Deepmind's U-net + Transformer architecture for up to 30 seconds video generation, in Pytorch
https://github.com/lucidrains/transframer-pytorch

artificial-intelligence attention-mechanisms deep-learning transformers unet video-generation

Last synced: 5 months ago
JSON representation

Implementation of Transframer, Deepmind's U-net + Transformer architecture for up to 30 seconds video generation, in Pytorch

Awesome Lists containing this project

README

          

## Transframer - Pytorch (wip)

Implementation of Transframer, Deepmind's U-net + Transformer architecture for up to 30 seconds video generation, in Pytorch

The gist of the paper is the usage of a Unet as a multi-frame encoder, along with a regular transformer decoder cross attending and predicting the rest of the frames. The author builds upon his prior work where images are encoded as sparse discrete cosine transform (DCT) sequences.

I will deviate from the implementation in this paper, using a hierarchical autoregressive transformer, and just a regular resnet block in place of the NF-net block (this design choice is just Deepmind reusing their own code, as NF-net was developed at Deepmind by Brock et al).

Update: On further meditation, there is nothing new in this paper except for generative modeling on DCT representations

## Appreciation

- This work would not be possible without the generous sponsorship from Stability AI, as well as my other sponsors

## Todo

- [ ] figure out if dct can be directly extracted from images in jpeg format

## Citations

```bibtex
@article{Nash2022TransframerAF,
title = {Transframer: Arbitrary Frame Prediction with Generative Models},
author = {Charlie Nash and Jo{\~a}o Carreira and Jacob Walker and Iain Barr and Andrew Jaegle and Mateusz Malinowski and Peter W. Battaglia},
journal = {ArXiv},
year = {2022},
volume = {abs/2203.09494}
}
```