https://github.com/lucidrains/transframer-pytorch

Implementation of Transframer, Deepmind's U-net + Transformer architecture for up to 30 seconds video generation, in Pytorch
https://github.com/lucidrains/transframer-pytorch

artificial-intelligence attention-mechanisms deep-learning transformers unet video-generation

Last synced: 5 months ago
JSON representation

Implementation of Transframer, Deepmind's U-net + Transformer architecture for up to 30 seconds video generation, in Pytorch

Host: GitHub
URL: https://github.com/lucidrains/transframer-pytorch
Owner: lucidrains
License: mit
Created: 2022-08-17T15:20:47.000Z (almost 4 years ago)
Default Branch: main
Last Pushed: 2022-08-23T20:33:15.000Z (almost 4 years ago)
Last Synced: 2025-03-30T14:51:10.319Z (over 1 year ago)
Topics: artificial-intelligence, attention-mechanisms, deep-learning, transformers, unet, video-generation
Language: Python
Homepage:
Size: 159 KB
Stars: 70
Watchers: 4
Forks: 6
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

## Transframer - Pytorch (wip)

Implementation of Transframer, Deepmind's U-net + Transformer architecture for up to 30 seconds video generation, in Pytorch

The gist of the paper is the usage of a Unet as a multi-frame encoder, along with a regular transformer decoder cross attending and predicting the rest of the frames. The author builds upon his prior work where images are encoded as sparse discrete cosine transform (DCT) sequences.

I will deviate from the implementation in this paper, using a hierarchical autoregressive transformer, and just a regular resnet block in place of the NF-net block (this design choice is just Deepmind reusing their own code, as NF-net was developed at Deepmind by Brock et al).

Update: On further meditation, there is nothing new in this paper except for generative modeling on DCT representations

## Appreciation

- This work would not be possible without the generous sponsorship from Stability AI, as well as my other sponsors

## Todo

- [ ] figure out if dct can be directly extracted from images in jpeg format

## Citations

```bibtex
@article{Nash2022TransframerAF,
title = {Transframer: Arbitrary Frame Prediction with Generative Models},
author = {Charlie Nash and Jo{\~a}o Carreira and Jacob Walker and Iain Barr and Andrew Jaegle and Mateusz Malinowski and Peter W. Battaglia},
journal = {ArXiv},
year = {2022},
volume = {abs/2203.09494}
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lucidrains/transframer-pytorch

Awesome Lists containing this project

README