https://github.com/lucidrains/nwt-pytorch

Implementation of NWT, audio-to-video generation, in Pytorch
https://github.com/lucidrains/nwt-pytorch

artificial-intelligence audio deep-learning video-generation

Last synced: over 1 year ago
JSON representation

Implementation of NWT, audio-to-video generation, in Pytorch

Host: GitHub
URL: https://github.com/lucidrains/nwt-pytorch
Owner: lucidrains
License: mit
Created: 2021-06-09T02:19:23.000Z (about 5 years ago)
Default Branch: main
Last Pushed: 2022-03-17T00:54:18.000Z (over 4 years ago)
Last Synced: 2025-03-28T01:49:39.502Z (over 1 year ago)
Topics: artificial-intelligence, audio, deep-learning, video-generation
Language: Python
Homepage:
Size: 10.7 KB
Stars: 90
Watchers: 13
Forks: 8
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          ## NWT - Pytorch (wip)

Implementation of NWT, audio-to-video generation, in Pytorch.

Generated samples

## Install

```bash

$ pip install nwt-pytorch

```

## Usage

The paper proposes a new discrete latent representation named `Memcodes`, which can be succinctly described as a type of multi-head hard-attention to learned memory (codebook) key / values. They claim the need for less codes and smaller codebook dimension in order to achieve better reconstructions.

```python

import torch

from nwt_pytorch import Memcodes

codebook = Memcodes(

    dim = 512,            # dimension of incoming features (codebook dimension will be dim / heads)

    heads = 8,            # head dimension, which is equivalent ot number of codebooks

    num_codes = 1024,     # number of codes per codebook

    temperature = 1.      # gumbel softmax temperature

)

x = torch.randn(1, 1024, 512)

out, codebook_indices = codebook(x) # (1, 1024, 512), (1, 1024, 8)

# (batch, seq, dimension), (batch, seq, heads)

# reconstruct output from codebook indices (codebook indices are autoregressed out from an attention net in paper)

assert torch.allclose(codebook.get_codes_from_indices(codebook_indices), out)

```

## Citations

```bibtex

@misc{mama2021nwt,

    title   = {NWT: Towards natural audio-to-video generation with representation learning}, 

    author  = {Rayhane Mama and Marc S. Tyndel and Hashiam Kadhim and Cole Clifford and Ragavan Thurairatnam},

    year    = {2021},

    eprint  = {2106.04283},

    archivePrefix = {arXiv},

    primaryClass = {cs.SD}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lucidrains/nwt-pytorch

Awesome Lists containing this project

README