Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/krasserm/perceiver-io
A PyTorch implementation of Perceiver, Perceiver IO and Perceiver AR with PyTorch Lightning scripts for distributed training
https://github.com/krasserm/perceiver-io
deep-learning machine-learning perceiver perceiver-ar perceiver-io pytorch pytorch-lightning
Last synced: 5 days ago
JSON representation
A PyTorch implementation of Perceiver, Perceiver IO and Perceiver AR with PyTorch Lightning scripts for distributed training
- Host: GitHub
- URL: https://github.com/krasserm/perceiver-io
- Owner: krasserm
- License: apache-2.0
- Created: 2021-10-04T05:42:06.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2024-01-02T14:51:28.000Z (about 1 year ago)
- Last Synced: 2024-10-30T00:33:40.705Z (3 months ago)
- Topics: deep-learning, machine-learning, perceiver, perceiver-ar, perceiver-io, pytorch, pytorch-lightning
- Language: Python
- Homepage:
- Size: 22.2 MB
- Stars: 434
- Watchers: 10
- Forks: 40
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
README
# Perceiver, Perceiver IO and Perceiver AR
This repository is a PyTorch implementation of Perceiver, Perceiver IO and Perceiver AR, with PyTorch Lightning
interfaces for model training and Hugging Face 🤗 interfaces for inference.
Perceiver: General Perception with Iterative Attention
(paper,
video)
Perceiver IO: A General Architecture for Structured Inputs & Outputs
(paper,
blog post)
General-purpose, long-context autoregressive modeling with Perceiver AR
(paper,
blog post)
## Overview
Core of the `perceiver-io` library are *backend models*, lightweight PyTorch implementations of Perceiver,
Perceiver IO and Perceiver AR. They can be wrapped into [PyTorch Lightning](https://pytorch-lightning.readthedocs.io/en/stable/)
modules for training (*Lightning interface*) and 🤗 modules for inference (*Hugging Face interface*). See
[library design](docs/library-design.md) for details.
The command line interface for training is implemented with [Lightning CLI](https://pytorch-lightning.readthedocs.io/en/stable/cli/lightning_cli.html).
Training datasets are 🤗 [datasets](https://huggingface.co/docs/datasets) wrapped into PyTorch Lightning data modules.
For NLP tasks, `perceiver-io` supports all 🤗 [fast tokenizers](https://huggingface.co/docs/transformers/fast_tokenizers)
and the 🤗 Perceiver UTF-8 bytes tokenizer.## Documentation
- [Installation](#installation)
- [Getting started](#getting-started)
- [Library design](docs/library-design.md)
- [Pretrained models](docs/pretrained-models.md)
- [Training examples](docs/training-examples.md)
- [Inference examples](examples/inference.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/krasserm/perceiver-io/blob/main/examples/inference.ipynb)
- [Model construction](docs/model-construction.md)
- [Building blocks](docs/building-blocks.md)## Installation
### Via pip
```shell
pip install perceiver-io[text,vision,audio]
```### From sources
Installation from sources requires a [Miniconda](https://docs.conda.io/en/latest/miniconda.html) and a
[Poetry](https://python-poetry.org/docs/#installation) (1.2.0 or higher) installation.Create and activate the `perceiver-io` conda environment:
```shell
conda env create -f environment.yml
conda activate perceiver-io
```Install main and test dependencies, including all extras:
```shell
# Without dependencies required for examples
poetry install --all-extras
```If you want to run the [examples](examples) locally, additionally use `--with examples`:
```shell
poetry install --all-extras --with examples
```### Docker image
```shell
docker pull ghcr.io/krasserm/perceiver-io:latest
```See [Docker image](docs/docker-image.md) for details.
## Getting started
### Inference
#### Optical flow
Compute the optical flow between consecutive frames of an input video and write the rendered results to an output
video:```python
from urllib.request import urlretrieve
from transformers import pipelinefrom perceiver.data.vision import video_utils
from perceiver.model.vision import optical_flow # register auto-classes and pipelineurlretrieve(
url="https://martin-krasser.com/perceiver/flow/sintel_clip_cave_dragon_fight.mp4",
filename="sintel_clip_cave_dragon_fight.mp4",
)# Create optical flow pipeline
optical_flow_pipeline = pipeline("optical-flow", model="krasserm/perceiver-io-optical-flow", device="cuda:0")# load consecutive video frame pairs
frame_pairs = video_utils.read_video_frame_pairs("sintel_clip_cave_dragon_fight.mp4")# create and render optical flow for all frame pairs
optical_flows = optical_flow_pipeline(frame_pairs, render=True, device="cuda:0")# create video with rendered optical flows
video_utils.write_video("sintel_clip_cave_dragon_fight_output.mp4", optical_flows, fps=24)
```Here is a side-by-side comparison of the input and output video:
#### Symbolic audio generation
Create audio sequences by generating symbolic ([MIDI](https://en.wikipedia.org/wiki/MIDI)) audio data and converting the
generated audio symbols into WAV output using [fluidsynth](https://www.fluidsynth.org/) (_Note:_ fluidsynth must be installed
in order for the following example to work):```python
from transformers import pipeline
from pretty_midi import PrettyMIDI
from perceiver.model.audio import symbolic # auto-class registrationrepo_id = "krasserm/perceiver-ar-sam-giant-midi"
prompt = PrettyMIDI("prompt.mid")
audio_generator = pipeline("symbolic-audio-generation", model=repo_id)output = audio_generator(prompt, max_new_tokens=64, num_latents=1, do_sample=True, top_p=0.95, temperature=1.0, render=True)
with open("generated_audio.wav", "wb") as f:
f.write(output["generated_audio_wav"])
```Examples of generated audio sequences are available on the 🤗 [hub](https://huggingface.co/krasserm/perceiver-ar-sam-giant-midi#audio-samples).
See [inference examples](https://colab.research.google.com/github/krasserm/perceiver-io/blob/main/examples/inference.ipynb)
for more examples.### Training
Train a small Perceiver IO image classifier (907K parameters) on MNIST from the command line. The classifier
cross-attends to individual pixels of input images with [repeated cross-attention](docs/building-blocks.md).
See [image classification](docs/training-examples.md#image-classification) training example for more details.```shell
python -m perceiver.scripts.vision.image_classifier fit \
--model.num_latents=32 \
--model.num_latent_channels=128 \
--model.encoder.num_frequency_bands=32 \
--model.encoder.num_cross_attention_layers=2 \
--model.encoder.num_self_attention_blocks=3 \
--model.encoder.num_self_attention_layers_per_block=3 \
--model.encoder.first_self_attention_block_shared=false \
--model.encoder.dropout=0.1 \
--model.encoder.init_scale=0.1 \
--model.decoder.num_output_query_channels=128 \
--model.decoder.dropout=0.1 \
--model.decoder.init_scale=0.1 \
--data=MNISTDataModule \
--data.batch_size=64 \
--optimizer=AdamW \
--optimizer.lr=1e-3 \
--lr_scheduler.warmup_steps=500 \
--trainer.accelerator=gpu \
--trainer.devices=1 \
--trainer.max_epochs=30 \
--trainer.logger=TensorBoardLogger \
--trainer.logger.save_dir=logs \
--trainer.logger.name=logs
```[Model construction](docs/model-construction.md) describes how to implement model-specific command line interfaces
with the Lightning CLI. Training checkpoints are written to the `logs/img_clf/version_0/checkpoints` directory. Assuming
a checkpoint with filename `epoch=025-val_loss=0.065.ckpt` exists, it can be converted to a `perceiver-io` 🤗 model with```python
from perceiver.model.vision.image_classifier import convert_mnist_classifier_checkpointconvert_mnist_classifier_checkpoint(
save_dir="example/mnist-classifier",
ckpt_url="logs/img_clf/version_0/checkpoints/epoch=025-val_loss=0.065.ckpt",
)
```so that it can be used in a 🤗 image classification pipeline
```python
from datasets import load_dataset
from transformers import pipelinemnist_dataset = load_dataset("mnist", split="test")[:9]
images = mnist_dataset["image"]
labels = mnist_dataset["label"]classifier = pipeline("image-classification", model="example/mnist-classifier")
predictions = [pred[0]["label"] for pred in classifier(images)]print(f"Labels: {labels}")
print(f"Predictions: {predictions}")
```
```
Labels: [7, 2, 1, 0, 4, 1, 4, 9, 5]
Predictions: [7, 2, 1, 0, 4, 1, 4, 9, 5]
```or loaded directly:
```python
import torch
from transformers import AutoModelForImageClassification, AutoImageProcessormodel = AutoModelForImageClassification.from_pretrained("example/mnist-classifier")
processor = AutoImageProcessor.from_pretrained("example/mnist-classifier")inputs = processor(images, return_tensors="pt")
with torch.no_grad():
# use perceiver-io Hugging Face model
output_1 = model(**inputs).logitswith torch.no_grad():
# or use perceiver-io backend model directly
output_2 = model.backend_model(inputs.pixel_values)print(f"Predictions: {output_1.argmax(dim=-1).numpy().tolist()}")
print(f"Predictions: {output_2.argmax(dim=-1).numpy().tolist()}")
```
```
Predictions: [7, 2, 1, 0, 4, 1, 4, 9, 5]
Predictions: [7, 2, 1, 0, 4, 1, 4, 9, 5]
```See [training examples](docs/training-examples.md) for more examples.
## Articles
Articles referencing this repository:
- [Training compute-optimal Perceiver AR language models](https://krasserm.github.io/2023/01/23/scaling-perceiver-ar/)
- [A gentle introduction to Rotary Position Embedding](https://krasserm.github.io/2022/12/13/rotary-position-embedding/)## Other implementations
- [Perceiver](https://paperswithcode.com/paper/perceiver-general-perception-with-iterative#code)
- [Perceiver IO](https://paperswithcode.com/paper/perceiver-io-a-general-architecture-for#code)
- [Perceiver AR](https://paperswithcode.com/paper/general-purpose-long-context-autoregressive#code)