Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/lucidrains/perfusion-pytorch

Implementation of Key-Locked Rank One Editing, from Nvidia AI
https://github.com/lucidrains/perfusion-pytorch

artificial-intelligence deep-learning memory-editing text-to-image

Last synced: 3 days ago
JSON representation

Implementation of Key-Locked Rank One Editing, from Nvidia AI

Awesome Lists containing this project

README

        

## Perfusion - Pytorch

Implementation of Key-Locked Rank One Editing. Project page

The selling point of this paper is extremely low extra parameters per added concept, down to 100kb.

It seems they successfully applied the Rank-1 editing technique from a memory editing paper for LLM, with a few improvements. They also identified that the keys determine the "where" of the new concept, while the values determine the "what", and propose local / global-key locking to a superclass concept (while learning the values).

For researchers out there, if this paper checks out, the tools in this repository should work for any other text-to-`` network using cross attention conditioning. Just a thought

## Appreciation

- StabilityAI for the generous sponsorship, as well as my other sponsors out there

- Yoad Tewel for the multiple code reviews and clarifying emails

- Brad Vidler for precomputing the covariance matrix for the CLIP used in Stable Diffusion 1.5!

- All the maintainers at OpenClip, for their SOTA open sourced contrastive learning text-image models

## Install

```bash
$ pip install perfusion-pytorch
```

## Usage

```python
import torch
from torch import nn

from perfusion_pytorch import Rank1EditModule

to_keys = nn.Linear(768, 320, bias = False)
to_values = nn.Linear(768, 320, bias = False)

wrapped_to_keys = Rank1EditModule(
to_keys,
is_key_proj = True
)

wrapped_to_values = Rank1EditModule(
to_values
)

text_enc = torch.randn(4, 77, 768) # regular input
text_enc_with_superclass = torch.randn(4, 77, 768) # init_input in algorithm 1, for key-locking
concept_indices = torch.randint(0, 77, (4,)) # index where the concept or superclass concept token is in the sequence
key_pad_mask = torch.ones(4, 77).bool()

keys = wrapped_to_keys(
text_enc,
concept_indices = concept_indices,
text_enc_with_superclass = text_enc_with_superclass,
)

values = wrapped_to_values(
text_enc,
concept_indices = concept_indices,
text_enc_with_superclass = text_enc_with_superclass,
)

# after much training ...

wrapped_to_keys.eval()
wrapped_to_values.eval()

keys = wrapped_to_keys(text_enc)

values = wrapped_to_values(text_enc)

```

The repository also contains an `EmbeddingWrapper` that makes it easy to train on a new concept (and for eventual inference with multiple concepts)

```python
import torch
from torch import nn

from perfusion_pytorch import EmbeddingWrapper

embed = nn.Embedding(49408, 512) # open clip embedding, somewhere in the module tree of stable diffusion

# wrap it, and will automatically create a new concept for learning, based on the superclass embed string

wrapped_embed = EmbeddingWrapper(
embed,
superclass_string = 'dog'
)

# now just pass in your prompts with the superclass id

embeds_with_new_concept, embeds_with_superclass, embed_mask, concept_indices = wrapped_embed([
'a portrait of dog',
'dog running through a green field',
'a man walking his dog'
]) # (3, 77, 512), (3, 77, 512), (3, 77), (3,)

# now pass both embeds through clip text transformer
# the embed_mask needs to be passed to the cross attention as key padding mask
```

If you can identify the `CLIP` instance within the stable diffusion instance, you can also pass it directly to the `OpenClipEmbedWrapper` to gain everything you need on forward for the cross attention layers

ex.

```python
from perfusion_pytorch import OpenClipEmbedWrapper

texts = [
'a portrait of dog',
'dog running through a green field',
'a man walking his dog'
]

wrapped_clip_with_new_concept = OpenClipEmbedWrapper(
stable_diffusion.path.to.clip,
superclass_string = 'dog'
)

text_enc, superclass_enc, mask, indices = wrapped_clip_with_new_concept(texts)

# (3, 77, 512), (3, 77, 512), (3, 77), (3,)
```

## Todo

- [ ] wire up with SD 1.5, starting with xiao's dreambooth-sd
- [ ] show example in readme for inference with multiple concepts
- [ ] automatically infer where keys and values projection are if not specified for the `make_key_value_proj_rank1_edit_modules_` function

- [x] embedding wrapper should take care of substituting with super class token id and return embedding with super class
- [x] review multiple concepts - thanks to Yoad
- [x] offer a function that wires up the cross attention
- [x] handle multiple concepts in one prompt at inference - summation of the sigmoid term + outputs
- [x] accept multiple concept indices
- [x] offer a way to combine separately learned concepts from multiple `Rank1EditModule` into one for inference
- [x] offer function for merging `Rank1EditModule`s
- [x] add the zero-shot masking of concept proposed in paper
- [x] take care of the function that takes in the dataset and text encoder and precomputes the covariance matrix needed for the rank-1 update
- [x] instead of having the researcher worry about different learning rates, offer the fractional gradient trick from other paper (to learn the concept embedding)

## Citations

```bibtex
@article{Tewel2023KeyLockedRO,
title = {Key-Locked Rank One Editing for Text-to-Image Personalization},
author = {Yoad Tewel and Rinon Gal and Gal Chechik and Yuval Atzmon},
journal = {ACM SIGGRAPH 2023 Conference Proceedings},
year = {2023},
url = {https://api.semanticscholar.org/CorpusID:258436985}
}
```

```bibtex
@inproceedings{Meng2022LocatingAE,
title = {Locating and Editing Factual Associations in GPT},
author = {Kevin Meng and David Bau and Alex Andonian and Yonatan Belinkov},
booktitle = {Neural Information Processing Systems},
year = {2022},
url = {https://api.semanticscholar.org/CorpusID:255825985}
}
```