Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lucidrains/perfusion-pytorch
Implementation of Key-Locked Rank One Editing, from Nvidia AI
https://github.com/lucidrains/perfusion-pytorch
artificial-intelligence deep-learning memory-editing text-to-image
Last synced: 3 days ago
JSON representation
Implementation of Key-Locked Rank One Editing, from Nvidia AI
- Host: GitHub
- URL: https://github.com/lucidrains/perfusion-pytorch
- Owner: lucidrains
- License: mit
- Created: 2023-08-06T15:20:56.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-09-07T17:16:18.000Z (over 1 year ago)
- Last Synced: 2024-05-02T01:14:22.659Z (8 months ago)
- Topics: artificial-intelligence, deep-learning, memory-editing, text-to-image
- Language: Python
- Homepage:
- Size: 3.14 MB
- Stars: 228
- Watchers: 53
- Forks: 7
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## Perfusion - Pytorch
Implementation of Key-Locked Rank One Editing. Project page
The selling point of this paper is extremely low extra parameters per added concept, down to 100kb.
It seems they successfully applied the Rank-1 editing technique from a memory editing paper for LLM, with a few improvements. They also identified that the keys determine the "where" of the new concept, while the values determine the "what", and propose local / global-key locking to a superclass concept (while learning the values).
For researchers out there, if this paper checks out, the tools in this repository should work for any other text-to-`` network using cross attention conditioning. Just a thought
## Appreciation
- StabilityAI for the generous sponsorship, as well as my other sponsors out there
- Yoad Tewel for the multiple code reviews and clarifying emails
- Brad Vidler for precomputing the covariance matrix for the CLIP used in Stable Diffusion 1.5!
- All the maintainers at OpenClip, for their SOTA open sourced contrastive learning text-image models
## Install
```bash
$ pip install perfusion-pytorch
```## Usage
```python
import torch
from torch import nnfrom perfusion_pytorch import Rank1EditModule
to_keys = nn.Linear(768, 320, bias = False)
to_values = nn.Linear(768, 320, bias = False)wrapped_to_keys = Rank1EditModule(
to_keys,
is_key_proj = True
)wrapped_to_values = Rank1EditModule(
to_values
)text_enc = torch.randn(4, 77, 768) # regular input
text_enc_with_superclass = torch.randn(4, 77, 768) # init_input in algorithm 1, for key-locking
concept_indices = torch.randint(0, 77, (4,)) # index where the concept or superclass concept token is in the sequence
key_pad_mask = torch.ones(4, 77).bool()keys = wrapped_to_keys(
text_enc,
concept_indices = concept_indices,
text_enc_with_superclass = text_enc_with_superclass,
)values = wrapped_to_values(
text_enc,
concept_indices = concept_indices,
text_enc_with_superclass = text_enc_with_superclass,
)# after much training ...
wrapped_to_keys.eval()
wrapped_to_values.eval()keys = wrapped_to_keys(text_enc)
values = wrapped_to_values(text_enc)
```
The repository also contains an `EmbeddingWrapper` that makes it easy to train on a new concept (and for eventual inference with multiple concepts)
```python
import torch
from torch import nnfrom perfusion_pytorch import EmbeddingWrapper
embed = nn.Embedding(49408, 512) # open clip embedding, somewhere in the module tree of stable diffusion
# wrap it, and will automatically create a new concept for learning, based on the superclass embed string
wrapped_embed = EmbeddingWrapper(
embed,
superclass_string = 'dog'
)# now just pass in your prompts with the superclass id
embeds_with_new_concept, embeds_with_superclass, embed_mask, concept_indices = wrapped_embed([
'a portrait of dog',
'dog running through a green field',
'a man walking his dog'
]) # (3, 77, 512), (3, 77, 512), (3, 77), (3,)# now pass both embeds through clip text transformer
# the embed_mask needs to be passed to the cross attention as key padding mask
```If you can identify the `CLIP` instance within the stable diffusion instance, you can also pass it directly to the `OpenClipEmbedWrapper` to gain everything you need on forward for the cross attention layers
ex.
```python
from perfusion_pytorch import OpenClipEmbedWrappertexts = [
'a portrait of dog',
'dog running through a green field',
'a man walking his dog'
]wrapped_clip_with_new_concept = OpenClipEmbedWrapper(
stable_diffusion.path.to.clip,
superclass_string = 'dog'
)text_enc, superclass_enc, mask, indices = wrapped_clip_with_new_concept(texts)
# (3, 77, 512), (3, 77, 512), (3, 77), (3,)
```## Todo
- [ ] wire up with SD 1.5, starting with xiao's dreambooth-sd
- [ ] show example in readme for inference with multiple concepts
- [ ] automatically infer where keys and values projection are if not specified for the `make_key_value_proj_rank1_edit_modules_` function- [x] embedding wrapper should take care of substituting with super class token id and return embedding with super class
- [x] review multiple concepts - thanks to Yoad
- [x] offer a function that wires up the cross attention
- [x] handle multiple concepts in one prompt at inference - summation of the sigmoid term + outputs
- [x] accept multiple concept indices
- [x] offer a way to combine separately learned concepts from multiple `Rank1EditModule` into one for inference
- [x] offer function for merging `Rank1EditModule`s
- [x] add the zero-shot masking of concept proposed in paper
- [x] take care of the function that takes in the dataset and text encoder and precomputes the covariance matrix needed for the rank-1 update
- [x] instead of having the researcher worry about different learning rates, offer the fractional gradient trick from other paper (to learn the concept embedding)## Citations
```bibtex
@article{Tewel2023KeyLockedRO,
title = {Key-Locked Rank One Editing for Text-to-Image Personalization},
author = {Yoad Tewel and Rinon Gal and Gal Chechik and Yuval Atzmon},
journal = {ACM SIGGRAPH 2023 Conference Proceedings},
year = {2023},
url = {https://api.semanticscholar.org/CorpusID:258436985}
}
``````bibtex
@inproceedings{Meng2022LocatingAE,
title = {Locating and Editing Factual Associations in GPT},
author = {Kevin Meng and David Bau and Alex Andonian and Yonatan Belinkov},
booktitle = {Neural Information Processing Systems},
year = {2022},
url = {https://api.semanticscholar.org/CorpusID:255825985}
}
```