https://github.com/yc015/scene-representation-diffusion-model
Linear probe found representations of scene attributes in a text-to-image diffusion model
https://github.com/yc015/scene-representation-diffusion-model
explainability image-editing interpretability scene stable-diffusion
Last synced: 23 days ago
JSON representation
Linear probe found representations of scene attributes in a text-to-image diffusion model
- Host: GitHub
- URL: https://github.com/yc015/scene-representation-diffusion-model
- Owner: yc015
- License: mit
- Created: 2023-07-29T06:07:08.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-07-11T00:17:46.000Z (over 1 year ago)
- Last Synced: 2025-09-05T13:53:09.351Z (5 months ago)
- Topics: explainability, image-editing, interpretability, scene, stable-diffusion
- Language: Jupyter Notebook
- Homepage: https://yc015.github.io/scene-representation-diffusion-model/
- Size: 73.7 MB
- Stars: 36
- Watchers: 6
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: citation.bib
Awesome Lists containing this project
README
# Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model
Linear probes found controllable representations of scene attributes in a text-to-image diffusion model
Project page of "Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model"
Paper arXiv link: [https://arxiv.org/abs/2306.05720](https://arxiv.org/abs/2306.05720)
[[NeurIPS link]](https://nips.cc/virtual/2023/74894) [[Poster link]](https://nips.cc/media/PosterPDFs/NeurIPS%202023/74894.png?t=1701540884.728899)
## How to generate a short video of moving foreground object using a pretrained text-to-image generative model?
See [application_of_intervention.ipynb](https://github.com/yc015/scene-representation-diffusion-model/blob/main/application_of_intervention.ipynb) for how to use our intervention technique to generate a short video of moving objects.
### Some examples:
The gifs are sampled using the original text-to-image diffusion model without fine-tuning. All frames are generated using the **same prompt, random seed (inital latent vectors), and model**. We edited the intermediate activations of the latent diffusion model when it generated the images so its internal representtaion of foreground match with our reference mask. See [notebook](https://github.com/yc015/scene-representation-diffusion-model/blob/main/application_of_intervention.ipynb) for implementation details.

## Probe Weights:
Unzip the [probe_checkpoints.zip](https://github.com/yc015/scene-representation-diffusion-model/blob/main/probe_checkpoints.zip) to acquire all probe weights trained by us. The probe weights in the unzipped folder should be sufficient for you to run all experiments shown in the paper.
## Citation
If you find the source code of this repo helpful, please cite
@article{chen2023beyond,
title={Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model},
author={Chen, Yida and Vi{\'e}gas, Fernanda and Wattenberg, Martin},
journal={arXiv preprint arXiv:2306.05720},
year={2023}
}