https://github.com/laion-ai/deep-image-diffusion-prior
Inverts CLIP text embeds to image embeds and visualizes with deep-image-prior.
https://github.com/laion-ai/deep-image-diffusion-prior
Last synced: about 1 month ago
JSON representation
Inverts CLIP text embeds to image embeds and visualizes with deep-image-prior.
- Host: GitHub
- URL: https://github.com/laion-ai/deep-image-diffusion-prior
- Owner: LAION-AI
- License: mit
- Created: 2022-07-03T02:29:13.000Z (almost 3 years ago)
- Default Branch: master
- Last Pushed: 2022-07-03T02:45:06.000Z (almost 3 years ago)
- Last Synced: 2025-05-07T18:13:50.781Z (about 1 month ago)
- Language: Jupyter Notebook
- Size: 1.28 MB
- Stars: 35
- Watchers: 5
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Deep Image Diffusion Prior
by [@nousr](https://twitter.com/nousr_)
Invert CLIP text embeds to image embeds and visualize them with `Deep Image Prior`.
> An oil painting of mountains, in the style of monet
## Quick start (docker required)
* Install [docker](https://docs.docker.com/get-docker/)
* Install [cog](https://github.com/replicate/cog/)The following command will download all weights and run a prediction with your inputs inside a proper docker container.
```sh
cog predict r8.im/laion-ai/deep-image-diffusion-prior \
-i prompt=... \
-i offset_type=... \
-i num_scales=... \
-i input_noise_strength=... \
-i lr=... \
-i offset_lr_fac=... \
-i lr_decay=... \
-i param_noise_strength=... \
-i display_freq=... \
-i iterations=... \
-i num_samples_per_batch=... \
-i num_cutouts=... \
-i guidance_scale=... \
-i seed=...
```Or you can use the [jupyter notebook](/deep_image_diffusion_prior.ipynb)
## Special Thanks
* [LAION](https://discord.gg/uPMftTmrvS) for support, resources, and community
* [@RiversHaveWings](https://twitter.com/RiversHaveWings) for making me aware of this technique
* [Stability AI](https://stability.ai/) for compute which makes these models possible
* [lucidrains](https://github.com/lucidrains) for spearheading the open-source replication of DALLE 2
## Intended use
See the world "through CLIP's eyes" by taking advantage of the `diffusion prior` as replicated by Laion to invert CLIP "ViT-L/14" text embeds to image embeds (as in unCLIP/DALLE2). After, a process known as `deep-image-prior` developed by Katherine Crowson is run to visualize the features in CLIP's weights corresponding to activations from your prompt.
## Ethical considerations
Just to avoid any confusion, this research is a recreation of (one part of) OpenAI's DALLE2 paper. It is _not_, "DALLE2", the product/service from OpenAI you may have seen on the web.
## Caveats and recommendations
These visualizations can be quite abstract compared to other text-2-image models. However, you can often find a sort of dream like quality due to this. Many outputs are artistically _fantastic_ because of this, but whether or not the visual matches your prompt as often is another matter.