https://github.com/laion-ai/deep-image-diffusion-prior

Inverts CLIP text embeds to image embeds and visualizes with deep-image-prior.
https://github.com/laion-ai/deep-image-diffusion-prior

Last synced: about 1 month ago
JSON representation

Inverts CLIP text embeds to image embeds and visualizes with deep-image-prior.

Host: GitHub
URL: https://github.com/laion-ai/deep-image-diffusion-prior
Owner: LAION-AI
License: mit
Created: 2022-07-03T02:29:13.000Z (almost 3 years ago)
Default Branch: master
Last Pushed: 2022-07-03T02:45:06.000Z (almost 3 years ago)
Last Synced: 2025-05-07T18:13:50.781Z (about 1 month ago)
Language: Jupyter Notebook
Size: 1.28 MB
Stars: 35
Watchers: 5
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Deep Image Diffusion Prior

by [@nousr](https://twitter.com/nousr_)

Invert CLIP text embeds to image embeds and visualize them with `Deep Image Prior`.

> An oil painting of mountains, in the style of monet

## Quick start (docker required)

* Install [docker](https://docs.docker.com/get-docker/)
* Install [cog](https://github.com/replicate/cog/)

The following command will download all weights and run a prediction with your inputs inside a proper docker container.

```sh
cog predict r8.im/laion-ai/deep-image-diffusion-prior \
-i prompt=... \
-i offset_type=... \
-i num_scales=... \
-i input_noise_strength=... \
-i lr=... \
-i offset_lr_fac=... \
-i lr_decay=... \
-i param_noise_strength=... \
-i display_freq=... \
-i iterations=... \
-i num_samples_per_batch=... \
-i num_cutouts=... \
-i guidance_scale=... \
-i seed=...
```

Or you can use the [jupyter notebook](/deep_image_diffusion_prior.ipynb)

## Special Thanks

* [LAION](https://discord.gg/uPMftTmrvS) for support, resources, and community

* [@RiversHaveWings](https://twitter.com/RiversHaveWings) for making me aware of this technique

* [Stability AI](https://stability.ai/) for compute which makes these models possible

* [lucidrains](https://github.com/lucidrains) for spearheading the open-source replication of DALLE 2

## Intended use

See the world "through CLIP's eyes" by taking advantage of the `diffusion prior` as replicated by Laion to invert CLIP "ViT-L/14" text embeds to image embeds (as in unCLIP/DALLE2). After, a process known as `deep-image-prior` developed by Katherine Crowson is run to visualize the features in CLIP's weights corresponding to activations from your prompt.

## Ethical considerations

Just to avoid any confusion, this research is a recreation of (one part of) OpenAI's DALLE2 paper. It is _not_, "DALLE2", the product/service from OpenAI you may have seen on the web.

## Caveats and recommendations

These visualizations can be quite abstract compared to other text-2-image models. However, you can often find a sort of dream like quality due to this. Many outputs are artistically _fantastic_ because of this, but whether or not the visual matches your prompt as often is another matter.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/laion-ai/deep-image-diffusion-prior

Awesome Lists containing this project

README