https://github.com/zer0int/clip-gradient-ascent-embeddings

Use CLIP to create matching texts + embeddings for given images; useful for XAI, adversarial training
https://github.com/zer0int/clip-gradient-ascent-embeddings

adversarial-attacks adversarial-examples clip contrastive-language-image-pretraining embeddings gradient-ascent text-embeddings text-image typographic-attack

Last synced: about 1 year ago
JSON representation

Use CLIP to create matching texts + embeddings for given images; useful for XAI, adversarial training

Host: GitHub
URL: https://github.com/zer0int/clip-gradient-ascent-embeddings
Owner: zer0int
Created: 2024-12-06T09:45:57.000Z (over 1 year ago)
Default Branch: CLIP-vision
Last Pushed: 2024-12-09T02:24:03.000Z (over 1 year ago)
Last Synced: 2025-04-30T05:04:21.722Z (about 1 year ago)
Topics: adversarial-attacks, adversarial-examples, clip, contrastive-language-image-pretraining, embeddings, gradient-ascent, text-embeddings, text-image, typographic-attack
Language: Python
Homepage:
Size: 5.64 MB
Stars: 6
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          ### 🚀🆙 CLIP-gradient-ascent-embeddings

- ❗ Requires [OpenAI/CLIP](https://github.com/openai/CLIP)

- Generates matching text embeddings / a 'CLIP opinion' about images

- Uses gradient ascent to optimize text embeds for cosine similarity with image embeds

- Saves 'CLIP opinion' as .txt files [best tokens]

- Saves text-embeds.pt with [batch_size] number of embeds

- Can be used to create an adversarial text-image aligned dataset

- For XAI, adversarial training, etc; see 'attack' folder for example images

- Usage: Single image: `python gradient-ascent.py --use_image attack/024_attack.png`

- Usage: Batch process: `python gradient-ascent.py --img_folder attack`

- 🆕 Load custom model: `python gradient-ascent-unproj_flux1.py --model_name "path/to/myCLIP.safetensors"`

-----

## Changes 07/DEC/2024

- Args `--model_name` now accepts name (default `ViT-L/14`), *OR* a `"/path/to/model.pt"`

- If it ends on `.safetensors`, will assume 'ViT-L/14' (CLIP-L) and load state_dict. ✅

- ⚠️ Must be nevertheless in original "OpenAI/CLIP" format. HuggingFace converted models will NOT work.

- My [HF: zer0int/CLIP-GmP-ViT-L-14](https://huggingface.co/zer0int/CLIP-GmP-ViT-L-14/tree/main) `model.safetensors` will NOT work (it's for diffusers / HF).

- Instead, download the full model .safetensors [text encoder AND vision encoder]; direkt link:

- My [GmP-BEST-smooth](https://huggingface.co/zer0int/CLIP-GmP-ViT-L-14/resolve/main/ViT-L-14-BEST-smooth-GmP-HF-format.safetensors?download=true) and [GmP-Text-detail](https://huggingface.co/zer0int/CLIP-GmP-ViT-L-14/resolve/main/ViT-L-14-TEXT-detail-improved-hiT-GmP-HF.safetensors?download=true) and 🆕 [SAE-GmP](https://huggingface.co/zer0int/CLIP-SAE-ViT-L-14/resolve/main/ViT-L-14-GmP-SAE-FULL-model.safetensors?download=true) will work with this code.

----

- 🆕 Added `gradient-ascent-unproj_flux1.py`. Usage is the same; however, in addition to projected embeddings:

- Saves `pinv` and `inv` version of pre-projection embeddings.

- 👉 `Flux.1-dev` uses these embeddings (`pinv` seems best for Flux.1-dev).

- Recommended samplers: _HEUN_, Euler.

Example "worst portrait ever" generated by Flux.1-dev with pure CLIP guidance (no T5!) as CLIP apparently tried to encode the facial expression of the cat 😂; plus, the usual CLIP text gibberish of something 'cat' and 'shoe' mashed-up:

![worst](https://github.com/user-attachments/assets/8523f4bc-32f5-42f2-9854-faa1db0f30f8)

-----

![gradient-ascent](https://github.com/user-attachments/assets/386645d8-5ed1-4799-9511-4ebe9746241c)

-----

Command-line arguments:

```

--batch_size, default=13, type=int, help="Reduce batch_size if you have OOM issues"

--model_name, default='ViT-L/14', help="CLIP model to use"

--tokens_to, default="texts", help="Save CLIP opinion texts path"

--embeds_to, default="embeds", help="Save CLIP embeddings path"

--use_best, default="True", help="If True, use best embeds (loss); if False, just saves last step (not recommended)"

--img_folder, default=None, help="Path to folder with images, for batch embeddings generation"

--use_image, default=None, help="Path to a single image"

```

Further processing example code snippets:

```

text_embeddings = torch.load("path/to/embeds.pt").to(device)

# loop over all batches of embeds and do a thing

num_embeddings = text_embeddings.size(0) # e.g. batch_size 13 -> idx 0 to 12

for selected_embedding_idx in range(num_embeddings):

    print(f"Processing embedding index: {selected_embedding_idx}")

    # do your thing here!

# select a random batch from embedding and do a thing

selected_embedding_idx = torch.randint(0, text_embeddings.size(0), (1,)).item()

selected_embedding = text_embeddings[selected_embedding_idx:selected_embedding_idx + 1]

# or just manually select one

selected_embedding_idx = 3

selected_embedding = text_embeddings[selected_embedding_idx:selected_embedding_idx + 1]

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/zer0int/clip-gradient-ascent-embeddings

Awesome Lists containing this project

README