https://github.com/mehdidc/dalle_clip_score
Simple script to compute CLIP-based scores given a DALL-e trained model.
https://github.com/mehdidc/dalle_clip_score
clip dall-e generative-model image-embedding text-embedding text-encoding text-to-image
Last synced: 6 days ago
JSON representation
Simple script to compute CLIP-based scores given a DALL-e trained model.
- Host: GitHub
- URL: https://github.com/mehdidc/dalle_clip_score
- Owner: mehdidc
- Created: 2021-06-10T07:15:37.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2021-06-13T03:57:37.000Z (almost 4 years ago)
- Last Synced: 2024-05-14T00:16:34.726Z (about 1 year ago)
- Topics: clip, dall-e, generative-model, image-embedding, text-embedding, text-encoding, text-to-image
- Language: Python
- Homepage:
- Size: 25.4 KB
- Stars: 29
- Watchers: 3
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# DALLE_clip_score
Simple script to compute CLIP scores based on a trained DALL-e model, using OpenAI's CLIP .
CLIP scores measures the compatibility between an image and a caption. The raw value is using cosine similarity, so it is
between -1 and 1. In CLIP, the value is scaled by 100 by default, giving a number between -100 and 100, where 100 means
maximum compatibility between an image and text. As mentioned in , it is rare that the score
is negative, but we clamp it to have a number between 0 and 100 anyways. Typical values are around 20-30.## How to install ?
1. Install CLIP from
2. Install DALL-E lucidrains implementation
3. `python setup.py install`## How to use ?
Here is an example:
`clip_score --dalle_path dalle.pt --image_text_folder CUB_200_2011 --taming --num_generate 1 --dump`
here:
- `dalle_path` is the path of the model trained with DALL-E using
- `image_text_folder` is the folder of the dataset following format
- `taming`: specify that we use taming transformers as an image encoder
- `num_generate`: number of images to generate per caption
- `dump`: save all the generated images in the folder `outputs` (by default) and their respective metricsExample output:
```
CLIP_score_real 30.1826171875
CLIP_score 26.7392578125
CLIP_score_top1 26.7392578125
CLIP_score_relative 0.8892822265625
CLIP_score_relative_top1 0.8892822265625
CLIP_atleast 0.7466491460800171
```Note that all the metrics will also be saved on `clip_score.json` by default.
- `CLIP_score_real`: average CLIP score for real images
- `CLIP_score`: average CLIP score for all generated images.
- `CLIP_score_top1`: for each caption, retain the generated image with best CLIP score, then compute the average CLIP score like in `CLIP_score`.
- `CLIP_score_relative`: similar to , we compute CLIP score of the generated image divided by the CLIP score of the real image, then average. In general, between 0 and 1, although it can be bigger than 1. Bigger than 1 means the CLIP score of the generated image is higher.
- `CLIP_score_relative_top1`: same as `CLIP_score_relative` but using the top CLIP score like in `CLIP_score_top1`.
- `CLIP_atleast`: for each caption, it is 1 if CLIP score can reach at least `--clip_thresh` (by default **25**), 0 if not, then we average over all captions. This score gives a number between 0 and 1.For all scores, the higher, the better.