https://github.com/finnff/vlm-bias-decomposition

VLM bias mitigation via vector decomposition in CLIP
https://github.com/finnff/vlm-bias-decomposition
Last synced: 12 months ago
JSON representation
VLM bias mitigation via vector decomposition in CLIP
Host: GitHub
URL: https://github.com/finnff/vlm-bias-decomposition
Owner: finnff
License: mit
Created: 2025-05-28T21:34:09.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-06-23T22:50:30.000Z (about 1 year ago)
Last Synced: 2025-06-23T23:31:34.348Z (about 1 year ago)
Language: Python
Size: 3.89 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          # vlm-bias-decomposition

VLM bias mitigation via vector decomposition in CLIP

## Requirements

```bash

conda init #(if you haven't done this before)

conda create -n clipenv python=3.8 --yes

conda activate clipenv

conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0

pip install ftfy regex tqdm pandas matplotlib requests gdown scipy scikit-learn seaborn adjustText

pip install git+https://github.com/openai/CLIP.git

```

## Usage

1. First run `python celeba_downloader_clip_test.py` to ensure you have the CelebA dataset downloaded and CLIP model loaded.

2. Then run `python distinct_prompt_evaluation.py` to perform the CLIP Attribute Detection and Correlation Analysis.

### Nvidia CUDA

Repository is setup for usage with a CUDA which requires a Nvidia GPU, but will work on CPU as well (But a lot slower!). 

Eg. running both scripts with 10000 and 100000 samples respectively on a 8 year old Nvidia GTX 1080 with 8GB VRAM took less than 10 minutes in total.

The same scripts on my laptop(i7 8650u) would take over 4 hours with CPU processing, which is ~25x slower.

So please decrease the `NUM_SAMPLES` at the top of each script to get a more reasonable run time if you don't have a Nvidia GPU.

### celeba_downloader_clip_test.py

![img](./celeba_demo.png)

running `./celeba_downloader_clip_test.py` gives us: 

```

Using device: cuda

Loading CLIP model...

CelebA Dataset + CLIP Analysis

==================================================

Loading CelebA dataset...

Files already downloaded and verified

✓ Successfully loaded CelebA dataset with 162770 images

==================================================

CELEBA DATASET EXPLORATION

==================================================

Total number of images: 162770

Number of attributes: 40

Analyzing attributes distribution...

Sampling attributes: 100%|████████████████████████████████████████████████████████████████████████████| 162770/162770 [01:40<00:00, 1626.66it/s]

 Attributes and how frequent they are:

          Attribute    Count  Percentage

           No_Beard 135779.0   83.41

              Young 126788.0   77.89

         Attractive  83603.0   51.36

Mouth_Slightly_Open  78486.0   48.21

            Smiling  78080.0   47.96

   Wearing_Lipstick  76437.0   46.96

    High_Cheekbones  73645.0   45.24

               Male  68261.0   41.93

       Heavy_Makeup  62555.0   38.43

          Wavy_Hair  51982.0   31.93

          Oval_Face  46101.0   28.32

        Pointy_Nose  44846.0   27.55

    Arched_Eyebrows  43278.0   26.58

           Big_Lips  39213.0   24.09

         Black_Hair  38906.0   23.90

           Big_Nose  38341.0   23.55

      Straight_Hair  33947.0   20.85

    Bags_Under_Eyes  33280.0   20.44

         Brown_Hair  33192.0   20.39

   Wearing_Earrings  30362.0   18.65

              Bangs  24685.0   15.16

         Blond_Hair  24267.0   14.90

     Bushy_Eyebrows  23386.0   14.36

   Wearing_Necklace  19764.0   12.14

        Narrow_Eyes  18869.0   11.59

   5_o_Clock_Shadow  18177.0   11.16

  Receding_Hairline  13040.0    8.01

    Wearing_Necktie  11890.0    7.30

        Rosy_Cheeks  10525.0    6.46

         Eyeglasses  10521.0    6.46

             Goatee  10337.0    6.35

             Chubby   9389.0    5.76

          Sideburns   9156.0    5.62

             Blurry   8362.0    5.13

        Wearing_Hat   8039.0    4.93

        Double_Chin   7571.0    4.65

          Pale_Skin   7005.0    4.30

          Gray_Hair   6896.0    4.23

           Mustache   6642.0    4.08

               Bald   3713.0    2.28

Visualizing 8 sample images...

Running CLIP analysis on 162770 images...

Processing batches: 100%|█████████████████████████████████████████████████████████████████████████████████████| 636/636 [06:26<00:00,  1.64it/s]

Analyzing CLIP results...

Average CLIP confidences by text prompt:

a photo of a woman with makeup: 0.213

a photo of a woman smiling: 0.209

a photo of a man with a receding hairline: 0.198

a photo of a woman with blonde hair: 0.173

a photo of an attractive man: 0.130

a photo of a man with glasses: 0.077

Correlation analysis:

Male images - average 'man' prompts confidence: 0.167

Female images - average 'woman' prompts confidence: 0.330

Images with glasses - 'glasses' prompts confidence: 0.593

Images without glasses - 'glasses' prompts confidence: 0.042

Images with receding hairline - 'receding hairline' confidence: 0.431

Images without receding hairline - 'receding hairline' confidence: 0.177

Attractive images - 'attractive' prompts confidence: 0.106

Non-attractive images - 'attractive' prompts confidence: 0.154

Smiling images - 'smiling' prompts confidence: 0.347

Non-smiling images - 'smiling' prompts confidence: 0.081

Heavy makeup images - 'makeup' prompts confidence: 0.380

No heavy makeup images - 'makeup' prompts confidence: 0.109

Blonde hair images - 'blonde hair' confidence: 0.714

Non-blonde hair images - 'blonde hair' confidence: 0.079

Analysis complete!

Check 'celeba_samples.png' for sample images visualization

python3 celeba_downloader_clip_test.py  1568.52s user 95.02s system 332% cpu 8:21.06 total

```

### distinct_prompt_evaluation.py

```

Using device: cuda

Loading CLIP model...

Loading CelebA dataset...

Files already downloaded and verified

✓ Successfully loaded CelebA dataset with 162770 images

Dataset loaded with 162770 images

Number of attributes: 40

Attributes: 5_o_Clock_Shadow, Arched_Eyebrows, Attractive, Bags_Under_Eyes, Bald, Bangs, Big_Lips, Big_Nose, Black_Hair, Blond_Hair...

============================================================================

IMPROVED CLIP ANALYSIS WITH BINARY COMPARISONS

============================================================================

Running improved CLIP analysis on 100000 images (batch size: 256, num_workers: 16, prefetch_factor: 2)...

Processing batches: 100%|█████████████████████████████████████████████████████████████████████████████████████| 391/391 [05:20<00:00,  1.22it/s]

Improved CLIP Analysis Results:

Gender Classification Accuracy:

Male accuracy: 41173/41991 = 98.1%

Female accuracy: 57749/58009 = 99.6%

Attribute Detection Performance:

Metrics:

- Avg Score (Has): How confident CLIP is when the attribute is present (higher is better)

- Avg Score (Has not): How confident CLIP is when attribute is absent (lower is better)

- Discrimination: Difference between the two (positive = CLIP can detect this attribute)

Attribute 
---------------------- 
male                 | 
blond_hair           | 
bald                 | 
eyeglasses           | 
smiling              | 
pale_skin            | 
bangs                | 
young                | 
wearing_hat          | 
black_hair           | 
gray_hair            | 
wavy_hair            | 
heavy_makeup         | 
bushy_eyebrows       | 
straight_hair        | 
chubby               | 
blurry               | 
brown_hair           | 
mustache             | 
wearing_lipstick     | 
wearing_necktie      | 
wearing_earrings     | 
arched_eyebrows      | 
double_chin          | 
attractive           | 
narrow_eyes          | 
bags_under_eyes      | 
mouth_slightly_open  | 
pointy_nose          | 
big_lips             | 
oval_face            | 
big_nose             | 
wearing_necklace     | 
high_cheekbones      | 
receding_hairline    | 
sideburns            | 
goatee               | 
rosy_cheeks          | 
no_beard             | 
5_o_clock_shadow     |

| Avg Score (Has) | Avg Score (Has not) | Discrimination ------------------------------------------------------ 0.945 |               0.033 |         +0.912 0.917 |               0.264 |         +0.653 0.658 |               0.063 |         +0.595 0.785 |               0.214 |         +0.571 0.604 |               0.159 |         +0.445 0.684 |               0.318 |         +0.366 0.617 |               0.268 |         +0.349 0.759 |               0.414 |         +0.345 0.778 |               0.474 |         +0.304 0.677 |               0.419 |         +0.257 0.489 |               0.278 |         +0.211 0.647 |               0.459 |         +0.187 0.585 |               0.400 |         +0.185 0.555 |               0.379 |         +0.176 0.915 |               0.763 |         +0.152 0.574 |               0.438 |         +0.136 0.367 |               0.233 |         +0.134 0.774 |               0.659 |         +0.115 0.353 |               0.253 |         +0.100 0.718 |               0.636 |         +0.082 0.688 |               0.607 |         +0.082 0.712 |               0.635 |         +0.077 0.742 |               0.671 |         +0.071 0.669 |               0.605 |         +0.064 0.345 |               0.286 |         +0.060 0.641 |               0.588 |         +0.053 0.573 |               0.554 |         +0.019 0.580 |               0.562 |         +0.018 0.291 |               0.283 |         +0.009 0.234 |               0.228 |         +0.006 0.561 |               0.557 |         +0.004 0.307 |               0.306 |         +0.002 0.645 |               0.653 |         -0.008 0.368 |               0.384 |         -0.015 0.553 |               0.573 |         -0.020 0.510 |               0.547 |         -0.037 0.453 |               0.521 |         -0.068 0.227 |               0.302 |         -0.075 0.742 |               0.838 |         -0.097 0.182 |               0.490 |         -0.309

Interesting Cases:

============================================================================

EMBEDDING SPACE CORRELATION ANALYSIS

============================================================================

Computing embedding similarities for 100000 images...

CLIP-CelebA Attribute Correlations:

Metrics:

- Correlation: How well CLIP's similarity scores align with CelebA labels (-1 to +1, higher (absolute) magnitude = better)

- Mean Sim (WITH): Average cosine similarity (dot product of normalized embeddings) between images WITH the attribute and its text description

- Mean Sim (WITHOUT): Average cosine similarity between images WITHOUT the attribute and its text description

- Higher correlation means CLIP understands this attribute; larger gap between WITH/WITHOUT is better

Attribute            |  Correlation | Mean Sim (WITH) |  Mean Sim (WITHOUT)

----------------------------------------------------------------------------

Smiling              |        0.640 |           0.241 |               0.220

Blond_Hair           |        0.554 |           0.252 |               0.222

Bangs                |        0.519 |           0.238 |               0.211

No_Beard             |       -0.514 |           0.210 |               0.230

Heavy_Makeup         |        0.501 |           0.227 |               0.211

Wearing_Hat          |        0.441 |           0.255 |               0.225

Black_Hair           |        0.406 |           0.233 |               0.216

Wavy_Hair            |        0.397 |           0.235 |               0.220

Goatee               |        0.393 |           0.245 |               0.218

Eyeglasses           |        0.392 |           0.232 |               0.206

Wearing_Earrings     |        0.389 |           0.246 |               0.232

Wearing_Lipstick     |        0.378 |           0.222 |               0.211

Wearing_Necktie      |        0.339 |           0.240 |               0.222

Brown_Hair           |        0.337 |           0.245 |               0.231

Arched_Eyebrows      |        0.295 |           0.250 |               0.240

Bald                 |        0.290 |           0.245 |               0.210

Mouth_Slightly_Open  |        0.265 |           0.235 |               0.228

Receding_Hairline    |        0.243 |           0.246 |               0.229

Mustache             |        0.232 |           0.230 |               0.213

Young                |        0.218 |           0.225 |               0.221

Blurry               |        0.214 |           0.229 |               0.217

Attractive           |        0.191 |           0.212 |               0.207

Bushy_Eyebrows       |        0.187 |           0.242 |               0.235

Pale_Skin            |        0.177 |           0.226 |               0.211

Gray_Hair            |        0.167 |           0.234 |               0.221

Straight_Hair        |        0.128 |           0.230 |               0.224

Oval_Face            |        0.111 |           0.242 |               0.240

Wearing_Necklace     |        0.105 |           0.233 |               0.229

5_o_Clock_Shadow     |        0.102 |           0.224 |               0.221

Sideburns            |        0.086 |           0.230 |               0.225

Big_Lips             |        0.078 |           0.222 |               0.220

Bags_Under_Eyes      |       -0.076 |           0.233 |               0.235

Pointy_Nose          |        0.069 |           0.230 |               0.228

Chubby               |        0.054 |           0.210 |               0.206

High_Cheekbones      |       -0.025 |           0.212 |               0.214

Big_Nose             |       -0.022 |           0.226 |               0.227

Double_Chin          |        0.021 |           0.214 |               0.212

Male                 |       -0.012 |           0.224 |               0.224

Narrow_Eyes          |        0.005 |           0.227 |               0.227

Rosy_Cheeks          |        0.005 |           0.220 |               0.220

Saving analysis results...

Analysis complete! Results saved to 'clip_celeba_improved_results.json'

python3 distinct_prompt_evaluation.py  1087.39s user 164.20s system 216% cpu 9:38.30 total

```
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/finnff/vlm-bias-decomposition

Awesome Lists containing this project

README