Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/wjpoom/SPEC
[CVPR' 24] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"
https://github.com/wjpoom/SPEC
clip compositionality computer-vision fine-grained multimodal vision-language vision-language-model
Last synced: 22 days ago
JSON representation
[CVPR' 24] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"
- Host: GitHub
- URL: https://github.com/wjpoom/SPEC
- Owner: wjpoom
- Created: 2023-11-27T07:55:15.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-04-13T17:07:29.000Z (8 months ago)
- Last Synced: 2024-04-14T09:41:28.893Z (8 months ago)
- Topics: clip, compositionality, computer-vision, fine-grained, multimodal, vision-language, vision-language-model
- Language: Jupyter Notebook
- Homepage: https://arxiv.org/abs/2312.00081
- Size: 11.4 MB
- Stars: 15
- Watchers: 2
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- Awesome-Segment-Anything - [code
README
# Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding
:bookmark_tabs:[`Paper`](https://arxiv.org/abs/2312.00081)
:file_folder:[`Data`](https://huggingface.co/datasets/wjpoom/SPEC)
:orange_book:[`Notebook`](https://github.com/wjpoom/SPEC/tree/main/notebooks)
:black_nib:[`BibTex`](#black_nib-citation)
:rocket:[`Preview`](https://wjpoom.github.io/preview/)
:scroll:[`Poster`](https://github.com/wjpoom/SPEC/blob/main/assets/poster-v2.pdf)**Authors**: Wujian Peng, Sicheng Xie, Zuyao You, [Shiyi Lan](https://voidrank.github.io/), [Zuxuan Wu](https://zxwu.azurewebsites.net/)
## :fire: News
* `Apr. 14, 2024` We have released a [preview](https://wjpoom.github.io/preview/) of a more advanced dataset version, the full version will come soon.
* `Apr. 13, 2024` We released the SPEC dataset and the code for evaluation, sorry for the delay :relaxed:.
* `Feb. 28, 2024` Our work has been accepted by [CVPR 2024](https://cvpr.thecvf.com/) :tada:.## :rocket: A more advanced version is coming!
We are building a new version with a larger data scale, more object categories, and higher-quality images and text, and more.
You can preview it at [this website](https://wjpoom.github.io/preview/), and the full version will come soon.## :mag: SPEC Benchmark
To evaluate the understanding capability of visual-language models on fine-grained concepts, we propose a new benchmark, SPEC,
which consists of six distinct subsets, distributed across the dimensions of **S**ize, **P**osition, **E**xistence, and **C**ount.
Each test case consists of an image candidate set, which differs only in certain visual concepts, and a text candidate set,
which differs only in the corresponding language concept.
## :wrench: Usage
### install
``` shell
git clone https://github.com/wjpoom/SPEC.git
cd SPEC/
pip install -e .
```
### prepare data
* run the following code in Python shell, replace `/path/to/save/data` with a specified dir to store the data.
```python
import zipfile
import os
from huggingface_hub import hf_hub_downloaddata_root = '/path/to/save/data'
hf_hub_download(repo_id='wjpoom/SPEC', repo_type='dataset', filename='data.zip', local_dir=data_root)with zipfile.ZipFile(os.path.join(data_root, 'data.zip'), 'r') as zip_ref:
zip_ref.extractall(os.path.join(data_root))
os.remove(os.path.join(data_root, 'data.zip'))
```
### explore the dataset
* We provide a πnotebook that enables you to visually explore the test samples in the SPEC dataset.
* Run this notebook either [locally](https://github.com/wjpoom/SPEC/blob/main/notebooks/explore_spec_local.ipynb) or online using [Colab](https://colab.research.google.com/github/wjpoom/SPEC/blob/main/notebooks/explore_spec_colab.ipynb).### reproduce the results
* In our paper, we evaluated four popular VLMs using our SPEC dataset, namely: CLIP, BLIP, FLAVA and CoCa.
* To reproduce the results with these VLMs, you can run [this script](https://github.com/wjpoom/SPEC/blob/main/spec/run_eval.sh).
* You can also reproduce with this [local notebook](https://github.com/wjpoom/SPEC/blob/main/notebooks/evaluate_example_local.ipynb) or the online [Colab notebook](https://colab.research.google.com/github/wjpoom/SPEC/blob/main/notebooks/evaluate_example_colab.ipynb).### evaluate custom VLMs
* If you want to evaluate your custom model on SPEC, you can follow the instructions in [this document](https://github.com/wjpoom/SPEC/blob/main/docs/evaluate_custom_model.md).## :memo: TODO
- [ ] Release the newly built version of the dataset
- [ ] Release the code of our data synthesize pipeline
- [x] Release the testing set of SPEC benchmark
- [x] Release the evaluation code of SPEC## :clap: Acknowledgement
Part of this repository is built upon [ARO](https://github.com/mertyg/vision-language-models-are-bows), thanks for the well-organized codebase.## Contact Us
Feel free to contact us if you have any questions or suggestionsEmail (Wujian Peng): [email protected]
## :black_nib: Citation
If you use our code or data in this repo or find our work helpful, please consider giving a citation:```
@inproceedings{spec2024,
title={Synthesize Diagnose and Optimize: Towards Fine-Grained Vision-Language Understanding},
author={Peng, Wujian and Xie, Sicheng and You, Zuyao and Lan, Shiyi and Wu, Zuxuan},
booktitle={CVPR},
year={2024}
}
```