https://github.com/cvi-szu/qa-clims
[ACM MM 2023] QA-CLIMS: Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation
https://github.com/cvi-szu/qa-clims
semantic-segmentation weakly-supervised-learning weakly-supervised-segmentation
Last synced: about 1 year ago
JSON representation
[ACM MM 2023] QA-CLIMS: Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation
- Host: GitHub
- URL: https://github.com/cvi-szu/qa-clims
- Owner: CVI-SZU
- License: mit
- Created: 2024-01-18T10:44:07.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2024-06-14T12:48:24.000Z (about 2 years ago)
- Last Synced: 2024-06-14T14:09:54.314Z (about 2 years ago)
- Topics: semantic-segmentation, weakly-supervised-learning, weakly-supervised-segmentation
- Language: Python
- Homepage:
- Size: 9.21 MB
- Stars: 10
- Watchers: 4
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# [MM'23] QA-CLIMS
This is the official PyTorch implementation of our paper:
> **QA-CLIMS: Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation**
> [Songhe Deng](https://github.com/Tiiiktak), [Wei Zhuo](), [Jinheng Xie](https://github.com/Sierkinhane), [Linlin Shen](https://scholar.google.com/citations?user=AZ_y9HgAAAAJ)
> Computer Vision Institute, Shenzhen University
> ACM International Conference on Multimedia, 2023
> [[Paper]](https://dl.acm.org/doi/10.1145/3581783.3612148) [[arXiv]](https://arxiv.org/abs/2401.09883)

## Environment
- Python 3.7
- PyTorch 1.7.1
- torchvision 0.8.2
```shell
pip install -r requirements.txt
```
## PASCAL VOC2012
You can find the following files at [here](https://drive.google.com/drive/folders/1U79Lmp-ufajPCUG7jAVyk924f9YmQSsA?usp=drive_link).
| File | filename |
|:---------------------------|:-------------------------------------------------------------------------------|
| FG & BG VQA results | `voc_vqa_fg_blip.npy`
`voc_vqa_bg_blip.npy` |
| FG & BG VQA text features | `voc_vqa_fg_blip_ViT-L-14_cache.npy`
`voc_vqa_bg_blip_ViT-L-14_cache.npy` |
| pre-trained baseline model | `res50_cam.pth` |
| QA-CLIMS model | `res50_qa_clims.pth` |
### 1. Prepare VQA result features
You can download the VQA text features `voc_vqa_fg_blip_ViT-L-14_cache.npy` and `voc_vqa_bg_blip_ViT-L-14_cache.npy` above
and put its in `vqa/`.
Or, you can generate it by yourself:
To generate VQA results, please follow [third_party/README](third_party/README.md#BLIP).
After that, run following command to generate VQA text features:
```shell
python gen_text_feats_cache.py voc \
--vqa_fg_file vqa/voc_vqa_fg_blip.npy \
--vqa_fg_cache_file vqa/voc_vqa_fg_blip_ViT-L-14_cache.npy \
--vqa_bg_file vqa/voc_vqa_bg_blip.npy \
--vqa_bg_cache_file vqa/voc_vqa_bg_blip_ViT-L-14_cache.npy \
--clip ViT-L/14
```
### 2. Train QA-CLIMS and generate initial CAMs
Please download the pre-trained baseline model `res50_cam.pth` above and put it at `cam-baseline-voc12/res50_cam.pth`.
```shell
bash run_voc12_qa_clims.sh
```
### 3. Train IRNet and generate pseudo semantic masks
```shell
bash run_voc12_sem_seg.sh
```
### 4.Train DeepLab using pseudo semantic masks.
Please follow [deeplab-pytorch](https://github.com/kazuto1011/deeplab-pytorch) or [CLIMS](https://github.com/CVI-SZU/CLIMS/tree/master/segmentation/deeplabv2).
## MS COCO2014
You can find the following files at [here](https://drive.google.com/drive/folders/1U79Lmp-ufajPCUG7jAVyk924f9YmQSsA?usp=drive_link).
| File | filename |
|:---------------------------|:---------------------------------------------------------------------------------|
| FG & BG VQA results | `coco_vqa_fg_blip.npy`
`coco_vqa_bg_blip.npy` |
| FG & BG VQA text features | `coco_vqa_fg_blip_ViT-L-14_cache.npy`
`coco_vqa_bg_blip_ViT-L-14_cache.npy` |
| pre-trained baseline model | `res50_cam.pth` |
| QA-CLIMS model | `res50_qa_clims.pth` |
Please place the downloaded `coco_vqa_fg_blip_ViT-L-14_cache.npy` and `coco_vqa_bg_blip_ViT-L-14_cache.npy`
in `vqa/`, and `res50_cam.pth` in `cam-baseline-coco14/`.
Then, running the following command:
```shell
bash run_coco14_qa_clims.sh
bash run_coco14_sem_seg.sh
```
## Citation
If you find this code useful for your research, please consider cite our paper:
```
@inproceedings{deng2023qa-clims,
title={QA-CLIMS: Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation},
author={Deng, Songhe and Zhuo, Wei and Xie, Jinheng and Shen, Linlin},
booktitle={Proceedings of the 31st ACM International Conference on Multimedia},
pages={5572--5583},
year={2023}
}
```
---
This repository was highly based on [CLIMS](https://github.com/CVI-SZU/CLIMS) and [IRNet](https://github.com/jiwoon-ahn/irn), thanks for their great works!