{"id":13563495,"url":"https://github.com/LeapLabTHU/Pseudo-Q","last_synced_at":"2025-04-03T20:31:11.820Z","repository":{"id":59357299,"uuid":"469620485","full_name":"LeapLabTHU/Pseudo-Q","owner":"LeapLabTHU","description":"[CVPR 2022] Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding","archived":false,"fork":false,"pushed_at":"2024-07-13T14:06:56.000Z","size":24056,"stargazers_count":141,"open_issues_count":0,"forks_count":10,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-08-01T13:29:21.154Z","etag":null,"topics":["computer-vision","cvpr2022","deep-learning","multimodal-deep-learning","pytorch","vision-and-language","visual-grounding"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2203.08481","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LeapLabTHU.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-03-14T07:15:58.000Z","updated_at":"2024-07-18T11:27:55.000Z","dependencies_parsed_at":"2024-01-14T03:49:11.528Z","dependency_job_id":"54dca018-8fda-467a-ac9d-54ee1c33f228","html_url":"https://github.com/LeapLabTHU/Pseudo-Q","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LeapLabTHU%2FPseudo-Q","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LeapLabTHU%2FPseudo-Q/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LeapLabTHU%2FPseudo-Q/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LeapLabTHU%2FPseudo-Q/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LeapLabTHU","download_url":"https://codeload.github.com/LeapLabTHU/Pseudo-Q/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223030592,"owners_count":17076461,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","cvpr2022","deep-learning","multimodal-deep-learning","pytorch","vision-and-language","visual-grounding"],"created_at":"2024-08-01T13:01:19.923Z","updated_at":"2024-11-04T16:30:51.418Z","avatar_url":"https://github.com/LeapLabTHU.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Pseudo-Q\n\u003cp align=\"center\"\u003e \u003cimg src='docs/framework.png' align=\"center\" height=\"250px\"\u003e \u003c/p\u003e\n\nThis repository is the official Pytorch implementation for CVPR2022 paper **Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding**. (Primary Contact: [Haojun Jiang](https://github.com/jianghaojun))\n\n\u003ch3 align=\"center\"\u003e\nLinks: \u003ca href=\"https://arxiv.org/abs/2203.08481\"\u003earXiv\u003c/a\u003e | \u003ca href=\"https://cloud.tsinghua.edu.cn/f/e5f6df930e5d4b21ae27/\"\u003ePoster\u003c/a\u003e | \u003ca href=\"https://cloud.tsinghua.edu.cn/f/d655d6e2a6b246b4bb4f/\"\u003eVideo\u003c/a\u003e\n\u003c/h3\u003e\n\n**Please leave a \u003cfont color='orange'\u003eSTAR ⭐\u003c/font\u003e if you like this project!**\n\n## News\n- Update on 2022/03/15: Release the training code.  \n- Update on 2022/06/02: Provide the poster and presentation video.\n- Update on 2022/06/04: Release the pseudo-query generation code.\n- **Update on 2022/08/25: Provide the detection results for all datasets.**\n\n## Reference\n\nIf you find our project useful in your research, please consider citing:\n\n```\n@inproceedings{jiang2022pseudoq,\n  title={Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding},\n  author={Jiang, Haojun and Lin, Yuanze and Han, Dongchen and Song, Shiji and Huang, Gao},\n  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},\n  year={2022}\n}\n```\n\n## Contents\n\n1. [Introduction](#introduction)\n2. [Usage](#usage)\n3. [Results](#results)\n4. [Contacts](#contacts)\n5. [Acknowledgments](#acknowledgments)\n\n## Introduction\nWe present a novel method, named **Pseudo-Q**, to automatically generate pseudo language queries for supervised training. Our method leverages an off-the-shelf object detector to identify visual objects from unlabeled images, and then language queries for these objects are obtained in an unsupervised fashion with a pseudo-query generation module. Extensive experimental results demonstrate that our method has two notable benefits: **(1)** it can reduce human annotation costs significantly, e.g., **31%** on RefCOCO without degrading original model's performance under the fully supervised setting, and **(2)** without bells and whistles, it achieves superior or comparable performance compared to state-of-the-art weakly-supervised visual grounding methods on all the five datasets we have experimented. For more details. please refer to our paper.\n\n## Usage\n\n### Dependencies\n- Python 3.9.10\n- PyTorch 1.9.0 + cu111 + cp39\n- [Pytorch-Bert 0.6.2](https://pypi.org/project/pytorch-pretrained-bert/)\n- Check [requirements.txt](requirements.txt) for other dependencies. \n\n\n### Data Preparation\n1.You can download the images from the original source and place them in `./data/image_data` folder:\n- RefCOCO and ReferItGame\n- [Flickr30K Entities](http://shannon.cs.illinois.edu/DenotationGraph/#:~:text=make%20face-,Downloads,-Please%20fill%20in)\n\nFinally, the `./data/image_data` folder will have the following structure:\n\n```angular2html\n|-- image_data\n   |-- data\n      |-- flickr\n      |-- gref\n      |-- gref_umd\n      |-- referit\n      |-- unc\n      |-- unc+\n   |-- Flickr30k\n      |-- flickr30k-images\n   |-- other\n      |-- images\n      |-- refcoco\n      |-- refcoco+\n      |-- refcocog\n   |-- referit\n      |-- images\n      |-- mask\n      |-- splits\n```\n- ```./data/image_data/data/xxx/```: Take the Flickr30K dataset as an example, ./data/image_data/data/flickr/ shoud contain files about the dataset's validation/test annotations(bbox-query pairs download from [Gdrive](https://drive.google.com/file/d/1fVwdDvXNbH8uuq_pHD_o5HI7yqeuz0yS/view?usp=sharing)) and our generated pseudo-annotations(pseudo-samples) for this dataset. You should uncompress the provided pseudo-sample files and put them on the corresponding folder.\n- ```./data/image_data/Flickr30k/flickr30k-images/```: Image data for the Flickr30K dataset, please download from this [link](http://shannon.cs.illinois.edu/DenotationGraph/#:~:text=make%20face-,Downloads,-Please%20fill%20in). Fill the form and download the images.\n- ```./data/image_data/other/images/```: Image data for RefCOCO/RefCOCO+/RefCOCOg. \n- ```./data/image_data/referit/images/```: Image data for ReferItGame.\n- Besides, I notice the links of refcoco/refcoco+/refcocog/referit data are not available recently. **You can leave an email in [Issues#2](https://github.com/LeapLabTHU/Pseudo-Q/issues/2) and I will send you a download link**.\n- ```./data/image_data/other/refcoco/, ./data/image_data/other/refcoco+/, ./data/image_data/other/refcocog/, ./data/image_data/referit/mask/, ./data/image_data/referit/splits/```: I follow the TransVG to prepare the data and I find these folders actually are not used in training.\n\n2.The generated pseudo region-query pairs can be download from [Tsinghua Cloud](https://cloud.tsinghua.edu.cn/f/0c7ba8c1c0db40cfbea8/?dl=1) or you can generate it follow [instructions](./pseudo_sample_generation/README.md).\n```\nmkdir data\nmv pseudo_samples.tar.gz ./data/\ntar -zxvf pseudo_samples.tar.gz\n```\nNote that to train the model with pseudo samples for different dataset you should put the uncompressed pseudo sample files under the right folder ```./data/image_data/data/xxx/```. For example, put the ```flickr_train_pseudo.pth``` under ```./data/image_data/data/flickr/```.\n\nFor generating pseudo-samples, we adopt the pretrained detector and attribute classifier from the [Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering](https://arxiv.org/abs/1707.07998). The pytorch implementation of this paper is available at [bottom-up-attention](https://github.com/MILVLG/bottom-up-attention.pytorch).\n\n\n### Pretrained Checkpoints\n1.You can download the DETR checkpoints from [Tsinghua Cloud](https://cloud.tsinghua.edu.cn/f/580d602748174298880d/?dl=1). These checkpoints should be downloaded and move to the [checkpoints](./checkpoints) directory.\n\n```\nmkdir checkpoints\nmv detr_checkpoints.tar.gz ./checkpoints/\ntar -zxvf checkpoints.tar.gz\n```\n\n2.Checkpoints that trained on our pseudo-samples can be downloaded from [Tsinghua Cloud](https://cloud.tsinghua.edu.cn/f/ebcfb88241ed45ea8115/?dl=1). You can evaluate the checkpoints following the instruction right below.\n\n```\nmv pseudoq_checkpoints.tar.gz ./checkpoints/\ntar -zxvf pseudoq_checkpoints.tar.gz\n```\n\n### Training and Evaluation\n\n1.  Training on RefCOCO. \n    ```\n    CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 --master_port 28888 --use_env train.py --num_workers 8 --epochs 10 --batch_size 32 --lr 0.00025 --lr_bert 0.000025 --lr_visu_cnn 0.000025 --lr_visu_tra 0.000025 --lr_scheduler cosine --aug_crop --aug_scale --aug_translate --backbone resnet50 --detr_model checkpoints/detr-r50-unc.pth --bert_enc_num 12 --detr_enc_num 6 --dataset unc --max_query_len 20 --data_root ./data/image_data --split_root ./data/pseudo_samples/ --prompt \"find the region that corresponds to the description {pseudo_query}\" --output_dir ./outputs/unc/;\n    ```\n    *Notably, if you use a smaller batch size, you should also use a smaller learning rate. Original learning rate is set for batch size 256(8GPU x 32).* \n    Please refer to [scripts/train.sh](scripts/train.sh) for training commands on other datasets. \n\n2.  Evaluation on RefCOCO.\n    ```\n    CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 --master_port 28888 --use_env eval.py --num_workers 4 --batch_size 128 --backbone resnet50 --bert_enc_num 12 --detr_enc_num 6 --dataset unc --max_query_len 20 --data_root ./data/image_data --split_root ./data/pseudo_samples/ --eval_model ./checkpoints/unc_best_checkpoint.pth --eval_set testA --prompt \"find the region that corresponds to the description {pseudo_query}\" --output_dir ./outputs/unc/testA/;\n    ```\n    Please refer to [scripts/eval.sh](scripts/eval.sh) for evaluation commands on other splits or datasets.\n\n## Results\n\n**1. Visualization of Pseudo-samples.**\n\n   \u003cp align=\"center\"\u003e \u003cimg src='docs/vis_pesudo_sample.png' align=\"center\" height=\"200px\"\u003e \u003c/p\u003e\n\n**2. Experiments of Reducing the Manual Labeling Cost on RefCOCO.**\n\n   \u003cp align=\"center\"\u003e \u003cimg src='docs/reducing_cost.png' align=\"center\" height=\"200px\"\u003e \u003c/p\u003e\n\n**3. Results on RefCOCO/RefCOCO+/RefCOCOg.**\n\n   \u003cp align=\"center\"\u003e \u003cimg src='docs/result1.png' align=\"center\" height=\"250px\"\u003e \u003c/p\u003e\n\n**4. Results on ReferItGame/Flickr30K Entities.**\n\n   \u003cp align=\"center\"\u003e \u003cimg src='docs/result2.png' align=\"center\" height=\"300px\"\u003e \u003c/p\u003e\n\n**Please refer to our paper for more details.**.\n\n## Contacts\njhj20 at mails dot tsinghua dot edu dot cn\n\nAny discussions or concerns are welcomed!\n\n## Acknowledgments\nThis codebase is built on [TransVG](https://github.com/djiajunustc/TransVG), [bottom-up-attention](https://github.com/peteanderson80/bottom-up-attention) and [Faster-R-CNN-with-model-pretrained-on-Visual-Genome](https://github.com/shilrley6/Faster-R-CNN-with-model-pretrained-on-Visual-Genome). Please consider citing or starring these projects.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FLeapLabTHU%2FPseudo-Q","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FLeapLabTHU%2FPseudo-Q","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FLeapLabTHU%2FPseudo-Q/lists"}