Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/ZhanYang-nwpu/RSVG-pytorch

RSVG: Exploring Data and Model for Visual Grounding on Remote Sensing Data, 2022
https://github.com/ZhanYang-nwpu/RSVG-pytorch
Last synced: 9 days ago
JSON representation
RSVG: Exploring Data and Model for Visual Grounding on Remote Sensing Data, 2022
Host: GitHub
URL: https://github.com/ZhanYang-nwpu/RSVG-pytorch
Owner: ZhanYang-nwpu
Created: 2022-09-11T09:19:20.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2023-12-12T11:31:14.000Z (11 months ago)
Last Synced: 2024-04-12T08:41:33.670Z (7 months ago)
Language: Python
Size: 3.81 MB
Stars: 85
Watchers: 3
Forks: 3
Open Issues: 13
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

        # RSVG: Exploring Data and Model for Visual Grounding on Remote Sensing Data

##### Author: Yang Zhan, Zhitong Xiong, Yuan Yuan

This is the offical dataset for paper **"RSVG: Exploring Data and Model for Visual Grounding on Remote Sensing Data"**, [Paper](https://ieeexplore.ieee.org/document/10056343).

**School of Artificial Intelligence, OPtics, and ElectroNics (iOPEN), Northwestern Polytechnical University**

## Please share a STAR ⭐ if this project does help

## 📢 News

Release the DIOR_RSVG dataset.  

**[2022/10/22]**: Publish the manuscript on arXiv.

## 💬 Introduction

This is Multi-Granularity Visual Language Fusion (MGVLF) Network, the PyTorch source code of the paper "RSVG: Exploring Data and Model for Visual Grounding on Remote Sensing Data". It is built on top of the [TransVG](https://github.com/djiajunustc/TransVG) in PyTorch. Our method is a transformer-based method for visual grounding for remote sensing data (RSVG). It has achieved the SOTA performance in the RSVG task on our constructed RSVG dataset.

### 📦DIOR-RSVG Dataset



    



### 📦Statistics of the Visual Grounding Dataset

|                                                                                                                                   **Dataset**                                                                                                                                   |   **train**    | **val**      | **test**     | **Overall** |

|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:--------------:|:------------:|:------------:|:------------:|

|                                                  Flickr30k [[Paper]](https://arxiv.org/abs/1505.04870) [[Code]](https://github.com/BryanPlummer/pl-clc) [[Website]](http://web.engr.illinois.edu/~bplumme2/Flickr30kEntities/)                                                  |  29783 (94%)   | 1000 (3%)    | 1000 (3%)    | 31783       |

|                                                                                 ReferItGame [[Paper]](http://www.aclweb.org/anthology/D14-1086) [[Website]](http://tamaraberg.com/referitgame/)                                                                                 |  54127 (45%)   | 5842 (5%)    | 60103 (50%)  | 120072      |

|                                                                                      RefCOCO [[Paper]](https://arxiv.org/pdf/1608.00272.pdf)[[Code]](https://github.com/lichengunc/refer)                                                                                       |  120624 (85%)  | 10834 (7%)   | 5657 (3%)    | 142210      |

|                                                                                      RefCOCO+ [[Paper]](https://arxiv.org/pdf/1608.00272.pdf)[[Code]](https://github.com/lichengunc/refer)                                                                                      |  120191 (85%)  | 10758 (7%)   | 5726 (4%)    | 141564      |

|                                                                 GuessWhat [[Paper]](https://arxiv.org/abs/1611.08481) [[Code]](https://github.com/GuessWhatGame/guesswhat/) [[Website]](https://guesswhat.ai/#)                                                                 |      70%       | 15%          | 15%          | 100%        |

|                           Cops-Ref [[Paper]](http://openaccess.thecvf.com/content_CVPR_2020/papers/Chen_Cops-Ref_A_New_Dataset_and_Task_on_Compositional_Referring_Expression_CVPR_2020_paper.pdf) [[Code]](https://github.com/zfchenUnique/Cops-Ref)                           | 119603 (80.5%) | 16524 (11%)  | 12586 (8.5%) | 148713      |

|                                                                                  KB-Ref [[Paper]](https://arxiv.org/pdf/2006.01629) [[Code]](https://github.com/wangpengnorman/KB-Ref_dataset)                                                                                  |  31284 (72%)   | 4000 (10%)   | 8000 (18%)   | 43284       |

| Ref-Reasoning [[Paper]](http://openaccess.thecvf.com/content_CVPR_2020/papers/Yang_Graph-Structured_Referring_Expression_Reasoning_in_the_Wild_CVPR_2020_paper.pdf) [[Code]](https://github.com/sibeiyang/sgmn) [[Website]](https://sibeiyang.github.io/dataset/ref-reasoning/) |  721164 (91%)  | 36183 (4.6%) | 34609 (4.4%) | 791956      |

|                                                                           RSVG [[Paper]](https://dl.acm.org/doi/abs/10.1145/3503161.3548316) [[Website]](https://sunyuxi.github.io/publication/GeoVG)                                                                           |   5505 (70%)   | 1201 (15%)   | 1227 (15%)   | 7933        |

|                                                                  **DIOR-RSVG** [[Paper]](https://ieeexplore.ieee.org/document/10056343) [[Dataset]](https://drive.google.com/drive/folders/1hTqtYsC6B-m4ED2ewx5oKuYZV13EoJp_)                                                                   |  **26991 (70%)**   | **3829 (10%)**   | **7500 (20%)**   | **38320**       |

### 🚀Network Architecture



    



## 👁️Requirements and Installation

We recommended the following dependencies.

- Python 3.6.13

- PyTorch 1.9.0

- NumPy 1.19.2

- cuda 11.1

- opencv 4.5.5

- torchvision

## 🔍Download Dataset

Download our constructed RSVG dataset files. We build the first large-scale dataset for RSVG, termed DIOR-RSVG, which can be downloaded from our [Google Drive](https://drive.google.com/drive/folders/1hTqtYsC6B-m4ED2ewx5oKuYZV13EoJp_?usp=sharing). The download link is available below:

```

https://drive.google.com/drive/folders/1hTqtYsC6B-m4ED2ewx5oKuYZV13EoJp_?usp=sharing

```

   

We expect the directory and file structure to be the following:

```

./                      # current (project) directory

├── data_loader.py      # Load data

├── main.py             # Main code for training, validation, and test

├── README.md

└── DIOR_RSVG/          # DIOR-RSVG dataset

    ├── Annotations/    # Query expressions and bounding boxes

    │   ├── 00001.xml/

    │   └── ..some xml files..

    ├── JPEGImages/     # Remote sensing images

    │   ├── 00001.jpg/

    │   └── ..some jpg files..

    ├── train.txt       # ID of training set    （26991）

    ├── val.txt         # ID of validation set  （3829）

    └── test.txt        # ID of test set        （7500）

```

## 📜Reference

If you found this code useful, please cite the paper. Welcome :+1:_`Fork and Star`_:+1:, then I will let you know when we update.

```

@ARTICLE{10056343,

  author={Zhan, Yang and Xiong, Zhitong and Yuan, Yuan},

  journal={IEEE Transactions on Geoscience and Remote Sensing}, 

  title={RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data}, 

  year={2023},

  volume={61},

  number={},

  pages={1-13},

  doi={10.1109/TGRS.2023.3250471}

  }

```

## 🙏Acknowledgments

Our DIOR-RSVG is constructed based on the [DIOR](http://www.escience.cn/people/JunweiHan/DIOR.html) remote sensing image dataset. We thank to the authors for releasing the dataset. Part of our code is borrowed from [TransVG](https://github.com/djiajunustc/TransVG). We thank to the authors for releasing codes. I would like to thank Xiong zhitong and Yuan yuan for helping the manuscript. I also thank the School of Artificial Intelligence, OPtics, and ElectroNics (iOPEN), Northwestern Polytechnical University for supporting this work.