Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-vision-language-models-for-earth-observation
A curated list of awesome vision and language resources for earth observation.
https://github.com/geoaigroup/awesome-vision-language-models-for-earth-observation
Last synced: 4 days ago
JSON representation
-
Vision-Language Remote Sensing Datasets
- Link
- Link
- Link
- Link
- Link
- Link
- Link
- Link - text paired dataset<br>|
- Link - text pairs in total, covering more than 29K distinct semantic tags |
- Link - 4292/15/8/2139) | Size : 2864 videos and 14,320 captions, where each video is paired with five unique captions |
- Link
- Link
- Training_Set - v1.0, but the extremely small instances (less than 10 pixels) are also annotated. Moreover, a new category, ”container crane” is added. <br> Use: object detection in aerial images <br>|
- Link
- Link
- Link
- Link
- Link
- link - annotated captions and 936 visual question-answer pairs with rich information and open-ended questions and answers.<br> Can be used for Image Captioning and Visual Question-Answering tasks <br> |
- Link
- Link
- Link
- Link
- Link
- Link
- Link - 2 and Open Street Map<br>Use: Remote Sensing Visual Question Answering <br>|
- Link
- Link - resolution (15cm) <br> Platforms: Sentinel-2, BigEarthNet and Open Street Map<br>Use: Remote Sensing Visual Question Answering <br>|
- Link
- Link - question-answer triplets <br>A small part of RSIVQA is annotated by human. Others are automatically generated using existing scene classification datasets and object detection datasets<br>Use: Remote Sensing Visual Question Answering <br>|
- Link - image pairs <br>Resolution : 224 x 224 <br> Platforms: UAV-DJI Mavic Pro quadcopters, after Hurricane Harvey<br>Use: Remote Sensing Visual Question Answering <br>|
- link - 5B <br> mean height of 633.0 pixels (up to 9,999) and mean width of 843.7 pixels (up to 19,687) <br> Platforms : Based on LAION-5B <br> |
- Link
- Link
- Link
- Link
- images_Link - Captions/blob/main/dataset_nwpu.json) | [Paper Link](https://ieeexplore.ieee.org/document/9866055/) | Size: 31,500 images with 157,500 sentences <br> Number of Classes: 45 <br>Resolution : 256 x 256 pixels<br> Platforms: based on NWPU-RESISC45 dataset <br> Use: Remote Sensing Image Captioning <br>|
- Link - Text Retrieval<br>|
- Link
- Link - 4292/10/6/964) | Size: 2,100 images <br> Number of Classes: 21 <br>Resolution : 256 x 256 <br> Platforms: Extension of the UC Merced <br> Use: Remote Sensing Image Retrieval (RSIR), Classification and Semantic Segmentation<br>|
- Link - query pairs and 17,402 RS images<br>Number of Classes: 20<br>Resolution : 800 x 800 <br> Platforms: DIOR dataset <br> Use: Remote Sensing Visual Grounding <br>|
- link - Scale_LanguageAware_Visual_Grounding_on_Remote_Sensing_Data) | Size : 25,452 Images and 48,952 expression in English and Chinese <br> Number of Classes : 14 <br> Resolution : 800 x 800 |
- link
- Link
- Link - m to 1.2-m <br> Platforms: Google Earth and Baidu Map <br> Use: Remote Sensing Object Detection <br>|
- Link
- aircraft
- link - http://gpcv.whu.edu.cn/data/building_dataset.html | [Paper Link](https://arxiv.org/pdf/2208.00657v1.pdf) | Size: more than 220, 000 independent buildings <br>Number of Classes: 1<br>Resolution : 0.075 m spatial resolution and 450 km2 covering in Christchurch, New Zealand <br> Platforms: QuickBird, Worldview series, IKONOS, ZY-3 and 6 neighboring satellite images covering 550 km2 on East Asia with 2.7 m ground resolution.<br> Use: Remote Sensing Building detection and change detection <br>|
- DOTA-v2.0 - v1.5, it further adds the new categories of ”airport” and ”helipad”. <br> Use: object detection in aerial images <br>|
- Training_Set - scale_Dataset_for_Instance_Segmentation_in_Aerial_Images_CVPRW_2019_paper.pdf) | Size: 2,806 images with 655,451 object instances<br>Number of Classes: 15<br>Resolution : high resolution <br> Platforms: Dota Dataset <br> Use: semantic segmentation or object detection <br>|
- link
- link
- Link
- Link
- Link
- Link
- Link
- Link
- Link
- Link
- Link
- Link
- Link
-
Foundation Models
-
Image Captioning
-
Visual Question Answering
- paper - wang.github.io/homepage/EarthVQA) | AAAI 2024 |
- paper
- paper - wang.github.io/homepage/EarthVQA) | AAAI 2024 |
- paper - berlin.de/rsim/lit4rsvqa) | IEEE IGARSS |
- paper - data/MQVQA) | IEEE TGRS
- paper
- paper - D-Wang/RSAdapter) | |
- paper
- paper
- paper - easy2hard) | IEEE TGRS |
- paper
- paper - berlin.de/rsim/multi-modal-fusion-transformer-for-vqa-in-rs) | SPIE Image and Signal Processing for Remote Sensing |
- paper
- paper
- paper
- paper
- paper
-
Visual Grounding
-
Related Repositories & Libraries
-
Text-Image Retrieval
- paper - pytorch) | ICMR'23 |
- paper - sensing-image-retrieval) | |
- paper
- paper
- paper
- paper
- paper
- paper
- paper - berlin.de/rsim/chnr) | IEEE ICIP |
- paper
- paper
- paper
- Paper
- paper
- paper - berlin.de/rsim/duch) | IEEE ICASSP |
- paper
- paper
- paper
- paper - MultimediaPlus/PIR-pytorch) | ACM MM 2023 (Oral) |
- paper
- paper
- paper
- paper
Programming Languages
Categories
Sub Categories