awesome-vision-language-models-for-earth-observation

A curated list of awesome vision and language resources for earth observation.
https://github.com/geoaigroup/awesome-vision-language-models-for-earth-observation

Last synced: about 16 hours ago
JSON representation

Vision-Language Remote Sensing Datasets
- Link
- Link
- Link
- Link
- Link
- Link
- Link
- Link
- Link
- Link
- Link
- Link - text paired dataset |
- Link - text pairs in total, covering more than 29K distinct semantic tags |
- Link - 4292/15/8/2139) | Size : 2864 videos and 14,320 captions, where each video is paired with five unique captions |
- Link
- Link
- Link
- Link
- Training_Set - v1.0, but the extremely small instances (less than 10 pixels) are also annotated. Moreover, a new category, ”container crane” is added. Use: object detection in aerial images |
- Link
- Link
- Link
- Link
- Link
- Link
- Link
- Link
- Link
- Link
- Link
- link - annotated captions and 936 visual question-answer pairs with rich information and open-ended questions and answers. Can be used for Image Captioning and Visual Question-Answering tasks |
- Link
- Link
- Link
- Link
- Link
- Link - 2 and Open Street Map Use: Remote Sensing Visual Question Answering |
- Link
- Link - resolution (15cm) Platforms: Sentinel-2, BigEarthNet and Open Street Map Use: Remote Sensing Visual Question Answering |
- Link
- Link - question-answer triplets A small part of RSIVQA is annotated by human. Others are automatically generated using existing scene classification datasets and object detection datasets Use: Remote Sensing Visual Question Answering |
- Link - image pairs Resolution : 224 x 224 Platforms: UAV-DJI Mavic Pro quadcopters, after Hurricane Harvey Use: Remote Sensing Visual Question Answering |
- link - 5B mean height of 633.0 pixels (up to 9,999) and mean width of 843.7 pixels (up to 19,687) Platforms : Based on LAION-5B |
- Link
- Link
- Link
- Link
- images_Link - Captions/blob/main/dataset_nwpu.json) | [Paper Link](https://ieeexplore.ieee.org/document/9866055/) | Size: 31,500 images with 157,500 sentences Number of Classes: 45 Resolution : 256 x 256 pixels Platforms: based on NWPU-RESISC45 dataset Use: Remote Sensing Image Captioning |
- Link - Text Retrieval |
- Link
- Link - 4292/10/6/964) | Size: 2,100 images Number of Classes: 21 Resolution : 256 x 256 Platforms: Extension of the UC Merced Use: Remote Sensing Image Retrieval (RSIR), Classification and Semantic Segmentation |
- Link - query pairs and 17,402 RS images Number of Classes: 20 Resolution : 800 x 800 Platforms: DIOR dataset Use: Remote Sensing Visual Grounding |
- link - Scale_LanguageAware_Visual_Grounding_on_Remote_Sensing_Data) | Size : 25,452 Images and 48,952 expression in English and Chinese Number of Classes : 14 Resolution : 800 x 800 |
- link
- Link
- Link - m to 1.2-m Platforms: Google Earth and Baidu Map Use: Remote Sensing Object Detection |
- Link
- aircraft
- link - http://gpcv.whu.edu.cn/data/building_dataset.html | [Paper Link](https://arxiv.org/pdf/2208.00657v1.pdf) | Size: more than 220, 000 independent buildings Number of Classes: 1 Resolution : 0.075 m spatial resolution and 450 km2 covering in Christchurch, New Zealand Platforms: QuickBird, Worldview series, IKONOS, ZY-3 and 6 neighboring satellite images covering 550 km2 on East Asia with 2.7 m ground resolution. Use: Remote Sensing Building detection and change detection |
- DOTA-v2.0 - v1.5, it further adds the new categories of ”airport” and ”helipad”. Use: object detection in aerial images |
- Training_Set - scale_Dataset_for_Instance_Segmentation_in_Aerial_Images_CVPRW_2019_paper.pdf) | Size: 2,806 images with 655,451 object instances Number of Classes: 15 Resolution : high resolution Platforms: Dota Dataset Use: semantic segmentation or object detection |
- link
- link
- Link
- Link
- Link
- Link
- Link
- Link
- Link
- Link
- Link
- Link
- Link
- Link
- Link
- Link
- Link
- Link
- Link
- Link
- Link
- Link
- Link
- Link
- Link
- images_Link - Captions/blob/main/dataset_nwpu.json) | [Paper Link](https://ieeexplore.ieee.org/document/9866055/) | Size: 31,500 images with 157,500 sentences Number of Classes: 45 Resolution : 256 x 256 pixels Platforms: based on NWPU-RESISC45 dataset Use: Remote Sensing Image Captioning |
- Link
- Link
- Link
- Link
- Link
- Link
- Link
Text-Image Retrieval
- paper
- paper - pytorch) | ICMR'23 |
- paper - sensing-image-retrieval) | |
- paper
- paper
- paper
- paper
- paper
- paper
- paper - berlin.de/rsim/chnr) | IEEE ICIP |
- paper
- paper
- paper
- Paper
- paper
- paper - berlin.de/rsim/duch) | IEEE ICASSP |
- paper
- paper
- paper
- paper - MultimediaPlus/PIR-pytorch) | ACM MM 2023 (Oral) |
- paper
- paper
- paper
Foundation Models
- paper
- paper - Sensing-ChatGPT) | |
- paper - nwpu/SkyEyeGPT) | |
- paper - oryx/geochat) | |
- paper
- paper
- paper
- paper
Image Captioning
- paper
- paper
- paper
- paper
- paper
- paper
- paper
- paper
- paper
- paper
- paper
- paper
- paper - Captions) | IEEE TGRS |
- paper
- paper
- paper
- paper
- paper
- paper - berlin.de/rsim/SD-RSIC) | IEEE TGRS |
- paper
- paper
- paper
- paper - Based Systems |
- paper
- paper
- paper
- paper
- paper - Yang-Liu/RSCaMa) | |
- paper
Visual Question Answering
- paper - wang.github.io/homepage/EarthVQA) | AAAI 2024 |
- paper
- paper - wang.github.io/homepage/EarthVQA) | AAAI 2024 |
- paper - berlin.de/rsim/lit4rsvqa) | IEEE IGARSS |
- paper - data/MQVQA) | IEEE TGRS
- paper
- paper - D-Wang/RSAdapter) | |
- paper
- paper
- paper - easy2hard) | IEEE TGRS |
- paper
- paper - berlin.de/rsim/multi-modal-fusion-transformer-for-vqa-in-rs) | SPIE Image and Signal Processing for Remote Sensing |
- paper
- paper
- paper
- paper
- paper
Visual Grounding
- paper
- paper
- paper - RSVG) | |
- paper
- paper - nwpu/RSVG-pytorch) | IEEE TGRS |
Related Repositories & Libraries

Programming Languages

Python 4

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

awesome-vision-language-models-for-earth-observation

Vision-Language Remote Sensing Datasets

Text-Image Retrieval

Foundation Models

Image Captioning

Visual Question Answering

Visual Grounding

awesome-vision-language-models-for-earth-observation

Vision-Language Remote Sensing Datasets

Text-Image Retrieval

Foundation Models

Image Captioning

Visual Question Answering

Visual Grounding

Related Repositories & Libraries