Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-remote-sensing-vision-language-models
Awesome-Remote-Sensing-Vision-Language-Models
https://github.com/lzw-lzw/awesome-remote-sensing-vision-language-models
Last synced: 3 days ago
JSON representation
-
Table of Contents
- code
- Vision-Language Models in Remote Sensing: Current Progress and Future Trends - |
- The Potential of Visual ChatGPT For Remote Sensing - |
- Brain-inspired Remote Sensing Foundation Models and Open Problems: A Comprehensive Survey
- RSGPT: A Remote Sensing Vision Language Model and Benchmark
- Tree-GPT: Modular Large Language Model Expert System for Forest Remote Sensing Image Understanding and Interactive Analysis
- Towards Automatic Satellite Images Captions Generation Using Large Language Models
- GeoChat: Grounded Large Vision-Language Model for Remote Sensing - oryx/geochat)
-
Image Captioning Dataset
- RS-5M - ai-lab/RS5M)|[[HuggingFace]](https://huggingface.co/datasets/Zilun/RS5M/viewer/Zilun--RS5M/train?row=0)
- RSICD
- DIOR-Captions - |-
- UCM-Captions
- LEVIR-CC - Yang-Liu/RSICC)|[Google Drive](https://drive.google.com/drive/folders/1cEv-BXISfWjw1RTzL39uBojH7atjLdCG) |
- SkyScript
-
Pretraining
-
Image Captioning
- Can a Machine Generate Humanlike Language Descriptions for a Remote Sensing Image?
- Natural language escription of remote sensing images based on deep learning
- Description Generation for Remote Sensing Images Using Attribute Attention Mechanism
- Vaa:Visual aligning attention model for remote sensing image captioning
- Exploring Multi-Level Attention and Semantic Relationship for Remote Sensing Image Captioning
- A multi-level attention model for remote sensing image captions
- Remote sensing image captioning via variational autoencoder and reinforcement learning - Based Systems 2020|-
- Truncation cross entropy loss for remote sensing image captionin
- Word–Sentence Framework for Remote Sensing Image Captioning
- A novel SVM-based decoder for remote sensing image captioning
- High-resolution remote sensing image captioning based on structured attention - Resolution-Remote-Sensing-Image-Captioning-Based-on-Structured-Attention)
- Exploring transformer and multilabel classification for remote sensing image captioning
- NWPU-captions dataset and mlca-net for remote sensing image captioning
- Transforming remote sensing images to textual descriptions
- Remote-sensing image captioning based on multilayer aggregated transformer
- Multi-source interactive stair attention for remote sensing image captioning
- Changes to Captions: An Attentive Network for Remote Sensing Change Captioning
- Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image Captioning
-
Text-based Image Generation
- Retro-Remote Sensing: Generating Images From Ancient Texts - STARS 2019|-
- Remote sensing image augmentation based on text description for waterside change detection
- Text-to-remote-sensing-image generation with structured generative adversarial networks
- Txt2img-MHN:Remote sensing image generation from text using modern hopfield network - MHN)
-
Image-text Retrieval
- Textrs: Deep bidirectional triplet network for matching text to remote sensing images.
- Deep unsupervised embedding for remote sensing image retrieval using textual cues
- A deep semantic alignment network for the cross-modal image-text retrieval in remote sensing - STARS 2021|-
- A lightweight multi-scale crossmodal text-image retrieval method in remote sensing
- Remote sensing cross-modal text-image retrieval based on global and local information
- Multilanguage transformer for improved text to remote sensing image retrieval - STARS 2022|-
- Contrasting dual transformer architectures for multi-modal remote sensing image retrieval
- Parameter-Efficient Transfer Learning for Remote Sensing Image-Text Retrieval
- Direction-Oriented Visual-semantic Embedding Model for Remote Sensing Image-text Retrieval
-
Visual Question Answering
- How to find a good image-text embedding for remote sensing visual question answering? - PKDD 2021|-
- Cross-Modal Visual Question Answering for Remote Sensing Data: The International Conference on Digital Image Computing: Techniques and Applications
- Self-Paced Curriculum Learning for Visual Question Answering on Remote Sensing Data
- From easy to hard: Learning language-guided curriculum for visual question answering on remote sensing data - easy2hard)
- Language transformers for remote sensing visual question answering
- Open-ended remote sensing visual question answering with transformers
- Bi-modal transformer-based approach for visual question answering in remote sensing imagery
- Prompt-RSVQA: Prompting visual context to a language model for remote sensing visual question answering
- A spatial hierarchical reasoning network for remote sensing visual question answering
- Multilingual Augmentation for Robust Visual Question Answering in Remote Sensing Images
- LiT-4-RSVQA: Lightweight Transformer-based Visual Question Answering in Remote Sensing - berlin.de/rsim/lit4rsvqa)
- Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs - roberts1/charting-new-territories)
-
Scene Classification
- Zero-shot scene classification for high spatial resolution remote sensing images
- Fine-grained object recognition and zero-shot learning in remote sensing imagery
- Structural alignment based zero-shot classification for remote sensing scenes
- A distance-constrained semantic autoencoder for zero-shot remote sensing scene classification - STARS 2021|-
- Learning deep crossmodal embedding networks for zero-shot remote sensing image scene classification
- Generative adversarial networks for zero-shot remote sensing scene classification
- APPLeNet: Visual Attention Parameterized Prompt Learning for Few-Shot Remote Sensing Image Generalization using CLIP
-
Object Detection
-
Semantic Segmentation
- Semi-supervised contrastive learning for few-shot segmentation of remote sensing images
- Few-shot segmentation of remote sensing images using deep metric learning
- Language-aware domain generalization network for cross-scene hyperspectral image classification - BIT/IEEE_TGRS_LDGnet)
- RSPrompter: Learning to Prompt for Remote Sensing Instance Segmentation based on Visual Foundation Model
- RRSIS: Referring Remote Sensing Image Segmentation
- CPSeg: Finer-grained Image Semantic Segmentation via Chain-of-Thought Language Prompting
-
Scene Classification Dataset
- AID - whu.github.io/AID/)|[[OneDrive]](https://1drv.ms/u/s!AthY3vMZmuxChNR0Co7QHpJ56M-SvQ) [[BaiduYun]](https://pan.baidu.com/s/1mifOBv6#list/path=%2F)
- UC Merced Land-Use(UCM)
- SATIN - roberts1/SATIN)
- NWPU-RESISC45 - nwpu.github.io/#Datasets)|[[OneDrive]](https://1drv.ms/u/s!AmgKYzARBl5ca3HNaHIlzp_IXjs) [[BaiduYun]](https://pan.baidu.com/s/1mifR6tU)
-
Object Detection Dataset
- NWPU VHR-10 - nwpu.github.io/#Datasets)|[[OneDrive]](https://1drv.ms/u/s!AmgKYzARBl5cczaUNysmiFRH4eE) [[BaiduYun]](https://pan.baidu.com/s/1hqwzXeG#list/path=%2F)
- DIOR - nwpu.github.io/#Datasets)|[[Google Drive]](https://drive.google.com/drive/folders/1UdlgHk49iu6WpcJ5467iT-UqNPpx__CC) [[BaiduYun]](https://pan.baidu.com/s/1iLKT0JQoKXEJTGNxt5lSMg#list/path=%2F)
- FAIR1M - |[[BaiduYun]](https://pan.baidu.com/share/init?surl=alWnbCbucLOQJJhi4WsZAw?pwd=u2xg)
-
Semantic Segmentation Dataset
- GID - ytong.github.io/project/GID.html)|[[BaiduYun code:GID5]](https://pan.baidu.com/s/1_DQluiDgJ4Z7dXSnciVx1A#list/path=%2F) [[OneDrive]](https://whueducn-my.sharepoint.com/personal/xinyi_tong_whu_edu_cn/_layouts/15/onedrive.aspx?id=%2Fpersonal%2Fxinyi%5Ftong%5Fwhu%5Fedu%5Fcn%2FDocuments%2FGID&ga=1)
- Home
-
Visual Question Answering Dataset
-
Visual Grounding
-
Visual Grounding Dataset
- DIOR-RSVG - nwpu/RSVG-pytorch)|[[Google Drive]](https://drive.google.com/drive/folders/1hTqtYsC6B-m4ED2ewx5oKuYZV13EoJp_?usp=sharing)
-
Text-based Image Retrieval Dataset
- RSITMD - LmQX32PYxr0Q?pwd=NIST) [[Google Drive]](https://drive.google.com/file/d/1NJY86TAAUd8BVs7hyteImv8I2_Lh95W6/view?usp=sharing)
Programming Languages
Categories
Image Captioning
18
Visual Question Answering
12
Image-text Retrieval
9
Table of Contents
8
Scene Classification
7
Image Captioning Dataset
6
Semantic Segmentation
6
Text-based Image Generation
4
Visual Question Answering Dataset
4
Scene Classification Dataset
4
Pretraining
3
Object Detection Dataset
3
Object Detection
2
Semantic Segmentation Dataset
2
Visual Grounding Dataset
1
Text-based Image Retrieval Dataset
1
Visual Grounding
1
Sub Categories