Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-described-object-detection
A curated list of papers and resources related to Described Object Detection, Open-Vocabulary/Open-World Object Detection and Referring Expression Comprehension. Updated frequently and pull requests welcomed.
https://github.com/Charles-Xie/awesome-described-object-detection
Last synced: 3 days ago
JSON representation
-
Grounding Datasets
-
Methods with Potential for DOD
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Cops-Ref: A new Dataset and Task on Compositional Referring Expression Comprehension (CVPR 2020) - | [Github](https://github.com/zfchenUnique/Cops-Ref) | eval only | A variant of REC |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- [paper - Open-Vocabulary)
- [paper
- [paper
- Generation and Comprehension of Unambiguous Object Descriptions (CVPR 2016) - | [Github](https://github.com/mjhucla/Google_Refexp_toolbox) | train & eval | images from COCO |
- witnessai/Awesome-Open-Vocabulary-Object-Detection - Vocabulary Object Detection papers.
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- daqingliu/awesome-rec
- qy-feng/awesome-visual-grounding
- MarkMoHR/Awesome-Referring-Image-Segmentation
- TheShadow29/awesome-grounding
- BradyFU/Awesome-Multimodal-Large-Language-Models
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Advancing Visual Grounding With Scene Knowledge: Benchmark and Method (CVPR 2023) - | [Github](https://github.com/zhjohnchan/SK-VG) | train & eval | scene knowledge in natural language is required |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Ferret: Refer and Ground Anything Anywhere at Any Granularity (arxiv 2023) - and-refer | - | [Github](https://github.com/apple/ml-ferret) | eval only | - |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Kosmos-2: Grounding Multimodal Large Language Models to the World (arxiv 2023) - | [Github](https://github.com/microsoft/unilm/tree/master/kosmos-2#grit-large-scale-training-corpus-of-grounded-image-text-pairs) [Huggingface](https://huggingface.co/datasets/zzliang/GRIT) | train only | created based on image-text pairs from a subset of COYO-700M and LAION-2B; 20.5M |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
- ReferItGame: Referring to Objects in Photographs of Natural Scenes (EMNLP 2014) - | [Github](https://github.com/lichengunc/refer) | train & eval | images from COCO |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
-
-
Open-Vocabulary Object Detection
-
Methods with Potential for DOD
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper - DQUO)
- [paper
- [paper
- [paper
- [paper
- [paper - Research/T-Rex)
- [paper - OVD)
- [paper - ai-lab/OmDet)
- [paper
- [paper - CVC/YOLO-World)
- [paper
- [paper - lab/SIC-CADS)
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper - OVOD)
- [paper
- [paper
- [paper - Lab/CoDet)
- [paper - det)
- [paper
- [paper
- [paper - ai-lab/OVDEval)
- [paper
- [paper - research/scenic/tree/main/scenic/projects/owl_vit)
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper - research/opera/tree/main/configs/dk-detr)
- [paper - Research/OpenSeeD)
- [paper
- [paper
- [paper - ovod)
- [paper
- [paper
- [paper
- [paper - research/google-research/tree/master/fvlm/rovit)
- [paper
- [paper
- [paper - research/google-research/tree/master/fvlm) [[website]](https://sites.google.com/view/f-vlm/home)
- [paper
- [paper
- [paper - freiburg/locov)
- [paper - centric-ovd)
- [paper
- [paper - PLM)
- [paper
- [paper - DETR)
- [paper - OVD)
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper - cnn)
- [paper
- [paper
- [paper - DETR)
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper - research/scenic/tree/main/scenic/projects/owl_vit)
- [paper - DINO)
- [paper
- [paper
- [paper - Det)
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
-
-
Referring Expression Comprehension/Visual Grounding
-
Methods with Potential for DOD
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper - ai-lab/GroundVLP)
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper - VG)
- [paper
- [paper
- [paper
- [paper - Sys/ONE-PEACE)
- [paper - VG)
- [paper - research.xyz/) [[code]](https://github.com/wl-zhao/VPD)
- [paper
- [paper - io-inference)
- [paper - science/polygon-transformer) [[demo]](https://huggingface.co/spaces/koajoel/PolyFormer)
- [paper
- [paper
- [paper
- [paper - Research/DQ-DETR)
- [paper
- [paper
- [paper - zhuh/SeqTR)
- [paper
- [paper
- [paper
- [paper
- [paper - Q)
- [paper - Sys/OFA)
- [paper
- [paper - vision/RefTR)
- [paper
- [paper
- Star - ->
- [paper - lab/LBYLNet)
- [paper - Cogrounding_semantic_attention)
- [paper - WeaklyGrounding.pytorch)
- [paper
- [paper - ur/ReSC)
- [paper
- [paper
- [paper
- [paper
- [paper - ur/onestage_grounding)
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper - DGT)
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
-
-
Detection Datasets
-
Methods with Potential for DOD
- The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
- The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
- The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
- The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
- The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
- Objects365: A Large-Scale, High-Quality Dataset for Object Detection (ICCV 2019) - |
- Link - |
- LVIS: A Dataset for Large Vocabulary Instance Segmentation (CVPR 2019) - dataset/lvis-api) | train & eval | long-tail; federated annotation; also used for OVD |
- Microsoft COCO: Common Objects in Context (ECCV 2014)
- The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
- The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
- Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy - | [Github](https://github.com/ZhangYuanhan-AI/Bamboo) | detector pretraining | build upon public datasets; 69M image classification annotations and 32M object bounding boxes |
- The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
- BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training (CVPR 2022 Workshop) - | [Github](https://github.com/amazon-science/bigdetection) | detector pretraining | - |
- The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
- The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
- The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
- The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
- The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
- The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
- The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
- The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
- The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
- The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
- The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
- The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
- The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
- The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
- The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
- The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
- The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
- The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
- The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
- The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
- The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
- The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
-
-
Described Object Detection
-
Methods with Potential for DOD
- [paper
- [paper
- [paper
- [paper - VLLM/LLaMA2-Accessory)![Star](https://img.shields.io/github/stars/Alpha-VLLM/LLaMA2-Accessory.svg?style=social&label=Star)
- [paper - Xuan/Pink)![Star](https://img.shields.io/github/stars/SY-Xuan/Pink.svg?style=social&label=Star)
- [paper
- [paper - Demo) [[code]](https://github.com/yuhangzang/ContextDET)![Star](https://img.shields.io/github/stars/yuhangzang/ContextDET.svg?style=social&label=Star)
- [paper - VL-Chat-Demo/summary) [[code]](https://github.com/QwenLM/Qwen-VL)![Star](https://img.shields.io/github/stars/QwenLM/Qwen-VL.svg?style=social&label=Star)
- [paper
- [paper - Research/GroundingDINO)![Star](https://img.shields.io/github/stars/IDEA-Research/GroundingDINO.svg?style=social&label=Star) (REC, OD, etc.)
- [paper - IIAU/UNINEXT)![Star](https://img.shields.io/github/stars/MasterBin-IIAU/UNINEXT.svg?style=social&label=Star) (REC, OVD, etc.)
- [paper - research/google-research/tree/master/findit)![Star](https://img.shields.io/github/stars/google-research/google-research.svg?style=social&label=Star) (REC, OD, etc.)
- [paper
-
Datasets for DOD and Similar Tasks
-
Methods with Potential for DOD
- OmniLabel: A Challenging Benchmark for Language-Based Object Detection (ICCV 2023) - |
- Described Object Detection: Liberating Object Detection with Flexible Expressions (NeurIPS 2023) - | [Github](https://github.com/shikras/d-cube) | eval only | - |
- How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection (AAAI 2024) - | [Github](https://github.com/om-ai-lab/OVDEval) | eval only | - |
-
Programming Languages
Categories
Sub Categories
Keywords
visual-instruction-tuning
1
visual-in-context-learning
1
visual-chain-of-thought
1
multimodal-large-language-models
1
multimodal-instruction-tuning
1
multimodal-in-context-learning
1
multimodal-chain-of-thought
1
multi-modality
1
large-vision-language-models
1
large-vision-language-model
1
large-language-models
1
instruction-tuning
1
instruction-following
1
in-context-learning
1
chain-of-thought
1
visual-grounding
1
video-understanding
1
video-grounding
1
phrase-grounding
1
papers
1
paper-roadmap
1
paper
1
natural-language-processing
1
multimodal-deep-learning
1
language-grounding
1
image-grounding
1
grounding
1
embodied-agent
1
computer-vision
1
captioning-videos
1
captioning-images
1
awesome-list
1
arxiv
1
text-based-segmentation
1
text-based
1
semantic-segmentation
1
segmentation-datasets
1
referring-segmentation
1
referring-image-segmentation
1
referring-expressions
1
language
1
instance-segmentation
1
image-segmentation
1