Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

awesome-described-object-detection

A curated list of papers and resources related to Described Object Detection, Open-Vocabulary/Open-World Object Detection and Referring Expression Comprehension. Updated frequently and pull requests welcomed.
https://github.com/Charles-Xie/awesome-described-object-detection

Last synced: 3 days ago
JSON representation

Grounding Datasets
- Methods with Potential for DOD
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Cops-Ref: A new Dataset and Task on Compositional Referring Expression Comprehension (CVPR 2020) - | [Github](https://github.com/zfchenUnique/Cops-Ref) | eval only | A variant of REC |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - [paper - Open-Vocabulary)
  - [paper
  - [paper
  - Generation and Comprehension of Unambiguous Object Descriptions (CVPR 2016) - | [Github](https://github.com/mjhucla/Google_Refexp_toolbox) | train & eval | images from COCO |
  - witnessai/Awesome-Open-Vocabulary-Object-Detection - Vocabulary Object Detection papers.
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - daqingliu/awesome-rec
  - qy-feng/awesome-visual-grounding
  - MarkMoHR/Awesome-Referring-Image-Segmentation
  - TheShadow29/awesome-grounding
  - BradyFU/Awesome-Multimodal-Large-Language-Models
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Advancing Visual Grounding With Scene Knowledge: Benchmark and Method (CVPR 2023) - | [Github](https://github.com/zhjohnchan/SK-VG) | train & eval | scene knowledge in natural language is required |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Ferret: Refer and Ground Anything Anywhere at Any Granularity (arxiv 2023) - and-refer | - | [Github](https://github.com/apple/ml-ferret) | eval only | - |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Kosmos-2: Grounding Multimodal Large Language Models to the World (arxiv 2023) - | [Github](https://github.com/microsoft/unilm/tree/master/kosmos-2#grit-large-scale-training-corpus-of-grounded-image-text-pairs) [Huggingface](https://huggingface.co/datasets/zzliang/GRIT) | train only | created based on image-text pairs from a subset of COYO-700M and LAION-2B; 20.5M |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
  - ReferItGame: Referring to Objects in Photographs of Natural Scenes (EMNLP 2014) - | [Github](https://github.com/lichengunc/refer) | train & eval | images from COCO |
  - Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations (IJCV 2017) - modal tasks (including REC) |
Open-Vocabulary Object Detection
- Methods with Potential for DOD
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper - DQUO)
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper - Research/T-Rex)
  - [paper - OVD)
  - [paper - ai-lab/OmDet)
  - [paper
  - [paper - CVC/YOLO-World)
  - [paper
  - [paper - lab/SIC-CADS)
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper - OVOD)
  - [paper
  - [paper
  - [paper - Lab/CoDet)
  - [paper - det)
  - [paper
  - [paper
  - [paper - ai-lab/OVDEval)
  - [paper
  - [paper - research/scenic/tree/main/scenic/projects/owl_vit)
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper - research/opera/tree/main/configs/dk-detr)
  - [paper - Research/OpenSeeD)
  - [paper
  - [paper
  - [paper - ovod)
  - [paper
  - [paper
  - [paper
  - [paper - research/google-research/tree/master/fvlm/rovit)
  - [paper
  - [paper
  - [paper - research/google-research/tree/master/fvlm) [[website]](https://sites.google.com/view/f-vlm/home)
  - [paper
  - [paper
  - [paper - freiburg/locov)
  - [paper - centric-ovd)
  - [paper
  - [paper - PLM)
  - [paper
  - [paper - DETR)
  - [paper - OVD)
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper - cnn)
  - [paper
  - [paper
  - [paper - DETR)
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper - research/scenic/tree/main/scenic/projects/owl_vit)
  - [paper - DINO)
  - [paper
  - [paper
  - [paper - Det)
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
Referring Expression Comprehension/Visual Grounding
- Methods with Potential for DOD
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper - ai-lab/GroundVLP)
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper - VG)
  - [paper
  - [paper
  - [paper
  - [paper - Sys/ONE-PEACE)
  - [paper - VG)
  - [paper - research.xyz/) [[code]](https://github.com/wl-zhao/VPD)
  - [paper
  - [paper - io-inference)
  - [paper - science/polygon-transformer) [[demo]](https://huggingface.co/spaces/koajoel/PolyFormer)
  - [paper
  - [paper
  - [paper
  - [paper - Research/DQ-DETR)
  - [paper
  - [paper
  - [paper - zhuh/SeqTR)
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper - Q)
  - [paper - Sys/OFA)
  - [paper
  - [paper - vision/RefTR)
  - [paper
  - [paper
  - Star - ->
  - [paper - lab/LBYLNet)
  - [paper - Cogrounding_semantic_attention)
  - [paper - WeaklyGrounding.pytorch)
  - [paper
  - [paper - ur/ReSC)
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper - ur/onestage_grounding)
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper - DGT)
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
Detection Datasets
- Methods with Potential for DOD
  - The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
  - The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
  - The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
  - The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
  - The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
  - Objects365: A Large-Scale, High-Quality Dataset for Object Detection (ICCV 2019) - |
  - Link - |
  - LVIS: A Dataset for Large Vocabulary Instance Segmentation (CVPR 2019) - dataset/lvis-api) | train & eval | long-tail; federated annotation; also used for OVD |
  - Microsoft COCO: Common Objects in Context (ECCV 2014)
  - The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
  - The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
  - Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy - | [Github](https://github.com/ZhangYuanhan-AI/Bamboo) | detector pretraining | build upon public datasets; 69M image classification annotations and 32M object bounding boxes |
  - The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
  - BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training (CVPR 2022 Workshop) - | [Github](https://github.com/amazon-science/bigdetection) | detector pretraining | - |
  - The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
  - The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
  - The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
  - The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
  - The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
  - The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
  - The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
  - The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
  - The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
  - The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
  - The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
  - The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
  - The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
  - The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
  - The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
  - The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
  - The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
  - The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
  - The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
  - The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
  - The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
  - The PASCAL Visual Object Classes (VOC) Challenge (IJCV 2010) - | train & eval | - |
Described Object Detection
- - [paper
  - here
  - [paper
  - [paper - mmlab/mmdetection/tree/main/configs/mm_grounding_dino)![Star](https://img.shields.io/github/stars/open-mmlab/mmdetection.svg?style=social&label=Star)
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper - Enhanced-Negs)
  - [paper
- Methods with Potential for DOD
  - [paper
  - [paper
  - [paper
  - [paper - VLLM/LLaMA2-Accessory)![Star](https://img.shields.io/github/stars/Alpha-VLLM/LLaMA2-Accessory.svg?style=social&label=Star)
  - [paper - Xuan/Pink)![Star](https://img.shields.io/github/stars/SY-Xuan/Pink.svg?style=social&label=Star)
  - [paper
  - [paper - Demo) [[code]](https://github.com/yuhangzang/ContextDET)![Star](https://img.shields.io/github/stars/yuhangzang/ContextDET.svg?style=social&label=Star)
  - [paper - VL-Chat-Demo/summary) [[code]](https://github.com/QwenLM/Qwen-VL)![Star](https://img.shields.io/github/stars/QwenLM/Qwen-VL.svg?style=social&label=Star)
  - [paper
  - [paper - Research/GroundingDINO)![Star](https://img.shields.io/github/stars/IDEA-Research/GroundingDINO.svg?style=social&label=Star) (REC, OD, etc.)
  - [paper - IIAU/UNINEXT)![Star](https://img.shields.io/github/stars/MasterBin-IIAU/UNINEXT.svg?style=social&label=Star) (REC, OVD, etc.)
  - [paper - research/google-research/tree/master/findit)![Star](https://img.shields.io/github/stars/google-research/google-research.svg?style=social&label=Star) (REC, OD, etc.)
  - [paper
Datasets for DOD and Similar Tasks
- Methods with Potential for DOD
  - OmniLabel: A Challenging Benchmark for Language-Based Object Detection (ICCV 2023) - |
  - Described Object Detection: Liberating Object Detection with Flexible Expressions (NeurIPS 2023) - | [Github](https://github.com/shikras/d-cube) | eval only | - |
  - How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection (AAAI 2024) - | [Github](https://github.com/om-ai-lab/OVDEval) | eval only | - |

Programming Languages

Python 1

Ecosyste.ms: Awesome

awesome-described-object-detection

Grounding Datasets

Methods with Potential for DOD

Open-Vocabulary Object Detection

Methods with Potential for DOD

Referring Expression Comprehension/Visual Grounding

Methods with Potential for DOD

Detection Datasets

Methods with Potential for DOD

Described Object Detection

Methods with Potential for DOD

Datasets for DOD and Similar Tasks

Methods with Potential for DOD