Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Awesome-Open-Vocabulary
(TPAMI 2024) A Survey on Open Vocabulary Learning
https://github.com/jianzongwu/Awesome-Open-Vocabulary
Last synced: 3 days ago
JSON representation
-
Methods: A Survey
-
Open Vocabulary Object Detection
- Grounded Language-Image Pre-training
- Open-Vocabulary Object Detection Using Captions - cnn)|
- Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
- RegionCLIP: Region-based Language-Image Pretraining
- Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model
- Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation
- Grounded Language-Image Pre-training
- GLIPv2: Unifying Localization and VL Understanding
- Localized Vision-Language Matching for Open-vocabulary Object Detection - freiburg/locov)|
- Open-Vocabulary DETR with Conditional Matching - DETR)|
- Open Vocabulary Object Detection with Pseudo Bounding-Box Labels - OVD)|
- Promptdet: Towards open-vocabulary detection using uncurated images
- Detecting Twenty-thousand Classes using Image-level Supervision
- Exploiting unlabeled data with vision and language models for object detection - PLM)|
- Simple Open-Vocabulary Object Detection with Vision Transformers - research/scenic/tree/main/scenic/projects/owl_vit)|
- Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection - centric-ovd)|
- DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection
- Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization
- P3OVD: Fine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary Object Detection
- Learning Object-Language Alignments for Open-Vocabulary Object Detection
- F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models - research/google-research/tree/master/fvlm)|
- Learning to Detect and Segment for Open Vocabulary Object Detection
- Aligning Bag of Regions for Open-Vocabulary Object Detection
- Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection
- CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching
- DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment
- Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers
- Multi-Modal Classifiers for Open-Vocabulary Object Detection - ovod/)|
- GridCLIP: One-Stage Object Detection by Grid-Level CLIP Representation Learning
- Enhancing the Role of Context in Region-Word Alignment for Object Detection
- Open-Vocabulary Object Detection using Pseudo Caption Labels
- Three ways to improve feature alignment for open vocabulary detection
- Prompt-Guided Transformers for End-to-End Open-Vocabulary Object Detection
- MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
- Open-Vocabulary Object Detection via Scene Graph Discovery
- Detection-Oriented Image-Text Pretraining for Open-Vocabulary Detection - research/google-research/tree/master/fvlm/dito)|
- EdaDet: Open-Vocabulary Object Detection Using Early Dense Alignment
- What Makes Good Open-Vocabulary Detector: A Disassembling Perspective
- CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection - lab/codet)|
- DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection - det)|
- Taming Self-Training for Open-Vocabulary Object Detection - det)|
- CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
- Simple Image-level Classification Improves Open-vocabulary Object Detection - lab/sic-cads)|
- ProxyDet: Synthesizing Proxy Novel Classes via Classwise Mixup for Open-Vocabulary Object Detection
- CLIM: Contrastive Language-Image Mosaic for Region Representation
- LP-OVOD: Open-Vocabulary Object Detection by Linear Probing - OVOD)|
- The devil is in the fine-grained details: Evaluating open-vocabulary object detectors for fine-grained understanding - OVD/)|
- GLIPv2: Unifying Localization and VL Understanding
- YOLO-World: Real-Time Open-Vocabulary Object Detection - CVC/YOLO-World)|
- The devil is in the fine-grained details: Evaluating open-vocabulary object detectors for fine-grained understanding - OVD/)|
- LLMs Meet VLMs: Boost Open Vocabulary Object Detection with Fine-grained Descriptors
-
Open Vocabulary Segmentation
- Primitive Generation and Semantic-related Alignment for Universal Zero-Shot Segmentation
- Semantic-Promoted Debiasing and Background Disambiguation for Zero-Shot Instance Segmentation
- FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation
- Language-driven Semantic Segmentation - org/lang-seg)|
- GroupViT: Semantic Segmentation Emerges from Text Supervision
- ZegFormer: Decoupling Zero-Shot Semantic Segmentation
- Scaling Open-Vocabulary Image Segmentation with Image-Level Labels
- Extract Free Dense Labels from CLIP
- A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-Language Model
- Open-world Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding
- Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models
- Perceptual Grouping in Contrastive Vision-Language Models
- SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation
- Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning
- Generalized Decoding for Pixel, Image, and Language - Decoder/tree/main)|
- Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP - seg)|
- Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision
- Side Adapter Network for Open-Vocabulary Semantic Segmentation
- A Simple Framework for Open-Vocabulary Segmentation and Detection - Research/OpenSeeD)|
- Global Knowledge Calibration for Fast Open-Vocabulary Segmentation
- CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation - CVLAB/CAT-Seg)|
- Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition - science/prompt-pretraining)|
- Segment Everything Everywhere All at Once - Decoder/Segment-Everything-Everywhere-All-At-Once)|
- MVP-SEG: Multi-View Prompt Learning for Open-Vocabulary Semantic Segmentation
- TagCLIP: Improving Discrimination Ability of Open-Vocabulary Semantic Segmentation
- Exploring Open-Vocabulary Semantic Segmentation without Human Labels
- DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
- Diffusion Models for Zero-Shot Open-Vocabulary Segmentation
- Diffumask: Synthesizing images with pixel-level annotations for semantic segmentation using diffusion models
- Guiding Text-to-Image Diffusion Model Towards Grounded Generation
- Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation
- Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling
- Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations - OVIS)|
- Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation - by-captions)|
- Open-Vocabulary Panoptic Segmentation with MaskCLIP
- Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
- Open-vocabulary Panoptic Segmentation with Embedding Modulation
- OpenSD: Unified Open-Vocabulary Segmentation and Detection
- SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation
- Plug-and-Play, Dense-Label-Free Extraction of Open-Vocabulary Semantic Segmentation from Vision-Language Models
- Grounding Everything: Emerging Localization Properties in Vision-Language Transformers
- Open-Vocabulary Segmentation with Semantic-Assisted Calibration
- Self-Guided Open-Vocabulary Semantic Segmentation
- CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
- CLIP-DINOiser: Teaching CLIP a few DINO tricks
- Leveraging Open-Vocabulary Diffusion to Camouflaged Instance Segmentation
- In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation - kang/lavg)|
- Hierarchical Open-vocabulary Universal Image Segmentation - hipie/HIPIE)|
- OMG-Seg: Is One Model Good Enough For All Segmentation? - Seg)|
- Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation
-
Open Vocabulary 3D Scene Understanding
- LidarCLIP or: How I Learned to Talk to Point Clouds
- PointCLIP: Point Cloud Understanding by CLIP
- CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-training
- PointCLIP V2: Adapting CLIP for Powerful 3D Open-world Learning
- ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding
- Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining
- Open-Vocabulary 3D Detection via Image-level Class and Debiased Cross-modal Contrastive Learning
- Open-Vocabulary Point-Cloud Object Detection without 3D Annotation - 3DET)|
- CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection
- Object2Scene: Putting Objects in Context for Open-Vocabulary 3D Detection
- FM-OV3D: Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection
- OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection
- PLA: Language-Driven Open-Vocabulary 3D Scene Understanding
- CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP
- OpenScene: 3D Scene Understanding with Open Vocabularies
- CLIP-FO3D: Learning Free Open-world 3D Scene Representations from 2D Dense CLIP
- OpenMask3D: Open-Vocabulary 3D Instance Segmentation
- OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
- Open3DIS: Open-vocabulary 3D Instance Segmentation with 2D Mask Guidance
- OpenSU3D: Open World 3D Scene Understanding using Foundation Models
- UniM-OV3D: Uni-Modality Open-Vocabulary 3D Scene Understanding with Fine-Grained Feature Representation - OV3D)|
-
Open Vocabulary Video Understanding
- Towards Open-Vocabulary Video Instance Segmentation
- OpenVIS: Open-vocabulary Video Instance Segmentation
- DVIS++: Improved Decoupled Framework for Universal Video Segmentation - tao-whu/DVIS_Plus)|
- ActionCLIP: A New Paradigm for Video Action Recognition
- Prompting Visual-Language Models for Efficient Video Understanding - chen.github.io/efficient-prompt)|
- Frozen CLIP Models are Efficient Video Learners - video-recognition)|
- Expanding Language-Image Pretrained Models for General Video Recognition - CLIP)|
- Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models
- Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
- AIM: Adapting Image Models for Efficient Video Action Recognition - image-models.github.io/)|
- Fine-tuned CLIP Models are Efficient Video Learners - CLIP)|
- Open-VCLIP: Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization - VCLIP)|
- Video Action Recognition with Attentive Semantic Units
- MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge - at/MAXI)|
- VicTR: Video-conditioned Text Representations for Activity Recognition
- Generating Action-conditioned Prompts for Open-vocabulary Video Action Recognition
- OVTrack: Open-Vocabulary Multiple Object Tracking
- AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation - NJU/AWT)|
-
-
Related Domains and Beyond
-
Class-agnostic Detection and Segmentation
- Learning Open-World Object Proposals without Learning to Classify
- Unidentified Video Objects: A Benchmark for Dense, Open-World Segmentation - video-object/home)|
- Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity - grouping/)|
- Class-agnostic object detection with multi-modal transformer
- Open World Entity Segmentation
- Fine-Grained Entity Segmentation
- SegPrompt: Boosting Open-World Segmentation via Category-level Prompt Learning - uofa/SegPrompt)|
-
Open-World Object Detection
- Towards Open World Recognition
- OW-DETR: Open-world Detection Transformer - DETR)|
- UC-OWOD: Unknown-Classified Open World Object Detection - OWOD)|
- Revisiting Open World Object Detection - OWOD/RE-OWOD)|
- Rectifying Open-set Object Detection: A Taxonomy, Practical Applications, and Proper Evaluation
- Open World DETR: Transformer based Open World Object Detection
- PROB: Probabilistic Objectness for Open World Object Detection
- Open World Object Detection in the Era of Foundation Models
- Hyp-OW: Exploiting Hierarchical Structure Learning with Hyperbolic Distance Enhances Open World Object Detection
- Towards Open World Object Detection.
- UC-OWOD: Unknown-Classified Open World Object Detection - OWOD)|
-
Open-Set Panoptic Segmentation
-
Programming Languages