https://github.com/chenxi52/CMPF

Open-Vocabulary Panoptic Segmentation
https://github.com/chenxi52/CMPF

clip instance-segmentation multi-modal-learning open-vocabulary open-vocabulary-segmentation open-vocabulary-semantic-segmentation panoptic-segmentation segment-anything segmentation vision-and-language zero-shot

Last synced: 3 months ago
JSON representation

Open-Vocabulary Panoptic Segmentation

Host: GitHub
URL: https://github.com/chenxi52/CMPF
Owner: chenxi52
Created: 2024-03-17T06:38:53.000Z (over 1 year ago)
Default Branch: master
Last Pushed: 2025-06-15T04:19:11.000Z (4 months ago)
Last Synced: 2025-06-15T05:22:11.377Z (4 months ago)
Topics: clip, instance-segmentation, multi-modal-learning, open-vocabulary, open-vocabulary-segmentation, open-vocabulary-semantic-segmentation, panoptic-segmentation, segment-anything, segmentation, vision-and-language, zero-shot
Language: Python
Homepage:
Size: 1.12 MB
Stars: 24
Watchers: 2
Forks: 1
Open Issues: 2
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

Awesome-Segment-Anything - [code

README

# CMPF: Harmonizing Cross-Model Prior Fusion for Open-Vocabulary Segmentation

## Abstract

>Open-vocabulary segmentation poses significant challenges, as it requires segmenting and recognizing objects across an open set of categories in unconstrained environments. Building on the success of powerful vision-language (ViL) foundation models, such as CLIP, recent efforts sought to harness their zero-shot capabilities to recognize unseen categories. Despite notable performance improvements, these models still encounter the critical issue of generating and recognizing precise mask proposals for unseen categories and scenarios, resulting in inferior segmentation performance eventually. To address this challenge, we introduce a novel Cross-Model Prior Fusion (CMPF) framework, an innovative framework that fuses visual knowledge from a localization foundation model (e.g., SAM) and text knowledge from a ViL model (e.g., CLIP), leveraging their complementary knowledge priors to overcome inherent limitations in mask proposal generation. Taking the ViL model’s visual encoder as the feature backbone, we propose Query Injector and Feature Injector to inject the visual localization feature into the learnable queries and CLIP features respectively, within a transformer decoder. In addition, an OpenSeg Ensemble strategy is designed to further improve mask quality by incorporating SAM’s universal segmentation masks during inference. To fully exploit pre-trained knowledge while minimizing training overhead, we freeze both foundation models, focusing optimization efforts solely on a lightweight transformer decoder for mask proposal generation – the performance bottleneck. Extensive experiments demonstrate that CMPF advances state-of-the-art results across various segmentation benchmarks, trained exclusively on COCO panoptic data, and tested in a zero-shot manner.

![FrozenSeg design](images/frozenseg.png)

## Dependencies and Installation
See [installation instructions](INSTALL.md).

## Getting Started
See [Preparing Datasets](datasets/README.md).

See [Getting Started](GETTING_STARTED.md).

## Models

ADE20K(A-150)
Cityscapes
Mapillary Vistas
BDD 100K
A-847
PC-459
PAS-21
Lvis
COCO
(training dataset)
download

PQ
mAP
mIoU
FWIoU
PQ
mAP
mIoU
PQ
mIoU
PQ
mIoU
mIoU
FWIoU
mIoU
FWIoU
mIoU
FWIoU
APr
PQ
mAP
mIoU

CMPF (ResNet50x64)
23.1
13.5
30.7
56.6
45.2
28.9
56.0
18.1
27.7
12.9
46.2
11.8
52.8
18.7
60.1
82.3
92.1
23.5
55.7
47.4
65.4
checkpoint

CMPF (ConvNeXt-Large)
25.9
16.5
34.4
59.9
45.8
28.4
56.8
18.5
27.3
19.3
52.3
14.8
51.4
19.7
60.2
82.5
92.1
25.6
56.2
47.3
65.5
checkpoint

## !!Note:

This repository serves as the official implementation for both **CMPF** and **FrozenSeg**, which are essentially the same work presented under different names.
>[FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation](https://arxiv.org/abs/2409.03525)

## Acknowledgement
[Detectron2](https://github.com/facebookresearch/detectron2), [Mask2Former](https://github.com/facebookresearch/Mask2Former), [Segment Anything](https://github.com/facebookresearch/segment-anything), [OpenCLIP](https://github.com/mlfoundations/open_clip) and [FC-CLIP](https://github.com/bytedance/fc-clip/tree/main).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/chenxi52/CMPF

Awesome Lists containing this project

README