https://github.com/NVlabs/ODISE

Official PyTorch implementation of ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models [CVPR 2023 Highlight]
https://github.com/NVlabs/ODISE

deep-learning diffusion-models instance-segmentation open-vocabulary open-vocabulary-segmentation open-vocabulary-semantic-segmentation open-world-classification open-world-object-detection panoptic-segmentation pytorch semantic-segmentation text-image-retrieval zero-shot-learning

Last synced: about 2 months ago
JSON representation

Official PyTorch implementation of ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models [CVPR 2023 Highlight]

Host: GitHub
URL: https://github.com/NVlabs/ODISE
Owner: NVlabs
License: other
Created: 2023-02-22T17:50:53.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2024-07-06T07:52:49.000Z (10 months ago)
Last Synced: 2024-08-01T18:38:03.971Z (9 months ago)
Topics: deep-learning, diffusion-models, instance-segmentation, open-vocabulary, open-vocabulary-segmentation, open-vocabulary-semantic-segmentation, open-world-classification, open-world-object-detection, panoptic-segmentation, pytorch, semantic-segmentation, text-image-retrieval, zero-shot-learning
Language: Python
Homepage: https://arxiv.org/abs/2303.04803
Size: 16.4 MB
Stars: 838
Watchers: 40
Forks: 45
Open Issues: 31
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-diffusion-categorized - [Code

README

        # ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models

**ODISE**: **O**pen-vocabulary **DI**ffusion-based panoptic **SE**gmentation exploits pre-trained text-image diffusion and discriminative models to perform open-vocabulary panoptic segmentation.

It leverages the frozen representation of both these models to perform panoptic segmentation of any category in the wild. 

This repository is the official implementation of ODISE introduced in the paper:

[**Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models**](https://arxiv.org/abs/2303.04803)

[*Jiarui Xu*](https://jerryxu.net),

[*Sifei Liu**](https://research.nvidia.com/person/sifei-liu),

[*Arash Vahdat**](http://latentspace.cc/),

[*Wonmin Byeon*](https://wonmin-byeon.github.io/),

[*Xiaolong Wang*](https://xiaolonw.github.io/),

[*Shalini De Mello*](https://research.nvidia.com/person/shalini-de-mello)

CVPR 2023 Highlight. (*equal contribution)

For business inquiries, please visit our website and submit the form: [NVIDIA Research Licensing](https://www.nvidia.com/en-us/research/inquiries/).

![teaser](figs/github_arch.gif)

## Visual Results





















## Links

* [Jiarui Xu's Project Page](https://jerryxu.net/ODISE/) (with additional visual results)

* [HuggingFace 🤗 Demo](https://huggingface.co/spaces/xvjiarui/ODISE)

* [arXiv Page](https://arxiv.org/abs/2303.04803)

## Citation

If you find our work useful in your research, please cite:

```BiBTeX

@article{xu2023odise,

  title={{Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models}},

  author={Xu, Jiarui and Liu, Sifei and Vahdat, Arash and Byeon, Wonmin and Wang, Xiaolong and De Mello, Shalini},

  journal={arXiv preprint arXiv:2303.04803},

  year={2023}

}

```

## Environment Setup

Install dependencies by running:

```bash

conda create -n odise python=3.9

conda activate odise

conda install pytorch=1.13.1 torchvision=0.14.1 pytorch-cuda=11.6 -c pytorch -c nvidia

conda install -c "nvidia/label/cuda-11.6.1" libcusolver-dev

git clone [email protected]:NVlabs/ODISE.git 

cd ODISE

pip install -e .

```

(Optional) install [xformers](https://github.com/facebookresearch/xformers) for efficient transformer implementation:

One could either install the pre-built version

```

pip install xformers==0.0.16

```

or build from latest source 

```bash

# (Optional) Makes the build much faster

pip install ninja

# Set TORCH_CUDA_ARCH_LIST if running and building on different GPU types

pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers

# (this can take dozens of minutes)

```

## Model Zoo

We provide two pre-trained models for ODISE trained with label or caption 

supervision on [COCO's](https://cocodataset.org/#home) entire training set.

ODISE's pre-trained models are subject to the [Creative Commons — Attribution-NonCommercial-ShareAlike 4.0 International — CC BY-NC-SA 4.0 License](https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode) terms.

Each model contains 28.1M trainable parameters.

The download links for these models are provided in the table below.

When you run the `demo/demo.py` or inference script for the very first time, it will also automatically download ODISE's pre-trained model to your local folder `$HOME/.torch/iopath_cache/NVlabs/ODISE/releases/download/v1.0.0/`.

  

    

    ADE20K(A-150)

    COCO

    ADE20K-Full 
 (A-847)

    Pascal Context 59 
 (PC-59)

    Pascal Context 459 
 (PC-459)

    Pascal VOC 21 
 (PAS-21) 

    download 

  

  

    

    PQ

    mAP

    mIoU

    PQ

    mAP

    mIoU

    mIoU

    mIoU

    mIoU

    mIoU

  

  

     ODISE (label) 

    22.6

    14.4

    29.9

    55.4

    46.0

    65.2

    11.1

    57.3

    14.5

    84.6

     checkpoint 

  

  

     ODISE (caption) 

    23.4

    13.9

    28.7

    45.6

    38.4

    52.4

    11.0

    55.3

    13.8

    82.7

     checkpoint 

  

## Get Started

See [Preparing Datasets for ODISE](datasets/README.md).

See [Getting Started with ODISE](GETTING_STARTED.md) for detailed instructions on training and inference with ODISE.

## Demo

* Integrated into [Huggingface Spaces 🤗](https://huggingface.co/spaces) using [Gradio](https://github.com/gradio-app/gradio). Try out the web demo: [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/xvjiarui/ODISE)

* Run the demo on Google Colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NVlabs/ODISE/blob/master/demo/demo.ipynb)

**Important Note**: When you run the `demo/demo.py` script for the very first time, besides ODISE's pre-trained models, it will also automaticlaly download the pre-trained models for [Stable Diffusion v1.3](https://huggingface.co/CompVis/stable-diffusion-v-1-3-original/resolve/main/sd-v1-3.ckpt) and [CLIP](https://openaipublic.azureedge.net/clip/models/3035c92b350959924f9f00213499208652fc7ea050643e8b385c2dac08641f02/ViT-L-14-336px.pt), from their original sources, to your local directories `$HOME/.torch/` and `$HOME/.cache/clip`, respectively.

The pre-trained models for Stable Diffusion and CLIP are subject to their original license terms from [Stable Diffusion](https://github.com/CompVis/stable-diffusion) and [CLIP](https://github.com/openai/CLIP), respectively.

* To run ODISE's demo from the command line:

    ```shell

    python demo/demo.py --input demo/examples/coco.jpg --output demo/coco_pred.jpg --vocab "black pickup truck, pickup truck; blue sky, sky"

    ```

    The output is saved in `demo/coco_pred.jpg`. For more detailed options for `demo/demo.py` see [Getting Started with ODISE](GETTING_STARTED.md).

    

  

* To run the [Gradio](https://github.com/gradio-app/gradio) demo locally:

    ```shell

    python demo/app.py

    ```

## Acknowledgement

Code is largely based on [Detectron2](https://github.com/facebookresearch/detectron2), [Stable Diffusion](https://github.com/CompVis/stable-diffusion), [Mask2Former](https://github.com/facebookresearch/Mask2Former), [OpenCLIP](https://github.com/mlfoundations/open_clip) and [GLIDE](https://github.com/openai/glide-text2im).

Thank you, all, for the great open-source projects!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/NVlabs/ODISE

Awesome Lists containing this project

README