{"id":13488564,"url":"https://github.com/NVlabs/ODISE","last_synced_at":"2025-03-28T01:36:23.863Z","repository":{"id":131758138,"uuid":"605221864","full_name":"NVlabs/ODISE","owner":"NVlabs","description":"Official PyTorch implementation of ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models [CVPR 2023 Highlight]","archived":false,"fork":false,"pushed_at":"2024-07-06T07:52:49.000Z","size":17199,"stargazers_count":838,"open_issues_count":31,"forks_count":45,"subscribers_count":40,"default_branch":"main","last_synced_at":"2024-08-01T18:38:03.971Z","etag":null,"topics":["deep-learning","diffusion-models","instance-segmentation","open-vocabulary","open-vocabulary-segmentation","open-vocabulary-semantic-segmentation","open-world-classification","open-world-object-detection","panoptic-segmentation","pytorch","semantic-segmentation","text-image-retrieval","zero-shot-learning"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2303.04803","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NVlabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-02-22T17:50:53.000Z","updated_at":"2024-07-31T03:45:27.000Z","dependencies_parsed_at":null,"dependency_job_id":"b3ee0d98-880e-4c11-9419-e390421b85bf","html_url":"https://github.com/NVlabs/ODISE","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVlabs%2FODISE","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVlabs%2FODISE/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVlabs%2FODISE/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVlabs%2FODISE/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NVlabs","download_url":"https://codeload.github.com/NVlabs/ODISE/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":222333976,"owners_count":16968058,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","diffusion-models","instance-segmentation","open-vocabulary","open-vocabulary-segmentation","open-vocabulary-semantic-segmentation","open-world-classification","open-world-object-detection","panoptic-segmentation","pytorch","semantic-segmentation","text-image-retrieval","zero-shot-learning"],"created_at":"2024-07-31T18:01:18.103Z","updated_at":"2025-03-28T01:36:23.856Z","avatar_url":"https://github.com/NVlabs.png","language":"Python","funding_links":[],"categories":["Segmentation Detection Tracking"],"sub_categories":[],"readme":"# ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models\n\n**ODISE**: **O**pen-vocabulary **DI**ffusion-based panoptic **SE**gmentation exploits pre-trained text-image diffusion and discriminative models to perform open-vocabulary panoptic segmentation.\nIt leverages the frozen representation of both these models to perform panoptic segmentation of any category in the wild. \n\nThis repository is the official implementation of ODISE introduced in the paper:\n\n[**Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models**](https://arxiv.org/abs/2303.04803)\n[*Jiarui Xu*](https://jerryxu.net),\n[*Sifei Liu**](https://research.nvidia.com/person/sifei-liu),\n[*Arash Vahdat**](http://latentspace.cc/),\n[*Wonmin Byeon*](https://wonmin-byeon.github.io/),\n[*Xiaolong Wang*](https://xiaolonw.github.io/),\n[*Shalini De Mello*](https://research.nvidia.com/person/shalini-de-mello)\nCVPR 2023 Highlight. (*equal contribution)\n\nFor business inquiries, please visit our website and submit the form: [NVIDIA Research Licensing](https://www.nvidia.com/en-us/research/inquiries/).\n\n![teaser](figs/github_arch.gif)\n\n## Visual Results\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"figs/github_vis_coco_0.gif\" width=\"32%\"\u003e\n\u003cimg src=\"figs/github_vis_ade_0.gif\" width=\"32%\"\u003e\n\u003cimg src=\"figs/github_vis_ego4d_0.gif\" width=\"32%\"\u003e\n\u003c/div\u003e\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"figs/github_vis_coco_1.gif\" width=\"32%\"\u003e\n\u003cimg src=\"figs/github_vis_ade_1.gif\" width=\"32%\"\u003e\n\u003cimg src=\"figs/github_vis_ego4d_1.gif\" width=\"32%\"\u003e\n\u003c/div\u003e\n\n\n## Links\n* [Jiarui Xu's Project Page](https://jerryxu.net/ODISE/) (with additional visual results)\n* [HuggingFace 🤗 Demo](https://huggingface.co/spaces/xvjiarui/ODISE)\n* [arXiv Page](https://arxiv.org/abs/2303.04803)\n\n## Citation\n\nIf you find our work useful in your research, please cite:\n\n```BiBTeX\n@article{xu2023odise,\n  title={{Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models}},\n  author={Xu, Jiarui and Liu, Sifei and Vahdat, Arash and Byeon, Wonmin and Wang, Xiaolong and De Mello, Shalini},\n  journal={arXiv preprint arXiv:2303.04803},\n  year={2023}\n}\n```\n\n## Environment Setup\n\nInstall dependencies by running:\n\n```bash\nconda create -n odise python=3.9\nconda activate odise\nconda install pytorch=1.13.1 torchvision=0.14.1 pytorch-cuda=11.6 -c pytorch -c nvidia\nconda install -c \"nvidia/label/cuda-11.6.1\" libcusolver-dev\ngit clone git@github.com:NVlabs/ODISE.git \ncd ODISE\npip install -e .\n```\n\n(Optional) install [xformers](https://github.com/facebookresearch/xformers) for efficient transformer implementation:\nOne could either install the pre-built version\n\n```\npip install xformers==0.0.16\n```\n\nor build from latest source \n\n```bash\n# (Optional) Makes the build much faster\npip install ninja\n# Set TORCH_CUDA_ARCH_LIST if running and building on different GPU types\npip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers\n# (this can take dozens of minutes)\n```\n\n## Model Zoo\n\nWe provide two pre-trained models for ODISE trained with label or caption \nsupervision on [COCO's](https://cocodataset.org/#home) entire training set.\nODISE's pre-trained models are subject to the [Creative Commons — Attribution-NonCommercial-ShareAlike 4.0 International — CC BY-NC-SA 4.0 License](https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode) terms.\nEach model contains 28.1M trainable parameters.\nThe download links for these models are provided in the table below.\nWhen you run the `demo/demo.py` or inference script for the very first time, it will also automatically download ODISE's pre-trained model to your local folder `$HOME/.torch/iopath_cache/NVlabs/ODISE/releases/download/v1.0.0/`.\n\n\u003ctable\u003e\n\u003cthead\u003e\n  \u003ctr\u003e\n    \u003cth align=\"center\"\u003e\u003c/th\u003e\n    \u003cth align=\"center\" style=\"text-align:center\" colspan=\"3\"\u003eADE20K(A-150)\u003c/th\u003e\n    \u003cth align=\"center\" style=\"text-align:center\" colspan=\"3\"\u003eCOCO\u003c/th\u003e\n    \u003cth align=\"center\" style=\"text-align:center\"\u003eADE20K-Full \u003cbr\u003e (A-847)\u003c/th\u003e\n    \u003cth align=\"center\" style=\"text-align:center\"\u003ePascal Context 59 \u003cbr\u003e (PC-59)\u003c/th\u003e\n    \u003cth align=\"center\" style=\"text-align:center\"\u003ePascal Context 459 \u003cbr\u003e (PC-459)\u003c/th\u003e\n    \u003cth align=\"center\" style=\"text-align:center\"\u003ePascal VOC 21 \u003cbr\u003e (PAS-21) \u003c/th\u003e\n    \u003cth align=\"center\" style=\"text-align:center\"\u003edownload \u003c/th\u003e\n  \u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n  \u003ctr\u003e\n    \u003ctd align=\"center\"\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003ePQ\u003c/td\u003e\n    \u003ctd align=\"center\"\u003emAP\u003c/td\u003e\n    \u003ctd align=\"center\"\u003emIoU\u003c/td\u003e\n    \u003ctd align=\"center\"\u003ePQ\u003c/td\u003e\n    \u003ctd align=\"center\"\u003emAP\u003c/td\u003e\n    \u003ctd align=\"center\"\u003emIoU\u003c/td\u003e\n    \u003ctd align=\"center\"\u003emIoU\u003c/td\u003e\n    \u003ctd align=\"center\"\u003emIoU\u003c/td\u003e\n    \u003ctd align=\"center\"\u003emIoU\u003c/td\u003e\n    \u003ctd align=\"center\"\u003emIoU\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd align=\"center\"\u003e\u003ca href=\"configs/Panoptic/odise_label_coco_50e.py\"\u003e ODISE (label) \u003c/a\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e22.6\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e14.4\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e29.9\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e55.4\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e46.0\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e65.2\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e11.1\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e57.3\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e14.5\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e84.6\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003ca href=\"https://github.com/NVlabs/ODISE/releases/download/v1.0.0/odise_label_coco_50e-b67d2efc.pth\"\u003e checkpoint \u003c/a\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd align=\"center\"\u003e\u003ca href=\"configs/Panoptic/odise_caption_coco_50e.py\"\u003e ODISE (caption) \u003c/a\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e23.4\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e13.9\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e28.7\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e45.6\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e38.4\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e52.4\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e11.0\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e55.3\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e13.8\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e82.7\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003ca href=\"https://github.com/NVlabs/ODISE/releases/download/v1.0.0/odise_caption_coco_50e-853cc971.pth\"\u003e checkpoint \u003c/a\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\n## Get Started\nSee [Preparing Datasets for ODISE](datasets/README.md).\n\nSee [Getting Started with ODISE](GETTING_STARTED.md) for detailed instructions on training and inference with ODISE.\n## Demo\n\n* Integrated into [Huggingface Spaces 🤗](https://huggingface.co/spaces) using [Gradio](https://github.com/gradio-app/gradio). Try out the web demo: [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/xvjiarui/ODISE)\n\n* Run the demo on Google Colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NVlabs/ODISE/blob/master/demo/demo.ipynb)\n\n\n**Important Note**: When you run the `demo/demo.py` script for the very first time, besides ODISE's pre-trained models, it will also automaticlaly download the pre-trained models for [Stable Diffusion v1.3](https://huggingface.co/CompVis/stable-diffusion-v-1-3-original/resolve/main/sd-v1-3.ckpt) and [CLIP](https://openaipublic.azureedge.net/clip/models/3035c92b350959924f9f00213499208652fc7ea050643e8b385c2dac08641f02/ViT-L-14-336px.pt), from their original sources, to your local directories `$HOME/.torch/` and `$HOME/.cache/clip`, respectively.\nThe pre-trained models for Stable Diffusion and CLIP are subject to their original license terms from [Stable Diffusion](https://github.com/CompVis/stable-diffusion) and [CLIP](https://github.com/openai/CLIP), respectively.\n\n* To run ODISE's demo from the command line:\n\n    ```shell\n    python demo/demo.py --input demo/examples/coco.jpg --output demo/coco_pred.jpg --vocab \"black pickup truck, pickup truck; blue sky, sky\"\n    ```\n    The output is saved in `demo/coco_pred.jpg`. For more detailed options for `demo/demo.py` see [Getting Started with ODISE](GETTING_STARTED.md).\n    \n  \n* To run the [Gradio](https://github.com/gradio-app/gradio) demo locally:\n    ```shell\n    python demo/app.py\n    ```\n\n## Acknowledgement\n\nCode is largely based on [Detectron2](https://github.com/facebookresearch/detectron2), [Stable Diffusion](https://github.com/CompVis/stable-diffusion), [Mask2Former](https://github.com/facebookresearch/Mask2Former), [OpenCLIP](https://github.com/mlfoundations/open_clip) and [GLIDE](https://github.com/openai/glide-text2im).\n\nThank you, all, for the great open-source projects!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNVlabs%2FODISE","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FNVlabs%2FODISE","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNVlabs%2FODISE/lists"}