{"id":13488605,"url":"https://github.com/segments-ai/latent-diffusion-segmentation","last_synced_at":"2026-04-08T17:03:30.031Z","repository":{"id":217351908,"uuid":"743570218","full_name":"segments-ai/latent-diffusion-segmentation","owner":"segments-ai","description":"A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting [ECCV 2024]","archived":false,"fork":false,"pushed_at":"2024-01-30T09:22:10.000Z","size":4331,"stargazers_count":76,"open_issues_count":7,"forks_count":5,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-03-28T01:46:30.711Z","etag":null,"topics":["diffusion-models","generative-ai","latent-diffusion","panoptic-segmentation","pytorch","segmentation","stable-diffusion"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2401.10227","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/segments-ai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-01-15T14:15:48.000Z","updated_at":"2025-03-19T22:02:21.000Z","dependencies_parsed_at":"2024-01-15T23:23:24.913Z","dependency_job_id":"f04383a3-db79-46de-bc01-a9d60b35d1b0","html_url":"https://github.com/segments-ai/latent-diffusion-segmentation","commit_stats":null,"previous_names":["segments-ai/latent-diffusion-segmentation"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/segments-ai/latent-diffusion-segmentation","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/segments-ai%2Flatent-diffusion-segmentation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/segments-ai%2Flatent-diffusion-segmentation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/segments-ai%2Flatent-diffusion-segmentation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/segments-ai%2Flatent-diffusion-segmentation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/segments-ai","download_url":"https://codeload.github.com/segments-ai/latent-diffusion-segmentation/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/segments-ai%2Flatent-diffusion-segmentation/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31564915,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-08T14:31:17.711Z","status":"ssl_error","status_checked_at":"2026-04-08T14:31:17.202Z","response_time":54,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["diffusion-models","generative-ai","latent-diffusion","panoptic-segmentation","pytorch","segmentation","stable-diffusion"],"created_at":"2024-07-31T18:01:18.749Z","updated_at":"2026-04-08T17:03:29.995Z","avatar_url":"https://github.com/segments-ai.png","language":"Python","funding_links":[],"categories":["Segmentation Detection Tracking"],"sub_categories":[],"readme":"# A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting\n\nThis repo contains the Pytorch implementation of LDMSeg: a simple latent diffusion approach for panoptic segmentation and Mask inpainting. The provided code inlcudes both the training and evaluation.\n\n\u003e [**A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting**](https://arxiv.org/abs/2401.10227)\n\u003e\n\u003e [Wouter Van Gansbeke](https://wvangansbeke.github.io/) and [Bert De Brabandere](https://scholar.google.be/citations?user=KcMb_7EAAAAJ)\n\u003cp align=\"left\"\u003e\n    \u003cimg src=\"assets/teaser.jpg\" width=500 /\u003e\n\n## Contents\n1. [Introduction](#-introduction)\n0. [Installation](#-installation)\n    - [Automatic Installation](#automatic-installation)\n    - [Manual Installation](#manual-installation)\n0. [Training](#-training)\n    - [Step 1: Train Auto-Encoder](#step-1-train-an-auto-encoder-on-panoptic-segmentation-maps)\n    - [Step 2: Train LDM](#step-2-train-an-ldm-for-panoptic-segmentation-conditioned-on-rgb-images)\n0. [Pretrained Models](#-pretrained-models)\n0. [Citation](#-citation)\n0. [License](#license)\n0. [Acknoledgements](#acknoledgements)\n\n\n## 📋 Introduction\nThis paper presents a conditional latent diffusion approach to tackle the task of panoptic segmentation.\nThe aim is to omit the need for specialized architectures (e.g., region-proposal-networks or object queries), complex loss functions (e.g., Hungarian matching or based on bounding boxes), and additional post-processing methods (e.g., clustering, NMS, or object pasting).\nAs a result, we rely on Stable Diffusion, which is a task-agnostic framework. The presented approach consists of two-steps: (1) project the panoptic segmentation masks to a latent space with a shallow auto-encoder; (2) train a diffusion model in latent space, conditioned on RGB images.\n\n__Key Contributions__: Our contributions are threefold:\n\n1. __Generative Framework__: We propose a fully\ngenerative approach based on Latent Diffusion Models\n(LDMs) for panoptic segmentation. Our approach builds\nupon Stable Diffusion to strive for simplicity and to\nease compute. We first study the class-agnostic setup to liberate\npanoptic segmentation from predefined classes.\n2. __General-Purpose Design__: Our approach circumvents spe-\ncialized architectures, complex loss functions, and object\ndetection modules, present in the majority of prevailing\nmethods. Here, the denoising objective omits the necessity\nfor object queries, region proposals, and Hungarian match-\ning. This simple and general approach paves the way\nfor future extensions to a wide range of dense prediction\ntasks, e.g., depth prediction, saliency estimation, etc.\n3. __Mask Inpainting__: We successfully apply our approach to\nscene-centric datasets and demonstrate its mask inpainting\ncapabilities for different sparsity levels. \nThe approach shows promising results for global mask inpainting.\n\n## 🛠 Installation\n\nThe code runs with recent Pytorch versions, e.g. 2.0. \nFurther, you can create a python environment with [Anaconda](https://docs.anaconda.com/anaconda/install/):\n```\nconda create -n LDMSeg python=3.11\nconda activate LDMSeg\n```\n### Automatic Installation\nWe recommend to follow the automatic installatation (see `tools/scripts/install_env.sh`). Run the following commands to install the project in editable mode. Note that all dependencies will be installed automatically. \nAs this might not always work (e.g., due to CUDA or gcc issues), please have a look at the manual installation steps.\n\n```shell\npython -m pip install -e .\npip install git+https://github.com/facebookresearch/detectron2.git\npip install git+https://github.com/cocodataset/panopticapi.git\n```\n\n### Manual Installation\nThe most important packages can be quickly installed with pip as:\n```shell\npip install torch torchvision einops                            # Main framework\npip install diffusers transformers xformers accelerate timm     # For using pretrained models\npip install scipy opencv-python                                 # For augmentations or loss\npip install pyyaml easydict hydra-core                          # For using config files\npip install termcolor wandb                                     # For printing and logging\n```\nSee `data/environment.yml` for a copy of my environment. We also rely on some dependencies from [detectron2](https://detectron2.readthedocs.io/en/latest/tutorials/install.html) and [panopticapi](https://github.com/cocodataset/panopticapi). Please follow their docs.\n\n## 🗃️ Dataset\nWe currently support the [COCO](https://cocodataset.org/#download) dataset. Please follow the docs for installing the images and their corresponding panoptic segmentation masks. Also, take a look at the `ldmseg/data/` directory for a few examples on the COCO dataset. As a sidenote, the adopted structure should be fairly standard:\n```\n.\n└── coco\n    ├── annotations\n    ├── panoptic_semseg_train2017\n    ├── panoptic_semseg_val2017\n    ├── panoptic_train2017 -\u003e annotations/panoptic_train2017\n    ├── panoptic_val2017 -\u003e annotations/panoptic_val2017\n    ├── test2017\n    ├── train2017\n    └── val2017\n```\n\nLast but not least, change the paths in `configs/env/root_paths.yml` to your dataset root and your desired output directory respectively.\n\n## ⏳ Training\nThe presented approach is two-pronged: First, we train an auto-encoder to represent segmentation maps in a lower dimensional space (e.g., 64x64). Next, we start from pretrained Latent Diffusion Models (LDM), particularly Stable Diffusion, to train a model which can generate panoptic masks from RGB images.\nThe models can be trained by running the the following commands. By default we will train on the COCO dataset with the base config file defined in `tools/configs/base/base.yaml`. Note that this file will be automatically loaded as we rely on the `hydra` package.\n\n### Step 1: Train an Auto-Encoder on Panoptic Segmentation Maps\n```python\npython -W ignore tools/main_ae.py \\\n    datasets=coco \\\n    base.train_kwargs.fp16=True \\\n    base.optimizer_name=adamw \\\n    base.optimizer_kwargs.lr=1e-4 \\\n    base.optimizer_kwargs.weight_decay=0.05\n```\nMore details on passing arguments can be found in `tools/scripts/train_ae.sh`. For example, I run this model for 50k iterations on a single GPU of 23 GB with a total batch size of 16.\n\n### Step 2: Train an LDM for Panoptic Segmentation Conditioned on RGB Images\n```python\npython -W ignore tools/main_ldm.py \\\n    datasets=coco \\\n    base.train_kwargs.gradient_checkpointing=True \\\n    base.train_kwargs.fp16=True \\\n    base.train_kwargs.weight_dtype=float16 \\\n    base.optimizer_zero_redundancy=True \\\n    base.optimizer_name=adamw \\\n    base.optimizer_kwargs.lr=1e-4 \\\n    base.optimizer_kwargs.weight_decay=0.05 \\\n    base.scheduler_kwargs.weight='max_clamp_snr' \\\n    base.vae_model_kwargs.pretrained_path='$AE_MODEL'\n```\n`$AE_MODEL` denotes the path to the model obtained from the previous step.\nMore details on passing arguments can be found in `tools/scripts/train_diffusion.sh`. For example, I ran this model for 200k iterations on 8 GPUs of 16 GB with a total batch size of 256. \n\n## 📊 Pretrained Models\n\nWe're planning to release several trained models. The (class-agnostic) PQ metric is provided on the COCO validation set.\n\n| Model                      |\\#Params | Dataset         | Iters     | PQ     | SQ | RQ | Download link                                                                                           |\n|----------------------------|-----|------------|------------|--------|---|---|---------------------------------------------------------------------------------------------------------|\n| [AE](#training) |  ~2M   | COCO              | 66k          | -      | - | -  | [Download](https://drive.google.com/file/d/1wmOGB-Ue47DPGFiPxiBFxHv1h5g5Zooe/view?usp=sharing)  (23 MB)  |\n| [LDM](#training) | ~800M   | COCO | 200k     | 51.7   | 82.0 |  63.0 | [Download](https://drive.google.com/file/d/1EKuOm_DnSGa0Ff-EkIl6Q1wknZxm4ygB/view?usp=sharing)  (3.3 GB)  |\n\nNote: A less powerful AE (i.e., less downsampling or upsampling layers) can often benefit inpainting, as we don't perform additional finetuning.\n\nThe evaluation should look like:\n```python\npython -W ignore tools/main_ldm.py \\\n    datasets=coco \\\n    base.sampling_kwargs.num_inference_steps=50 \\\n    base.eval_only=True \\\n    base.load_path=$PRETRAINED_MODEL_PATH \\\n```\nYou can add parameters if necessary. Higher thresholds such as `--base.eval_kwargs.count_th 700` or `--base.eval_kwargs.mask_th 0.9` can further boost the numbers. \nHowever, we use standard values by thresholding at 0.5 and removing segments with an area smaller than 512 for the evaluation.\n\nTo evaluate a pretrained model from above, run `tools/scripts/eval.sh`.\n\n\nHere, we visualize the results:\n\n\u003cp align=\"left\"\u003e\n    \u003cimg src=\"assets/predictions.jpg\" width=1000 /\u003e\n\n## 🪧 Citation\nIf you find this repository useful for your research, please consider citing the following paper:\n\n```bibtex\n@article{vangansbeke2024ldmseg,\n  title={a simple latent diffusion approach for panoptic segmentation and mask inpainting},\n  author={Van Gansbeke, Wouter and De Brabandere, Bert},\n  journal={arxiv preprint arxiv:2401.10227},\n  year={2024}\n}\n```\nFor any enquiries, please contact the [main author](https://github.com/wvangansbeke).\n\n## License\n\nThis software is released under a creative commons license which allows for personal and research use only. For a commercial license please contact the authors. You can view a license summary [here](http://creativecommons.org/licenses/by-nc/4.0/).\n\n\n## Acknoledgements\n\nI'm thankful for all the public repositories (see also references in the code), and in particular for the [detectron2](https://github.com/facebookresearch/detectron2) and [diffusers](https://github.com/huggingface/diffusers) libaries.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsegments-ai%2Flatent-diffusion-segmentation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsegments-ai%2Flatent-diffusion-segmentation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsegments-ai%2Flatent-diffusion-segmentation/lists"}