{"id":18678232,"url":"https://github.com/wvangansbeke/unsupervised-semantic-segmentation","last_synced_at":"2025-04-06T00:09:35.617Z","repository":{"id":40628684,"uuid":"337978427","full_name":"wvangansbeke/Unsupervised-Semantic-Segmentation","owner":"wvangansbeke","description":"Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals. [ICCV 2021]","archived":false,"fork":false,"pushed_at":"2022-06-14T03:13:55.000Z","size":1874,"stargazers_count":410,"open_issues_count":2,"forks_count":54,"subscribers_count":16,"default_branch":"main","last_synced_at":"2025-03-29T23:09:24.813Z","etag":null,"topics":["clustering","contrastive-learning","iccv","iccv2021","moco","pascal-voc","representation-learning","self-supervised-learning","semantic-segmentation","unsupervised-learning","unsupervised-semantic-segmentation"],"latest_commit_sha":null,"homepage":"https://openaccess.thecvf.com/content/ICCV2021/html/Van_Gansbeke_Unsupervised_Semantic_Segmentation_by_Contrasting_Object_Mask_Proposals_ICCV_2021_paper.html","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wvangansbeke.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-02-11T09:09:25.000Z","updated_at":"2025-02-14T08:29:39.000Z","dependencies_parsed_at":"2022-08-01T00:08:09.240Z","dependency_job_id":null,"html_url":"https://github.com/wvangansbeke/Unsupervised-Semantic-Segmentation","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wvangansbeke%2FUnsupervised-Semantic-Segmentation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wvangansbeke%2FUnsupervised-Semantic-Segmentation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wvangansbeke%2FUnsupervised-Semantic-Segmentation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wvangansbeke%2FUnsupervised-Semantic-Segmentation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wvangansbeke","download_url":"https://codeload.github.com/wvangansbeke/Unsupervised-Semantic-Segmentation/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247415967,"owners_count":20935387,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clustering","contrastive-learning","iccv","iccv2021","moco","pascal-voc","representation-learning","self-supervised-learning","semantic-segmentation","unsupervised-learning","unsupervised-semantic-segmentation"],"created_at":"2024-11-07T09:36:31.542Z","updated_at":"2025-04-06T00:09:35.591Z","avatar_url":"https://github.com/wvangansbeke.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals\n\nThis repo contains the Pytorch implementation of our paper:\n\u003e [**Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals**](https://arxiv.org/pdf/2102.06191.pdf)\n\u003e\n\u003e [Wouter Van Gansbeke](https://twitter.com/WGansbeke), [Simon Vandenhende](https://twitter.com/svandenh1), [Stamatios Georgoulis](https://twitter.com/stam_g), and [Luc Van Gool](https://ee.ethz.ch/the-department/faculty/professors/person-detail.OTAyMzM=.TGlzdC80MTEsMTA1ODA0MjU5.html).\n\n- __Accepted at ICCV 2021 ([Slides](https://wvangansbeke.github.io/pdfs/unsupervised_segmentation.pdf)).__\n- 🏆 __SOTA for unsupervised semantic segmentation.__ \n- __Check out [Papers With Code](https://paperswithcode.com/paper/unsupervised-semantic-segmentation-by) for the [Unsupervised Semantic Segmentation](https://paperswithcode.com/sota/unsupervised-semantic-segmentation-on-pascal-1?p=unsupervised-semantic-segmentation-by) benchmark and more details.__\n- __Check out our follow-up work [MaskDistill](https://github.com/wvangansbeke/MaskDistill) with improvements upto +11\\% mIoU on PASCAL VOC__. \n\u003cp align=\"left\"\u003e\n    \u003cimg src=\"images/teaser.png\" width=\"600\"/\u003e\n\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unsupervised-semantic-segmentation-by/unsupervised-semantic-segmentation-on-pascal-1)](https://paperswithcode.com/sota/unsupervised-semantic-segmentation-on-pascal-1?p=unsupervised-semantic-segmentation-by) \n\n## Contents\n1. [Introduction](#introduction)\n0. [Installation](#installation)\n0. [Training MaskContrast](#training-maskcontrast)\n    - [Setup](#setup)\n    - [Train](#pre-train-model)\n0. [Evaluation](#evaluation)\n    - [Linear Classifier](#linear-classifier-lc)\n    - [Clustering](#clustering-k-means)\n    - [Semantic Segment Retrieval](#semantic-segment-retrieval)\n0. [Model Zoo](#model-zoo)\n0. [Citation](#citation)\n\n## Introduction\nBeing able to learn dense semantic representations of images without supervision is an important problem in computer vision. However, despite its significance, this problem remains rather unexplored, with a few exceptions that considered unsupervised semantic segmentation on small-scale datasets with a narrow visual domain. We make a first attempt to tackle the problem on datasets that have been traditionally utilized for the supervised case (e.g. PASCAL VOC). To achieve this, we introduce a novel two-step framework that adopts a predetermined prior in a contrastive optimization objective to learn pixel embeddings.\nAdditionally, we argue about the importance of having a prior that contains information about objects, or their parts, and discuss several possibilities to obtain such a prior in an unsupervised manner. In particular, we adopt a mid-level visual prior to group pixels together and contrast the obtained object mask porposals. For this reason we name the method __MaskContrast__. \n\n## Installation\nThe Python code is compatible with Pytorch version 1.4 (version 1.5 should work as well).\nAssuming [Anaconda](https://docs.anaconda.com/anaconda/install/), the most important packages can be installed as:\n```shell\nconda install pytorch=1.4.0 torchvision=0.5.0 cudatoolkit=10.0 -c pytorch\nconda install -c conda-forge opencv           # For image transformations\nconda install matplotlib scipy scikit-learn   # For evaluation\nconda install pyyaml easydict                 # For using config files\nconda install termcolor                       # For colored print statements\n```\nWe refer to the `requirements.txt` file for an overview of the packages in the environment we used to produce our results.\nThe code was run on 2 Tesla V100 GPUs. \n\n## Training MaskContrast\n\n### Setup\nThe PASCAL VOC [dataset](https://drive.google.com/file/d/1pxhY5vsLwXuz6UHZVUKhtb7EJdCg2kuH/view?usp=sharing) will be downloaded automatically when running the code for the first time. The dataset includes the precomputed supervised and unsupervised saliency masks, following the implementation from the paper. \n\nThe following files (in the `pretrain/` and `segmentation/` directories) need to be adapted in order to run the code on your own machine:\n- Change the file path for the datasets in `data/util/mypath.py`. The PASCAL VOC dataset will be saved to this path.\n- Specify the output directory in `configs/env.yml`. All results will be stored under this directory. \n\n### Pre-train model\nThe training procedure consists of two steps. First, pixels are grouped together based upon a mid-level visual prior (saliency is used). Then, a pre-training strategy is proposed to contrast the pixel-embeddings of the obtained object masks. The code for the pre-training can be found in the `pretrain/` directory and the configuration files are located in the `pretrain/configs/` directory. You can choose to run the model with the masks from the supervised or unsupervised saliency model.\nFor example, run the following command to perform the pre-training step on PASCAL VOC with the supervised saliency model:\n```shell\ncd pretrain\npython main.py --config_env configs/env.yml --config_exp configs/VOCSegmentation_supervised_saliency_model.yml\n```\n\n## Evaluation\n### Linear Classifier (LC)\nWe freeze the weights of the pre-trained model and train a 1 x 1 convolutional layer to predict the class assignments from the generated feature representations. Since the discriminative power of a linear classifier is low, the pixel embeddings need to be informative of the semantic class to solve the task in this way. To train the classifier run the following command:\n```shell\ncd segmentation\npython linear_finetune.py --config_env configs/env.yml --config_exp configs/linear_finetune/linear_finetune_VOCSegmentation_supervised_saliency.yml\n```\nNote, make sure that the `pretraining` variable in `linear_finetune_VOCSegmentation_supervised_saliency.yml` points to the location of your pre-trained model.\nYou should get the following results:\n```\nmIoU is 63.95\nIoU class background is 90.95\nIoU class aeroplane is 83.78\nIoU class bicycle is 30.66\nIoU class bird is 78.79\nIoU class boat is 64.57\nIoU class bottle is 67.31\nIoU class bus is 84.24\nIoU class car is 76.77\nIoU class cat is 79.10\nIoU class chair is 21.24\nIoU class cow is 66.45\nIoU class diningtable is 46.63\nIoU class dog is 73.25\nIoU class horse is 62.61\nIoU class motorbike is 69.66\nIoU class person is 72.30\nIoU class pottedplant is 40.15\nIoU class sheep is 74.70\nIoU class sofa is 30.43\nIoU class train is 74.67\nIoU class tvmonitor is 54.66\n```\nUnsurprisingly, the model has not learned a good representation for every class since some classes are hard to distinguish, e.g. `chair` or `sofa`.\n\nWe visualize a few examples after CRF post-processing below.\n\u003cp align=\"left\"\u003e\n    \u003cimg src=\"images/examples.jpg\" width=\"950\"/\u003e\n\n### Clustering (K-means)\nThe feature representations are clustered with K-means. If the pixel embeddings are disentangled according to the defined class labels, we can match the predicted clusters with the ground-truth classes using the Hungarian matching algorithm. \n\n\n```shell\ncd segmentation\npython kmeans.py --config_env configs/env.yml --config_exp configs/kmeans/kmeans_VOCSegmentation_supervised_saliency.yml\n```\nRemarks: Note that we perform the complete K-means fitting on the validation set to save memory and that the reported results were averaged over 5 different runs. \nYou should get the following results (21 clusters):\n```\nIoU class background is 88.17\nIoU class aeroplane is 77.41\nIoU class bicycle is 26.18\nIoU class bird is 68.27\nIoU class boat is 47.89\nIoU class bottle is 56.99\nIoU class bus is 80.63\nIoU class car is 66.80\nIoU class cat is 46.13\nIoU class chair is 0.73\nIoU class cow is 0.10\nIoU class diningtable is 0.57\nIoU class dog is 35.93\nIoU class horse is 48.68\nIoU class motorbike is 60.60\nIoU class person is 32.24\nIoU class pottedplant is 23.88\nIoU class sheep is 36.76\nIoU class sofa is 26.85\nIoU class train is 69.90\nIoU class tvmonitor is 27.56\n```\n\n### Semantic Segment Retrieval\nWe examine our representations on PASCAL through segment retrieval. First, we compute a feature vector for every object mask in the `val` set by averaging the pixel embeddings within the predicted mask. Next, we retrieve the nearest neighbors on the `train_aug` set for each object.\n\n```shell\ncd segmentation\npython retrieval.py --config_env configs/env.yml --config_exp configs/retrieval/retrieval_VOCSegmentation_unsupervised_saliency.yml\n```\n\n| Method                    | MIoU (7 classes) | MIoU (21 classes)|\n| ------------------------- | ---------------- | ---------------- |\n| MoCo v2                   | 48.0             | 39.0             |\n| MaskContrast* (unsup sal.)| 53.4             | 43.3             |\n| MaskContrast* (sup sal.)  | 62.3             | 49.6             |\n\n_\\* Denotes MoCo init._\n\n\n## Model Zoo\nDownload the pretrained and linear finetuned models here.\n\n| Dataset            | Pixel Grouping Prior    | mIoU (LC)     | mIoU (K-means)   |Download link |\n|------------------  | ----------------------  |---------------|---------  |--------------|\n| PASCAL VOC         |  Supervised Saliency    |   -           |   44.2    |[Pretrained Model 🔗](https://drive.google.com/file/d/1UkzAZMBG1U8kTqO3yhO2nTtoRNtEvyRq/view?usp=sharing) | \n| PASCAL VOC         |  Supervised Saliency    |   63.9 (65.5*)  |   44.2    |[Linear Finetuned 🔗](https://drive.google.com/file/d/1C2iv8wFV8MNLYLKw2E0Do2aeO-eaWNw3/view?usp=sharing)  |\n| PASCAL VOC         |  Unsupervised Saliency   |   -           |  35.0     |[Pretrained Model 🔗](https://drive.google.com/file/d/1efL1vWVcrGAqeC6OLalX8pwec41c6NZj/view?usp=sharing) |\n| PASCAL VOC         |  Unsupervised Saliency   |   58.4 (59.5*) |  35.0     |[Linear Finetuned 🔗](https://drive.google.com/file/d/1y-HZTHHTyAceiFDLAraLXooGOdyQqY2Z/view?usp=sharing)  |\n\n_\\* Denotes CRF post-processing._\n\nTo evaluate and visualize the predictions of the finetuned model, run the following command:\n```shell\ncd segmentation\npython eval.py --config_env configs/env.yml --config_exp configs/linear_finetune/linear_finetune_VOCSegmentation_supervised_saliency.yml --state-dict $PATH_TO_MODEL\n```\nYou can optionally append the `--crf-postprocess` flag. \n\n\n## Citation\nThis code is based on the [SCAN](https://github.com/wvangansbeke/Unsupervised-Classification) and [MoCo](https://github.com/facebookresearch/moco) repositories.\nIf you find this repository useful for your research, please consider citing the following paper(s):\n\n```bibtex\n@inproceedings{vangansbeke2020unsupervised,\n  title={Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals},\n  author={Van Gansbeke, Wouter and Vandenhende, Simon and Georgoulis, Stamatios and Van Gool, Luc},\n  booktitle={International Conference on Computer Vision},\n  year={2021}\n}\n@inproceedings{vangansbeke2020scan,\n  title={Scan: Learning to classify images without labels},\n  author={Van Gansbeke, Wouter and Vandenhende, Simon and Georgoulis, Stamatios and Proesmans, Marc and Van Gool, Luc},\n  booktitle={European Conference on Computer Vision},\n  year={2020}\n}\n@inproceedings{he2019moco,\n  title={Momentum Contrast for Unsupervised Visual Representation Learning},\n  author={Kaiming He and Haoqi Fan and Yuxin Wu and Saining Xie and Ross Girshick},\n  booktitle = {Conference on Computer Vision and Pattern Recognition},\n  year={2019}\n}\n```\nFor any enquiries, please contact the main authors.\n\nFor an overview on self-supervised learning, have a look at the [overview repository](https://github.com/wvangansbeke/Self-Supervised-Learning-Overview).\n\n## License\n\nThis software is released under a creative commons license which allows for personal and research use only. For a commercial license please contact the authors. You can view a license summary [here](http://creativecommons.org/licenses/by-nc/4.0/).\n\n## Acknoledgements\nThis work was supported by Toyota, and was carried out at the TRACE Lab at KU Leuven (Toyota Research on Automated Cars in Europe - Leuven).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwvangansbeke%2Funsupervised-semantic-segmentation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwvangansbeke%2Funsupervised-semantic-segmentation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwvangansbeke%2Funsupervised-semantic-segmentation/lists"}