{"id":13736540,"url":"https://github.com/mhamilton723/STEGO","last_synced_at":"2025-05-08T12:33:09.619Z","repository":{"id":37716236,"uuid":"467142925","full_name":"mhamilton723/STEGO","owner":"mhamilton723","description":"Unsupervised Semantic Segmentation by Distilling Feature Correspondences","archived":false,"fork":false,"pushed_at":"2023-03-24T21:10:02.000Z","size":9440,"stargazers_count":717,"open_issues_count":46,"forks_count":145,"subscribers_count":14,"default_branch":"master","last_synced_at":"2024-10-14T09:38:14.012Z","etag":null,"topics":["computer-vision","deep-learning","iclr2022","pytorch","semantic-segmentation","unsupervised-learning"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mhamilton723.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2022-03-07T15:06:43.000Z","updated_at":"2024-10-13T16:57:01.000Z","dependencies_parsed_at":"2024-01-06T14:11:58.389Z","dependency_job_id":"1fccb052-cd70-4ca7-af46-cac8c8bf0cdb","html_url":"https://github.com/mhamilton723/STEGO","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mhamilton723%2FSTEGO","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mhamilton723%2FSTEGO/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mhamilton723%2FSTEGO/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mhamilton723%2FSTEGO/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mhamilton723","download_url":"https://codeload.github.com/mhamilton723/STEGO/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253069105,"owners_count":21848922,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","deep-learning","iclr2022","pytorch","semantic-segmentation","unsupervised-learning"],"created_at":"2024-08-03T03:01:23.706Z","updated_at":"2025-05-08T12:33:07.853Z","avatar_url":"https://github.com/mhamilton723.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook","Python"],"sub_categories":[],"readme":"# STEGO: Unsupervised Semantic Segmentation by Distilling Feature Correspondences\n### [Project Page](https://mhamilton.net/stego.html) | [Paper](https://arxiv.org/abs/2203.08414) | [Video](https://aka.ms/stego-video) | [ICLR 2022](https://iclr.cc/virtual/2022/poster/6068) \n\n\t\n[Mark Hamilton](https://mhamilton.net/),\n[Zhoutong Zhang](https://ztzhang.info/),\n[Bharath Hariharan](http://home.bharathh.info/),\n[Noah Snavely](https://www.cs.cornell.edu/~snavely/),\n[William T. Freeman](https://billf.mit.edu/about/bio)\n\nThis is the official implementation of the paper \"Unsupervised Semantic Segmentation by Distilling Feature Correspondences\".\n\n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mhamilton723/STEGO/blob/master/src/STEGO_Colab_Demo.ipynb) \\\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unsupervised-semantic-segmentation-by-2/unsupervised-semantic-segmentation-on)](https://paperswithcode.com/sota/unsupervised-semantic-segmentation-on?p=unsupervised-semantic-segmentation-by-2)\\\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unsupervised-semantic-segmentation-by-2/unsupervised-semantic-segmentation-on-coco-4)](https://paperswithcode.com/sota/unsupervised-semantic-segmentation-on-coco-4?p=unsupervised-semantic-segmentation-by-2) \\\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unsupervised-semantic-segmentation-by-2/unsupervised-semantic-segmentation-on-potsdam-1)](https://paperswithcode.com/sota/unsupervised-semantic-segmentation-on-potsdam-1?p=unsupervised-semantic-segmentation-by-2)\n\n\n[![Overview Video](https://marhamilresearch4.blob.core.windows.net/stego-public/graphics/STEGO%20Header%20video%20(2).jpg)](https://youtu.be/NPub4E4o8BA)\n\n## Contents\n\u003c!--ts--\u003e\n   * [Install](#install)\n   * [Evaluation](#evaluation)\n   * [Training](#training)\n      * [Bringing your own data](#bringing-your-own-data)\n   * [Understanding STEGO](#understanding-stego)\n      * [Unsupervised Semantic Segmentation](#unsupervised-semantic-segmentation)\n      * [Deep features connect objects across images](#deep-features-connect-objects-across-images)\n      * [The STEGO architecture](#the-stego-architecture)\n      * [Results](#results)\n   * [Citation](#citation)\n   * [Contact](#contact)\n\u003c!--te--\u003e\n\n## Install\n\n### Clone this repository:\n```shell script\ngit clone https://github.com/mhamilton723/STEGO.git\ncd STEGO\n```\n\n### Install Conda Environment\nPlease visit the [Anaconda install page](https://docs.anaconda.com/anaconda/install/index.html) if you do not already have conda installed\n\n```shell script\nconda env create -f environment.yml\nconda activate stego\n```\n\n### Download Pre-Trained Models\n\n```shell script\ncd src\npython download_models.py\n```\n\n### Download Datasets\n\nFirst, change the `pytorch_data_dir` variable to your \nsystems pytorch data directory where datasets are stored. \n\n```shell script\npython download_datasets.py\n```\n\nOnce downloaded please navigate to your pytorch data dir and unzip the resulting files:\n\n```shell script\ncd /YOUR/PYTORCH/DATA/DIR\nunzip cocostuff.zip\nunzip cityscapes.zip\nunzip potsdam.zip\nunzip potsdamraw.zip\n```\n\n\n## Evaluation\n\nTo evaluate our pretrained models please run the following in `STEGO/src`:\n```shell script\npython eval_segmentation.py\n```\nOne can change the evaluation parameters and model by editing [`STEGO/src/configs/eval_config.yml`](src/configs/eval_config.yml)\n\n## Training\n\nTo train STEGO from scratch, please first generate the KNN indices for the datasets of interest. Note that this requires generating a cropped dataset first, and you may need to modify `crop datasets.py` to specify the dataset that you are cropping:\n\n```shell script\npython crop_datasets.py\npython precompute_knns.py\n```\n\nThen you can run the following in `STEGO/src`:\n```shell script\npython train_segmentation.py\n```\nHyperparameters can be adjusted in [`STEGO/src/configs/train_config.yml`](src/configs/train_config.yml)\n\nTo monitor training with tensorboard run the following from `STEGO` directory:\n\n```shell script\ntensorboard --logdir logs\n```\n\n### Bringing your own data\n\nTo train STEGO on your own dataset please create a directory in your pytorch data root with the following structure. Note, if you do not have labels, omit the `labels` directory from the structure:\n\n```\ndataset_name\n|── imgs\n|   ├── train\n|   |   |── unique_img_name_1.jpg\n|   |   └── unique_img_name_2.jpg\n|   └── val\n|       |── unique_img_name_3.jpg\n|       └── unique_img_name_4.jpg\n└── labels\n    ├── train\n    |   |── unique_img_name_1.png\n    |   └── unique_img_name_2.png\n    └── val\n        |── unique_img_name_3.png\n        └── unique_img_name_4.png\n```\n\nNext in [`STEGO/src/configs/train_config.yml`](src/configs/train_config.yml) set the following parameters:\n\n```yaml\ndataset_name: \"directory\"\ndir_dataset_name: \"dataset_name\"\ndir_dataset_n_classes: 5 # This is the number of object types to find\n```\n\nIf you want to train with cropping to increase spatial resolution run our [cropping utility](src/crop_datasets.py).\n\nFinally, uncomment the custom dataset code and run `python precompute_knns.py`\n from `STEGO\\src` to generate the prerequisite KNN information for the custom dataset.\n \nYou can now train on your custom dataset using:\n```shell script\npython train_segmentation.py\n```\n\n## Understanding STEGO\n\n### Unsupervised semantic segmentation\nReal-world images can be cluttered with multiple objects making classification feel arbitrary. Furthermore, objects in the real world don't always fit in bounding boxes. Semantic segmentation methods aim to avoid these challenges by assigning each pixel of an image its own class label. Conventional semantic segmentation methods are notoriously difficult to train due to their dependence on densely labeled images, which can take 100x longer to create than bounding boxes or class annotations. This makes it hard to gather sizable and diverse datasets impossible in domains where humans don't know the structure a-priori. We sidestep these challenges by learning an ontology of objects with pixel-level semantic segmentation through only self-supervision.\n\n### Deep features connect objects across images\nSelf-supervised contrastive learning enables algorithms to learn intelligent representations for images without supervision. STEGO builds on this work by showing that representations from self-supervised visual transformers like  Caron et. al.’s  DINO are already aware of the relationships between objects. By computing the cosine similarity between image features, we can see that similar semantic regions such as grass, motorcycles, and sky are “linked” together by feature similarity.\n\n![Feature connection GIF](https://mhamilton.net/images/Picture3.gif)\n\n\n### The STEGO architecture\nThe STEGO unsupervised segmentation system learns by distilling correspondences between images into a set of class labels using a contrastive loss. In particular we aim to learn a segmentation that respects the induced correspondences between objects. To achieve this we train a shallow segmentation network on top of the DINO ViT backbone with three contrastive terms that distill connections between an image and itself, similar images, and random other images respectively. If two regions are strongly coupled by deep features we encourage them to share the same class.\n\n![Architecture](results/figures/stego.svg)\n\n### Results\n\nWe evaluate the STEGO algorithm on the CocoStuff, Cityscapes, and Potsdam semantic segmentation datasets. Because these methods see no labels, we use a Hungarian matching algorithm to find the best mapping between clusters and dataset classes. We find that STEGO is capable of segmenting complex and cluttered scenes with much higher spatial resolution and sensitivity than the prior art, [PiCIE](https://sites.google.com/view/picie-cvpr2021/home). This not only yields a substantial qualitative improvement, but also more than doubles the mean intersection over union (mIoU). For results on Cityscapes, and Potsdam see [our paper](https://arxiv.org/abs/2203.08414).\n\n![Cocostuff results](results/figures/cocostuff27_results.jpg)\n\n\n## Citation\n\n```\n@inproceedings{hamilton2022unsupervised,\n\ttitle={Unsupervised Semantic Segmentation by Distilling Feature Correspondences},\n\tauthor={Mark Hamilton and Zhoutong Zhang and Bharath Hariharan and Noah Snavely and William T. Freeman},\n\tbooktitle={International Conference on Learning Representations},\n\tyear={2022},\n\turl={https://openreview.net/forum?id=SaKO6z6Hl0c}\n}\n```\n\n## Contact\n\nFor feedback, questions, or press inquiries please contact [Mark Hamilton](mailto:markth@mit.edu)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmhamilton723%2FSTEGO","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmhamilton723%2FSTEGO","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmhamilton723%2FSTEGO/lists"}