{"id":20648670,"url":"https://github.com/csailvision/semantic-segmentation-pytorch","last_synced_at":"2025-05-14T03:10:39.122Z","repository":{"id":37334422,"uuid":"106061907","full_name":"CSAILVision/semantic-segmentation-pytorch","owner":"CSAILVision","description":"Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset","archived":false,"fork":false,"pushed_at":"2024-01-15T14:26:43.000Z","size":5159,"stargazers_count":5011,"open_issues_count":79,"forks_count":1104,"subscribers_count":124,"default_branch":"master","last_synced_at":"2025-04-13T20:17:47.713Z","etag":null,"topics":["ade20k","pytorch","scene-recognition","semantic-segmentation"],"latest_commit_sha":null,"homepage":"http://sceneparsing.csail.mit.edu/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CSAILVision.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2017-10-07T00:52:17.000Z","updated_at":"2025-04-12T21:43:40.000Z","dependencies_parsed_at":"2024-01-17T10:29:28.364Z","dependency_job_id":null,"html_url":"https://github.com/CSAILVision/semantic-segmentation-pytorch","commit_stats":{"total_commits":237,"total_committers":13,"mean_commits":18.23076923076923,"dds":"0.22784810126582278","last_synced_commit":"8f27c9b97d2ca7c6e05333d5766d144bf7d8c31b"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CSAILVision%2Fsemantic-segmentation-pytorch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CSAILVision%2Fsemantic-segmentation-pytorch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CSAILVision%2Fsemantic-segmentation-pytorch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CSAILVision%2Fsemantic-segmentation-pytorch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CSAILVision","download_url":"https://codeload.github.com/CSAILVision/semantic-segmentation-pytorch/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254059513,"owners_count":22007769,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ade20k","pytorch","scene-recognition","semantic-segmentation"],"created_at":"2024-11-16T17:09:46.295Z","updated_at":"2025-05-14T03:10:34.107Z","avatar_url":"https://github.com/CSAILVision.png","language":"Python","readme":"# Semantic Segmentation on MIT ADE20K dataset in PyTorch\n\nThis is a PyTorch implementation of semantic segmentation models on MIT ADE20K scene parsing dataset (http://sceneparsing.csail.mit.edu/).\n\nADE20K is the largest open source dataset for semantic segmentation and scene parsing, released by MIT Computer Vision team. Follow the link below to find the repository for our dataset and implementations on Caffe and Torch7:\nhttps://github.com/CSAILVision/sceneparsing\n\nIf you simply want to play with our demo, please try this link: http://scenesegmentation.csail.mit.edu You can upload your own photo and parse it!\n\n[You can also use this colab notebook playground here](https://colab.research.google.com/github/CSAILVision/semantic-segmentation-pytorch/blob/master/notebooks/DemoSegmenter.ipynb) to tinker with the code for segmenting an image.\n\nAll pretrained models can be found at:\nhttp://sceneparsing.csail.mit.edu/model/pytorch\n\n\u003cimg src=\"./teaser/ADE_val_00000278.png\" width=\"900\"/\u003e\n\u003cimg src=\"./teaser/ADE_val_00001519.png\" width=\"900\"/\u003e\n[From left to right: Test Image, Ground Truth, Predicted Result]\n\nColor encoding of semantic categories can be found here:\nhttps://docs.google.com/spreadsheets/d/1se8YEtb2detS7OuPE86fXGyD269pMycAWe2mtKUj2W8/edit?usp=sharing\n\n## Updates\n- HRNet model is now supported.\n- We use configuration files to store most options which were in argument parser. The definitions of options are detailed in ```config/defaults.py```.\n- We conform to Pytorch practice in data preprocessing (RGB [0, 1], substract mean, divide std).\n\n\n## Highlights\n\n### Syncronized Batch Normalization on PyTorch\nThis module computes the mean and standard-deviation across all devices during training. We empirically find that a reasonable large batch size is important for segmentation. We thank [Jiayuan Mao](http://vccy.xyz/) for his kind contributions, please refer to [Synchronized-BatchNorm-PyTorch](https://github.com/vacancy/Synchronized-BatchNorm-PyTorch) for details.\n\nThe implementation is easy to use as:\n- It is pure-python, no C++ extra extension libs.\n- It is completely compatible with PyTorch's implementation. Specifically, it uses unbiased variance to update the moving average, and use sqrt(max(var, eps)) instead of sqrt(var + eps).\n- It is efficient, only 20% to 30% slower than UnsyncBN.\n\n### Dynamic scales of input for training with multiple GPUs \nFor the task of semantic segmentation, it is good to keep aspect ratio of images during training. So we re-implement the `DataParallel` module, and make it support distributing data to multiple GPUs in python dict, so that each gpu can process images of different sizes. At the same time, the dataloader also operates differently. \n\n\u003csup\u003e*Now the batch size of a dataloader always equals to the number of GPUs*, each element will be sent to a GPU. It is also compatible with multi-processing. Note that the file index for the multi-processing dataloader is stored on the master process, which is in contradict to our goal that each worker maintains its own file list. So we use a trick that although the master process still gives dataloader an index for `__getitem__` function, we just ignore such request and send a random batch dict. Also, *the multiple workers forked by the dataloader all have the same seed*, you will find that multiple workers will yield exactly the same data, if we use the above-mentioned trick directly. Therefore, we add one line of code which sets the defaut seed for `numpy.random` before activating multiple worker in dataloader.\u003c/sup\u003e\n\n### State-of-the-Art models\n- **PSPNet** is scene parsing network that aggregates global representation with Pyramid Pooling Module (PPM). It is the winner model of ILSVRC'16 MIT Scene Parsing Challenge. Please refer to [https://arxiv.org/abs/1612.01105](https://arxiv.org/abs/1612.01105) for details.\n- **UPerNet** is a model based on Feature Pyramid Network (FPN) and Pyramid Pooling Module (PPM). It doesn't need dilated convolution, an operator that is time-and-memory consuming. *Without bells and whistles*, it is comparable or even better compared with PSPNet, while requiring much shorter training time and less GPU memory. Please refer to [https://arxiv.org/abs/1807.10221](https://arxiv.org/abs/1807.10221) for details.\n- **HRNet** is a recently proposed model that retains high resolution representations throughout the model, without the traditional bottleneck design. It achieves the SOTA performance on a series of pixel labeling tasks. Please refer to [https://arxiv.org/abs/1904.04514](https://arxiv.org/abs/1904.04514) for details.\n\n\n## Supported models\nWe split our models into encoder and decoder, where encoders are usually modified directly from classification networks, and decoders consist of final convolutions and upsampling. We have provided some pre-configured models in the ```config``` folder.\n\nEncoder:\n- MobileNetV2dilated\n- ResNet18/ResNet18dilated\n- ResNet50/ResNet50dilated\n- ResNet101/ResNet101dilated\n- HRNetV2 (W48)\n\nDecoder:\n- C1 (one convolution module)\n- C1_deepsup (C1 + deep supervision trick)\n- PPM (Pyramid Pooling Module, see [PSPNet](https://hszhao.github.io/projects/pspnet) paper for details.)\n- PPM_deepsup (PPM + deep supervision trick)\n- UPerNet (Pyramid Pooling + FPN head, see [UperNet](https://arxiv.org/abs/1807.10221) for details.)\n\n## Performance:\nIMPORTANT: The base ResNet in our repository is a customized (different from the one in torchvision). The base models will be automatically downloaded when needed.\n\n\u003ctable\u003e\u003ctbody\u003e\n    \u003cth valign=\"bottom\"\u003eArchitecture\u003c/th\u003e\n    \u003cth valign=\"bottom\"\u003eMultiScale Testing\u003c/th\u003e\n    \u003cth valign=\"bottom\"\u003eMean IoU\u003c/th\u003e\n    \u003cth valign=\"bottom\"\u003ePixel Accuracy(%)\u003c/th\u003e\n    \u003cth valign=\"bottom\"\u003eOverall Score\u003c/th\u003e\n    \u003cth valign=\"bottom\"\u003eInference Speed(fps)\u003c/th\u003e\n    \u003ctr\u003e\n        \u003ctd rowspan=\"2\"\u003eMobileNetV2dilated + C1_deepsup\u003c/td\u003e\n        \u003ctd\u003eNo\u003c/td\u003e\u003ctd\u003e34.84\u003c/td\u003e\u003ctd\u003e75.75\u003c/td\u003e\u003ctd\u003e54.07\u003c/td\u003e\n        \u003ctd\u003e17.2\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eYes\u003c/td\u003e\u003ctd\u003e33.84\u003c/td\u003e\u003ctd\u003e76.80\u003c/td\u003e\u003ctd\u003e55.32\u003c/td\u003e\n        \u003ctd\u003e10.3\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd rowspan=\"2\"\u003eMobileNetV2dilated + PPM_deepsup\u003c/td\u003e\n        \u003ctd\u003eNo\u003c/td\u003e\u003ctd\u003e35.76\u003c/td\u003e\u003ctd\u003e77.77\u003c/td\u003e\u003ctd\u003e56.27\u003c/td\u003e\n        \u003ctd\u003e14.9\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eYes\u003c/td\u003e\u003ctd\u003e36.28\u003c/td\u003e\u003ctd\u003e78.26\u003c/td\u003e\u003ctd\u003e57.27\u003c/td\u003e\n        \u003ctd\u003e6.7\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd rowspan=\"2\"\u003eResNet18dilated + C1_deepsup\u003c/td\u003e\n        \u003ctd\u003eNo\u003c/td\u003e\u003ctd\u003e33.82\u003c/td\u003e\u003ctd\u003e76.05\u003c/td\u003e\u003ctd\u003e54.94\u003c/td\u003e\n        \u003ctd\u003e13.9\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eYes\u003c/td\u003e\u003ctd\u003e35.34\u003c/td\u003e\u003ctd\u003e77.41\u003c/td\u003e\u003ctd\u003e56.38\u003c/td\u003e\n        \u003ctd\u003e5.8\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd rowspan=\"2\"\u003eResNet18dilated + PPM_deepsup\u003c/td\u003e\n        \u003ctd\u003eNo\u003c/td\u003e\u003ctd\u003e38.00\u003c/td\u003e\u003ctd\u003e78.64\u003c/td\u003e\u003ctd\u003e58.32\u003c/td\u003e\n        \u003ctd\u003e11.7\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eYes\u003c/td\u003e\u003ctd\u003e38.81\u003c/td\u003e\u003ctd\u003e79.29\u003c/td\u003e\u003ctd\u003e59.05\u003c/td\u003e\n        \u003ctd\u003e4.2\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd rowspan=\"2\"\u003eResNet50dilated + PPM_deepsup\u003c/td\u003e\n        \u003ctd\u003eNo\u003c/td\u003e\u003ctd\u003e41.26\u003c/td\u003e\u003ctd\u003e79.73\u003c/td\u003e\u003ctd\u003e60.50\u003c/td\u003e\n        \u003ctd\u003e8.3\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eYes\u003c/td\u003e\u003ctd\u003e42.14\u003c/td\u003e\u003ctd\u003e80.13\u003c/td\u003e\u003ctd\u003e61.14\u003c/td\u003e\n        \u003ctd\u003e2.6\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd rowspan=\"2\"\u003eResNet101dilated + PPM_deepsup\u003c/td\u003e\n        \u003ctd\u003eNo\u003c/td\u003e\u003ctd\u003e42.19\u003c/td\u003e\u003ctd\u003e80.59\u003c/td\u003e\u003ctd\u003e61.39\u003c/td\u003e\n        \u003ctd\u003e6.8\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eYes\u003c/td\u003e\u003ctd\u003e42.53\u003c/td\u003e\u003ctd\u003e80.91\u003c/td\u003e\u003ctd\u003e61.72\u003c/td\u003e\n        \u003ctd\u003e2.0\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd rowspan=\"2\"\u003eUperNet50\u003c/td\u003e\n        \u003ctd\u003eNo\u003c/td\u003e\u003ctd\u003e40.44\u003c/td\u003e\u003ctd\u003e79.80\u003c/td\u003e\u003ctd\u003e60.12\u003c/td\u003e\n        \u003ctd\u003e8.4\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eYes\u003c/td\u003e\u003ctd\u003e41.55\u003c/td\u003e\u003ctd\u003e80.23\u003c/td\u003e\u003ctd\u003e60.89\u003c/td\u003e\n        \u003ctd\u003e2.9\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd rowspan=\"2\"\u003eUperNet101\u003c/td\u003e\n        \u003ctd\u003eNo\u003c/td\u003e\u003ctd\u003e42.00\u003c/td\u003e\u003ctd\u003e80.79\u003c/td\u003e\u003ctd\u003e61.40\u003c/td\u003e\n        \u003ctd\u003e7.8\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eYes\u003c/td\u003e\u003ctd\u003e42.66\u003c/td\u003e\u003ctd\u003e81.01\u003c/td\u003e\u003ctd\u003e61.84\u003c/td\u003e\n        \u003ctd\u003e2.3\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd rowspan=\"2\"\u003eHRNetV2\u003c/td\u003e\n        \u003ctd\u003eNo\u003c/td\u003e\u003ctd\u003e42.03\u003c/td\u003e\u003ctd\u003e80.77\u003c/td\u003e\u003ctd\u003e61.40\u003c/td\u003e\n        \u003ctd\u003e5.8\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eYes\u003c/td\u003e\u003ctd\u003e43.20\u003c/td\u003e\u003ctd\u003e81.47\u003c/td\u003e\u003ctd\u003e62.34\u003c/td\u003e\n        \u003ctd\u003e1.9\u003c/td\u003e\n    \u003c/tr\u003e\n\n\u003c/tbody\u003e\u003c/table\u003e\n\nThe training is benchmarked on a server with 8 NVIDIA Pascal Titan Xp GPUs (12GB GPU memory), the inference speed is benchmarked a single NVIDIA Pascal Titan Xp GPU, without visualization.\n\n## Environment\nThe code is developed under the following configurations.\n- Hardware: \u003e=4 GPUs for training, \u003e=1 GPU for testing (set ```[--gpus GPUS]``` accordingly)\n- Software: Ubuntu 16.04.3 LTS, ***CUDA\u003e=8.0, Python\u003e=3.5, PyTorch\u003e=0.4.0***\n- Dependencies: numpy, scipy, opencv, yacs, tqdm\n\n## Quick start: Test on an image using our trained model \n1. Here is a simple demo to do inference on a single image:\n```bash\nchmod +x demo_test.sh\n./demo_test.sh\n```\nThis script downloads a trained model (ResNet50dilated + PPM_deepsup) and a test image, runs the test script, and saves predicted segmentation (.png) to the working directory.\n\n2. To test on an image or a folder of images (```$PATH_IMG```), you can simply do the following:\n```\npython3 -u test.py --imgs $PATH_IMG --gpu $GPU --cfg $CFG\n```\n\n## Training\n1. Download the ADE20K scene parsing dataset:\n```bash\nchmod +x download_ADE20K.sh\n./download_ADE20K.sh\n```\n2. Train a model by selecting the GPUs (```$GPUS```) and configuration file (```$CFG```) to use. During training, checkpoints by default are saved in folder ```ckpt```.\n```bash\npython3 train.py --gpus $GPUS --cfg $CFG \n```\n- To choose which gpus to use, you can either do ```--gpus 0-7```, or ```--gpus 0,2,4,6```.\n\nFor example, you can start with our provided configurations: \n\n* Train MobileNetV2dilated + C1_deepsup\n```bash\npython3 train.py --gpus GPUS --cfg config/ade20k-mobilenetv2dilated-c1_deepsup.yaml\n```\n\n* Train ResNet50dilated + PPM_deepsup\n```bash\npython3 train.py --gpus GPUS --cfg config/ade20k-resnet50dilated-ppm_deepsup.yaml\n```\n\n* Train UPerNet101\n```bash\npython3 train.py --gpus GPUS --cfg config/ade20k-resnet101-upernet.yaml\n```\n\n3. You can also override options in commandline, for example  ```python3 train.py TRAIN.num_epoch 10 ```.\n\n\n## Evaluation\n1. Evaluate a trained model on the validation set. Add ```VAL.visualize True``` in argument to output visualizations as shown in teaser.\n\nFor example:\n\n* Evaluate MobileNetV2dilated + C1_deepsup\n```bash\npython3 eval_multipro.py --gpus GPUS --cfg config/ade20k-mobilenetv2dilated-c1_deepsup.yaml\n```\n\n* Evaluate ResNet50dilated + PPM_deepsup\n```bash\npython3 eval_multipro.py --gpus GPUS --cfg config/ade20k-resnet50dilated-ppm_deepsup.yaml\n```\n\n* Evaluate UPerNet101\n```bash\npython3 eval_multipro.py --gpus GPUS --cfg config/ade20k-resnet101-upernet.yaml\n```\n\n## Integration with other projects\nThis library can be installed via `pip` to easily integrate with another codebase\n```bash\npip install git+https://github.com/CSAILVision/semantic-segmentation-pytorch.git@master\n```\n\nNow this library can easily be consumed programmatically. For example\n```python\nfrom mit_semseg.config import cfg\nfrom mit_semseg.dataset import TestDataset\nfrom mit_semseg.models import ModelBuilder, SegmentationModule\n```\n\n## Reference\n\nIf you find the code or pre-trained models useful, please cite the following papers:\n\nSemantic Understanding of Scenes through ADE20K Dataset. B. Zhou, H. Zhao, X. Puig, T. Xiao, S. Fidler, A. Barriuso and A. Torralba. International Journal on Computer Vision (IJCV), 2018. (https://arxiv.org/pdf/1608.05442.pdf)\n\n    @article{zhou2018semantic,\n      title={Semantic understanding of scenes through the ade20k dataset},\n      author={Zhou, Bolei and Zhao, Hang and Puig, Xavier and Xiao, Tete and Fidler, Sanja and Barriuso, Adela and Torralba, Antonio},\n      journal={International Journal on Computer Vision},\n      year={2018}\n    }\n\nScene Parsing through ADE20K Dataset. B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso and A. Torralba. Computer Vision and Pattern Recognition (CVPR), 2017. (http://people.csail.mit.edu/bzhou/publication/scene-parse-camera-ready.pdf)\n\n    @inproceedings{zhou2017scene,\n        title={Scene Parsing through ADE20K Dataset},\n        author={Zhou, Bolei and Zhao, Hang and Puig, Xavier and Fidler, Sanja and Barriuso, Adela and Torralba, Antonio},\n        booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},\n        year={2017}\n    }\n    \n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcsailvision%2Fsemantic-segmentation-pytorch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcsailvision%2Fsemantic-segmentation-pytorch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcsailvision%2Fsemantic-segmentation-pytorch/lists"}