{"id":13738011,"url":"https://github.com/microsoft/SimMIM","last_synced_at":"2025-05-08T15:32:12.669Z","repository":{"id":41173599,"uuid":"429351432","full_name":"microsoft/SimMIM","owner":"microsoft","description":"This is an official implementation for \"SimMIM: A Simple Framework for Masked Image Modeling\".","archived":false,"fork":false,"pushed_at":"2022-09-29T15:17:40.000Z","size":282,"stargazers_count":975,"open_issues_count":29,"forks_count":95,"subscribers_count":22,"default_branch":"main","last_synced_at":"2025-05-07T23:47:46.998Z","etag":null,"topics":["image-classification","masked-image-modeling","self-supervised-learning","swin-transformer"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2111.09886","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/microsoft.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null}},"created_at":"2021-11-18T08:26:55.000Z","updated_at":"2025-05-07T05:19:40.000Z","dependencies_parsed_at":"2022-07-14T09:22:29.525Z","dependency_job_id":null,"html_url":"https://github.com/microsoft/SimMIM","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FSimMIM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FSimMIM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FSimMIM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FSimMIM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/microsoft","download_url":"https://codeload.github.com/microsoft/SimMIM/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253096308,"owners_count":21853573,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["image-classification","masked-image-modeling","self-supervised-learning","swin-transformer"],"created_at":"2024-08-03T03:02:08.849Z","updated_at":"2025-05-08T15:32:12.101Z","avatar_url":"https://github.com/microsoft.png","language":"Python","readme":"# SimMIM\n\nBy [Zhenda Xie](https://zdaxie.github.io)\\*, [Zheng Zhang](https://stupidzz.github.io/)\\*, [Yue Cao](http://yue-cao.me)\\*, [Yutong Lin](https://github.com/impiga), [Jianmin Bao](https://jianminbao.github.io/), [Zhuliang Yao](https://github.com/Howal), [Qi Dai](https://www.microsoft.com/en-us/research/people/qid/) and [Han Hu](https://ancientmooner.github.io/)\\*.\n\nThis repo is the official implementation of [\"SimMIM: A Simple Framework for Masked Image Modeling\"](https://arxiv.org/abs/2111.09886).\n\n## Updates\n\n***09/29/2022***\n\nSimMIM was merged to [Swin Transformer repo on GitHub](https://github.com/microsoft/Swin-Transformer).\n\n***03/02/2022***\n\nSimMIM got accepted by CVPR 2022. SimMIM was used in [\"Swin Transformer V2\"](https://github.com/microsoft/Swin-Transformer) to alleviate the data hungry problem for large-scale vision model training.\n\n***12/09/2021***\n\nInitial commits:\n\n1. Pre-trained and fine-tuned models on ImageNet-1K (`Swin Base`, `Swin Large`, and `ViT Base`) are provided.\n2. The supported code for ImageNet-1K pre-training and fine-tuneing is provided.\n\n## Introduction\n\n**SimMIM** is initially described in [arxiv](https://arxiv.org/abs/2111.09886), which serves as a\nsimple framework for masked image modeling. From systematically study, we find that simple designs of each component have revealed very strong representation learning performance: 1) random masking of the input image with a moderately large masked patch size (e.g., 32) makes a strong pre-text task; 2) predicting raw pixels of RGB values by direct regression performs no worse than the patch classification approaches with complex designs; 3) the prediction head can be as light as a linear layer, with no worse performance than heavier ones.\n\n\u003cdiv align=\"center\"\u003e\n    \u003cimg src=\"figures/teaser.jpg\" height=\"250px\" /\u003e\n\u003c/div\u003e\n\n## Main Results on ImageNet\n\n### Swin Transformer\n\n**ImageNet-1K Pre-trained and Fine-tuned Models**\n\n| name | pre-train epochs | pre-train resolution | fine-tune resolution | acc@1 | pre-trained model | fine-tuned model |\n| :---: | :---: | :---: | :---: | :---: | :---: | :---: |\n| Swin-Base | 100 | 192x192 | 192x192 | 82.8 | [google](https://drive.google.com/file/d/1Wcbr66JL26FF30Kip9fZa_0lXrDAKP-d/view?usp=sharing)/[config](configs/swin_base__100ep/simmim_pretrain__swin_base__img192_window6__100ep.yaml) | [google](https://drive.google.com/file/d/1RsgHfjB4B1ZYblXEQVT-FPX3WSvBrxcs/view?usp=sharing)/[config](configs/swin_base__100ep/simmim_finetune__swin_base__img192_window6__100ep.yaml) |\n| Swin-Base | 100 | 192x192 | 224x224 | 83.5 | [google](https://drive.google.com/file/d/1Wcbr66JL26FF30Kip9fZa_0lXrDAKP-d/view?usp=sharing)/[config](configs/swin_base__100ep/simmim_pretrain__swin_base__img192_window6__100ep.yaml) | [google](https://drive.google.com/file/d/1mb43BkW56F5smwiX-g7QUUD7f1Rftq8u/view?usp=sharing)/[config](configs/swin_base__100ep/simmim_finetune__swin_base__img224_window7__100ep.yaml) |\n| Swin-Base | 800 | 192x192 | 224x224 | 84.0 | [google](https://drive.google.com/file/d/15zENvGjHlM71uKQ3d2FbljWPubtrPtjl/view?usp=sharing)/[config](configs/swin_base__800ep/simmim_pretrain__swin_base__img192_window6__800ep.yaml) | [google](https://drive.google.com/file/d/1xEKyfMTsdh6TfnYhk5vbw0Yz7a-viZ0w/view?usp=sharing)/[config](configs/swin_base__800ep/simmim_finetune__swin_base__img224_window7__800ep.yaml) |\n| Swin-Large | 800 | 192x192 | 224x224 | 85.4 | [google](https://drive.google.com/file/d/1qDxrTl2YUDB0505_4QrU5LU2R1kKmcBP/view?usp=sharing)/[config](configs/swin_large__800ep/simmim_pretrain__swin_large__img192_window12__800ep.yaml) | [google](https://drive.google.com/file/d/1mf0ZpXttEvFsH87Www4oQ-t8Kwr0x485/view?usp=sharing)/[config](configs/swin_large__800ep/simmim_finetune__swin_large__img224_window14__800ep.yaml) |\n| SwinV2-Huge | 800 | 192x192 | 224x224 | 85.7 | / | / |\n| SwinV2-Huge | 800 | 192x192 | 512x512 | 87.1 | / | / |\n\n### Vision Transformer\n\n**ImageNet-1K Pre-trained and Fine-tuned Models**\n\n| name | pre-train epochs | pre-train resolution | fine-tune resolution | acc@1 | pre-trained model | fine-tuned model |\n| :---: | :---: | :---: | :---: | :---: | :---: | :---: |\n| ViT-Base | 800 | 224x224 | 224x224 | 83.8 | [google](https://drive.google.com/file/d/1dJn6GYkwMIcoP3zqOEyW1_iQfpBi8UOw/view?usp=sharing)/[config](configs/vit_base__800ep/simmim_pretrain__vit_base__img224__800ep.yaml) | [google](https://drive.google.com/file/d/1fKgDYd0tRgyHyTnyB1CleYxjo0Gn5tEB/view?usp=sharing)/[config](configs/vit_base__800ep/simmim_finetune__vit_base__img224__800ep.yaml) |\n\n## Citing SimMIM\n\n```\n@inproceedings{xie2021simmim,\n  title={SimMIM: A Simple Framework for Masked Image Modeling},\n  author={Xie, Zhenda and Zhang, Zheng and Cao, Yue and Lin, Yutong and Bao, Jianmin and Yao, Zhuliang and Dai, Qi and Hu, Han},\n  booktitle={International Conference on Computer Vision and Pattern Recognition (CVPR)},\n  year={2022}\n}\n```\n\n## Getting Started\n\n### Installation\n\n- Install `CUDA 11.3` with `cuDNN 8` following the official installation guide of [CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html) and [cuDNN](https://developer.nvidia.com/rdp/cudnn-archive).\n\n- Setup conda environment:\n```bash\n# Create environment\nconda create -n SimMIM python=3.8 -y\nconda activate SimMIM\n\n# Install requirements\nconda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch -y\n\n# Install apex\ngit clone https://github.com/NVIDIA/apex\ncd apex\npip install -v --disable-pip-version-check --no-cache-dir --global-option=\"--cpp_ext\" --global-option=\"--cuda_ext\" ./\ncd ..\n\n# Clone SimMIM\ngit clone https://github.com/microsoft/SimMIM\ncd SimMIM\n\n# Install other requirements\npip install -r requirements.txt\n```\n\n### Evaluating provided models\n\nTo evaluate a provided model on ImageNet validation set, run:\n```bash\npython -m torch.distributed.launch --nproc_per_node \u003cnum-of-gpus-to-use\u003e main_finetune.py \\\n--eval --cfg \u003cconfig-file\u003e --resume \u003ccheckpoint\u003e --data-path \u003cimagenet-path\u003e\n```\n\nFor example, to evaluate the `Swin Base` model on a single GPU, run:\n```bash\npython -m torch.distributed.launch --nproc_per_node 1 main_finetune.py \\\n--eval --cfg configs/swin_base__800ep/simmim_finetune__swin_base__img224_window7__800ep.yaml --resume simmim_finetune__swin_base__img224_window7__800ep.pth --data-path \u003cimagenet-path\u003e\n```\n\n### Pre-training with SimMIM\nTo pre-train models with `SimMIM`, run:\n```bash\npython -m torch.distributed.launch --nproc_per_node \u003cnum-of-gpus-to-use\u003e main_simmim.py \\ \n--cfg \u003cconfig-file\u003e --data-path \u003cimagenet-path\u003e/train [--batch-size \u003cbatch-size-per-gpu\u003e --output \u003coutput-directory\u003e --tag \u003cjob-tag\u003e]\n```\n\nFor example, to pre-train `Swin Base` for 800 epochs on one DGX-2 server, run:\n```bash\npython -m torch.distributed.launch --nproc_per_node 16 main_simmim.py \\ \n--cfg configs/swin_base__800ep/simmim_pretrain__swin_base__img192_window6__800ep.yaml --batch-size 128 --data-path \u003cimagenet-path\u003e/train [--output \u003coutput-directory\u003e --tag \u003cjob-tag\u003e]\n```\n\n### Fine-tuning pre-trained models\nTo fine-tune models pre-trained by `SimMIM`, run:\n```bash\npython -m torch.distributed.launch --nproc_per_node \u003cnum-of-gpus-to-use\u003e main_finetune.py \\ \n--cfg \u003cconfig-file\u003e --data-path \u003cimagenet-path\u003e --pretrained \u003cpretrained-ckpt\u003e [--batch-size \u003cbatch-size-per-gpu\u003e --output \u003coutput-directory\u003e --tag \u003cjob-tag\u003e]\n```\n\nFor example, to fine-tune `Swin Base` pre-trained by `SimMIM` on one DGX-2 server, run:\n```bash\npython -m torch.distributed.launch --nproc_per_node 16 main_finetune.py \\ \n--cfg configs/swin_base__800ep/simmim_finetune__swin_base__img224_window7__800ep.yaml --batch-size 128 --data-path \u003cimagenet-path\u003e --pretrained \u003cpretrained-ckpt\u003e [--output \u003coutput-directory\u003e --tag \u003cjob-tag\u003e]\n```\n\n## Contributing\n\nThis project welcomes contributions and suggestions.  Most contributions require you to agree to a\nContributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us\nthe rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.\n\nWhen you submit a pull request, a CLA bot will automatically determine whether you need to provide\na CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions\nprovided by the bot. You will only need to do this once across all repos using our CLA.\n\nThis project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).\nFor more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or\ncontact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.\n\n## Trademarks\n\nThis project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft \ntrademarks or logos is subject to and must follow \n[Microsoft's Trademark \u0026 Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).\nUse of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.\nAny use of third-party trademarks or logos are subject to those third-party's policies.\n","funding_links":[],"categories":["Python","其他_机器视觉"],"sub_categories":["网络服务_其他"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2FSimMIM","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmicrosoft%2FSimMIM","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2FSimMIM/lists"}