{"id":13486257,"url":"https://github.com/SwinTransformer/Swin-Transformer-Object-Detection","last_synced_at":"2025-03-27T20:32:46.240Z","repository":{"id":37393448,"uuid":"357172305","full_name":"SwinTransformer/Swin-Transformer-Object-Detection","owner":"SwinTransformer","description":"This is an official implementation for \"Swin Transformer: Hierarchical Vision Transformer using Shifted Windows\" on Object Detection and Instance Segmentation.","archived":false,"fork":true,"pushed_at":"2023-04-09T13:51:08.000Z","size":20876,"stargazers_count":1832,"open_issues_count":142,"forks_count":380,"subscribers_count":23,"default_branch":"master","last_synced_at":"2025-01-25T08:32:16.007Z","etag":null,"topics":["cascade","mask-rcnn","mscoco","object-detection","reppoints","swin","swin-transformer"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2103.14030","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"open-mmlab/mmdetection","license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SwinTransformer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":".github/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":".github/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-04-12T11:46:05.000Z","updated_at":"2025-01-21T12:19:30.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/SwinTransformer/Swin-Transformer-Object-Detection","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SwinTransformer%2FSwin-Transformer-Object-Detection","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SwinTransformer%2FSwin-Transformer-Object-Detection/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SwinTransformer%2FSwin-Transformer-Object-Detection/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SwinTransformer%2FSwin-Transformer-Object-Detection/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SwinTransformer","download_url":"https://codeload.github.com/SwinTransformer/Swin-Transformer-Object-Detection/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245920418,"owners_count":20694086,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cascade","mask-rcnn","mscoco","object-detection","reppoints","swin","swin-transformer"],"created_at":"2024-07-31T18:00:42.665Z","updated_at":"2025-03-27T20:32:44.289Z","avatar_url":"https://github.com/SwinTransformer.png","language":"Python","funding_links":[],"categories":["Uncategorized"],"sub_categories":["Uncategorized"],"readme":"# Swin Transformer for Object Detection\n\nThis repo contains the supported code and configuration files to reproduce object detection results of [Swin Transformer](https://arxiv.org/pdf/2103.14030.pdf). It is based on [mmdetection](https://github.com/open-mmlab/mmdetection).\n\n## Updates\n\n***05/11/2021*** Models for [MoBY](https://github.com/SwinTransformer/Transformer-SSL) are released\n\n***04/12/2021*** Initial commits\n\n## Results and Models\n\n### Mask R-CNN\n\n| Backbone | Pretrain | Lr Schd | box mAP | mask mAP | #params | FLOPs | config | log | model |\n| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |:---: |\n| Swin-T | ImageNet-1K | 1x | 43.7 | 39.8 | 48M | 267G | [config](configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_1x_coco.py) | [github](https://github.com/SwinTransformer/storage/releases/download/v1.0.3/mask_rcnn_swin_tiny_patch4_window7_1x.log.json)/[baidu](https://pan.baidu.com/s/1bYZk7BIeFEozjRNUesxVWg) | [github](https://github.com/SwinTransformer/storage/releases/download/v1.0.3/mask_rcnn_swin_tiny_patch4_window7_1x.pth)/[baidu](https://pan.baidu.com/s/19UOW0xl0qc-pXQ59aFKU5w) |\n| Swin-T | ImageNet-1K | 3x | 46.0 | 41.6 | 48M | 267G | [config](configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco.py) | [github](https://github.com/SwinTransformer/storage/releases/download/v1.0.2/mask_rcnn_swin_tiny_patch4_window7.log.json)/[baidu](https://pan.baidu.com/s/1Te-Ovk4yaavmE4jcIOPAaw) | [github](https://github.com/SwinTransformer/storage/releases/download/v1.0.2/mask_rcnn_swin_tiny_patch4_window7.pth)/[baidu](https://pan.baidu.com/s/1YpauXYAFOohyMi3Vkb6DBg) |\n| Swin-S | ImageNet-1K | 3x | 48.5 | 43.3 | 69M | 359G | [config](configs/swin/mask_rcnn_swin_small_patch4_window7_mstrain_480-800_adamw_3x_coco.py) | [github](https://github.com/SwinTransformer/storage/releases/download/v1.0.2/mask_rcnn_swin_small_patch4_window7.log.json)/[baidu](https://pan.baidu.com/s/1ymCK7378QS91yWlxHMf1yw) | [github](https://github.com/SwinTransformer/storage/releases/download/v1.0.2/mask_rcnn_swin_small_patch4_window7.pth)/[baidu](https://pan.baidu.com/s/1V4w4aaV7HSjXNFTOSA6v6w) |\n\n### Cascade Mask R-CNN\n\n| Backbone | Pretrain | Lr Schd | box mAP | mask mAP | #params | FLOPs | config | log | model |\n| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |:---: |\n| Swin-T | ImageNet-1K | 1x | 48.1 | 41.7 | 86M | 745G | [config](configs/swin/cascade_mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_1x_coco.py) | [github](https://github.com/SwinTransformer/storage/releases/download/v1.0.3/cascade_mask_rcnn_swin_tiny_patch4_window7_1x.log.json)/[baidu](https://pan.baidu.com/s/1x4vnorYZfISr-d_VUSVQCA) | [github](https://github.com/SwinTransformer/storage/releases/download/v1.0.3/cascade_mask_rcnn_swin_tiny_patch4_window7_1x.pth)/[baidu](https://pan.baidu.com/s/1vFwbN1iamrtwnQSxMIW4BA) |\n| Swin-T | ImageNet-1K | 3x | 50.4 | 43.7 | 86M | 745G | [config](configs/swin/cascade_mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py) | [github](https://github.com/SwinTransformer/storage/releases/download/v1.0.2/cascade_mask_rcnn_swin_tiny_patch4_window7.log.json)/[baidu](https://pan.baidu.com/s/1GW_ic617Ak_NpRayOqPSOA) | [github](https://github.com/SwinTransformer/storage/releases/download/v1.0.2/cascade_mask_rcnn_swin_tiny_patch4_window7.pth)/[baidu](https://pan.baidu.com/s/1i-izBrODgQmMwTv6F6-x3A) |\n| Swin-S | ImageNet-1K | 3x | 51.9 | 45.0 | 107M | 838G | [config](configs/swin/cascade_mask_rcnn_swin_small_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py) | [github](https://github.com/SwinTransformer/storage/releases/download/v1.0.2/cascade_mask_rcnn_swin_small_patch4_window7.log.json)/[baidu](https://pan.baidu.com/s/17Vyufk85vyocxrBT1AbavQ) | [github](https://github.com/SwinTransformer/storage/releases/download/v1.0.2/cascade_mask_rcnn_swin_small_patch4_window7.pth)/[baidu](https://pan.baidu.com/s/1Sv9-gP1Qpl6SGOF6DBhUbw) |\n| Swin-B | ImageNet-1K | 3x | 51.9 | 45.0 | 145M | 982G | [config](configs/swin/cascade_mask_rcnn_swin_base_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py) | [github](https://github.com/SwinTransformer/storage/releases/download/v1.0.2/cascade_mask_rcnn_swin_base_patch4_window7.log.json)/[baidu](https://pan.baidu.com/s/1UZAR39g-0kE_aGrINwfVHg) | [github](https://github.com/SwinTransformer/storage/releases/download/v1.0.2/cascade_mask_rcnn_swin_base_patch4_window7.pth)/[baidu](https://pan.baidu.com/s/1tHoC9PMVnldQUAfcF6FT3A) |\n\n### RepPoints V2\n\n| Backbone | Pretrain | Lr Schd | box mAP | mask mAP | #params | FLOPs | config | log | model |\n| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |:---: |\n| Swin-T | ImageNet-1K | 3x | 50.0 | - | 45M | 283G | [config](configs/swin/reppoitsv2_swin_tiny_patch4_window7_mstrain_480_960_giou_gfocal_bifpn_adamw_3x_coco.py) | [github](https://github.com/SwinTransformer/storage/releases/download/v1.0.9/reppointsv2_swin_tiny_patch4_window7_3x.log.json) | [github](https://github.com/SwinTransformer/storage/releases/download/v1.0.9/reppointsv2_swin_tiny_patch4_window7_3x.pth) |\n\n### Mask RepPoints V2\n\n| Backbone | Pretrain | Lr Schd | box mAP | mask mAP | #params | FLOPs | config | log | model |\n| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |:---: |\n| Swin-T | ImageNet-1K | 3x | 50.4 | 43.8 | 47M | 292G | [config](configs/swin/mask_reppoitsv2_swin_tiny_patch4_window7_mstrain_480_960_giou_gfocal_bifpn_adamw_3x_coco.py) | [github](https://github.com/SwinTransformer/storage/releases/download/v1.0.9/mask_reppointsv2_swin_tiny_patch4_window7_3x.log.json) | [github](https://github.com/SwinTransformer/storage/releases/download/v1.0.9/mask_reppointsv2_swin_tiny_patch4_window7_3x.pth) |\n\n**Notes**: \n\n- **Pre-trained models can be downloaded from [Swin Transformer for ImageNet Classification](https://github.com/microsoft/Swin-Transformer)**.\n- Access code for `baidu` is `swin`.\n\n## Results of MoBY with Swin Transformer\n\n### Mask R-CNN\n\n| Backbone | Pretrain | Lr Schd | box mAP | mask mAP | #params | FLOPs | config | log | model |\n| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |:---: |\n| Swin-T | ImageNet-1K | 1x | 43.6 | 39.6 | 48M | 267G | [config](configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_1x_coco.py) | [github](https://github.com/SwinTransformer/storage/releases/download/v1.0.3/moby_mask_rcnn_swin_tiny_patch4_window7_1x.log.json)/[baidu](https://pan.baidu.com/s/1P5gCIfLUQ64jbVMOom0H3w) | [github](https://github.com/SwinTransformer/storage/releases/download/v1.0.3/moby_mask_rcnn_swin_tiny_patch4_window7_1x.pth)/[baidu](https://pan.baidu.com/s/1xGRihuIrGVreFKn5eJ6oTg) |\n| Swin-T | ImageNet-1K | 3x | 46.0 | 41.7 | 48M | 267G | [config](configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco.py) | [github](https://github.com/SwinTransformer/storage/releases/download/v1.0.3/moby_mask_rcnn_swin_tiny_patch4_window7_3x.log.json)/[baidu](https://pan.baidu.com/s/17WAhUmhAam1of3hXOu-wtA) | [github](https://github.com/SwinTransformer/storage/releases/download/v1.0.3/moby_mask_rcnn_swin_tiny_patch4_window7_3x.pth)/[baidu](https://pan.baidu.com/s/1MSj8cC1wlQU1QaXCdKrzeA) |\n\n### Cascade Mask R-CNN\n\n| Backbone | Pretrain | Lr Schd | box mAP | mask mAP | #params | FLOPs | config | log | model |\n| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |:---: |\n| Swin-T | ImageNet-1K | 1x | 48.1 | 41.5 | 86M | 745G | [config](configs/swin/cascade_mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_1x_coco.py) | [github](https://github.com/SwinTransformer/storage/releases/download/v1.0.3/moby_cascade_mask_rcnn_swin_tiny_patch4_window7_1x.log.json)/[baidu](https://pan.baidu.com/s/1eOdq1rvi0QoXjc7COgiM7A) | [github](https://github.com/SwinTransformer/storage/releases/download/v1.0.3/moby_cascade_mask_rcnn_swin_tiny_patch4_window7_1x.pth)/[baidu](https://pan.baidu.com/s/1-gbY-LExbf0FgYxWWs8OPg) |\n| Swin-T | ImageNet-1K | 3x | 50.2 | 43.5 | 86M | 745G | [config](configs/swin/cascade_mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py) | [github](https://github.com/SwinTransformer/storage/releases/download/v1.0.3/moby_cascade_mask_rcnn_swin_tiny_patch4_window7_3x.log.json)/[baidu](https://pan.baidu.com/s/1zEFXHYjEiXUCWF1U7HR5Zg) | [github](https://github.com/SwinTransformer/storage/releases/download/v1.0.3/moby_cascade_mask_rcnn_swin_tiny_patch4_window7_3x.pth)/[baidu](https://pan.baidu.com/s/1FMmW0GOpT4MKsKUrkJRgeg) |\n\n**Notes:**\n\n- The drop path rate needs to be tuned for best practice.\n- MoBY pre-trained models can be downloaded from [MoBY with Swin Transformer](https://github.com/SwinTransformer/Transformer-SSL).\n\n## Usage\n\n### Installation\n\nPlease refer to [get_started.md](https://github.com/open-mmlab/mmdetection/blob/master/docs/en/get_started.md) for installation and dataset preparation.\n\n### Inference\n```\n# single-gpu testing\npython tools/test.py \u003cCONFIG_FILE\u003e \u003cDET_CHECKPOINT_FILE\u003e --eval bbox segm\n\n# multi-gpu testing\ntools/dist_test.sh \u003cCONFIG_FILE\u003e \u003cDET_CHECKPOINT_FILE\u003e \u003cGPU_NUM\u003e --eval bbox segm\n```\n\n### Training\n\nTo train a detector with pre-trained models, run:\n```\n# single-gpu training\npython tools/train.py \u003cCONFIG_FILE\u003e --cfg-options model.pretrained=\u003cPRETRAIN_MODEL\u003e [model.backbone.use_checkpoint=True] [other optional arguments]\n\n# multi-gpu training\ntools/dist_train.sh \u003cCONFIG_FILE\u003e \u003cGPU_NUM\u003e --cfg-options model.pretrained=\u003cPRETRAIN_MODEL\u003e [model.backbone.use_checkpoint=True] [other optional arguments] \n```\nFor example, to train a Cascade Mask R-CNN model with a `Swin-T` backbone and 8 gpus, run:\n```\ntools/dist_train.sh configs/swin/cascade_mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py 8 --cfg-options model.pretrained=\u003cPRETRAIN_MODEL\u003e \n```\n\n**Note:** `use_checkpoint` is used to save GPU memory. Please refer to [this page](https://pytorch.org/docs/stable/checkpoint.html) for more details.\n\n\n### Apex (optional):\nWe use apex for mixed precision training by default. To install apex, run:\n```\ngit clone https://github.com/NVIDIA/apex\ncd apex\npip install -v --disable-pip-version-check --no-cache-dir --global-option=\"--cpp_ext\" --global-option=\"--cuda_ext\" ./\n```\nIf you would like to disable apex, modify the type of runner as `EpochBasedRunner` and comment out the following code block in the [configuration files](configs/swin):\n```\n# do not use mmdet version fp16\nfp16 = None\noptimizer_config = dict(\n    type=\"DistOptimizerHook\",\n    update_interval=1,\n    grad_clip=None,\n    coalesce=True,\n    bucket_size_mb=-1,\n    use_fp16=True,\n)\n```\n\n## Citing Swin Transformer\n```\n@article{liu2021Swin,\n  title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},\n  author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},\n  journal={arXiv preprint arXiv:2103.14030},\n  year={2021}\n}\n```\n\n## Other Links\n\n\u003e **Image Classification**: See [Swin Transformer for Image Classification](https://github.com/microsoft/Swin-Transformer).\n\n\u003e **Semantic Segmentation**: See [Swin Transformer for Semantic Segmentation](https://github.com/SwinTransformer/Swin-Transformer-Semantic-Segmentation).\n\n\u003e **Self-Supervised Learning**: See [MoBY with Swin Transformer](https://github.com/SwinTransformer/Transformer-SSL).\n\n\u003e **Video Recognition**, See [Video Swin Transformer](https://github.com/SwinTransformer/Video-Swin-Transformer).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSwinTransformer%2FSwin-Transformer-Object-Detection","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FSwinTransformer%2FSwin-Transformer-Object-Detection","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSwinTransformer%2FSwin-Transformer-Object-Detection/lists"}