{"id":20503125,"url":"https://github.com/idea-research/maskdino","last_synced_at":"2025-05-16T02:09:13.104Z","repository":{"id":37358081,"uuid":"500500682","full_name":"IDEA-Research/MaskDINO","owner":"IDEA-Research","description":"[CVPR 2023] Official implementation of the paper \"Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation\"","archived":false,"fork":false,"pushed_at":"2023-12-20T07:05:03.000Z","size":2347,"stargazers_count":1292,"open_issues_count":60,"forks_count":116,"subscribers_count":34,"default_branch":"main","last_synced_at":"2025-04-01T11:05:59.969Z","etag":null,"topics":["instance-segmentation","object-detection","panoptic-segmentation","semantic-segmentation"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/IDEA-Research.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-06-06T16:02:01.000Z","updated_at":"2025-04-01T01:16:34.000Z","dependencies_parsed_at":"2024-11-15T19:31:25.770Z","dependency_job_id":"13518cd5-62e1-4822-821e-89d5e1794dff","html_url":"https://github.com/IDEA-Research/MaskDINO","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IDEA-Research%2FMaskDINO","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IDEA-Research%2FMaskDINO/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IDEA-Research%2FMaskDINO/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IDEA-Research%2FMaskDINO/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/IDEA-Research","download_url":"https://codeload.github.com/IDEA-Research/MaskDINO/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247847609,"owners_count":21006099,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["instance-segmentation","object-detection","panoptic-segmentation","semantic-segmentation"],"created_at":"2024-11-15T19:29:19.353Z","updated_at":"2025-04-08T13:05:46.094Z","avatar_url":"https://github.com/IDEA-Research.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"Mask DINO \u003cimg src=\"figures/dinosaur.png\" width=\"30\"\u003e\n========\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mask-dino-towards-a-unified-transformer-based-1/panoptic-segmentation-on-coco-minival)](https://paperswithcode.com/sota/panoptic-segmentation-on-coco-minival?p=mask-dino-towards-a-unified-transformer-based-1)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mask-dino-towards-a-unified-transformer-based-1/panoptic-segmentation-on-coco-test-dev)](https://paperswithcode.com/sota/panoptic-segmentation-on-coco-test-dev?p=mask-dino-towards-a-unified-transformer-based-1)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mask-dino-towards-a-unified-transformer-based-1/instance-segmentation-on-coco-minival)](https://paperswithcode.com/sota/instance-segmentation-on-coco-minival?p=mask-dino-towards-a-unified-transformer-based-1)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mask-dino-towards-a-unified-transformer-based-1/instance-segmentation-on-coco)](https://paperswithcode.com/sota/instance-segmentation-on-coco?p=mask-dino-towards-a-unified-transformer-based-1)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mask-dino-towards-a-unified-transformer-based-1/semantic-segmentation-on-ade20k)](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k?p=mask-dino-towards-a-unified-transformer-based-1)\n\n\n[Feng Li*](https://fengli-ust.github.io/), [Hao Zhang*](https://haozhang534.github.io/), [Huaizhe Xu](https://scholar.google.com/citations?user=zgaTShsAAAAJ\u0026hl=en\u0026scioq=Huaizhe+Xu), [Shilong Liu](https://www.lsl.zone/), [Lei Zhang](https://scholar.google.com/citations?hl=zh-CN\u0026user=fIlGZToAAAAJ), [Lionel M. Ni](https://scholar.google.com/citations?hl=zh-CN\u0026user=OzMYwDIAAAAJ), and [Heung-Yeung Shum](https://scholar.google.com.hk/citations?user=9akH-n8AAAAJ\u0026hl=en).\n\nThis repository is the official implementation of the [Mask DINO: Towards A Unified Transformer-based\nFramework for Object Detection and Segmentation](https://arxiv.org/abs/2206.02777) (DINO pronounced `daɪnoʊ' as in dinosaur). Our code is based on [detectron2](https://github.com/facebookresearch/detectron2). [detrex](https://github.com/IDEA-Research/detrex) version is opensource simultaneously.\n\n:fire: We release a strong open-set object detection and segmentation model [OpenSeeD](https://arxiv.org/pdf/2303.08131.pdf) based on MaskDINO that achieves the best results on open-set object segmentation tasks. Code and checkpoints are available [here](https://github.com/IDEA-Research/OpenSeeD).\n\n\u003cdetails open\u003e\n\u003csummary\u003e \u003cfont size=8\u003e\u003cstrong\u003eNews\u003c/strong\u003e\u003c/font\u003e \u003c/summary\u003e\n\n[2023/7] We release [Semantic-SAM](https://github.com/UX-Decoder/Semantic-SAM), a universal image segmentation model to enable segment and recognize anything at any desired granularity. **Code** and **checkpoint** are available!\n\n[2023/2] Mask DINO has been accepted to CVPR 2023!\n\n[2022/9] We release a toolbox [**detrex**](https://github.com/IDEA-Research/detrex) that provides state-of-the-art Transformer-based detection algorithms. It includes DINO **with better performance** and Mask DINO will also be released with detrex implementation. Welcome to use it! \u003c/br\u003e\n  - Supports Now: [DETR](https://arxiv.org/abs/2005.12872), [Deformble DETR](https://arxiv.org/abs/2010.04159), [Conditional DETR](https://arxiv.org/abs/2108.06152), [Group-DETR](https://arxiv.org/abs/2207.13085), [DAB-DETR](https://arxiv.org/abs/2201.12329), [DN-DETR](https://arxiv.org/abs/2203.01305), [DINO](https://arxiv.org/abs/2203.03605).\n\n[2022/7] Code for [DINO](https://arxiv.org/pdf/2203.03605.pdf) is available [here](https://github.com/IDEACVR/DINO)!\n\n\n[2022/3]We build a repo [Awesome Detection Transformer](https://github.com/IDEACVR/awesome-detection-transformer) to present papers about transformer for detection and segmentation. Welcome to your attention!\n\u003c/details\u003e\n\n\n### Features \n\n* A **unified** architecture for object detection, panoptic, instance and semantic segmentation.\n* Achieve **task and data cooperation** between detection and segmentation.\n* **State-of-the-art** performance under the same setting.\n* Support major detection and segmentation datasets: COCO, ADE20K, Cityscapes.\n\n\n### Code Updates\n\n* [2022/12/02] Our code and checkpoints are available! Mask DINO further Achieves \u003cstrong\u003e51.7\u003c/strong\u003e and \u003cstrong\u003e59.0\u003c/strong\u003e box AP on COCO with a ResNet-50 and SwinL without extra detection data, **outperforming DINO** under the same setting!\n\n* [2022/6] We propose a unified detection and segmentation model [Mask DINO](https://arxiv.org/pdf/2206.02777.pdf) that achieves the best results on all the three segmentation tasks (**54.7** AP on [COCO instance leaderboard](https://paperswithcode.com/sota/instance-segmentation-on-coco), **59.5** PQ on [COCO panoptic leaderboard](https://paperswithcode.com/sota/panoptic-segmentation-on-coco-test-dev), and **60.8** mIoU on [ADE20K semantic leaderboard](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k))!\n\n\u003cdetails open\u003e\n\u003csummary\u003e \u003cfont size=8\u003e\u003cstrong\u003eTodo list\u003c/strong\u003e\u003c/font\u003e \u003c/summary\u003e\n\n- [x] Release code and checkpoints\n  \n- [ ] Release model conversion checkpointer from DINO to MaskDINO\n \n- [ ] Release GPU cluster submit scripts based on submitit for multi-node training\n \n- [ ] Release EMA training for large models\n \n- [ ] Release more large models\n\u003c/details\u003e\n\n\n\n***\n\n## Installation\n\nSee [installation instructions](INSTALL.md).\n\n\n\n## Getting Started\nSee [Inference Demo with Pre-trained Model](demo/README.md)\n\nSee [Results](#results).\n\nSee [Preparing Datasets for MaskDINO](datasets/README.md).\n\nSee [Getting Started](#getting-started-1).\n\nSee [More Usage](#more-usage).\n\n![MaskDINO](figures/framework.jpg)\n\n***\n\n# Results\nIn this part, we present the clean models that do not use extra detection data or tricks.\n### COCO Instance Segmentation and Object Detection\n we follow DINO to use hidden dimension `2048` in the encoder of feedforward by default. We also use the mask-enhanced\nbox initialization proposed in our paper in instance segmentation and detection. To better present our model, we also list the models trained with \nhidden dimension `1024` (`hid 1024`) and not using mask-enhance initialization (`no mask enhance`) in this table.\n\u003ctable\u003e\u003ctbody\u003e\n\u003c!-- START TABLE --\u003e\n\u003c!-- TABLE HEADER --\u003e\n\u003cth valign=\"bottom\"\u003eName\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003eBackbone\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003eEpochs\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003eMask AP\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003eBox AP\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003eParams\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003eGFlops\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003edownload\u003c/th\u003e\n\n \u003ctr\u003e\u003ctd align=\"left\"\u003eMaskDINO (hid 1024) | \u003ca href=\"configs/coco/instance-segmentation/maskdino_R50_bs16_50ep_3s.yaml\"\u003econfig\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003eR50\u003c/td\u003e\n\u003ctd align=\"center\"\u003e50\u003c/td\u003e\n\u003ctd align=\"center\"\u003e46.1\u003c/td\u003e\n\u003ctd align=\"center\"\u003e51.5\u003c/td\u003e\n\u003ctd align=\"center\"\u003e47M\u003c/td\u003e\n\u003ctd align=\"center\"\u003e226\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"https://github.com/IDEA-Research/detrex-storage/releases/download/maskdino-v0.1.0/maskdino_r50_50ep_300q_hid1024_3sd1_instance_maskenhanced_mask46.1ap_box51.5ap.pth\"\u003emodel\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\n \u003ctr\u003e\u003ctd align=\"left\"\u003eMaskDINO | \u003ca href=\"configs/coco/instance-segmentation/maskdino_R50_bs16_50ep_3s_dowsample1_2048.yaml\"\u003econfig\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003eR50\u003c/td\u003e\n\u003ctd align=\"center\"\u003e50\u003c/td\u003e\n\u003ctd align=\"center\"\u003e46.3\u003c/td\u003e\n\u003ctd align=\"center\"\u003e51.7\u003c/td\u003e\n\u003ctd align=\"center\"\u003e52M\u003c/td\u003e\n\u003ctd align=\"center\"\u003e286\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"https://github.com/IDEA-Research/detrex-storage/releases/download/maskdino-v0.1.0/maskdino_r50_50ep_300q_hid2048_3sd1_instance_maskenhanced_mask46.3ap_box51.7ap.pth\"\u003emodel\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\n \u003ctr\u003e\u003ctd align=\"left\"\u003eMaskDINO (no mask enhance) | \u003ca href=\"configs/coco/instance-segmentation/swin/maskdino_R50_bs16_50ep_4s_dowsample1_2048.yaml\"\u003econfig\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003eSwin-L (IN21k)\u003c/td\u003e\n\u003ctd align=\"center\"\u003e50\u003c/td\u003e\n\u003ctd align=\"center\"\u003e52.1\u003c/td\u003e\n\u003ctd align=\"center\"\u003e58.3\u003c/td\u003e\n\u003ctd align=\"center\"\u003e223\u003c/td\u003e\n\u003ctd align=\"center\"\u003e1326\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"https://github.com/IDEA-Research/detrex-storage/releases/download/maskdino-v0.1.0/maskdino_swinl_50ep_300q_hid2048_3sd1_instance_mask52.1ap_box58.3ap.pth\"\u003emodel\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\n \u003ctr\u003e\u003ctd align=\"left\"\u003eMaskDINO | \u003ca href=\"configs/coco/instance-segmentation/swin/maskdino_R50_bs16_50ep_4s_dowsample1_2048.yaml\"\u003econfig\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003eSwin-L (IN21k)\u003c/td\u003e\n\u003ctd align=\"center\"\u003e50\u003c/td\u003e\n\u003ctd align=\"center\"\u003e52.3\u003c/td\u003e\n\u003ctd align=\"center\"\u003e59.0\u003c/td\u003e\n\u003ctd align=\"center\"\u003e223\u003c/td\u003e\n\u003ctd align=\"center\"\u003e1326\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"https://github.com/IDEA-Research/detrex-storage/releases/download/maskdino-v0.1.0/maskdino_swinl_50ep_300q_hid2048_3sd1_instance_maskenhanced_mask52.3ap_box59.0ap.pth\"\u003emodel\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n \u003ctr\u003e\u003ctd align=\"left\"\u003eMaskDINO+O365 data+1.2 x larger image\u003c/td\u003e\n\u003ctd align=\"center\"\u003eSwin-L (IN21k)\u003c/td\u003e\n\u003ctd align=\"center\"\u003e20\u003c/td\u003e\n\u003ctd align=\"center\"\u003e54.5\u003c/td\u003e\n\u003ctd align=\"center\"\u003e---\u003c/td\u003e\n\u003ctd align=\"center\"\u003e223\u003c/td\u003e\n\u003ctd align=\"center\"\u003e1326\u003c/td\u003e\n\u003ctd align=\"center\"\u003eTo Release\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/tbody\u003e\u003c/table\u003e\n\n### COCO Panoptic Segmentation\n\n\u003ctable\u003e\u003ctbody\u003e\n\u003c!-- START TABLE --\u003e\n\u003c!-- TABLE HEADER --\u003e\n\u003cth valign=\"bottom\"\u003eName\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003eBackbone\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003eepochs\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003ePQ\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003eMask AP\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003eBox AP\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003emIoU\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003edownload\u003c/th\u003e\n\n \u003ctr\u003e\u003ctd align=\"left\"\u003eMaskDINO | \u003ca href=\"configs/coco/panoptic-segmentation/maskdino_R50_bs16_50ep_3s_dowsample1_2048.yaml\"\u003econfig\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003eR50\u003c/td\u003e\n\u003ctd align=\"center\"\u003e50\u003c/td\u003e\n\u003ctd align=\"center\"\u003e53.0\u003c/td\u003e\n\u003ctd align=\"center\"\u003e48.8\u003c/td\u003e\n\u003ctd align=\"center\"\u003e44.3\u003c/td\u003e\n\u003ctd align=\"center\"\u003e60.6\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"https://github.com/IDEA-Research/detrex-storage/releases/download/maskdino-v0.1.0/maskdino_r50_50ep_300q_hid2048_3sd1_panoptic_pq53.0.pth\"\u003emodel\u003c/a\u003e\u003c/td\u003e\n\n \u003ctr\u003e\u003ctd align=\"left\"\u003eMaskDINO | \u003ca href=\"configs/coco/panoptic-segmentation/swin/maskdino_R50_bs16_50ep_4s_dowsample1_2048.yaml\"\u003econfig\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003eSwin-L (IN21k)\u003c/td\u003e\n\u003ctd align=\"center\"\u003e50\u003c/td\u003e\n\u003ctd align=\"center\"\u003e58.3\u003c/td\u003e\n\u003ctd align=\"center\"\u003e50.6\u003c/td\u003e\n\u003ctd align=\"center\"\u003e56.2\u003c/td\u003e\n\u003ctd align=\"center\"\u003e67.5\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"https://github.com/IDEA-Research/detrex-storage/releases/download/maskdino-v0.1.0/maskdino_swinl_50ep_300q_hid2048_3sd1_panoptic_58.3pq.pth\"\u003emodel\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\n \u003ctr\u003e\u003ctd align=\"left\"\u003eMaskDINO+O365 data+1.2 x larger image\u003c/td\u003e\n\u003ctd align=\"center\"\u003eSwin-L (IN21k)\u003c/td\u003e\n\u003ctd align=\"center\"\u003e20\u003c/td\u003e\n\u003ctd align=\"center\"\u003e59.4\u003c/td\u003e\n\u003ctd align=\"center\"\u003e53.0\u003c/td\u003e\n\u003ctd align=\"center\"\u003e57.7\u003c/td\u003e\n\u003ctd align=\"center\"\u003e67.3\u003c/td\u003e\n\u003ctd align=\"center\"\u003eTo Release\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003c/tbody\u003e\u003c/table\u003e\n\n### Semantic Segmentation\nWe use hidden dimension `1024` and 100 queries for semantic segmentation.\n\u003ctable\u003e\u003ctbody\u003e\n\u003c!-- START TABLE --\u003e\n\u003c!-- TABLE HEADER --\u003e\n\u003cth valign=\"bottom\"\u003eName\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003eDataset\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003eBackbone\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003eiterations\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003emIoU\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003edownload\u003c/th\u003e\n\n \u003ctr\u003e\u003ctd align=\"left\"\u003eMaskDINO | \u003ca href=\"configs/ade20k/semantic-segmentation/maskdino_R50_bs16_160k_steplr.yaml\"\u003econfig\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003eADE20K\u003c/td\u003e\n\u003ctd align=\"center\"\u003eR50\u003c/td\u003e\n\u003ctd align=\"center\"\u003e160k\u003c/td\u003e\n\u003ctd align=\"center\"\u003e48.7\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"https://github.com/IDEA-Research/detrex-storage/releases/download/maskdino-v0.1.0/maskdino_r50_50ep_100q_celoss_hid1024_3s_semantic_ade20k_48.7miou.pth\"\u003emodel\u003c/a\u003e\u003c/td\u003e\n\n \u003ctr\u003e\u003ctd align=\"left\"\u003eMaskDINO | \u003ca href=\"configs/cityscapes/semantic-segmentation/maskdino_R50_bs16_90k_steplr.yaml\"\u003econfig\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003eCityscapes\u003c/td\u003e\n\u003ctd align=\"center\"\u003eR50\u003c/td\u003e\n\u003ctd align=\"center\"\u003e90k\u003c/td\u003e\n\u003ctd align=\"center\"\u003e79.8\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003ca href=\"https://github.com/IDEA-Research/detrex-storage/releases/download/maskdino-v0.1.0/maskdino_r50_50ep_100q_celoss_hid1024_3s_semantic_cityscapes_79.8miou.pth\"\u003emodel\u003c/a\u003e\u003c/td\u003e\n\n\u003c/tbody\u003e\u003c/table\u003e\n\nYou can also find all these models [here](https://github.com/IDEA-Research/detrex-storage/releases/tag/maskdino-v0.1.0).\n\nAll models were trained with  **4** NVIDIA A100 GPUs (ResNet-50 based models) or **8** NVIDIA A100 GPUs (Swin-L based models).\n\nWe will release more pretrained models in the future.\n# Getting Started\n\nIn the above tables, the \"Name\" column contains a link `config_path` to the config file, and the corresponding model checkpoints\ncan be downloaded from the link in `model`.\n\nIf your dataset files are not under this repo, you need to add `export DETECTRON2_DATASETS=/path/to/your/data` or use Symbolic Link `ln -s`\nto link the dataset into this repo before the\nfollowing command first.\n#### Evalaluate our pretrained models\n* You can download our pretrained models and evaluate them with the following commands.\n  ```sh\n  python train_net.py --eval-only --num-gpus 8 --config-file config_path MODEL.WEIGHTS /path/to/checkpoint_file\n  ```\n  for example, to reproduce our instance segmentation result, you can copy the config path from the table, download the pretrained checkpoint into `/path/to/checkpoint_file`, and run \n  ```sh\n  python train_net.py --eval-only --num-gpus 8 --config-file configs/coco/instance-segmentation/maskdino_R50_bs16_50ep_3s_dowsample1_2048.yaml MODEL.WEIGHTS /path/to/checkpoint_file\n  ```\n  which can reproduce the model. \n#### Train MaskDINO to reproduce results\n* Use the above command without `eval-only` will train the model. For Swin backbones, you need to specify the path of the pretrained backbones with `MODEL.WEIGHTS /path/to/pretrained_checkpoint`\n    ```sh\n  python train_net.py --num-gpus 8 --config-file config_path MODEL.WEIGHTS /path/to/checkpoint_file\n  ```\n* For ResNet-50 models, training on 8 GPU requires around `15G` memory on each GPU and `3` days training for 50 epochs. \n* For Swin-L models, training on 8 gpu required memory `60G` on each GPU. If your gpu do not have enough \n  memory, you may also train with 16 GPUs with distributed training on two nodes.\n* We use total batch size 16 for all our models. If train on 1 GPU, you need to figure out learning rate and batch size by yourself\n    ```sh\n  python train_net.py --num-gpus 1 --config-file config_path SOLVER.IMS_PER_BATCH SET_TO_SOME_REASONABLE_VALUE SOLVER.BASE_LR SET_TO_SOME_REASONABLE_VALUE\n  ```\n\nYou can also refer to [Getting Started with Detectron2](https://github.com/facebookresearch/detectron2/blob/master/GETTING_STARTED.md) for full usage.\n\n\n# More Usage\n\n### Mask-enhanced box initialization\n\nWe provide 2 ways to convert predicted masks to boxes to initialize decoder boxes. You can set as follows\n* `MODEL.MaskDINO.INITIALIZE_BOX_TYPE: no` not using mask enhanced box initialization\n* `MODEL.MaskDINO.INITIALIZE_BOX_TYPE: mask2box`  a fast conversion way\n* `MODEL.MaskDINO.INITIALIZE_BOX_TYPE: bitmask`  provided conversion from detectron2, slower but more accurate conversion. \n\nThese two conversion ways do not affect the final performance much, you can choose either way. \n\nIn addition, if you already\ntrain a model for 50 epochs without mask-enhance box initialization, you can plug in this method and simply \nfinetune the model in the last few epochs (i.e., load from 32K iteration trained model and finetune it). This way can\nalso achieve similar performance compared with training from scratch, but more flexible.\n\n### Model components\nMaskDINO  consists of three components: a backbone, a pixel decoder and a Transformer decoder.\nYou can easily replace each of these three components with your own implementation.\n\n* **backbone**: Define and register your backbone under `maskdino/modeling/backbone`. You can follow the Swin Transformer as an example.\n  \n* **pixel decoder**: pixel decoder is actually the multi-scale encoder in DINO and Deformable DETR, we follow mask2former to call\n  it pixel decoder. It is in `maskdino/modeling/pixel_decoder`, you can change your multi-scale encoder. The returned values \n  include \n  1. `mask_features` is the per-pixel embeddings with resolution 1/4 of the original image, obtained by fusing backbone 1/4 features and multi-scale encoder encoded 1/8 features. This is used to produce binary masks.\n  2. `multi_scale_features`, which is the multi-scale inputs to the Transformer decoder.\n  For ResNet-50 models with 4 scales, we use resolution 1/32, 1/16, and 1/8 but you can use arbitrary resolutions here, and follow DINO to additionally downsample\n     1/32 to get a 4th scale with 1/64 resolution. For 5-scale models with SwinL, we additional use 1/4 resolution features as in DINO.\n\n* **transformer decoder**: it mainly follows DINO decoder to do detection and segmentation tasks. It is defined in `maskdino/modeling/transformer_decoder`.\n\n\n## LICNESE\nMask DINO is released under the Apache 2.0 license. Please see the [LICENSE](LICNESE) file for more information.\n\nCopyright (c) IDEA. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use these files except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.\n\n## \u003ca name=\"CitingMaskDINO\"\u003e\u003c/a\u003eCiting Mask DINO\n\nIf you find our work helpful for your research, please consider citing the following BibTeX entry.\n\n```BibTeX\n@misc{li2022mask,\n      title={Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation}, \n      author={Feng Li and Hao Zhang and Huaizhe xu and Shilong Liu and Lei Zhang and Lionel M. Ni and Heung-Yeung Shum},\n      year={2022},\n      eprint={2206.02777},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV}\n}\n```\n\nIf you find the code useful, please also consider the following BibTeX entry.\n\n```BibTeX\n@misc{zhang2022dino,\n      title={DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection}, \n      author={Hao Zhang and Feng Li and Shilong Liu and Lei Zhang and Hang Su and Jun Zhu and Lionel M. Ni and Heung-Yeung Shum},\n      year={2022},\n      eprint={2203.03605},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV}\n}\n\n@inproceedings{li2022dn,\n      title={Dn-detr: Accelerate detr training by introducing query denoising},\n      author={Li, Feng and Zhang, Hao and Liu, Shilong and Guo, Jian and Ni, Lionel M and Zhang, Lei},\n      booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},\n      pages={13619--13627},\n      year={2022}\n}\n\n@inproceedings{\n      liu2022dabdetr,\n      title={{DAB}-{DETR}: Dynamic Anchor Boxes are Better Queries for {DETR}},\n      author={Shilong Liu and Feng Li and Hao Zhang and Xiao Yang and Xianbiao Qi and Hang Su and Jun Zhu and Lei Zhang},\n      booktitle={International Conference on Learning Representations},\n      year={2022},\n      url={https://openreview.net/forum?id=oMI9PjOb9Jl}\n}\n```\n\n## Acknowledgement\n\nMany thanks to these excellent opensource projects \n* [Mask2Former](https://github.com/facebookresearch/Mask2Former) \n* [DINO](https://github.com/IDEA-Research/DINO)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fidea-research%2Fmaskdino","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fidea-research%2Fmaskdino","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fidea-research%2Fmaskdino/lists"}