{"id":13935672,"url":"https://github.com/dddzg/up-detr","last_synced_at":"2025-07-19T20:33:37.700Z","repository":{"id":38082683,"uuid":"344069267","full_name":"dddzg/up-detr","owner":"dddzg","description":"[TPAMI 2022 \u0026 CVPR2021 Oral] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers","archived":false,"fork":false,"pushed_at":"2023-07-19T12:51:27.000Z","size":3621,"stargazers_count":475,"open_issues_count":8,"forks_count":71,"subscribers_count":13,"default_branch":"master","last_synced_at":"2024-08-08T23:21:27.688Z","etag":null,"topics":["coco","cvpr","cvpr2021","detection","detr","self-supervised","tpami","transformers"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dddzg.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-03-03T09:22:14.000Z","updated_at":"2024-08-05T06:46:13.000Z","dependencies_parsed_at":"2024-04-27T23:45:46.695Z","dependency_job_id":null,"html_url":"https://github.com/dddzg/up-detr","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dddzg%2Fup-detr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dddzg%2Fup-detr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dddzg%2Fup-detr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dddzg%2Fup-detr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dddzg","download_url":"https://codeload.github.com/dddzg/up-detr/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":226677228,"owners_count":17666019,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["coco","cvpr","cvpr2021","detection","detr","self-supervised","tpami","transformers"],"created_at":"2024-08-07T23:01:58.876Z","updated_at":"2024-11-27T03:31:06.105Z","avatar_url":"https://github.com/dddzg.png","language":"Python","funding_links":[],"categories":["Python","对象检测、分割"],"sub_categories":["网络服务_其他"],"readme":"**UP-DETR**: Unsupervised Pre-training for Object Detection with Transformers\n========\nThis is the official PyTorch implementation and models for [UP-DETR paper](https://arxiv.org/abs/2011.09094) and the [extended version](https://ieeexplore.ieee.org/document/9926201):\n```\n@ARTICLE{9926201,\n  author={Dai, Zhigang and Cai, Bolun and Lin, Yugeng and Chen, Junying},\n  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, \n  title={Unsupervised Pre-Training for Detection Transformers}, \n  year={2022},\n  volume={},\n  number={},\n  pages={1-11},\n  doi={10.1109/TPAMI.2022.3216514}}\n\n@InProceedings{Dai_2021_CVPR,\n    author    = {Dai, Zhigang and Cai, Bolun and Lin, Yugeng and Chen, Junying},\n    title     = {UP-DETR: Unsupervised Pre-Training for Object Detection With Transformers},\n    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},\n    month     = {June},\n    year      = {2021},\n    pages     = {1601-1610}\n}\n```\nIn UP-DETR, we introduce a novel pretext named **random query patch detection** to pre-train transformers for object detection.\nUP-DETR inherits from DETR with the same ResNet-50 backbone, same Transformer encoder, decoder and same codebase.\nWith unsupervised pre-training CNN, the whole UP-DETR pre-training doesn't require any human annotations.\nUP-DETR achieves **43.1 AP**([even higher](https://github.com/dddzg/up-detr/issues/8)) on COCO with 300 epochs fine-tuning. The AP of open-source version is a little higher than paper report.\n\n![UP-DETR](.github/UP-DETR.png)\n\n# Model Zoo\nWe provide pre-training UP-DETR and fine-tuning UP-DETR models on COCO, and plan to include more in future.\nThe evaluation metric is same to [DETR](https://github.com/facebookresearch/detr).\n\n\nHere is the UP-DETR model pre-trained on **ImageNet** without labels. \nThe CNN weight is initialized from [SwAV](https://github.com/facebookresearch/swav), which is fixed during the transformer **pre-training**:\n\n\u003ctable\u003e\n  \u003cthead\u003e\n    \u003ctr style=\"text-align: right;\"\u003e\n      \u003cth\u003ename\u003c/th\u003e\n      \u003cth\u003ebackbone\u003c/th\u003e\n      \u003cth\u003eepochs\u003c/th\u003e\n      \u003cth\u003eurl\u003c/th\u003e\n      \u003cth\u003esize\u003c/th\u003e\n      \u003cth\u003emd5\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eUP-DETR\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eR50 (SwAV)\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e60\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003ca href=\"https://drive.google.com/file/d/1JhL1uwNJCaxMrIUx7UzQ3CMCHqmZpCnn/view?usp=sharing\"\u003emodel\u003c/a\u003e\u0026nbsp;|\u0026nbsp;\u003ca href=\"https://drive.google.com/file/d/19BfOQzZmyOOrkdWPfpFd4HIEKaM8s5d6/view?usp=sharing\"\u003elogs\u003c/a\u003e\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e164Mb\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003ctt\u003e49f01f8b\u003c/tt\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\nThe result of UP-DETR **fine-tuned** on **COCO**:\n\u003ctable\u003e\n  \u003cthead\u003e\n    \u003ctr style=\"text-align: right;\"\u003e\n      \u003cth align=\"center\"\u003ename\u003c/th\u003e\n      \u003cth align=\"center\"\u003ebackbone (pre-train)\u003c/th\u003e\n      \u003cth align=\"center\"\u003eepochs\u003c/th\u003e\n      \u003cth align=\"center\"\u003ebox AP\u003c/th\u003e\n      \u003cth align=\"center\"\u003eAP\u003csub\u003eS\u003c/sub\u003e\u003c/th\u003e\n      \u003cth align=\"center\"\u003eAP\u003csub\u003eM\u003c/sub\u003e\u003c/th\u003e\n      \u003cth align=\"center\"\u003eAP\u003csub\u003eL\u003c/sub\u003e\u003c/th\u003e\n      \u003cth align=\"center\"\u003eurl\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eDETR\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eR50 (Supervised)\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e500\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e42.0\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e20.5\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e45.8\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e61.1\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e - \u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eDETR\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eR50 (SwAV)\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e300\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e42.1\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e19.7\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e46.3\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e60.9\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e - \u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd align=\"center\"\u003eUP-DETR\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eR50 (SwAV)\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e300\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003cb\u003e43.1\u003c/b\u003e\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003cb\u003e21.6\u003c/b\u003e\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003cb\u003e46.8\u003c/b\u003e\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003cb\u003e62.4\u003c/b\u003e\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e \u003ca href=\"https://drive.google.com/file/d/1_YNtzKKaQbgFfd6m2ZUCO6LWpKqd7o7X/view?usp=sharing\"\u003emodel\u003c/a\u003e\u0026nbsp;|\u0026nbsp;\u003ca href=\"https://drive.google.com/file/d/1DQqveOZnMc2VaBhMzl9VilMxdeniiWXo/view?usp=sharing\"\u003elogs\u003c/a\u003e \u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\nCOCO val5k evaluation results of UP-DETR can be found in this [gist](https://gist.github.com/dddzg/cd0957c5643f5656f6cdc979da4d6db1).\n\n\n\n# Usage - Object Detection\nThere are no extra compiled components in UP-DETR and package dependencies are same to DETR. \nWe provide instructions how to install dependencies via conda:\n```\ngit clone tbd\nconda install -c pytorch pytorch torchvision\nconda install cython scipy\npip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'\n```\n\nUP-DETR follows two steps: **pre-training** and **fine-tuning**.\nWe present the model pre-trained on ImageNet and then fine-tuned on COCO.\n \n## Unsupervised Pre-training\n### Data Preparation\nDownload and extract ILSVRC2012 train dataset.\n\nWe expect the directory structure to be the following:\n```\npath/to/imagenet/\n  n06785654/  # caterogey directory\n    n06785654_16140.JPEG # images\n  n04584207/  # caterogey directory\n    n04584207_14322.JPEG # images\n```\nImages can be organized disorderly because our pre-training is unsupervised.  \n\n### Pre-training\nTo pr-train UP-DETR on a single node with 8 gpus for 60 epochs, run:\n```\npython -m torch.distributed.launch --nproc_per_node=8 --use_env main.py \\\n    --lr_drop 40 \\\n    --epochs 60 \\\n    --pre_norm \\\n    --num_patches 10 \\\n    --batch_size 32 \\\n    --feature_recon \\\n    --fre_cnn \\\n    --imagenet_path path/to/imagenet \\\n    --output_dir path/to/save_model\n```\nAs the size of pre-training images is relative small, so we can set a large batch size.\n\nIt takes about 2 hours for a epoch, so 60 epochs pre-training takes about 5 days with 8 V100 gpus.\n\nIn our further ablation experiment, we found that object query shuffle is not helpful. So, we remove it in the open-source version. \n\n## Fine-tuning\n### Data Preparation\nDownload and extract [COCO 2017 dataset](https://cocodataset.org/#download) train and val dataset.\n\nThe directory structure is expected as follows:\n```\npath/to/coco/\n  annotations/  # annotation json files\n  train2017/    # train images\n  val2017/      # val images\n```\n### Fine-tuning\n\nTo fine-tune UP-DETR with 8 gpus for 300 epochs, run:\n\n```\npython -m torch.distributed.launch --nproc_per_node=8 --use_env detr_main.py \\\n    --lr_drop 200 \\\n    --epochs 300 \\\n    --lr_backbone 5e-5 \\\n    --pre_norm \\\n    --coco_path path/to/coco \\\n    --pretrain path/to/save_model/checkpoint.pth\n```\nThe fine-tuning cost is exactly same to DETR, which takes 28 minutes with 8 V100 gpus. So, 300 epochs training takes about 6 days.\n\nThe model can also extended to panoptic segmentation, checking more details on [DETR](https://github.com/facebookresearch/detr/blob/master/README.md#usage---segmentation).\n\n### Evaluation\n```\npython detr_main.py \\\n    --batch_size 2 \\\n    --eval \\\n    --no_aux_loss \\\n    --pre_norm \\\n    --coco_path path/to/coco \\\n    --resume path/to/save_model/checkpoint.pth\n```\nCOCO val5k evaluation results of UP-DETR can be found in this [gist](https://gist.github.com/dddzg/cd0957c5643f5656f6cdc979da4d6db1).\n\n\n# Notebook\n\nWe provide a notebook in colab to get the visualization result in the paper:\n\n* [Visualization Notebook](https://colab.research.google.com/github/dddzg/up-detr/blob/master/visualization.ipynb): This notebook shows how to perform query patch detection with the pre-training model (without any annotations fine-tuning).\n\n![vis](.github/vis.png)\n\n# License\nUP-DETR is released under the Apache 2.0 license. Please see the [LICENSE](LICENSE) file for more information.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdddzg%2Fup-detr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdddzg%2Fup-detr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdddzg%2Fup-detr/lists"}