Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dddzg/up-detr
[TPAMI 2022 & CVPR2021 Oral] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
https://github.com/dddzg/up-detr
coco cvpr cvpr2021 detection detr self-supervised tpami transformers
Last synced: 3 months ago
JSON representation
[TPAMI 2022 & CVPR2021 Oral] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
- Host: GitHub
- URL: https://github.com/dddzg/up-detr
- Owner: dddzg
- License: apache-2.0
- Created: 2021-03-03T09:22:14.000Z (almost 4 years ago)
- Default Branch: master
- Last Pushed: 2023-07-19T12:51:27.000Z (over 1 year ago)
- Last Synced: 2024-08-08T23:21:27.688Z (7 months ago)
- Topics: coco, cvpr, cvpr2021, detection, detr, self-supervised, tpami, transformers
- Language: Python
- Homepage:
- Size: 3.45 MB
- Stars: 475
- Watchers: 13
- Forks: 71
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- StarryDivineSky - dddzg/up-detr
README
**UP-DETR**: Unsupervised Pre-training for Object Detection with Transformers
========
This is the official PyTorch implementation and models for [UP-DETR paper](https://arxiv.org/abs/2011.09094) and the [extended version](https://ieeexplore.ieee.org/document/9926201):
```
@ARTICLE{9926201,
author={Dai, Zhigang and Cai, Bolun and Lin, Yugeng and Chen, Junying},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
title={Unsupervised Pre-Training for Detection Transformers},
year={2022},
volume={},
number={},
pages={1-11},
doi={10.1109/TPAMI.2022.3216514}}@InProceedings{Dai_2021_CVPR,
author = {Dai, Zhigang and Cai, Bolun and Lin, Yugeng and Chen, Junying},
title = {UP-DETR: Unsupervised Pre-Training for Object Detection With Transformers},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2021},
pages = {1601-1610}
}
```
In UP-DETR, we introduce a novel pretext named **random query patch detection** to pre-train transformers for object detection.
UP-DETR inherits from DETR with the same ResNet-50 backbone, same Transformer encoder, decoder and same codebase.
With unsupervised pre-training CNN, the whole UP-DETR pre-training doesn't require any human annotations.
UP-DETR achieves **43.1 AP**([even higher](https://github.com/dddzg/up-detr/issues/8)) on COCO with 300 epochs fine-tuning. The AP of open-source version is a little higher than paper report.
# Model Zoo
We provide pre-training UP-DETR and fine-tuning UP-DETR models on COCO, and plan to include more in future.
The evaluation metric is same to [DETR](https://github.com/facebookresearch/detr).Here is the UP-DETR model pre-trained on **ImageNet** without labels.
The CNN weight is initialized from [SwAV](https://github.com/facebookresearch/swav), which is fixed during the transformer **pre-training**:
name
backbone
epochs
url
size
md5
UP-DETR
R50 (SwAV)
60
model | logs
164Mb
49f01f8b
The result of UP-DETR **fine-tuned** on **COCO**:
name
backbone (pre-train)
epochs
box AP
APS
APM
APL
url
DETR
R50 (Supervised)
500
42.0
20.5
45.8
61.1
-
DETR
R50 (SwAV)
300
42.1
19.7
46.3
60.9
-
UP-DETR
R50 (SwAV)
300
43.1
21.6
46.8
62.4
model | logs
COCO val5k evaluation results of UP-DETR can be found in this [gist](https://gist.github.com/dddzg/cd0957c5643f5656f6cdc979da4d6db1).
# Usage - Object Detection
There are no extra compiled components in UP-DETR and package dependencies are same to DETR.
We provide instructions how to install dependencies via conda:
```
git clone tbd
conda install -c pytorch pytorch torchvision
conda install cython scipy
pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
```UP-DETR follows two steps: **pre-training** and **fine-tuning**.
We present the model pre-trained on ImageNet and then fine-tuned on COCO.
## Unsupervised Pre-training
### Data Preparation
Download and extract ILSVRC2012 train dataset.We expect the directory structure to be the following:
```
path/to/imagenet/
n06785654/ # caterogey directory
n06785654_16140.JPEG # images
n04584207/ # caterogey directory
n04584207_14322.JPEG # images
```
Images can be organized disorderly because our pre-training is unsupervised.### Pre-training
To pr-train UP-DETR on a single node with 8 gpus for 60 epochs, run:
```
python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py \
--lr_drop 40 \
--epochs 60 \
--pre_norm \
--num_patches 10 \
--batch_size 32 \
--feature_recon \
--fre_cnn \
--imagenet_path path/to/imagenet \
--output_dir path/to/save_model
```
As the size of pre-training images is relative small, so we can set a large batch size.It takes about 2 hours for a epoch, so 60 epochs pre-training takes about 5 days with 8 V100 gpus.
In our further ablation experiment, we found that object query shuffle is not helpful. So, we remove it in the open-source version.
## Fine-tuning
### Data Preparation
Download and extract [COCO 2017 dataset](https://cocodataset.org/#download) train and val dataset.The directory structure is expected as follows:
```
path/to/coco/
annotations/ # annotation json files
train2017/ # train images
val2017/ # val images
```
### Fine-tuningTo fine-tune UP-DETR with 8 gpus for 300 epochs, run:
```
python -m torch.distributed.launch --nproc_per_node=8 --use_env detr_main.py \
--lr_drop 200 \
--epochs 300 \
--lr_backbone 5e-5 \
--pre_norm \
--coco_path path/to/coco \
--pretrain path/to/save_model/checkpoint.pth
```
The fine-tuning cost is exactly same to DETR, which takes 28 minutes with 8 V100 gpus. So, 300 epochs training takes about 6 days.The model can also extended to panoptic segmentation, checking more details on [DETR](https://github.com/facebookresearch/detr/blob/master/README.md#usage---segmentation).
### Evaluation
```
python detr_main.py \
--batch_size 2 \
--eval \
--no_aux_loss \
--pre_norm \
--coco_path path/to/coco \
--resume path/to/save_model/checkpoint.pth
```
COCO val5k evaluation results of UP-DETR can be found in this [gist](https://gist.github.com/dddzg/cd0957c5643f5656f6cdc979da4d6db1).# Notebook
We provide a notebook in colab to get the visualization result in the paper:
* [Visualization Notebook](https://colab.research.google.com/github/dddzg/up-detr/blob/master/visualization.ipynb): This notebook shows how to perform query patch detection with the pre-training model (without any annotations fine-tuning).

# License
UP-DETR is released under the Apache 2.0 license. Please see the [LICENSE](LICENSE) file for more information.