https://github.com/zhenyuw16/UniDetector

Code release for our CVPR 2023 paper "Detecting Everything in the Open World: Towards Universal Object Detection".
https://github.com/zhenyuw16/UniDetector

Last synced: about 2 months ago
JSON representation

Code release for our CVPR 2023 paper "Detecting Everything in the Open World: Towards Universal Object Detection".

Host: GitHub
URL: https://github.com/zhenyuw16/UniDetector
Owner: zhenyuw16
License: apache-2.0
Created: 2023-03-20T07:10:15.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2023-04-21T02:53:52.000Z (about 2 years ago)
Last Synced: 2024-10-28T05:13:00.951Z (6 months ago)
Language: Python
Homepage:
Size: 10.1 MB
Stars: 536
Watchers: 14
Forks: 24
Open Issues: 28
Metadata Files:
- Readme: README.md
- Contributing: .github/CONTRIBUTING.md
- License: LICENSE
- Code of conduct: .github/CODE_OF_CONDUCT.md

Awesome Lists containing this project

awesome-yolo-object-detection - UniDetector
awesome-yolo-object-detection - UniDetector

README

        # UniDetector

  


> [**Detecting Everything in the Open World: Towards Universal Object Detection**](https://arxiv.org/abs/2303.11749),        

> *CVPR 2023

## Installation

Our code is based on [mmdetection v2.18.0](https://github.com/open-mmlab/mmdetection/tree/v2.18.0). See its [official installation](https://github.com/open-mmlab/mmdetection/blob/v2.18.0/docs/get_started.md) for installation.

[CLIP](https://github.com/openai/CLIP) is also required for running the code.

## Preparation

Please first [prepare datasets](docs/datasets.md)

Prepare the language CLIP embeddings. We have released the pre-computed embeddings in the [clip_embeddings](clip_embeddings/) folder, you can also run the script to obtain the language embeddings:

~~~

python scripts/dump_clip_features_manyprompt.py --ann path_to_annotation_for_datasets --clip_model RN50 --out_path path_to_lanugage_embeddings

~~~

Prepare the pre-trained [RegionCLIP](https://github.com/microsoft/RegionCLIP) parameters. We have released the RegionCLIP embeddings converted in mmdetection formats [google drive](https://drive.google.com/file/d/1icKGFMQRHZpKhjl-YwN-389w2jx6siR2/view?usp=sharing), [Baidu drive](https://pan.baidu.com/s/1vTJcXSpjuPx8nnBufePc7Q), 提取码bj48. The code for parameter conversion will be released soon.

## Singe-dataset training

### End-to-end training

run

~~~

bash tools/dist_train.sh configs/singledataset/clip_end2end_faster_rcnn_r50_c4_1x_coco.py 8 --cfg-options load_from=regionclip_pretrained-cc_rn50_mmdet.pth

~~~

to train a Faster RCNN model on the single COCO dataset (val35k).

### Decoupled training

train the region proposal stage (our CLN model) on the single COCO dataset (val35k):

~~~

bash tools/dist_train.sh configs/singledataset/clip_decouple_faster_rcnn_r50_c4_1x_coco_1ststage.py 8

~~~

extract pre-computed region proposals:

~~~

bash tools/dist_test.sh configs/singledataset/clip_decouple_faster_rcnn_r50_c4_1x_coco_1ststage.py [path_for_trained_checkpoints] 8 --out rp_train.pkl

~~~

Modify the datasets in config files to extract region proposals on the COCO validation datasets. The default proposal names we use are rp_train.pkl and rp_val.pkl, which is specified in the config file of the second stage.

train the RoI classification stage on the single COCO dataset (val35k):

~~~

bash tools/dist_train.sh configs/singledataset/clip_decouple_faster_rcnn_r50_c4_1x_coco_2ndstage.py 8 --cfg-options load_from=regionclip_pretrained-cc_rn50_mmdet.pth

~~~

## Open-world inference

### End-to-end inference

inference on the LVIS v0.5 dataset to evaluation the open-world performance of end-to-end models:

~~~

bash tools/dist_test.sh configs/inference/clip_end2end_faster_rcnn_r50_c4_1x_lvis_v0.5.py [path_for_trained_checkpoints] 8 --eval bbox

~~~

### Decoupled inference

extract pre-computed region proposals:

~~~

bash tools/dist_test.sh configs/inference/clip_decouple_faster_rcnn_r50_c4_1x_lvis_v0.5_1ststage.py [path_for_trained_checkpoints] 8 --out rp_val_ow.pkl

~~~

inference with pre-computed proposals and the RoI classification stage:

~~~

bash tools/dist_test.sh configs/inference/clip_decouple_faster_rcnn_r50_c4_1x_lvis_v0.5_2ndstage.py [path_for_trained_checkpoints] 8 --eval bbox

~~~

### Inference with probability calibration

For inference with probability calibration, obtain detection results for prior probability by infering first:

~~~

bash tools/dist_test.sh configs/inference/clip_decouple_faster_rcnn_r50_c4_1x_lvis_v0.5_2ndstage.py [path_for_trained_checkpoints] 8 --out raw_lvis_results.pkl --eval bbox

~~~

`raw_lvis_results.pkl` here is the detection result file we use here by default.

Then inference with probability calibration:

~~~

bash tools/dist_test.sh configs/inference/clip_decouple_faster_rcnn_r50_c4_1x_lvis_v0.5_2ndstage_withcalibration.py [path_for_trained_checkpoints] 8 --eval bbox

~~~

## Multi-dataset training

The steps for multi-dataset training are generally the same as single-dataset training. Use the config files under `configs/multidataset/` for multi-dataset training. We release the config files for training with two datasets (Objects365 and COCO) and three datasets (OpenImages, Objects365 and COCO). 

## MODEL ZOO

We will release other checkpoints soon.

|Training Data                  |  end-to-end training | decoupled training (1st stage)  | decoupled training (2nd stage)  |

|-------------------------------|----------------------|-----------------|----------|

|COCO                           | [model](https://drive.google.com/file/d/1zKjKO_jSMQmIu5qNuQwyKohKDdCJ7tnG/view?usp=sharing)              |  [model](https://drive.google.com/file/d/1zAvoPx5btVug64Zz_9-VNtp6OzyYLt_d/view?usp=sharing) |   [model](https://drive.google.com/file/d/1I__-S-FzvLM2ToxenSzESe4MAy3mATK7/view?usp=sharing)     |

|COCO + Objects365              |                |          |          |

|COCO + Objects365 + OpenImages |                 |         |          |

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/zhenyuw16/UniDetector

Awesome Lists containing this project

README