Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/facebookresearch/omni3d

Code release for "Omni3D A Large Benchmark and Model for 3D Object Detection in the Wild"
https://github.com/facebookresearch/omni3d
Last synced: 6 days ago
JSON representation
Code release for "Omni3D A Large Benchmark and Model for 3D Object Detection in the Wild"
Host: GitHub
URL: https://github.com/facebookresearch/omni3d
Owner: facebookresearch
License: other
Created: 2022-07-16T08:28:39.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-04-07T04:13:16.000Z (7 months ago)
Last Synced: 2024-10-28T06:57:54.756Z (16 days ago)
Language: Python
Size: 6.79 MB
Stars: 723
Watchers: 22
Forks: 65
Open Issues: 38
Metadata Files:
- Readme: README.md
- Contributing: .github/CONTRIBUTING.md
- License: LICENSE.md
- Code of conduct: .github/CODE_OF_CONDUCT.md
Awesome Lists containing this project

README

        # Omni3D & Cube R-CNN

[![Support Ukraine](https://img.shields.io/badge/Support-Ukraine-FFD500?style=flat&labelColor=005BBB)](https://opensource.fb.com/support-ukraine)

**Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild**

[Garrick Brazil][gb], [Abhinav Kumar][ak], [Julian Straub][js], [Nikhila Ravi][nr], [Justin Johnson][jj], [Georgia Gkioxari][gg]

[[`Project Page`](https://garrickbrazil.com/omni3d)] [[`arXiv`](https://arxiv.org/abs/2207.10660)] [[`BibTeX`](#citing)]

	

		


			Zero-shot (+ tracking) on Project Aria data

			

		

	

	

		

			Predictions on COCO

			

		

	

## Table of Contents:

1. [Installation](#installation)

2. [Demo](#demo)

3. [Omni3D Data](#data)

4. [Cube R-CNN Training](#training)

5. [Cube R-CNN Inference](#inference)

6. [Results](#results)

7. [License](#license)

8. [Citing](#citing)

## Installation 

- [Detectron2][d2]

- [PyTorch][pyt]

- [PyTorch3D][py3d]

- [COCO][coco]

``` bash

# setup new evironment

conda create -n cubercnn python=3.8

source activate cubercnn

# main dependencies

conda install -c fvcore -c iopath -c conda-forge -c pytorch3d -c pytorch fvcore iopath pytorch3d pytorch=1.8 torchvision=0.9.1 cudatoolkit=10.1

# OpenCV, COCO, detectron2

pip install cython opencv-python

pip install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.8/index.html

# other dependencies

conda install -c conda-forge scipy seaborn

```

For reference, we used `cuda/10.1` and `cudnn/v7.6.5.32` for our experiments. We expect that slight variations in versions are also compatible. 

## Demo 

To run the Cube R-CNN demo on a folder of input images using our `DLA34` model trained on the full Omni3D dataset,

``` bash

# Download example COCO images

sh demo/download_demo_COCO_images.sh

# Run an example demo

python demo/demo.py \

--config-file cubercnn://omni3d/cubercnn_DLA34_FPN.yaml \

--input-folder "datasets/coco_examples" \

--threshold 0.25 --display \

MODEL.WEIGHTS cubercnn://omni3d/cubercnn_DLA34_FPN.pth \

OUTPUT_DIR output/demo 

```

See [`demo.py`](demo/demo.py) for more details. For example, if you know the camera intrinsics you may input them as arguments with the convention `--focal-length ` and `--principal-point  `. See our [`MODEL_ZOO.md`](MODEL_ZOO.md) for more model checkpoints. 

## Omni3D Data 

See [`DATA.md`](DATA.md) for instructions on how to download and set up images and annotations of our Omni3D benchmark for training and evaluating Cube R-CNN. 

## Training Cube R-CNN on Omni3D 

We provide config files for trainin Cube R-CNN on

* Omni3D: [`configs/Base_Omni3D.yaml`](configs/Base_Omni3D.yaml)

* Omni3D indoor: [`configs/Base_Omni3D_in.yaml`](configs/Base_Omni3D_in.yaml)

* Omni3D outdoor: [`configs/Base_Omni3D_out.yaml`](configs/Base_Omni3D_out.yaml)

We train on 48 GPUs using [submitit](https://github.com/facebookincubator/submitit) which wraps the following training command,

```bash

python tools/train_net.py \

  --config-file configs/Base_Omni3D.yaml \

  OUTPUT_DIR output/omni3d_example_run

```

Note that our provided configs specify hyperparameters tuned for 48 GPUs. You could train on 1 GPU (though with no guarantee of reaching the final performance) as follows,

``` bash

python tools/train_net.py \

  --config-file configs/Base_Omni3D.yaml --num-gpus 1 \

  SOLVER.IMS_PER_BATCH 4 SOLVER.BASE_LR 0.0025 \

  SOLVER.MAX_ITER 5568000 SOLVER.STEPS (3340800, 4454400) \

  SOLVER.WARMUP_ITERS 174000 TEST.EVAL_PERIOD 1392000 \

  VIS_PERIOD 111360 OUTPUT_DIR output/omni3d_example_run

```

### Tips for Tuning Hyperparameters 

Our Omni3D configs are designed for multi-node training. 

We follow a simple scaling rule for adjusting to different system configurations. We find that 16GB GPUs (e.g. V100s) can hold 4 images per batch when training with a `DLA34` backbone. If $g$ is the number of GPUs, then the number of images per batch is $b = 4g$. Let's define $r$ to be the ratio between the recommended batch size $b_0$ and the actual batch size $b$, namely $r = b_0 / b$. The values for $b_0$ can be found in the configs. For instance, for the full Omni3D training $b_0 = 196$ as shown [here](https://github.com/facebookresearch/omni3d/blob/main/configs/Base_Omni3D.yaml#L4).

We scale the following hyperparameters as follows:

  * `SOLVER.IMS_PER_BATCH` $=b$

  * `SOLVER.BASE_LR` $/=r$

  * `SOLVER.MAX_ITER`  $*=r$

  * `SOLVER.STEPS`  $*=r$

  * `SOLVER.WARMUP_ITERS` $*=r$

  * `TEST.EVAL_PERIOD` $*=r$

  * `VIS_PERIOD`  $*=r$

We tune the number of GPUs $g$ such that `SOLVER.MAX_ITER` is in a range between about 90 - 120k iterations. We cannot guarantee that all GPU configurations perform the same. We expect noticeable performance differences at extreme ends of resources (e.g. when using 1 GPU).

## Inference on Omni3D 

To evaluate trained models from Cube R-CNN's [`MODEL_ZOO.md`](MODEL_ZOO.md), run

```

python tools/train_net.py \

  --eval-only --config-file cubercnn://omni3d/cubercnn_DLA34_FPN.yaml \

  MODEL.WEIGHTS cubercnn://omni3d/cubercnn_DLA34_FPN.pth \

  OUTPUT_DIR output/evaluation

```

Our evaluation is similar to COCO evaluation and uses $IoU_{3D}$ (from [PyTorch3D](https://github.com/facebookresearch/pytorch3d/blob/main/pytorch3d/ops/iou_box3d.py)) as a metric. We compute the aggregate 3D performance averaged across categories. 

To run the evaluation on your own models outside of the Cube R-CNN evaluation loop, we recommending using the `Omni3DEvaluationHelper` class from our [evaluation](https://github.com/facebookresearch/omni3d/blob/main/cubercnn/evaluation/omni3d_evaluation.py#L60-L88) similar to how it is utilized [here](https://github.com/facebookresearch/omni3d/blob/main/tools/train_net.py#L68-L114). 

The evaluator relies on the detectron2 MetadataCatalog for keeping track of category names and contiguous IDs. Hence, it is important to set these variables appropriately. 

```

# (list[str]) the category names in their contiguous order

MetadataCatalog.get('omni3d_model').thing_classes = ... 

# (dict[int: int]) the mapping from Omni3D category IDs to the contiguous order

MetadataCatalog.get('omni3d_model').thing_dataset_id_to_contiguous_id = ...

```

In summary, the evaluator expects a list of image-level predictions in the format of:

```

{

    "image_id":  the unique image identifier from Omni3D,

    "K":  3x3 intrinsics matrix for the image,

    "width":  image width,

    "height":  image height,

    "instances": [

        {

            "image_id":   the unique image identifier from Omni3D,

            "category_id":  the contiguous category prediction IDs, 

                which can be mapped from Omni3D's category ID's using

                MetadataCatalog.get('omni3d_model').thing_dataset_id_to_contiguous_id

            "bbox": [float] 2D box as [x1, y1, x2, y2] used for IoU2D,

            "score":  the confidence score for the object,

            "depth":  the depth of the center of the object,

            "bbox3D": list[list[float]] 8x3 corner vertices used for IoU3D,

        }

        ...

    ]

}

```

## Results 

See [`RESULTS.md`](RESULTS.md) for detailed Cube R-CNN performance and comparison with other methods.

## License 

Cube R-CNN is released under [CC-BY-NC 4.0](LICENSE.md).

## Citing 

Please use the following BibTeX entry if you use Omni3D and/or Cube R-CNN in your research or refer to our results.

```BibTeX

@inproceedings{brazil2023omni3d,

  author =       {Garrick Brazil and Abhinav Kumar and Julian Straub and Nikhila Ravi and Justin Johnson and Georgia Gkioxari},

  title =        {{Omni3D}: A Large Benchmark and Model for {3D} Object Detection in the Wild},

  booktitle =    {CVPR},

  address =      {Vancouver, Canada},

  month =        {June},

  year =         {2023},

  organization = {IEEE},

}

```

If you use the Omni3D benchmark, we kindly ask you to additionally cite all datasets. BibTex entries are provided below.

Dataset BibTex

```BibTex

@inproceedings{Geiger2012CVPR,

  author = {Andreas Geiger and Philip Lenz and Raquel Urtasun},

  title = {Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite},

  booktitle = {CVPR},

  year = {2012}

}

``` 

```BibTex

@inproceedings{caesar2020nuscenes,

  title={nuscenes: A multimodal dataset for autonomous driving},

  author={Caesar, Holger and Bankiti, Varun and Lang, Alex H and Vora, Sourabh and Liong, Venice Erin and Xu, Qiang and Krishnan, Anush and Pan, Yu and Baldan, Giancarlo and Beijbom, Oscar},

  booktitle={CVPR},

  year={2020}

}

```

```BibTex

@inproceedings{song2015sun,

  title={Sun rgb-d: A rgb-d scene understanding benchmark suite},

  author={Song, Shuran and Lichtenberg, Samuel P and Xiao, Jianxiong},

  booktitle={CVPR},

  year={2015}

}

```

```BibTex

@inproceedings{dehghan2021arkitscenes,

  title={{ARK}itScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile {RGB}-D Data},

  author={Gilad Baruch and Zhuoyuan Chen and Afshin Dehghan and Tal Dimry and Yuri Feigin and Peter Fu and Thomas Gebauer and Brandon Joffe and Daniel Kurz and Arik Schwartz and Elad Shulman},

  booktitle={NeurIPS Datasets and Benchmarks Track (Round 1)},

  year={2021},

}

```

```BibTex

@inproceedings{hypersim,

  author    = {Mike Roberts AND Jason Ramapuram AND Anurag Ranjan AND Atulit Kumar AND

                 Miguel Angel Bautista AND Nathan Paczan AND Russ Webb AND Joshua M. Susskind},

  title     = {{Hypersim}: {A} Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding},

  booktitle = {ICCV},

  year      = {2021},

}

```

```BibTex

@article{objectron2021,

  title={Objectron: A Large Scale Dataset of Object-Centric Videos in the Wild with Pose Annotations},

  author={Ahmadyan, Adel and Zhang, Liangkai and Ablavatski, Artsiom and Wei, Jianing and Grundmann, Matthias},

  journal={CVPR},

  year={2021},

}

```

[gg]: https://github.com/gkioxari

[jj]: https://github.com/jcjohnson

[gb]: https://github.com/garrickbrazil

[ak]: https://github.com/abhi1kumar

[nr]: https://github.com/nikhilaravi

[js]: https://github.com/jstraub

[d2]: https://github.com/facebookresearch/detectron2

[py3d]: https://github.com/facebookresearch/pytorch3d

[pyt]: https://pytorch.org/

[coco]: https://cocodataset.org/