{"id":16865760,"url":"https://github.com/tai-wang/depth-from-motion","last_synced_at":"2025-04-06T01:09:06.720Z","repository":{"id":47302020,"uuid":"516021516","full_name":"Tai-Wang/Depth-from-Motion","owner":"Tai-Wang","description":"[ECCV 2022 oral] Monocular 3D Object Detection with Depth from Motion","archived":false,"fork":false,"pushed_at":"2022-10-11T11:52:03.000Z","size":17944,"stargazers_count":312,"open_issues_count":7,"forks_count":29,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-03-30T00:07:31.079Z","etag":null,"topics":["3d-detection","autonomous-driving","monocular","pytorch","robotics","structure-from-motion"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Tai-Wang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null}},"created_at":"2022-07-20T14:46:38.000Z","updated_at":"2025-02-01T08:16:22.000Z","dependencies_parsed_at":"2023-01-20T03:03:32.576Z","dependency_job_id":null,"html_url":"https://github.com/Tai-Wang/Depth-from-Motion","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tai-Wang%2FDepth-from-Motion","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tai-Wang%2FDepth-from-Motion/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tai-Wang%2FDepth-from-Motion/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tai-Wang%2FDepth-from-Motion/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Tai-Wang","download_url":"https://codeload.github.com/Tai-Wang/Depth-from-Motion/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247419860,"owners_count":20936012,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-detection","autonomous-driving","monocular","pytorch","robotics","structure-from-motion"],"created_at":"2024-10-13T14:48:23.993Z","updated_at":"2025-04-06T01:09:06.702Z","avatar_url":"https://github.com/Tai-Wang.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Depth from Motion (DfM)\n\nThis repository is the official implementation for DfM and MV-FCOS3D++.\n\n![pv-demo](https://user-images.githubusercontent.com/30491025/181146351-876d8800-7261-4725-aeb1-b42e416eed01.gif)\n\n![3d-demo-318](https://user-images.githubusercontent.com/30491025/181148417-915d9dd0-4f04-49fb-8106-4217e9d27e2a.gif) ![3d-demo2-318](https://user-images.githubusercontent.com/30491025/181148429-1d51bb92-68e2-4ab6-ac67-224822444b1d.gif)\n\n## Introduction\n\nThis is an official release of the paper: `Monocular 3D Object Detection with Depth from Motion` and `MV-FCOS3D++: Multi-View Camera-Only 4D Object Detection with Pretrained Monocular Backbones`.\n\nThe code is still going through large refactoring. We plan to re-organize this repo as a combination of core codes for this project and mmdet3d requirement finally.\n\nPlease stay tuned for the clean release of all the configs and models.\n\nNote: We will also release the refactored code in the official [mmdet3d](https://github.com/open-mmlab/mmdetection3d) soon.\n\n\u003e **Monocular 3D Object Detection with Depth from Motion**,            \n\u003e Tai Wang, Jiangmiao Pang, Dahua Lin            \n\u003e In: Proc. European Conference on Computer Vision (ECCV), 2022          \n\u003e [[arXiv](https://arxiv.org/abs/2207.12988)][[Bibtex](https://github.com/Tai-Wang/Depth-from-Motion#citation)]\n\u003e\n\u003e **MV-FCOS3D++: Multi-View Camera-Only 4D Object Detection with Pretrained Monocular Backbones**,            \n\u003e Tai Wang, Qing Lian, Chenming Zhu, Xinge Zhu, Wenwei Zhang            \n\u003e In: arxiv, 2022          \n\u003e [[arXiv](https://arxiv.org/abs/2207.12716)][[Bibtex](https://github.com/Tai-Wang/Depth-from-Motion#citation)]\n\n## Results\n\n### DfM\n\nThe results of DfM and its corresponding config are shown as below.\n\nWe have released the preliminary model for reproducing the results on the KITTI validation set.\n\nThe complete model checkpoints and logs will be released soon.\n\n|  Backbone | Lr schd | Mem (GB) | Inf time (fps) |  Easy  | Moderate | Hard | Download|\n| :-------: | :-----: | :------: | :------------: | :----: | :------: | :--: | :-----: |\n| [ResNet34](./configs/dfm/dfm_r34_1x8_kitti-3d-3class.py) | - | - | - | 29.1232 | 19.8970 | 17.3910\u003csup\u003e1\u003c/sup\u003e | [model](https://download.openmmlab.com/mim-example/dfm/dfm_r34_1x8_kitti-3d-3class/epoch_53.pth) \\| [log](https://download.openmmlab.com/mim-example/dfm/dfm_r34_1x8_kitti-3d-3class/20220909_092821.log.json) |\n| above @ BEV AP\u003cbr\u003e(IoU 0.7) | - | - | - | 38.9137 | 27.2843 | 24.8381 | |\n| above @ 3D AP\u003cbr\u003e(IoU 0.5) | - | - | - | 67.4935 | 51.2602 | 47.4430 | |\n| above @ BEV AP\u003cbr\u003e(IoU 0.5) | - | - | - | 72.5696 | 55.4583 | 52.4735 | |\n\n[1] This reproduced performance may have some degree of fluctuation due to the limited training samples and sensitive metrics. From my experience of multiple runs, the average performance may vary from 26/18/16 to 29/20/17, depending on the effect of corner cases (caused by matrix inverse computation or other reasons). Please stay tuned for a more stable version. (Models and logs will be updated soon.)\n\n### MV-FCOS3D++\n\nThe results of MV-FCOS3D++ (baseline version) and its corresponding config are shown as below.\n\nWe have released the preliminary config for reproducing the results on the Waymo validation set.\n\n(To comply the license agreement of Waymo dataset, the pre-trained models on Waymo dataset are not released.)\n\nThe complete model configs and logs will be released soon.\n\n#### Pretrained FCOS3D++ (without customized finetuning)\n\n|  Backbone | Lr schd | Mem (GB) | Inf time (fps) |  mAPL  | mAP | mAPH | Download |\n| :-------: | :-----: | :------: | :------------: | :----: | :------: | :--: | :-----: |\n| [ResNet101+DCN](./configs/pgd/pgd_r101_fpn_gn-head_dcn_3x16_2x_waymoD3-mv3d.py) | - | - | - | 20.41 | 28.6 | 27.01 | [log](https://download.openmmlab.com/mim-example/dfm/pgd_r101_fpn_gn-head_dcn_3x16_2x_waymoD3-mv3d/20220808_221519.log.json) |\n| above @ Car | - | - | - | 41.05 | 55.74 | 54.83 | |\n| above @ Pedestrian | - | - | - | 18.77 | 27.85 | 24.21 | |\n| above @ Cyclist | - | - | - | 1.43 | 2.21 | 2.0 | |\n\n#### MV-FCOS3D++ with Pretrained FCOS3D++\n\n|  Backbone | Lr schd | Mem (GB) | Inf time (fps) |  mAPL  | mAP | mAPH | Download |\n| :-------: | :-----: | :------: | :------------: | :----: | :------: | :--: | :-----: |\n| [ResNet101+DCN](./configs/dfm/multiview-dfm_r101_dcn_2x16_waymoD5-3d-3class_camsync.py) | - | - | - | 33.8 | 46.65 | 44.25| [log](https://download.openmmlab.com/mim-example/dfm/multiview-dfm_r101_dcn_2x16_waymoD5-3d-3class_camsync/20220807_153735.log.json) |\n| above @ Car | - | - | - | 52.69 | 68.36 | 67.47 | |\n| above @ Pedestrian | - | - | - | 26.82 | 38.47 | 34.1 | |\n| above @ Cyclist | - | - | - | 21.9 | 33.11 | 31.16 | |\n| [ResNet101+DCN\u003cbr\u003e+10 sweeps](./configs/dfm/multiview-dfm_r101_dcn_2x16_waymoD5-3d-3class_camsync_10sweeps.py) | - | - | - | 35.14| 47.98 | 45.49 | [log1](https://download.openmmlab.com/mim-example/dfm/multiview-dfm_r101_dcn_2x16_waymoD5-3d-3class_camsync_10sweeps/20220808_170010.log.json) \\| [log2](https://download.openmmlab.com/mim-example/dfm/multiview-dfm_r101_dcn_2x16_waymoD5-3d-3class_camsync_10sweeps/20220809_093358.log.json) |\n| above @ Car | - | - | - | 55.44 | 70.72 | 69.79 | |\n| above @ Pedestrian | - | - | - | 27.6 | 39.5 | 35.1 | |\n| above @ Cyclist | - | - | - | 22.39 | 33.72 | 31.59 | |\n| [ResNet101+DCN\u003cbr\u003e(slow infer)\u003csup\u003e2\u003c/sup\u003e](./configs/dfm/multiview-dfm_r101_dcn_2x16_waymoD5-3d-3class_camsync.py) | - | - | - | 37.9 | 52.15 | 48.84| |\n| above @ Car | - | - | - | 56.24 | 73.15 | 72.07 | |\n| above @ Pedestrian | - | - | - | 34.6 | 49.01 | 42.25 | |\n| above @ Cyclist | - | - | - | 22.84 | 34.29 | 32.18 | |\n\n[2] \"slow infer\" refers to changing the nms setting to `nms_pre=4096` and `max_num=500` to increase the number of predictions such that the inference can get a better recall performance. It will slow down the inference procedure but significantly improves the final performance under the Waymo metric. **The same trick can also be applied to the 10-sweep config and other models.**\n\n## Installation\n\nIt requires the following OpenMMLab packages:\n\n- MMCV-full \u003e= v1.6.0 (recommended for the latest iou3d computation)\n- MMDetection \u003e= v2.24.0\n- MMSegmentation \u003e= v0.20.0\n\nAll the above versions are recommended except mmcv. Lower version of mmdet and mmseg may also work but are not tested temporarily.\n\nExample commands are shown as follows.\n\n```bash\nconda create --name dfm python=3.7 -y\nconda activate dfm\nconda install pytorch==1.9.0 torchvision==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forge\npip install mmcv-full==1.6.0\npip install mmdet==2.24.0\npip install mmsegmentation==0.20.0\ngit clone https://github.com/Tai-Wang/Depth-from-Motion.git\ncd Depth-from-Motion\npip install -v -e .\n```\n\n## License\n\nThis project is released under the [Apache 2.0 license](LICENSE).\n\n## Usage\n\n### Data preparation\n\nFirst prepare the raw data of KITTI and Waymo data following [MMDetection3D](https://github.com/open-mmlab/mmdetection3d).\n\nThen we prepare the data related to temporally consecutive frames.\n\nFor KITTI, we need to additionally download the pose and label files of the raw data [here](https://www.cse.msu.edu/computervision/Kinematic3D-raw_extra.zip) and the official mapping (between the raw data and the 3D detection benchmark split) [here](https://github.com/garrickbrazil/kinematic3d/tree/master/data/kitti_split1/devkit/mapping). Then we can run the data converter script:\n\n```\npython tools/create_data.py kitti --root-path ./data/kitti --out-dir ./data/kitti --extra-tag kitti\n```\n\nFor Waymo, we need to additionally generate the ground truth bin file for camera-only setting (only boxes covered by the perception range of cameras are considered). Besides, we recommend use the latest waymo dataset, which includes the camera synced annotations tailored to this setting.\n\n```\npython tools/create_waymo_gt_bin.py\n```\n\nThen please follow the mmdet3d [tutorial for Waymo dataset](https://mmdetection3d.readthedocs.io/en/latest/datasets/waymo_det.html) for the pre-processing steps.\n\nThe final data structure looks like below:\n\n```text\nmmdetection3d\n├── mmdet3d\n├── tools\n├── configs\n├── data\n│   ├── kitti\n│   │   ├── ImageSets\n│   │   ├── testing\n│   │   │   ├── calib\n│   │   │   ├── image_2\n│   │   │   ├── prev_2\n│   │   │   ├── velodyne\n│   │   ├── training\n│   │   │   ├── calib\n│   │   │   ├── image_2\n│   │   │   ├── prev_2\n│   │   │   ├── label_2\n│   │   │   ├── velodyne\n│   │   ├── raw\n│   │   │   ├── 2011_09_26_drive_0001_sync\n│   │   │   ├── xxxx (other raw data files)\n│   │   ├── devkit\n│   │   │   ├── mapping\n│   │   │   │   ├── train_mapping.txt\n│   │   │   │   ├── train_rand.txt\n│   ├── waymo\n│   │   ├── waymo_format\n│   │   │   ├── training\n│   │   │   ├── validation\n│   │   │   ├── testing\n│   │   │   ├── gt.bin\n│   │   │   ├── cam_gt.bin\n│   │   ├── kitti_format\n│   │   │   ├── ImageSets\n│   │   │   ├── training\n│   │   │   │   ├── calib\n│   │   │   │   ├── image_0\n│   │   │   │   ├── image_1\n│   │   │   │   ├── image_2\n│   │   │   │   ├── image_3\n│   │   │   │   ├── image_4\n│   │   │   │   ├── label_0\n│   │   │   │   ├── label_1\n│   │   │   │   ├── label_2\n│   │   │   │   ├── label_3\n│   │   │   │   ├── label_4\n│   │   │   │   ├── label_all\n│   │   │   │   ├── pose\n│   │   │   │   ├── velodyne\n│   │   │   ├── testing\n│   │   │   │   ├── (the same as training)\n│   │   │   ├── waymo_gt_database\n│   │   │   ├── waymo_infos_trainval.pkl\n│   │   │   ├── waymo_infos_train.pkl\n│   │   │   ├── waymo_infos_val.pkl\n│   │   │   ├── waymo_infos_test.pkl\n│   │   │   ├── waymo_dbinfos_train.pkl\n```\n\n### Pretrained models\n\nFor the KITTI implementation of DfM, we keep the LIGA-Stereo setting that has a LiDAR-based teacher for better supervision during training. Please download the teacher checkpoint (has been converted to mmdet3d-style) [here](https://download.openmmlab.com/mim-example/dfm/pretrained_models/mmdet3d-second-teacher.pth). It can make this network converge faster and bring ~1 AP performance gain. We will consider to replace it with other more direct supervision for simpler usage in the near future.\n\n### Demo\n\nTo test DfM on image data, simply run:\n\n```shell\npython demo/mono_det_demo.py ${IMAGE_FILE} ${ANNOTATION_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device ${GPU_ID}] [--out-dir ${OUT_DIR}] [--show]\n```\n\nwhere the `ANNOTATION_FILE` should provide the 3D to 2D projection matrix (camera intrinsic matrix). The visualization results including an image and its predicted 3D bounding boxes projected on the image will be saved in `${OUT_DIR}/IMAGE_NAME`.\n\nExample on KITTI data using [DfM](https://github.com/Tai-Wang/Depth-from-Motion/blob/main/configs/dfm) model:\n\n```shell\npython demo/mono_det_demo.py demo/data/kitti/000008.png demo/data/kitti/kitti_000008_infos.pkl configs/dfm/dfm_r34_1x8_kitti-3d-3class.py checkpoints/dfm.pth\n```\n\n### Training and testing\n\nFor training and testing, you can follow the standard command in mmdet to train and test the model\n\n```bash\n# train DfM on KITTI\n./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR}\n```\n\nFor simple inference and evaluation, you can use the command below:\n\n```bash\n# evaluate DfM on KITTI and MV-FCOS3D++ on Waymo\n./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${CKPT_PATH} --eval mAP\n```\n\n### FAQ\n\n- How to use the Waymo LET-AP metric to evaluate the performance of MV-FCOS3D++?\n\nYou can follow the [instruction](https://github.com/waymo-research/waymo-open-dataset/blob/master/docs/quick_start.md#metrics-computation) of compiling the original Waymo detection metrics to compile this [file](https://github.com/waymo-research/waymo-open-dataset/blob/master/waymo_open_dataset/metrics/tools/compute_detection_let_metrics_main.cc) and get the `compute_detection_let_metrics_main` file for LET-AP metric evaluation. Besides, you can refer to the [official tutorial](https://github.com/waymo-research/waymo-open-dataset/blob/master/tutorial/tutorial_camera_only.ipynb) of camera-only 3D detection for more details about its python example code.\n\n## Acknowledgement\n\nThis codebase is based on [MMDet3D](https://github.com/open-mmlab/mmdetection3d) and it benefits a lot from [LIGA-Stereo](https://github.com/xy-guo/LIGA-Stereo).\n\n## Citation\n\n```bibtex\n@inproceedings{wang2022dfm,\n    title={Monocular 3D Object Detection with Depth from Motion},\n    author={Wang, Tai and Pang, Jiangmiao and Lin, Dahua},\n    year={2022},\n    booktitle={European Conference on Computer Vision (ECCV)},\n}\n@article{wang2022mvfcos3d++,\n  title={{MV-FCOS3D++: Multi-View} Camera-Only 4D Object Detection with Pretrained Monocular Backbones},\n  author={Wang, Tai and Lian, Qing and Zhu, Chenming and Zhu, Xinge and Zhang, Wenwei},\n  journal={arXiv preprint},\n  year={2022}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftai-wang%2Fdepth-from-motion","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftai-wang%2Fdepth-from-motion","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftai-wang%2Fdepth-from-motion/lists"}