{"id":13442644,"url":"https://github.com/Eaphan/UPIDet","last_synced_at":"2025-03-20T14:31:46.379Z","repository":{"id":152541964,"uuid":"591198202","full_name":"Eaphan/UPIDet","owner":"Eaphan","description":"Unleash the Potential of Image Branch for Cross-modal 3D Object Detection [NeurIPS2023]","archived":false,"fork":false,"pushed_at":"2024-06-04T07:02:16.000Z","size":3057,"stargazers_count":55,"open_issues_count":2,"forks_count":7,"subscribers_count":6,"default_branch":"main","last_synced_at":"2024-10-28T05:59:30.539Z","etag":null,"topics":["3d-object-detection","cross-modal","multi-modal"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Eaphan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-01-20T06:37:29.000Z","updated_at":"2024-10-14T02:23:39.000Z","dependencies_parsed_at":null,"dependency_job_id":"e0727e15-fb66-4fdb-803c-6b5934e14ce5","html_url":"https://github.com/Eaphan/UPIDet","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Eaphan%2FUPIDet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Eaphan%2FUPIDet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Eaphan%2FUPIDet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Eaphan%2FUPIDet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Eaphan","download_url":"https://codeload.github.com/Eaphan/UPIDet/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244630259,"owners_count":20484344,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-object-detection","cross-modal","multi-modal"],"created_at":"2024-07-31T03:01:48.523Z","updated_at":"2025-03-20T14:31:45.757Z","avatar_url":"https://github.com/Eaphan.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"\u003c!-- \u003cimg src=\"docs/open_mmlab.png\" align=\"right\" width=\"30%\"\u003e --\u003e\n\n# Unleash the Potential of Image Branch for Cross-modal 3D Object Detection\nThis is the official implementation of \"Unleash the Potential of Image Branch for Cross-modal 3D Object Detection\". This repository is based on [`[OpenPCDet]`](https://github.com/open-mmlab/OpenPCDet).\n\n\u003c!-- \u003cimg src=\"docs/pipeline.png\"\u003e --\u003e\n**Abstract**: To achieve reliable and precise scene understanding, autonomous vehicles typically incorporate multiple sensing modalities to capitalize on their complementary attributes. However, existing cross-modal 3D detectors do not fully utilize the image domain information to address the bottleneck issues of the LiDAR-based detectors. This paper presents a new cross-modal 3D object detector, namely UPIDet, which aims to unleash the potential of the image branch from two aspects. First, UPIDet introduces a new 2D auxiliary task called normalized local coordinate map estimation. This approach enables the learning of local spatial-aware features from the image modality to supplement sparse point clouds. Second, we discover that the representational capability of the point cloud backbone can be enhanced through the gradients backpropagated from the training objectives of the image branch, utilizing a succinct and effective point-to-pixel module. Extensive experiments and ablation studies validate the effectiveness of our method. Notably, we achieved the top rank in the highly competitive cyclist class of the KITTI benchmark at the time of submission. \n\n\n\n## Overview\n- [Installation](#Installation)\n- [Pretrained Models](#pretrained-models)\n- [Getting Started](#getting-started)\n- [License](#license)\n- [Acknowledgement](#acknowledgement)\n\u003c!-- - [Contribution](#contribution) --\u003e\n\u003c!-- - [Citation](#citation) --\u003e\n\n## Installation\n\nPlease refer to [INSTALL.md](docs/INSTALL.md) for the installation instruction.\n\n## Pretrained-models\nHere we present the 3D detection performance of moderate difficulty on the *val* set of KITTI dataset.\n\n* The pre-trained model is trained with 4 NVIDIA 3090Ti GPUs and are available for download.\n* The training time is measured with 4 NVIDIA 3090Ti GPUs and PyTorch 1.8.\n* We could not provide the above pretrained models due to Waymo Dataset License Agreement, but you could easily achieve similar performance by training with the default configs.\n\n|                                             | training time | Car@R40 | Pedestrian@R40 | Cyclist@R40   | download |\n|---------------------------------------------|:----------:|:-------:|:-------:|:-------:|:---------:|\n| [UPIDet](tools/cfgs/kitti_models/upidet.yaml) |~12 hours| 86.10 | 68.67 | 76.70 | [model-287M](https://drive.google.com/file/d/1clUCPAixSAAad5aSH08zJr32-8o--P0u/view?usp=sharing) |\n\n## Getting Started\n\n### Prepare KITTI Dataset\n* Please download the official [KITTI 3D object detection](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d) dataset and organize the downloaded files as follows (the road planes could be downloaded from [[road plane]](https://drive.google.com/file/d/1d5mq0RXRnvHPVeKx6Q612z0YRO1t2wAp/view?usp=sharing), which are optional for data augmentation in the training):\n\u003c!-- * If you would like to train [CaDDN](../tools/cfgs/kitti_models/CaDDN.yaml), download the precomputed [depth maps](https://drive.google.com/file/d/1qFZux7KC_gJ0UHEg-qGJKqteE9Ivojin/view?usp=sharing) for the KITTI training set --\u003e\n\u003c!-- * NOTE: if you already have the data infos from `pcdet v0.1`, you can choose to use the old infos and set the DATABASE_WITH_FAKELIDAR option in tools/cfgs/dataset_configs/kitti_dataset.yaml as True. The second choice is that you can create the infos and gt database again and leave the config unchanged. --\u003e\n\n```\nOpenPCDet\n├── data\n│   ├── kitti\n│   │   │── ImageSets\n│   │   │── training\n│   │   │   ├──calib \u0026 velodyne \u0026 label_2 \u0026 image_2 \u0026 planes\n│   │   │── testing\n│   │   │   ├──calib \u0026 velodyne \u0026 image_2\n├── pcdet\n├── tools\n```\n\n* Generate the data infos by running the following command: \n```python \npython -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml\n```\nEspecially, for the 2D auxiliary task of semantic segmentation, we used the instance segmentation annotations as provided in [KINS dataset](https://github.com/qqlu/Amodal-Instance-Segmentation-through-KINS-Dataset). We incorporate mask of instance segmentation in kitti_infos_train/val.pkl and kitti_dbinfos_train.pkl. Please download them in this [link](https://drive.google.com/drive/folders/1cyFt9MqHnKK620IKbRuTN6SiEvJP6r8d?usp=sharing) and replace the original files.\n\n### Prepare Waymo Open Dataset\n* Please download the official [Waymo Open Dataset](https://waymo.com/open/download/)(v1.2.0), \nincluding the training data `training_0000.tar~training_0031.tar` and the validation \ndata `validation_0000.tar~validation_0007.tar`.\n* Unzip all the above `xxxx.tar` files to the directory of `data/waymo/raw_data` as follows (You could get 798 *train* tfrecord and 202 *val* tfrecord ):  \n```\nGLENet\n├── data\n│   ├── waymo\n│   │   │── ImageSets\n│   │   │── kitti_format\n│   │   │   │── calib\n│   │   │   │── image_0\n│   │   │   │── image_1\n│   │   │   │── image_2\n│   │   │   │── image_3\n|   |   |   |── image_4\n|   |   |   |   │── segment-xxxxxxxx_with_camera_labels\n|   |   |   |   |   │── 0000.jpg  0001.jpg  0002.jpg ...\n|   |   |   |   |── ...\n│   │   │── raw_data\n│   │   │   │── segment-xxxxxxxx.tfrecord\n|   |   |   |── ...\n|   |   |── waymo_processed_data_v0_5_0\n│   │   │   │── segment-xxxxxxxx/\n|   |   |   |── ...\n│   │   │── waymo_processed_data_v0_5_0_gt_database_train_sampled_1/\n│   │   │── waymo_processed_data_v0_5_0_waymo_dbinfos_train_sampled_1.pkl\n│   │   │── waymo_processed_data_v0_5_0_gt_database_train_sampled_1_global.npy (optional)\n│   │   │── waymo_processed_data_v0_5_0_infos_train.pkl (optional)\n│   │   │── waymo_processed_data_v0_5_0_infos_val.pkl (optional)\n\n```\n* You should use mmdet3d to generate RGB images for waymo dataset. Then you can link the image files to  the kitti_format directory using modified script tools/map_mmdet_waymo_image.py.\n\n* Install the official `waymo-open-dataset` by running the following command: \n```shell script\npip3 install --upgrade pip\n# tf 2.0.0\npip3 install waymo-open-dataset-tf-2-5-0 --user\n```\n\n* Extract point cloud data from tfrecord and generate data infos by running the following command (it takes several hours, \nand you could refer to `data/waymo/waymo_processed_data_v0_5_0` to see how many records that have been processed): \n```python \npython -m pcdet.datasets.waymo.waymo_dataset --func create_waymo_infos \\\n    --cfg_file tools/cfgs/dataset_configs/waymo_dataset.yaml\n```\n\n### Training\n```\ncd tools;\npython train.py --cfg_file ./cfgs/kitti_models/upidet.yaml\n```\nMulti gpu training, assuming you have 4 gpus:\n\n```\nCUDA_VISIBLE_DEVICES=0,1,2,3 bash scripts/dist_train.sh 4 --cfg_file ./cfgs/kitti_models/upidet.yaml\n\n```\n\n**Note**: For the waymo dataset, you should checkout branch \"waymo_lidar\" to train the single-modal detector, then checkout branch \"waymo\" to train the cross-modal detector based on the weights of obtained single-modal detector.\n\n### Testing\n```\ncd tools/\n```\nSingle gpu testing for all saved checkpoints, assuming you have 4 gpus:\n```\npython test.py --eval_all --cfg_file ./cfgs/kitti_models/upidet.yaml\n```\n\nMulti gpu testing for all saved checkpoints, assuming you have 4 gpus:\n```\nCUDA_VISIBLE_DEVICES=0,1,2,3 bash scripts/dist_test.sh 4 --eval_all --cfg_file ./cfgs/kitti_models/upidet.yaml\n```\n\nMulti gpu testing a specific checkpoint, assuming you have 4 gpus and checkpoint_39 is your best checkpoint :\n```\nCUDA_VISIBLE_DEVICES=0,1,2,3 bash scripts/dist_test.sh 4  --cfg_file ./cfgs/kitti_models/upidet.yaml --ckpt ../output/upidet/default/ckpt/checkpoint_epoch_80.pth\n```\n\n\u003c!-- ## Pretrained Models --\u003e\n\n## License\n\n`UPIDet` is released under the [Apache 2.0 license](LICENSE).\n\n## Acknowledgement\nWe sincerely appreciate the following open-source projects for providing valuable and high-quality codes:\n- [`OpenPCDet`](https://github.com/open-mmlab/OpenPCDet)\n- [mmdetection3d](https://github.com/open-mmlab/mmdetection3d)\n- [Focalsconv](https://github.com/dvlab-research/FocalsConv)\n- [CamLiFlow](https://github.com/MCG-NJU/CamLiFlow)\n- [mmdetection](https://github.com/open-mmlab/mmdetection)\n- [PDV](https://github.com/TRAILab/PDV)\n\n## Citation\nIf you find this work useful in your research, please consider cite:\n```\n@inproceedings{zhang2024unleash,\n    title={Unleash the potential of image branch for cross-modal 3d object detection},\n    author={Zhang, Yifan and Zhang, Qijian and Hou, Junhui and Yuan, Yixuan and Xing, Guoliang},\n    booktitle={Advances in Neural Information Processing Systems},\n    volume={36},\n    pages={51562--51583},\n    year={2023}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FEaphan%2FUPIDet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FEaphan%2FUPIDet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FEaphan%2FUPIDet/lists"}