{"id":29219689,"url":"https://github.com/dvlab-research/uvtr","last_synced_at":"2025-07-03T02:06:39.616Z","repository":{"id":43916131,"uuid":"498739962","full_name":"dvlab-research/UVTR","owner":"dvlab-research","description":"Unifying Voxel-based Representation with Transformer for 3D Object Detection (NeurIPS 2022)","archived":false,"fork":false,"pushed_at":"2022-10-19T12:33:51.000Z","size":636,"stargazers_count":233,"open_issues_count":9,"forks_count":17,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-03-20T16:40:21.697Z","etag":null,"topics":["3d-detection","multi-modality","pytorch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dvlab-research.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-06-01T13:06:28.000Z","updated_at":"2025-03-13T13:33:19.000Z","dependencies_parsed_at":"2022-08-12T10:51:53.757Z","dependency_job_id":null,"html_url":"https://github.com/dvlab-research/UVTR","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dvlab-research/UVTR","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dvlab-research%2FUVTR","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dvlab-research%2FUVTR/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dvlab-research%2FUVTR/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dvlab-research%2FUVTR/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dvlab-research","download_url":"https://codeload.github.com/dvlab-research/UVTR/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dvlab-research%2FUVTR/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263245318,"owners_count":23436514,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-detection","multi-modality","pytorch"],"created_at":"2025-07-03T02:06:38.867Z","updated_at":"2025-07-03T02:06:39.597Z","avatar_url":"https://github.com/dvlab-research.png","language":"Python","readme":"\n# UVTR\n[![arXiv](https://img.shields.io/badge/arXiv-Paper-\u003cCOLOR\u003e.svg)](https://arxiv.org/abs/2206.00630)\n![visitors](https://visitor-badge.glitch.me/badge?page_id=dvlab-research/UVTR)\n\n**Unifying Voxel-based Representation with Transformer for 3D Object Detection**\n\nYanwei Li, Yilun Chen, Xiaojuan Qi, Zeming Li, Jian Sun, Jiaya Jia\n\n[[`arXiv`](https://arxiv.org/abs/2206.00630)] [[`BibTeX`](#CitingUVTR)]\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"projects/docs/uvtr.png\"/\u003e\n\u003c/div\u003e\u003cbr/\u003e\n\nThis project provides an implementation for the NeurIPS 2022 paper \"[Unifying Voxel-based Representation with Transformer for 3D Object Detection](https://arxiv.org/abs/2206.00630)\" based on [mmDetection3D](https://github.com/open-mmlab/mmdetection3d). UVTR aims to unify multi-modality representations in the voxel space for accurate and robust single- or cross-modality 3D detection.\n\n## Preparation\nThis project is based on [mmDetection3D](https://github.com/open-mmlab/mmdetection3d), which can be constructed as follows.\n* Install PyTorch [v1.7.1](https://pytorch.org/get-started/previous-versions/) and mmDetection3D [v0.17.3](https://github.com/open-mmlab/mmdetection3d/tree/v0.17.3) following [the instructions](https://github.com/open-mmlab/mmdetection3d/blob/v0.17.3/docs/getting_started.md).\n* Copy our project and related files to installed mmDetection3D:\n```bash\ncp -r projects mmdetection3d/\ncp -r extra_tools mmdetection3d/\n```\n* Prepare the nuScenes dataset following [the structure](https://github.com/open-mmlab/mmdetection3d/blob/v0.17.3/docs/data_preparation.md).\n* Generate the unified data info and sampling database for nuScenes dataset:\n```bash\npython3 extra_tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes_unified\n```\n\n## Training\nYou can train the model following [the instructions](https://github.com/open-mmlab/mmdetection3d/blob/v0.17.3/docs/datasets/nuscenes_det.md).\nYou can find the pretrained models [here](https://drive.google.com/drive/folders/1KvG7tBYhmFQCiF_pAZc3Aa3H_D__-Jqh?usp=sharing) if you want to train the model from scratch.\nFor example, to launch UVTR training on multi GPUs,\none should execute:\n```bash\ncd /path/to/mmdetection3d\nbash extra_tools/dist_train.sh ${CFG_FILE} ${NUM_GPUS}\n```\nor train with a single GPU:\n```bash\npython3 extra_tools/train.py ${CFG_FILE}\n```\n\n## Evaluation\nYou can evaluate the model following [the instructions](./docs/GETTING_STARTED.md).\nFor example, to launch UVTR evaluation with a pretrained checkpoint on multi GPUs,\none should execute:\n```bash\nbash extra_tools/dist_test.sh ${CFG_FILE} ${CKPT} ${NUM_GPUS} --eval=bbox\n```\nor evaluate with a single GPU:\n```bash\npython3 extra_tools/test.py ${CFG_FILE} ${CKPT} --eval=bbox\n```\n## nuScenes 3D Object Detection Results\nWe provide results on nuScenes *val* set with pretrained models.\n|                                             | NDS(%) | mAP(%) | mATE\u0026darr; | mASE\u0026darr; | mAOE\u0026darr; | mAVE\u0026darr; | mAAE\u0026darr; | download | \n|---------------------------------------------|:-------:|:-------:|:-------:|:---------:|:---------:|:---------:|:---------:|:---------:|\n| **Camera-based** |\n| [UVTR-C-R50-H5](projects/configs/uvtr/camera_based/camera/uvtr_c_r50_h5.py) | 40.1 | 31.3 | 0.810 | 0.281 | 0.486 | 0.793 | 0.187 | [GoogleDrive](https://drive.google.com/file/d/1gomNuo5--I5bdDiuiJxnhUbSw4GqE4VO/view?usp=sharing) |\n| [UVTR-C-R50-H11](projects/configs/uvtr/camera_based/camera/uvtr_c_r50_h11.py) | 41.8 | 33.3 | 0.795 | 0.276 | 0.452 | 0.761 | 0.196 | [GoogleDrive](https://drive.google.com/file/d/1ZCwzpsByd5ZulgHltGQCIzoOZmI8FC12/view?usp=sharing) |\n| [UVTR-C-R101](projects/configs/uvtr/camera_based/camera/uvtr_c_r101_h11.py) | 44.1 | 36.1 | 0.761 | 0.271 | 0.409 | 0.756 | 0.203 | [GoogleDrive](https://drive.google.com/file/d/1Mc3ZDGDPqc5uqZvrJswTn4TQdsEwtnAP/view?usp=sharing) |\n| [UVTR-CS-R50](projects/configs/uvtr/camera_based/camera_sweep/uvtr_cs5_r50_h11.py) | 47.2 | 36.2 | 0.756 | 0.276 | 0.399 | 0.467 | 0.189 | [GoogleDrive](https://drive.google.com/file/d/1BHsUzTuColqtHEIXczhgC7SsWi_0mA69/view?usp=sharing) |\n| [UVTR-CS-R101](projects/configs/uvtr/camera_based/camera_sweep/uvtr_cs4_r101_h11.py) | 48.3 | 37.9 | 0.739 | 0.267 | 0.350 | 0.510 | 0.200 | [GoogleDrive](https://drive.google.com/file/d/1JcNbnIBfp5us2CaEktr1-4t5jWOFLldA/view?usp=sharing) |\n| [UVTR-L2C-R101](projects/configs/uvtr/camera_based/knowledge_distill/uvtr_l2c_r101_h11.py) | 45.0 | 37.2 | 0.735 | 0.269 | 0.397 | 0.761 | 0.193 | [GoogleDrive](https://drive.google.com/file/d/1Knc9EHeOjXtAkRzRAN0jPUFSiqK2t1Ac/view?usp=sharing) |\n| [UVTR-L2CS3-R101](projects/configs/uvtr/camera_based/knowledge_distill/uvtr_l2cs3_r101_h11.py) | 48.8 | 39.2 | 0.720 | 0.268 | 0.354 | 0.534 | 0.206 | [GoogleDrive](https://drive.google.com/file/d/1Q5f-fESCKje9q98mj7v6pC9r_yYUw1-4/view?usp=sharing) |\n| **LiDAR-based** |\n| [UVTR-L-V0075](projects/configs/uvtr/lidar_based/uvtr_l_v0075_h5.py) | 67.6 | 60.8 | 0.335 | 0.257 | 0.303 | 0.206 | 0.183 | [GoogleDrive](https://drive.google.com/file/d/11wepYo4alFifpEEOtnmRJg6-plLE1QD8/view?usp=sharing) |\n| **Multi-modality** |\n| [UVTR-M-V0075-R101](projects/configs/uvtr/lidar_based/uvtr_l_v01_h5.py) | 70.2 | 65.4 | 0.333 | 0.258 | 0.270 | 0.216 | 0.176 | [GoogleDrive](https://drive.google.com/file/d/1dlxXIS4Cuv6ePxuxMRIaxpG_b1Pk8sqO/view?usp=sharing) |\n## Acknowledgement\nWe would like to thank the authors of [mmDetection3D](https://github.com/open-mmlab/mmdetection3d) and [DETR3D](https://github.com/WangYueFt/detr3d) for their open-source release.\n\n## License\n`UVTR` is released under the [Apache 2.0 license](LICENSE).\n\n## \u003ca name=\"CitingUVTR\"\u003e\u003c/a\u003eCiting UVTR\n\nConsider cite UVTR in your publications if it helps your research.\n\n```\n@inproceedings{li2022uvtr,\n  title={Unifying Voxel-based Representation with Transformer for 3D Object Detection},\n  author={Li, Yanwei and Chen, Yilun and Qi, Xiaojuan and Li, Zeming and Sun, Jian and Jia, Jiaya},\n  booktitle={Advances in Neural Information Processing Systems},\n  year={2022}\n}\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdvlab-research%2Fuvtr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdvlab-research%2Fuvtr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdvlab-research%2Fuvtr/lists"}