{"id":13442495,"url":"https://github.com/hailanyi/VirConv","last_synced_at":"2025-03-20T14:31:14.095Z","repository":{"id":119034144,"uuid":"609394309","full_name":"hailanyi/VirConv","owner":"hailanyi","description":"Virtual Sparse Convolution for Multimodal 3D Object Detection","archived":false,"fork":false,"pushed_at":"2024-03-13T12:56:11.000Z","size":589,"stargazers_count":275,"open_issues_count":22,"forks_count":39,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-10-28T05:59:17.404Z","etag":null,"topics":["3d-object-detection","kitti","multimodal","point-clouds"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2303.02314","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hailanyi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-04T03:29:29.000Z","updated_at":"2024-10-22T20:26:23.000Z","dependencies_parsed_at":"2024-10-28T04:00:07.908Z","dependency_job_id":"f3f83570-ee7c-4925-8ca4-616d21619bfe","html_url":"https://github.com/hailanyi/VirConv","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hailanyi%2FVirConv","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hailanyi%2FVirConv/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hailanyi%2FVirConv/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hailanyi%2FVirConv/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hailanyi","download_url":"https://codeload.github.com/hailanyi/VirConv/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244630112,"owners_count":20484316,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-object-detection","kitti","multimodal","point-clouds"],"created_at":"2024-07-31T03:01:46.417Z","updated_at":"2025-03-20T14:31:13.655Z","avatar_url":"https://github.com/hailanyi.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"\n# Virtual Sparse Convolution for Multimodal 3D Object Detection\nThis is a official code release of [VirConv](https://arxiv.org/abs/2303.02314) (Virtual Sparse Convolution for 3D Object Detection). \nThis code is mainly based on [OpenPCDet](https://github.com/open-mmlab/OpenPCDet), some codes are from [TED](https://github.com/hailanyi/TED), \n[CasA](https://github.com/hailanyi/CasA), [PENet](https://github.com/JUGGHM/PENet_ICRA2021) and [SFD](https://github.com/LittlePey/SFD).\n\n## Detection Framework\n* VirConv-L: A light-weight multimodal 3D detector based on Virtual Sparse Convolution.\n* VirConv-T: A improved multimodal 3D detector based on Virtual Sparse Convolution and transformed refinement scheme.\n* VirConv-S: A semi-supervised VirConv-T based on pseudo labels and fine-tuning.\n\nThe detection frameworks are shown below.\n\n![](./tools/image/framework.png)\n\n## Model Zoo\nWe release three models: VirConv-L, VirConv-T and VirConv-S.\n\n* The VirConv-L and VirConv-T are trained with train split (3712 samples) of KITTI dataset.\n\n* The VirConv-S is trained with train split (3712 samples) and unlabeled odometry split (semi split 10888 sample) of KITTI dataset.\n\n* The results are the 3D AP(R40) of Car on the *val* set of KITTI dataset.\n\n**Important notes:**\n* **The input voxel discard has been changed to [input point discard](https://github.com/hailanyi/VirConv/blob/master/pcdet/datasets/dataset.py) for faster voxelization.**\n* **The convergence of VirConv-T is somewhat unstable ( AP~\\[89.5,90.3\\]), if you cannot achieve similar AP, please try multiple times. We recommend VirConv-S, which can achieve 90.5+ AP easily.**\n* **These models are not suitable to directly report results on KITTI *test* set, please train the models on all or 80% training data and choose a good score threshold to achieve a desirable performance.**\n\nTrain multiple times on 8xV100 and choose the best:\n\nEnvironment|              Detector                   | GPU (train)| Easy | Mod. | Hard  | download | \n|------|---------------------------------------------|:----------:|:-------:|:-------:|:-------:|:---------:|\n|Spconv1.2 | [VirConv-L](tools/cfgs/models/kitti/VirConv-L.yaml)|~7 GB | 93.08 |88.51 |86.69 | [google](https://drive.google.com/file/d/1UwH4ArmKCAPlFV6XjRmVrqClgrvc1M1q/view?usp=sharing) / [baidu(05u2)](https://pan.baidu.com/s/1Q-hvk-u6bA72EFhcc5IIwA) / 51M| \n|Spconv1.2 | [VirConv-T](tools/cfgs/models/kitti/VirConv-T.yaml)|~13 GB| 94.58 |89.87 |87.78 | [google](https://drive.google.com/file/d/1Y3Q0x0pDran0Bqqg1CulL0geYwIkDQvu/view?usp=sharing) / [baidu(or81)](https://pan.baidu.com/s/1CkMi5YYKjBfi4sgnx20fIw) / 55M|\n|Spconv1.2 | [VirConv-S](tools/cfgs/models/kitti/VirConv-S.yaml)|~13 GB| 95.67 |91.09 |89.09 | [google](https://drive.google.com/file/d/1_IUkMzGlPdZTiCyiBn1GaCMYKVbi9Oh2/view?usp=sharing) / [baidu(ak74)](https://pan.baidu.com/s/1PZURrn97OoFQyBGb0hJX3A) / 62M|\n\nTrain multiple times on 8xV100 and choose the best:\n\nEnvironment|              Detector                   |GPU (train) | Easy | Mod. | Hard  | download | \n|------|---------------------------------------------|:----------:|:-------:|:-------:|:-------:|:---------:|\n|Spconv2.1 | [VirConv-L](tools/cfgs/models/kitti/VirConv-L.yaml)|~7 GB | 93.18 |88.23 |85.48 | [google](https://drive.google.com/file/d/1MRRgMX8l5FFaFZb81YjqfcjCFgYYDMak/view?usp=sharing) / [baidu(k2dp)](https://pan.baidu.com/s/1fOSbDup5x2pootf3dtPb8Q) / 51M| \n|Spconv2.1 | [VirConv-T](tools/cfgs/models/kitti/VirConv-T.yaml)|~13 GB| 94.91 |90.36 |88.10 | [google](https://drive.google.com/file/d/123ndzJIwo01DvQIBzy_GnussuuXkhwji/view?usp=sharing) / [baidu(a4r4)](https://pan.baidu.com/s/1ueAUwj57DIEgF7NBKtgCmA) / 56M|\n|Spconv2.1 | [VirConv-S](tools/cfgs/models/kitti/VirConv-S.yaml)|~13 GB| 95.76 |90.91 |88.61 | [google](https://drive.google.com/file/d/1-ztIQdhAi2MnTI6pBKxUeqWZnwlDxDH3/view?usp=sharing) / [baidu(j3mi)](https://pan.baidu.com/s/1iJUjR7IehRBk1WSacvd2Yg) / 56M|\n\n\n## Getting Started\n```\nconda create -n spconv2 python=3.9\nconda activate spconv2\npip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html\npip install numpy==1.19.5 protobuf==3.19.4 scikit-image==0.19.2 waymo-open-dataset-tf-2-5-0 nuscenes-devkit==1.0.5 spconv-cu111 numba scipy pyyaml easydict fire tqdm shapely matplotlib opencv-python addict pyquaternion awscli open3d pandas future pybind11 tensorboardX tensorboard Cython prefetch-generator\n```\n### Dependency\nOur released implementation is tested on.\n+ Ubuntu 18.04\n+ Python 3.6.9 \n+ PyTorch 1.8.1\n+ Numba 0.53.1\n+ Spconv 1.2.1\n+ NVIDIA CUDA 11.1\n+ 8x Tesla V100 GPUs\n\n\nWe also tested on.\n+ Ubuntu 18.04\n+ Python 3.9.13 \n+ PyTorch 1.8.1\n+ Numba 0.53.1\n+ Spconv 2.1.22 # pip install spconv-cu111\n+ NVIDIA CUDA 11.1 \n+ 8x Tesla V100 GPUs\n\nWe also tested on.\n+ Ubuntu 18.04\n+ Python 3.9.13 \n+ PyTorch 1.8.1\n+ Numba 0.53.1\n+ Spconv 2.1.22 # pip install spconv-cu111\n+ NVIDIA CUDA 11.1 \n+ 2x 3090 GPUs\n\n\n### Prepare dataset\n\nYou must creat additional ```semi``` dataset and ```velodyne_depth``` dataset to run our multimodal and semi-supervised detectors.\n\n* You can download all the preprocessed data from\n[baidu (japc)](https://pan.baidu.com/s/1idoCSVndT2mImcGN4lFSNQ) \\[74GB\\],\nor partial data (not include ```semi``` due to disk space limit )\nfrom [google (13GB)](https://drive.google.com/file/d/1xki9v_zsQMM8vMVNo0ENi1Mh_GNMjHUg/view?usp=sharing).\n\n\n* Or you can generate the dataset by yourself as follows:\n\nPlease download the official [KITTI 3D object detection](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d) \ndataset, [KITTI odometry dataset](https://www.cvlibs.net/datasets/kitti/eval_odometry.php) and organize the downloaded files as follows (the road planes could be downloaded \nfrom [[road plane]](https://drive.google.com/file/d/1d5mq0RXRnvHPVeKx6Q612z0YRO1t2wAp/view?usp=sharing), \nwhich are optional for data augmentation in the training):\n\n```\nVirConv\n├── data\n│   ├── odometry\n│   │   │── 00\n│   │   │── 01\n│   │   │   │── image_2\n│   │   │   │── velodyne\n│   │   │   │── calib.txt\n│   │   │── ...\n│   │   │── 21\n│   ├── kitti\n│   │   │── ImageSets\n│   │   │── training\n│   │   │   ├──calib \u0026 velodyne \u0026 label_2 \u0026 image_2 \u0026 (optional: planes)\n│   │   │── testing\n│   │   │   ├──calib \u0026 velodyne \u0026 image_2\n├── pcdet\n├── tools\n```\n\n(1) Creat ```semi``` dataset from odometry dataset.\n```\ncd tools\npython3 creat_semi_dataset.py ../data/odometry ../data/kitti/semi\n```\n(2) Download the pseudo labels generated by VirConv-T from [here](https://drive.google.com/file/d/1wyMgqUjhdXUEDiY8NYO_doFMfKtAn0X8/view?usp=sharing) (fuse detections from last 10 checkpoints by WBF and filter low quality detections by a 0.9 score threshold) and put it into ```kitti/semi```.\n\n(3) Download the PENet depth completion model from [google (500M)](https://drive.google.com/file/d/1RDdKlKJcas-G5OA49x8OoqcUDiYYZgeM/view?usp=sharing) or [baidu (gp68)](https://pan.baidu.com/s/1tBVuqvBZ0ns79ARmNpgwWw), and put it into ```tools/PENet```.\n\n(4) Then run the following code to generate RGB virtual points.\n\n```\ncd tools/PENet\npython3 main.py --detpath ../../data/kitti/training\npython3 main.py --detpath ../../data/kitti/testing\npython3 main.py --detpath ../../data/kitti/semi\n```\n(5) After that, run following command to creat dataset infos:\n```\npython3 -m pcdet.datasets.kitti.kitti_dataset_mm create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml\npython3 -m pcdet.datasets.kitti.kitti_datasetsemi create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml\n```\n\nAnyway, the data structure should be: \n```\nVirConv\n├── data\n│   ├── kitti\n│   │   │── ImageSets\n│   │   │── training\n│   │   │   ├──calib \u0026 velodyne \u0026 label_2 \u0026 image_2 \u0026 (optional: planes) \u0026 velodyne_depth\n│   │   │── testing\n│   │   │   ├──calib \u0026 velodyne \u0026 image_2 \u0026 velodyne_depth\n│   │   │── semi (optional)\n│   │   │   ├──calib \u0026 velodyne \u0026 label_2(pseudo label) \u0026 image_2 \u0026 velodyne_depth\n│   │   │── gt_database_mm\n│   │   │── gt_databasesemi\n│   │   │── kitti_dbinfos_trainsemi.pkl\n│   │   │── kitti_dbinfos_train_mm.pkl\n│   │   │── kitti_infos_test.pkl\n│   │   │── kitti_infos_train.pkl\n│   │   │── kitti_infos_trainsemi.pkl\n│   │   │── kitti_infos_trainval.pkl\n│   │   │── kitti_infos_val.pkl\n├── pcdet\n├── tools\n```\n\n### Setup\n\n```\ncd VirConv\npython setup.py develop\n```\n\n### Training.\n\n**For training the VirConv-L and VirConv-T:**\n\nSingle GPU train:\n```\ncd tools\npython3 train.py --cfg_file ${CONFIG_FILE}\n```\nFor example, if you train the VirConv-L model:\n```\ncd tools\npython3 train.py --cfg_file cfgs/models/kitti/VirConv-L.yaml\n```\n\nMultiple GPU train: \n\nYou can modify the gpu number in the dist_train.sh and run\n```\ncd tools\nsh dist_train.sh\n```\nThe log infos are saved into log.txt\nYou can run ```cat log.txt``` to view the training process.\n\n**For training the VirConv-S:**\n\nYou should firstly train a VirConv-T:\n```\ncd tools\npython3 train.py --cfg_file cfgs/models/kitti/VirConv-T.yaml\n```\nThen train the VirConv-S:\n```\ncd tools\npython3 train.py --cfg_file cfgs/models/kitti/VirConv-S.yaml --pretrained_model ../output/models/kitti/VirConv-T/default/ckpt/checkpoint_epoch_40.pth\n```\n\n### Evaluation.\n\n```\ncd tools\npython3 test.py --cfg_file ${CONFIG_FILE} --batch_size ${BATCH_SIZE} --ckpt ${CKPT}\n```\n\nFor example, if you test the VirConv-S model:\n\n```\ncd tools\npython3 test.py --cfg_file cfgs/models/kitti/VirConv-S.yaml --ckpt VirConv-S.pth\n```\n\nMultiple GPU test: you should modify the gpu number in the dist_test.sh and run\n```\nsh dist_test.sh \n```\nThe log infos are saved into log-test.txt\nYou can run ```cat log-test.txt``` to view the test results.\n## License\n\nThis code is released under the [Apache 2.0 license](LICENSE).\n\n## Acknowledgement\n[TED](https://github.com/hailanyi/TED)\n\n[CasA](https://github.com/hailanyi/CasA)\n\n[OpenPCDet](https://github.com/open-mmlab/OpenPCDet)\n\n[PENet](https://github.com/JUGGHM/PENet_ICRA2021)\n\n[SFD](https://github.com/LittlePey/SFD)\n\n## Citation\n\n```\n@inproceedings{VirConv,\n    title={Virtual Sparse Convolution for Multimodal 3D Object Detection},\n    author={Wu, Hai and Wen,Chenglu and Shi, Shaoshuai and Wang, Cheng},\n    booktitle={CVPR},\n    year={2023}\n}\n```\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhailanyi%2FVirConv","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhailanyi%2FVirConv","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhailanyi%2FVirConv/lists"}