{"id":20305525,"url":"https://github.com/vitae-transformer/simdistill","last_synced_at":"2025-04-11T14:51:09.054Z","repository":{"id":171883612,"uuid":"621042985","full_name":"ViTAE-Transformer/SimDistill","owner":"ViTAE-Transformer","description":"The official repo for [AAAI 2024] \"SimDistill: Simulated Multi-modal Distillation for BEV 3D Object Detection\"\"","archived":false,"fork":false,"pushed_at":"2024-05-16T03:39:34.000Z","size":9109,"stargazers_count":33,"open_issues_count":5,"forks_count":2,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-03-20T13:33:09.376Z","etag":null,"topics":["3d-object-detection","bird-view-image","deep-learning","distillation","simulation"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ViTAE-Transformer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-29T22:00:10.000Z","updated_at":"2025-03-11T05:13:08.000Z","dependencies_parsed_at":"2024-05-16T02:16:10.120Z","dependency_job_id":"e0637a7d-05be-4174-b92f-43ab25fb3493","html_url":"https://github.com/ViTAE-Transformer/SimDistill","commit_stats":null,"previous_names":["vitae-transformer/bevsimdet"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ViTAE-Transformer%2FSimDistill","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ViTAE-Transformer%2FSimDistill/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ViTAE-Transformer%2FSimDistill/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ViTAE-Transformer%2FSimDistill/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ViTAE-Transformer","download_url":"https://codeload.github.com/ViTAE-Transformer/SimDistill/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248424566,"owners_count":21101196,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-object-detection","bird-view-image","deep-learning","distillation","simulation"],"created_at":"2024-11-14T17:08:48.897Z","updated_at":"2025-04-11T14:51:09.035Z","avatar_url":"https://github.com/ViTAE-Transformer.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003eSimDistill: Simulated Multi-modal Distillation for BEV 3D Object Detection\u003c/h1\u003e\n\u003cp align=\"center\"\u003e\n\u003ca href=\"https://arxiv.org/abs/2303.16818\"\u003e\u003cimg  src=\"https://img.shields.io/badge/arXiv-Paper-\u003cCOLOR\u003e.svg\" \u003e\u003c/a\u003e\n\u003ch4 align=\"center\"\u003eThis is the official repository of the paper \u003ca href=\"https://arxiv.org/abs/2303.16818\"\u003eSimDistill: Simulated Multi-modal Distillation for BEV 3D Object Detection\u003c/a\u003e.\u003c/h4\u003e\n\u003ch5 align=\"center\"\u003e\u003cem\u003eHaimei Zhao, Qiming Zhang, Shanshan Zhao, Zhe Chen, Jing Zhang, and Dacheng Tao\u003c/em\u003e\u003c/h5\u003e\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"#news\"\u003eNews\u003c/a\u003e |\n  \u003ca href=\"#abstract\"\u003eAbstract\u003c/a\u003e |\n  \u003ca href=\"#method\"\u003eMethod\u003c/a\u003e |\n  \u003ca href=\"#results\"\u003eResults\u003c/a\u003e |\n  \u003ca href=\"#preparation\"\u003ePreparation\u003c/a\u003e |\n  \u003ca href=\"#code\"\u003eCode\u003c/a\u003e |\n  \u003ca href=\"#statement\"\u003eStatement\u003c/a\u003e\n\u003c/p\u003e\n\n## News\n- **(2023/3/29)** SimDistill is accepted by AAAI 2024!.\n- **(2023/3/29)** BEVSimDet is released on [arXiv](https://arxiv.org/abs/2303.16818).\n\n\u003e Other applications of [ViTAE Transformer](https://github.com/ViTAE-Transformer/ViTAE-Transformer) include: [image classification](https://github.com/ViTAE-Transformer/ViTAE-Transformer/tree/main/Image-Classification) | [object detection](https://github.com/ViTAE-Transformer/ViTAE-Transformer/tree/main/Object-Detection) | [semantic segmentation](https://github.com/ViTAE-Transformer/ViTAE-Transformer/tree/main/Semantic-Segmentation) | [pose estimation](https://github.com/ViTAE-Transformer/ViTPose) | [remote sensing](https://github.com/ViTAE-Transformer/ViTAE-Transformer-Remote-Sensing)｜[image matting](https://github.com/ViTAE-Transformer/ViTAE-Transformer-Matting) | [scene text spotting](https://github.com/ViTAE-Transformer/ViTAE-Transformer-Scene-Text-Detection)\n\n\n## Abstract\n\nMulti-view camera-based 3D object detection has become popular due to its low cost, but accurately inferring 3D geometry solely from camera data remains challenging and may lead to inferior performance. Although distilling precise 3D geometry knowledge from LiDAR data could help tackle this challenge, the benefits of LiDAR information could be greatly hindered by the significant modality gap between different sensory modalities. To address this issue, we propose a \\textbf{Si}mulated \\textbf{m}ulti-modal \\textbf{Distill}ation (\\textbf{SimDistill}) method by carefully crafting the model architecture and distillation strategy. Specifically, we devise multi-modal architectures for both teacher and student models, including a LiDAR-camera fusion-based teacher and a simulated fusion-based student. Owing to the ``identical'' architecture design, the student can mimic the teacher to generate multi-modal features with merely multi-view images as input, where a geometry compensation module is introduced to bridge the modality gap. Furthermore, we propose a comprehensive multi-modal distillation scheme that supports intra-modal, cross-modal, and multi-modal fusion distillation simultaneously in the Bird's-eye-view space. Incorporating them together, our SimDistill can learn better feature representations for 3D object detection while maintaining a cost-effective camera-only deployment. Extensive experiments validate the effectiveness and superiority of SimDistill over state-of-the-art methods, achieving an improvement of 4.8\\% mAP and 4.1\\% NDS over the baseline detector.\n## Method\n\n![the framework figure](./docker/mainfigure.png \"framework\")\n## Results\n\n### Quantitative results on Nuscenes validation set\n![quantitative figure](./docker/quantitative-results.png \"quantitative-results\")\n### Qualitative results\n![qualitative figure](./docker/visualization.png \"visualization\")\n![qualitative figure](./docker/supplementary-lidar.png \"supplementary-lidar\")\n![qualitative figure](./docker/supplementary-prediction1.png \"supplementary-prediction1\")\n## Preparation\n\n### Prerequisites\n\nThe code is built with following libraries:\n\n- Python \u003e= 3.8, \\\u003c3.9\n- OpenMPI = 4.0.4 and mpi4py = 3.0.3 (Needed for torchpack)\n- Pillow = 8.4.0 (see [here](https://github.com/mit-han-lab/bevfusion/issues/63))\n- [PyTorch](https://github.com/pytorch/pytorch) \u003e= 1.9, \\\u003c= 1.10.2\n- [tqdm](https://github.com/tqdm/tqdm)\n- [torchpack](https://github.com/mit-han-lab/torchpack)\n- [mmcv](https://github.com/open-mmlab/mmcv) = 1.4.0\n- [mmdetection](http://github.com/open-mmlab/mmdetection) = 2.20.0\n- [nuscenes-dev-kit](https://github.com/nutonomy/nuscenes-devkit)\n\nAfter installing these dependencies, please run this command to install the codebase:\n\n```bash\npython setup.py develop\n```\n### Data Preparation\n\n#### nuScenes\n\nPlease follow the instructions from [here](https://github.com/open-mmlab/mmdetection3d/blob/master/docs/en/datasets/nuscenes_det.md) to download and preprocess the nuScenes dataset. Please remember to download both detection dataset and the map extension (for BEV map segmentation). After data preparation, you will be able to see the following directory structure (as is indicated in mmdetection3d):\n\n```\nmmdetection3d\n├── mmdet3d\n├── tools\n├── configs\n├── data\n│   ├── nuscenes\n│   │   ├── maps\n│   │   ├── samples\n│   │   ├── sweeps\n│   │   ├── v1.0-test\n|   |   ├── v1.0-trainval\n│   │   ├── nuscenes_database\n│   │   ├── nuscenes_infos_train.pkl\n│   │   ├── nuscenes_infos_val.pkl\n│   │   ├── nuscenes_infos_test.pkl\n│   │   ├── nuscenes_dbinfos_train.pkl\n\n```\n\n## Code\n### Setup\n```bash\npython setup.py develop\n```\n### Training\n```bash\ndifferent loss items should be changed in configs/nuscenes/det/centerhead/lssfpn/camera/256x704/swint/convfuser.yaml\n\nand different backbone networks can be choosed, including swinT, vitaev2, and bevformer in configs/nuscenes/det/centerhead/lssfpn/camera/256x704/\n\ntorchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/centerhead/lssfpn/camera/256x704/swint/convfuser.yaml --data.samples_per_gpu 3 --max_epochs 20 --data.workers_per_gpu 6 --run-dir swinT-twobranchesloss --load_from ../bevfusion-main/pretrained/bevfusion-det.pth\n```\n### Evaluation\n```bash\ntorchpack dist-run -np 8 python tools/test.py configs/nuscenes/det/centerhead/lssfpn/camera/256x704/swint/convfuser.yaml --xxx.pth --eval bbox\n```\n\n## Statement\n@inproceedings{zhao2024simdistill,\ntitle={SimDistill: Simulated Multi-Modal Distillation for BEV 3D Object Detection},\nauthor={Zhao, Haimei and Zhang, Qiming and Zhao, Shanshan and Chen, Zhe and Zhang, Jing and Tao, Dacheng},\nbooktitle={Proceedings of the AAAI Conference on Artificial Intelligence},\nvolume={38},\nnumber={7},\npages={7460--7468},\nyear={2024}\n}\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvitae-transformer%2Fsimdistill","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvitae-transformer%2Fsimdistill","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvitae-transformer%2Fsimdistill/lists"}