{"id":13443377,"url":"https://github.com/DerrickXuNu/v2x-vit","last_synced_at":"2025-03-20T16:31:06.115Z","repository":{"id":41896345,"uuid":"510451246","full_name":"DerrickXuNu/v2x-vit","owner":"DerrickXuNu","description":"[ECCV2022] Official Implementation of  paper \"V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer\"","archived":false,"fork":false,"pushed_at":"2024-09-06T01:45:31.000Z","size":270,"stargazers_count":275,"open_issues_count":5,"forks_count":31,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-10-28T06:58:05.902Z","etag":null,"topics":["3d-object-detection","autonomous-driving","collaborative-perception","computer-vision","deep-learning","machine-learning","multi-agent-system","pytorch","simulation","v2x","vehicle-to-everything","vision-transformer"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DerrickXuNu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2022-07-04T17:40:18.000Z","updated_at":"2024-10-21T07:45:49.000Z","dependencies_parsed_at":"2024-01-18T14:42:55.376Z","dependency_job_id":"e2f6694d-27cb-4c6b-ad10-1ddd643b2fdd","html_url":"https://github.com/DerrickXuNu/v2x-vit","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DerrickXuNu%2Fv2x-vit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DerrickXuNu%2Fv2x-vit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DerrickXuNu%2Fv2x-vit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DerrickXuNu%2Fv2x-vit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DerrickXuNu","download_url":"https://codeload.github.com/DerrickXuNu/v2x-vit/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244649810,"owners_count":20487496,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-object-detection","autonomous-driving","collaborative-perception","computer-vision","deep-learning","machine-learning","multi-agent-system","pytorch","simulation","v2x","vehicle-to-everything","vision-transformer"],"created_at":"2024-07-31T03:01:59.876Z","updated_at":"2025-03-20T16:31:05.686Z","avatar_url":"https://github.com/DerrickXuNu.png","language":"Python","funding_links":[],"categories":["Python","Vehicle-to-Everything Field Datasets","Anti-UAV Datasets"],"sub_categories":[],"readme":"[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/v2x-vit-vehicle-to-everything-cooperative/3d-object-detection-on-v2xset)](https://paperswithcode.com/sota/3d-object-detection-on-v2xset?p=v2x-vit-vehicle-to-everything-cooperative)\n\n# [V2X-ViT](https://arxiv.org/abs/2203.10638): Vehicle-to-Everything Cooperative Perception with Vision Transformer (ECCV 2022)\n\n[![paper](https://img.shields.io/badge/arXiv-Paper-\u003cCOLOR\u003e.svg)](https://arxiv.org/abs/2203.10638)\n[![supplement](https://img.shields.io/badge/Supplementary-Material-red)]()\n[![video](https://img.shields.io/badge/Video-Presentation-F9D371)]()\n\n\nThis is the official implementation of ECCV2022 paper \"V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer\".\n[Runsheng Xu](https://derrickxunu.github.io/), [Hao Xiang](https://xhwind.github.io/), [Zhengzhong Tu](https://github.com/vztu), [Xin Xia](https://scholar.google.com/citations?user=vCYqMTIAAAAJ\u0026hl=en), [Ming-Hsuan Yang](https://scholar.google.com/citations?user=p9-ohHsAAAAJ\u0026hl=en), [Jiaqi Ma](https://mobility-lab.seas.ucla.edu/)\n\nUCLA, UT-Austin, Google Research, UC-Merced\n\n**Important Notice**: [OpenCOOD](https://github.com/DerrickXuNu/OpenCOOD) supports V2X-ViT and V2XSet now! We will **no longer** update this repo, and all the new features (e.g. multi gpu implementation) will only be updated in OpenCOOD.\n\n![teaser](images/v2xvit.png)\n\n## Installation\n```bash\n# Clone repo\ngit clone https://github.com/DerrickXuNu/v2x-vit\n\ncd v2x-vit\n\n# Setup conda environment\nconda create -y --name v2xvit python=3.7\n\nconda activate v2xvit\n# pytorch \u003e= 1.8.1, newest version can work well\nconda install -y pytorch torchvision cudatoolkit=11.3 -c pytorch\n# spconv 2.0 install, choose the correct cuda version for you\npip install spconv-cu113\n\n# Install dependencies\npip install -r requirements.txt\n# Install bbx nms calculation cuda version\npython v2xvit/utils/setup.py build_ext --inplace\n\n# install v2xvit into the environment\npython setup.py develop\n```\n\n## Data\n### Download\nThe data can be found from [this url](https://ucla.app.box.com/v/UCLA-MobilityLab-V2XVIT).  Since the data for train/validate/test\nis very large, we  split each data set into small chunks, which can be found in the directory ending with `_chunks`, such as `train_chunks`. After downloading, please run the following command to each set to merge those chunks together:\n\n```\ncat train.zip.part* \u003e train.zip\nunzip train.zip\n```\nIf you have good internet, you can also directly download the whole zip file, e.g. train.zip\n### Structure\nAfter downloading is finished, please make the file structured as following:\n\n```sh\nv2x-vit # root of your v2xvit\n├── v2xset # the downloaded v2xset data\n│   ├── train\n│   ├── validate\n│   ├── test\n├── v2xvit # the core codebase\n\n```\n### Details\nOur data label format is very similar with the one in [OPV2V](https://github.com/DerrickXuNu/OpenCOOD). For more details, please refer to the [data tutorial](docs/data_intro.md).\n\n### Noise Simulation\nOne important feature of V2XSet is the capability of adding different communication noises. This is done in a post-processing approach through our flexible coding framework. To set different noise, please\nrefer to [config yaml tutorial](docs/config_tutorial.md).\n\n## Getting Started\n### Data sequence visualization\nTo quickly visualize the LiDAR stream in the V2XSet dataset, first modify the `validate_dir`\nin your `v2xvit/hypes_yaml/visualization.yaml` to the V2XSet data path on your local machine, e.g. `v2xset/validate`,\nand the run the following commond:\n```python\ncd ~/v2x-vit\npython v2xvit/visualization/vis_data_sequence.py [--color_mode ${COLOR_RENDERING_MODE}]\n```\nArguments Explanation:\n- `color_mode` : str type, indicating the lidar color rendering mode. You can choose from 'constant', 'intensity' or 'z-value'.\n\n### Test with pretrained model\nTo test the pretrained model of V2X-ViT, first download the model file from [google url](https://drive.google.com/drive/folders/1h2UOPP2tNRkV_s6cbKcSfMvTgb8_ZFj9?usp=sharing) and\nthen put it under v2x-vit/logs/v2x-vit. Change the `validate_path` in `v2x-vit/logs/v2x-vit/config.yaml` as `'v2xset/test'.\n\nTo test under perfect setting, change both `async`  and `loc_error`to false in the v2x-vit/logs/v2x-vit/config.yaml.\n\nTo test under noisy setting in our paper, change the `wild_setting` as followings:\n```\nwild_setting:\n  async: true\n  async_mode: 'sim'\n  async_overhead: 100\n  backbone_delay: 10\n  data_size: 1.06\n  loc_err: true\n  ryp_std: 0.2\n  seed: 25\n  transmission_speed: 27\n  xyz_std: 0.2\n```\nEventually, run the following command to perform test:\n```python\npython v2xvit/tools/inference.py --model_dir ${CHECKPOINT_FOLDER} --fusion_method ${FUSION_STRATEGY} [--show_vis] [--show_sequence]\n```\nArguments Explanation:\n- `model_dir`: the path to your saved model.\n- `fusion_method`: indicate the fusion strategy, currently support 'early', 'late', and 'intermediate'.\n- `show_vis`: whether to visualize the detection overlay with point cloud.\n- `show_sequence` : the detection results will visualized in a video stream. It can NOT be set with `show_vis` at the same time.\n\n\n\n\n### Train your model\nV2X-ViT uses yaml file to configure all the parameters for training. To train your own model\nfrom scratch or a continued checkpoint, run the following commonds:\n```python\npython v2xvit/tools/train.py --hypes_yaml ${CONFIG_FILE} [--model_dir  ${CHECKPOINT_FOLDER} --half]\n```\nArguments Explanation:\n- `hypes_yaml`: the path of the training configuration file, e.g. `v2xvit/hypes_yaml/point_pillar_v2xvit.yaml`, meaning you want to train\n- `model_dir` (optional) : the path of the checkpoints. This is used to fine-tune the trained models. When the `model_dir` is\ngiven, the trainer will discard the `hypes_yaml` and load the `config.yaml` in the checkpoint folder.\n- `half`(optional): if specified, hybrid-precision training will be used to save memory occupation.\n\n\u003cstrong\u003eImportant Notes for Training:\u003c/strong\u003e\n1. When you train from scratch, please first set `async` and `loc_err` to false to train on perfect setting. Also, set `compression` to 0 at beginning.\n2. After the model on perfect setting converged, set `compression`  to 32 (please change the config yaml in your trained model directory) and continue training on the perfect setting for another 1-2 epoches.\n3. Next, set `async` to true, `async_mode` to 'real', `async_overhead` to 200 or 300, `loc_err` to true, `xyz_std` to 0.2, `rpy_std` to 0.2, and then continue training your model on this noisy setting. Please note that you are free to change these noise setting during training to obtain better performance.\n4. Eventually, use the model fine-tuned on noisy setting as the test model for both perfect and noisy setting.\n\n## Citation\n If you are using our V2X-ViT model or V2XSet dataset for your research, please cite the following paper:\n ```bibtex\n@inproceedings{xu2022v2xvit,\n  author = {Runsheng Xu, Hao Xiang, Zhengzhong Tu, Xin Xia, Ming-Hsuan Yang, Jiaqi Ma},\n  title = {V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer},\n  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},\n  year = {2022}}\n```\n\n## Acknowledgement\nV2X-ViT is build upon [OpenCOOD](https://github.com/DerrickXuNu/OpenCOOD), which is the first Open Cooperative Detection framework for autonomous driving.\n\nV2XSet is collected using [OpenCDA](https://github.com/ucla-mobility/OpenCDA), which is the first open co-simulation-based research/engineering framework integrated with prototype cooperative driving automation pipelines as well as regular automated driving components (e.g., perception, localization, planning, control).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDerrickXuNu%2Fv2x-vit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FDerrickXuNu%2Fv2x-vit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDerrickXuNu%2Fv2x-vit/lists"}