{"id":20349993,"url":"https://github.com/cure-lab/magicdrive","last_synced_at":"2025-05-15T07:07:30.075Z","repository":{"id":199686291,"uuid":"703433265","full_name":"cure-lab/MagicDrive","owner":"cure-lab","description":"[ICLR24] Official implementation of the paper “MagicDrive: Street View Generation with Diverse 3D Geometry Control”","archived":false,"fork":false,"pushed_at":"2024-12-09T09:58:54.000Z","size":24662,"stargazers_count":937,"open_issues_count":23,"forks_count":48,"subscribers_count":14,"default_branch":"main","last_synced_at":"2025-04-14T13:06:45.632Z","etag":null,"topics":["autonomous-vehicles","deep-learning","diffusion-models","image-generation","pytorch","video-generation"],"latest_commit_sha":null,"homepage":"https://gaoruiyuan.com/magicdrive/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cure-lab.png","metadata":{"files":{"readme":"README.MD","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-10-11T08:33:23.000Z","updated_at":"2025-04-14T11:38:51.000Z","dependencies_parsed_at":"2023-10-11T16:59:56.836Z","dependency_job_id":"a91826d2-e594-41c1-b640-0c8fe94b7454","html_url":"https://github.com/cure-lab/MagicDrive","commit_stats":{"total_commits":36,"total_committers":1,"mean_commits":36.0,"dds":0.0,"last_synced_commit":"1e03e21e4d0ef16c22faf8575e765482e3e5fe59"},"previous_names":["cure-lab/magicdrive"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cure-lab%2FMagicDrive","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cure-lab%2FMagicDrive/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cure-lab%2FMagicDrive/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cure-lab%2FMagicDrive/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cure-lab","download_url":"https://codeload.github.com/cure-lab/MagicDrive/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254292043,"owners_count":22046426,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["autonomous-vehicles","deep-learning","diffusion-models","image-generation","pytorch","video-generation"],"created_at":"2024-11-14T22:28:29.285Z","updated_at":"2025-05-15T07:07:25.037Z","avatar_url":"https://github.com/cure-lab.png","language":"Python","readme":"# MagicDrive\n\n✨ Check out our new work [MagicDrive3D](https://github.com/flymin/MagicDrive3D) on **3D scene generation**!\n\n✨ If you want **video generation**, please find the code at the [`video branch`](https://github.com/cure-lab/MagicDrive/tree/video).\n\n[![arXiv](https://img.shields.io/badge/ArXiv-2310.02601-b31b1b.svg?style=plastic)](https://arxiv.org/abs/2310.02601) [![web](https://img.shields.io/badge/Web-MagicDrive-blue.svg?style=plastic)](https://gaoruiyuan.com/magicdrive/) [![license](https://img.shields.io/github/license/cure-lab/MagicDrive?style=plastic)](https://github.com/cure-lab/MagicDrive/blob/main/LICENSE) [![star](https://img.shields.io/github/stars/cure-lab/MagicDrive)](https://github.com/cure-lab/MagicDrive)\n\nVideos generated by MagicDrive (click the image to see the video).\n\n[![2_7_gen](./assets/2_7_gen_frame0.png)](https://gaoruiyuan.com/magicdrive/static/videos/2_7_gen.mp4)\n\n[![3_7_gen](./assets/3_7_gen_frame0.png)](https://gaoruiyuan.com/magicdrive/static/videos/3_7_gen.mp4)\n\nThis repository contains the implementation of the paper \n\n\u003e MagicDrive: Street View Generation with Diverse 3D Geometry Control \u003cbr\u003e\n\u003e [Ruiyuan Gao](https://gaoruiyuan.com/)\u003csup\u003e1\\*\u003c/sup\u003e, [Kai Chen](https://kaichen1998.github.io/)\u003csup\u003e2\\*\u003c/sup\u003e, [Enze Xie](https://xieenze.github.io/)\u003csup\u003e3^\u003c/sup\u003e, [Lanqing Hong](https://scholar.google.com.sg/citations?user=2p7x6OUAAAAJ\u0026hl=en)\u003csup\u003e3\u003c/sup\u003e, [Zhenguo Li](https://scholar.google.com/citations?user=XboZC1AAAAAJ\u0026hl=en)\u003csup\u003e3\u003c/sup\u003e, [Dit-Yan Yeung](https://sites.google.com/view/dyyeung)\u003csup\u003e2\u003c/sup\u003e, [Qiang Xu](https://cure-lab.github.io/)\u003csup\u003e1^\u003c/sup\u003e\u003cbr\u003e\n\u003e \u003csup\u003e1\u003c/sup\u003eCUHK \u003csup\u003e2\u003c/sup\u003eHKUST \u003csup\u003e3\u003c/sup\u003eHuawei Noah's Ark Lab \u003cbr\u003e\n\u003e \u003csup\u003e\\*\u003c/sup\u003eEqual Contribution \u003csup\u003e^\u003c/sup\u003eCorresponding Authors\n\n## Abstract\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eTL; DR\u003c/b\u003e MagicDrive generates high-quality street-view images \u0026 videos with diverse 3D geometry control and multiview consistency, which can be used as a data engine in various perception tasks.\u003c/summary\u003e\n\nRecent advancements in diffusion models have significantly enhanced the data synthesis with 2D control. Yet, precise 3D control in street view generation, crucial for 3D perception tasks, remains elusive. Specifically, utilizing Bird's-Eye View (BEV) as the primary condition often leads to challenges in geometry control (e.g., height), affecting the representation of object shapes, occlusion patterns, and road surface elevations, all of which are essential to perception data synthesis, especially for 3D object detection tasks. In this paper, we introduce MagicDrive, a novel street view generation framework, offering diverse 3D geometry controls including camera poses, road maps, and 3D bounding boxes, together with textual descriptions, achieved through tailored encoding strategies. Besides, our design incorporates a cross-view attention module, ensuring consistency across multiple camera views. With MagicDrive, we achieve high-fidelity street-view image \u0026 video synthesis that captures nuanced 3D geometry and various scene descriptions, enhancing tasks like BEV segmentation and 3D object detection.\n\n\u003c/details\u003e\n\n## News\n\n- [2024/12/09] We release 60-frame video generation model on [huggingface](https://huggingface.co/flymin/MagicDrive-t-60f-224x400-80k), please use the code in `video` branch to run.\n- [2024/12/09] We release two higher-resolution image generation models ([424x800 model](https://huggingface.co/flymin/MagicDrive-424x800-450ep) for visualization and [272x736 model](https://huggingface.co/flymin/MagicDrive-272x736-400ep) for BEVFusion) with their training configs.\n- [2024/06/07] MagicDrive can generate **60-frame** videos! We release the config: [rawbox_mv2.0t_0.4.3_60.yaml](https://github.com/cure-lab/MagicDrive/blob/video/configs/exp/rawbox_mv2.0t_0.4.3_60.yaml). Check out our demos on the [project page](https://gaoruiyuan.com/magicdrive/#long-video).\n- [2024/06/07] We release **model checkpoint** for **16-frame** video generation. [Check it out](https://github.com/cure-lab/MagicDrive/tree/video?tab=readme-ov-file#magicdrive-t-checkpoints)!\n- [2024/06/01] We hold the [W-CODA](https://coda-dataset.github.io/w-coda2024/index.html) workshop @ECCV2024. Challenge [track 2](https://coda-dataset.github.io/w-coda2024/track2/) will use MagicDrive as the baseline. We will release more resources in the near future. Stay tuned!\n\n## Method\n\nIn MagicDrive, we employ two strategies (cross-attention and additive encoder branch) to inject text prompts, camera poses, object boxes, and road maps as conditions for generation. We also propose a cross-view attention module for multiview consistency.\n\n![image-20231011165634648](./assets/overview.png)\n\n## TODO\n\n- [x] [config](configs/exp/224x400.yaml) and [model checkpoint](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155157018_link_cuhk_edu_hk/ERiu-lbAvq5IkODTscFXYPUBpVYVDbwjHchDExBlPfeQ0w?e=8YaDM0) for base resolution (224x400)\n- [x] demo for base resolution (224x400)\n- [x] GUI for interactive bbox editing\n- [x] train and test code release\n- [x] FID test code\n- [x] config and checkpoint for high resolution\n\n## Getting Started\n\n### Environment Setup\n\nClone this repo with submodules\n\n```bash\ngit clone --recursive https://github.com/cure-lab/MagicDrive.git\n```\n\nThe code is tested with `Pytorch==1.10.2` and `cuda 10.2` on V100 servers. To setup the python environment, follow:\n\n```bash\n# option1: to run GUI only\npip install -r requirements/gui.txt\n# 😍 our GUI does not need mm-series packages.\n# continue to install diffusers from `third_party`.\n\n# option2: to run the full testing demo (and also test your env before training)\ncd ${ROOT}\npip install -r requirements/dev.txt\n# continue to install `third_party`s as following.\n```\n\nWe opt to install the source code for the following packages, with `cd ${FOLDER}; pip -vvv install .`\n\n```bash\n# install third-party\nthird_party/\n├── bevfusion -\u003e based on db75150\n├── diffusers -\u003e based on v0.17.1 (afcca39)\n└── xformers  -\u003e based on v0.0.19 (8bf59c9), optional\n```\n\nsee [note about our xformers](doc/xformers.md). If you have issues with the environment setup, please check [FAQ](doc/FAQ.md) first.\n\nSetup default configuration for `accelerate` with\n```bash\naccelerate config\n```\n\nOur default log directory is `${ROOT}/magicdrive-log`. Please be prepared.\n\n### Pretrained Weights\n\nOur training is based on [stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5). We assume you put them at `${ROOT}/pretrained/` as follows:\n\n```bash\n{ROOT}/pretrained/stable-diffusion-v1-5/\n├── text_encoder\n├── tokenizer\n├── unet\n├── vae\n└── ...\n```\n\n## Street-view Generation with MagicDrive\n\nDownload our model checkpoint for MagicDrive from \n\n- 224x400 model: [onedrive](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155157018_link_cuhk_edu_hk/ERiu-lbAvq5IkODTscFXYPUBpVYVDbwjHchDExBlPfeQ0w?e=8YaDM0)\n- 272x736 model: [huggingface](https://huggingface.co/flymin/MagicDrive-272x736-400ep)\n- 424x800 model: [huggingface](https://huggingface.co/flymin/MagicDrive-424x800-450ep)\n\nand put them in `${ROOT}/pretrained/`\n\n**Run our demo**\n\n👍 We recommend users run our interactive GUI first, because we have minimized the dependencies for the GUI demo.\n```bash\ncd ${ROOT}\npython demo/interactive_gui.py\n# a gradio-based gui, use your web browser\n```\n\nAs suggested by [#37](https://github.com/cure-lab/MagicDrive/issues/37), prompt is configurable through GUI!\n\n![gui](assets/gui.jpg)\n\nRun our demo for camera view generation.\n\n```bash\ncd ${ROOT}\npython demo/run.py resume_from_checkpoint=magicdrive-log/SDv1.5mv-rawbox_2023-09-07_18-39_224x400\n```\nThe generated images will be located in `magicdrive-log/test`. More information can be find in [demo doc](demo/readme.md).\n\n## Train MagicDrive\n\n### Prepare Data\nWe prepare the nuScenes dataset similar to [bevfusion's instructions](https://github.com/mit-han-lab/bevfusion#data-preparation). Specifically,\n\n1. Download the nuScenes dataset from the [website](https://www.nuscenes.org/nuscenes) and put them in `./data/`. You should have these files:\n    ```bash\n    data/nuscenes\n    ├── maps\n    ├── mini\n    ├── samples\n    ├── sweeps\n    ├── v1.0-mini\n    └── v1.0-trainval\n    ```\n\n\u003e [!TIP]\n\u003e You can download the `.pkl` files from [OneDrive](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155157018_link_cuhk_edu_hk/EYF9ZkMHwVZKjrU5CUUPbfYBhC1iZMMnhE2uI2q5iCuv9w?e=QgEmcH). They should be enough for training and testing.\n\n2. Generate mmdet3d annotation files by:\n\n    ```bash\n    python tools/create_data.py nuscenes --root-path ./data/nuscenes \\\n      --out-dir ./data/nuscenes_mmdet3d_2 --extra-tag nuscenes\n    ```\n    You should have these files:\n    ```bash\n    data/nuscenes_mmdet3d_2\n    ├── nuscenes_dbinfos_train.pkl (-\u003e ${bevfusion-version}/nuscenes_dbinfos_train.pkl)\n    ├── nuscenes_gt_database (-\u003e ${bevfusion-version}/nuscenes_gt_database)\n    ├── nuscenes_infos_train.pkl\n    └── nuscenes_infos_val.pkl\n    ```\n    Note: As shown above, some files can be soft-linked with the original version from bevfusion. If some of the files is located in `data/nuscenes`, you can move them to `data/nuscenes_mmdet3d_2` manually.\n\n3. (Optional) To accelerate data loading, we prepared cache files in h5 format for BEV maps. They can be generated through `tools/prepare_map_aux.py` with different configs in `configs/dataset`. For example:\n    ```bash\n    python tools/prepare_map_aux.py +process=train\n    python tools/prepare_map_aux.py +process=val\n    ```\n    You will have files like `./val_tmp.h5` and `./train_tmp.h5`. You have to rename the cache files correctly after generating them. Our default is\n    ```bash\n    data/nuscenes_map_aux\n    ├── train_26x200x200_map_aux_full.h5 (42G)\n    └── val_26x200x200_map_aux_full.h5 (9G)\n    ```\n\n### Train the model\n\nLaunch training with (with 8xV100):\n```bash\naccelerate launch --mixed_precision fp16 --gpu_ids all --num_processes 8 tools/train.py \\\n  +exp=224x400 runner=8gpus\n```\nDuring training, you can check tensorboard for the log and intermediate results.\n\nBesides, we provide debug config to test your environment and data loading process (with 2xV100):\n```bash\naccelerate launch --mixed_precision fp16 --gpu_ids all --num_processes 2 tools/train.py \\\n  +exp=224x400 runner=debug runner.validation_before_run=true\n```\n\n### Test the model\nAfter training, you can test your model for driving view generation through:\n```bash\npython tools/test.py resume_from_checkpoint=${YOUR MODEL}\n# take our the 224x400 model checkpoint as an example\npython tools/test.py resume_from_checkpoint=./pretrained/SDv1.5mv-rawbox_2023-09-07_18-39_224x400\n```\nPlease find the results in `./magicdrive-log/test/`.\n\n**To test FID**\n\nFirst, you should generate the full validation set with\n```bash\npython perception/data_prepare/val_set_gen.py \\\n  resume_from_checkpoint=./pretrained/SDv1.5mv-rawbox_2023-09-07_18-39_224x400 \\\n  task_id=224x400 fid.img_gen_dir=./tmp/224x400 +fid=data_gen +exp=224x400\n  # for map=zero as the null condition for CFG, add `runner.pipeline_param.use_zero_map_as_unconditional=true`\n```\nFor this script, **multi-process / multi-node** is also available by `accelerate`. Just launch it with commands similar to that of training.\n\nThen, test the FID score with\n```bash\n# we assume your torch cache dir is at \"../pretrained/torch_cache/\". If you want\n# to use the default place, please comment the second last line in \"tools/fid_score.py\".\npython tools/fid_score.py cfg \\\n  resume_from_checkpoint=./pretrained/SDv1.5mv-rawbox_2023-09-07_18-39_224x400 \\\n  fid.rootb=tmp/224x400\n```\n\nAlternatively, we provide the pre-generated samples for validation set [here](https://mycuhk-my.sharepoint.com/:f:/g/personal/1155157018_link_cuhk_edu_hk/EjWsTYfC01BAl0F2NLP_bX4BqHjY-oV1VaTx4RgMzbiXWQ?e=fPfEy3).\nYou can put them in `./tmp` and launch the test through\n```bash\npython tools/fid_score.py cfg \\\n  resume_from_checkpoint=./pretrained/SDv1.5mv-rawbox_2023-09-07_18-39_224x400 \\\n  fid.rootb=tmp/224x400/samples  # FID=14.46065995481922\n  # or `fid.rootb=tmp/224x400map0/samples`, FID=16.195992872931697\n```\n\n## Quantitative Results\n\u003cdetails\u003e\n\u003csummary\u003eCompare MagicDrive with other methods for generation quality:\u003c/summary\u003e\n\n![main_results](./assets/main_results.png)\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eTraining support with images generated from MagicDrive:\u003c/summary\u003e\n\n![trainability](./assets/trainability.png)\n\n\u003c/details\u003e\n\nMore results can be found in the main paper.\n\n## Qualitative Results\n\nMore results can be found in the main paper.\n\n![editings](./assets/editings.png)\n\n## Cite Us\n\n```bibtex\n@inproceedings{gao2023magicdrive,\n  title={{MagicDrive}: Street View Generation with Diverse 3D Geometry Control},\n  author={Gao, Ruiyuan and Chen, Kai and Xie, Enze and Hong, Lanqing and Li, Zhenguo and Yeung, Dit-Yan and Xu, Qiang},\n  booktitle = {International Conference on Learning Representations},\n  year={2024}\n}\n```\n\n## Credit\n\nWe adopt the following open-sourced projects:\n\n- [bevfusion](https://github.com/mit-han-lab/bevfusion): dataloader to handle 3d bounding boxes and BEV map\n- [diffusers](https://github.com/huggingface/diffusers): framework to train stable diffusion\n- [xformers](https://github.com/facebookresearch/xformers): accelerator for attention mechanism\n- Thanks [@pixeli99](https://github.com/pixeli99) for training the [60-frame video generation](https://gaoruiyuan.com/magicdrive/#long-video).\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcure-lab%2Fmagicdrive","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcure-lab%2Fmagicdrive","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcure-lab%2Fmagicdrive/lists"}