{"id":24785486,"url":"https://github.com/YvanYin/DrivingWorld","last_synced_at":"2025-10-12T09:30:56.545Z","repository":{"id":269775854,"uuid":"908133244","full_name":"YvanYin/DrivingWorld","owner":"YvanYin","description":"Code for \"DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT\"","archived":false,"fork":false,"pushed_at":"2025-01-15T04:19:03.000Z","size":1016,"stargazers_count":127,"open_issues_count":1,"forks_count":10,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-01-15T06:14:25.340Z","etag":null,"topics":["autonomous-driving","driving-world-model","generative-model","gpt","video-generation","video-gpt","world-models"],"latest_commit_sha":null,"homepage":"https://huxiaotaostasy.github.io/DrivingWorld/index.html","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/YvanYin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-25T08:16:58.000Z","updated_at":"2025-01-15T04:19:05.000Z","dependencies_parsed_at":null,"dependency_job_id":"fc2b87b5-0ca2-444d-9754-9a81a2a232a7","html_url":"https://github.com/YvanYin/DrivingWorld","commit_stats":null,"previous_names":["yvanyin/drivingworld"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YvanYin%2FDrivingWorld","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YvanYin%2FDrivingWorld/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YvanYin%2FDrivingWorld/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YvanYin%2FDrivingWorld/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/YvanYin","download_url":"https://codeload.github.com/YvanYin/DrivingWorld/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":236192311,"owners_count":19110001,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["autonomous-driving","driving-world-model","generative-model","gpt","video-generation","video-gpt","world-models"],"created_at":"2025-01-29T14:01:49.056Z","updated_at":"2025-10-12T09:30:51.001Z","avatar_url":"https://github.com/YvanYin.png","language":"Python","funding_links":[],"categories":["Python","多模态大模型"],"sub_categories":["资源传输下载"],"readme":"\u003cdiv align=\"center\"\u003e\n\n\u003ch1\u003eDrivingWorld: Constructing World Model for Autonomous Driving via Video GPT\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n\u003ca href=\"https://arxiv.org/abs/2412.19505\"\u003e\u003cimg src=\"https://img.shields.io/badge/ArXiv-2412.19505-%23840707.svg\" alt=\"ArXiv\"\u003e\u003c/a\u003e\n\u003ca href=\"https://youtu.be/5QJRAxnjX0k\"\u003e\u003cimg src=\"https://img.shields.io/badge/Youtube Demo-Video-%26840707.svg\" alt=\"VideoDemo\"\u003e\u003c/a\u003e\n\u003ca href=\"https://huxiaotaostasy.github.io/DrivingWorld/index.html\"\u003e\u003cimg src=\"https://img.shields.io/badge/Webpage-DrivingWorld-%237CB4F7.svg\" alt=\"Webpage\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n[Xiaotao Hu](https://huxiaotaostasy.github.io/)\u003csup\u003e1,2*\u003c/sup\u003e, [Wei Yin](https://yvanyin.net/)\u003csup\u003e2*§\u003c/sup\u003e, [Mingkai Jia](https://scholar.google.com/citations?user=fcpTdvcAAAAJ\u0026hl=zh-CN)\u003csup\u003e1,2\u003c/sup\u003e, [Junyuan Deng](https://scholar.google.com/citations?user=KTCPC5IAAAAJ\u0026hl=en)\u003csup\u003e1,2\u003c/sup\u003e, [Xiaoyang Guo](https://xy-guo.github.io/)\u003csup\u003e2\u003c/sup\u003e\u003cbr\u003e\n[Qian Zhang](https://scholar.google.com.hk/citations?hl=zh-CN\u0026user=pCY-bikAAAAJ)\u003csup\u003e2\u003c/sup\u003e, [Xiaoxiao Long](https://www.xxlong.site/)\u003csup\u003e1†\u003c/sup\u003e, [Ping Tan](https://scholar.google.com/citations?user=XhyKVFMAAAAJ\u0026hl=en)\u003csup\u003e1\u003c/sup\u003e\u003cbr\u003e\n\n[HKUST](https://hkust.edu.hk/)\u003csup\u003e1\u003c/sup\u003e, [Horizon Robotics](https://en.horizon.auto/)\u003csup\u003e2\u003c/sup\u003e\u003cbr\u003e\n\u003csup\u003e*\u003c/sup\u003e Equal Contribution, \u003csup\u003e†\u003c/sup\u003e Corresponding Author, \u003csup\u003e§\u003c/sup\u003e Project Leader\n\u003cbr\u003e\u003cbr\u003e\u003cimage src=\"./images/pipeline.png\"/\u003e\n\u003c/div\u003e\n\nWe present **DrivingWorld** (World Model for Autonomous Driving), a model that enables autoregressive video and ego state generation with high efficiency. **DrivingWorld** formulates the future state prediction (ego state and visions) as a next-state autoregressive style. Our **DrivingWorld** is able to predict over 40s videos and achieves high-fidelity controllable generation.\n\n## 🚀News\n\n- ```[Dec 2024]``` Released [paper](https://arxiv.org/abs/2412.19505), inference codes, and quick start guide.\n\n## 🔨 TODO LIST\n\n- [ ] Hugging face demos\n- [x] Complete evaluation code\n- [x] Video preprocess code\n- [ ] Training code\n\n\n## ✨Hightlights\n\n- 🔥 **Novel Approach**: GPT-style video and ego state generation.\n- 🔥 **State-of-the-art Performance**:  and long-duration driving-scene video results.\n- 🔥 **Controlable Generation**: High-fidelity controllable generation with ego poses.\n\n## 🗄️Demos\n- 🔥 Controllable generation with provided ego poses.\n\u003ca id=\"demo\"\u003e\u003c/a\u003e\n\n\u003cimage src=\"./images/teaser.png\"/\u003e\n\n![gif](https://raw.githubusercontent.com/huxiaotaostasy/huxiaotaostasy.github.io/main/DrivingWorld/videos/video_github.gif)\n\n## 🙊 Model Zoo\n| Model | Link |\n|---|---|\n|Video VQVAE| [link](https://huggingface.co/huxiaotaostasy/DrivingWorld/blob/main/vqvae.pt) |\n|World Model| [link](https://huggingface.co/huxiaotaostasy/DrivingWorld/blob/main/world_model.pth) |\n\n\n## 🔑 Quick Start\n\u003ca id=\"quick start\"\u003e\u003c/a\u003e\n\n\n### Installation\n\n```bash\ngit clone https://github.com/YvanYin/DrivingWorld.git\ncd DrivingWorld\npip3 install -r requirements.txt\n```\n* Download the pretrained models from [Hugging Face](https://huggingface.co/huxiaotaostasy/DrivingWorld/tree/main), and move the pretrained parameters to `DrivingWorld/pretrained_models/*`\n\n### Data Preparation\n\nFor data preparation, please refer to [video_data_preprocess.md](./video_data_preprocess//video_data_preprocess.md) for more details.\n\n\n### Change Road Demo\n\nScript for the default setting (conditioned on 15 frames, on demo videos, adopt topk sampling):\n```bash\npython3 tools/test_change_road_demo.py \\\n--config \"configs/drivingworld_v1/gen_videovq_conf_demo.py\" \\\n--exp_name \"demo_dest_change_road\" \\\n--load_path \"./pretrained_models/world_model.pth\" \\\n--save_video_path \"./outputs/change_road\"\n```\n\n### Long-term Demo\n\nScript for the default setting (conditioned on 15 frames, on demo videos, adopt topk sampling):\n```bash\npython3 tools/test_long_term_demo.py \\\n--config \"configs/drivingworld_v1/gen_videovq_conf_demo.py\" \\ \n--exp_name \"demo_test_long_term\" \\\n--load_path \"./pretrained_models/world_model.pth\" \\\n--save_video_path \"./outputs/long_term\"\n```\n\n### Personalized Generation\n\nFor all kinds of generation, you can change the conditional yaws and poses in the code yourself to get different outputs, and you can also modify the sampling parameters in the config files according to your needs.\n\n## 📌 Citation\n\nIf the paper and code from `DrivingWorld` help your research, we kindly ask you to give a citation to our paper ❤️. Additionally, if you appreciate our work and find this repository useful, giving it a star ⭐️ would be a wonderful way to support our work. Thank you very much.\n\n```bibtex\n@article{hu2024drivingworld,\n  title={DrivingWorld: ConstructingWorld Model for Autonomous Driving via Video GPT},\n  author={Hu, Xiaotao and Yin, Wei and Jia, Mingkai and Deng, Junyuan and Guo, Xiaoyang and Zhang, Qian and Long, Xiaoxiao and Tan, Ping},\n  journal={arXiv preprint arXiv:2412.19505},\n  year={2024}\n}\n```\n\n## Reference\nWe thank for [VQGAN](https://github.com/CompVis/taming-transformers), [LlamaGen](https://github.com/FoundationVision/LlamaGen) and [LLlama 3.1](https://github.com/meta-llama/llama3) for their codebase.\n\n\n\n## License\n\nThis repository is under the MIT License. For more license questions, please contact Wei Yin (yvanwy@outlook.com) and Xiaotao Hu (xiaotao.hu@connect.ust.hk).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FYvanYin%2FDrivingWorld","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FYvanYin%2FDrivingWorld","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FYvanYin%2FDrivingWorld/lists"}