{"id":42709199,"url":"https://github.com/baaivision/ursa","last_synced_at":"2026-01-29T15:06:00.232Z","repository":{"id":321366178,"uuid":"1077927679","full_name":"baaivision/URSA","owner":"baaivision","description":"🐻 Uniform Discrete Diffusion with Metric Path for Video Generation","archived":false,"fork":false,"pushed_at":"2026-01-15T06:26:52.000Z","size":10109,"stargazers_count":89,"open_issues_count":1,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-01-15T12:46:13.060Z","etag":null,"topics":["diffusion-forcing","discrete-diffusion","image-generation","video-generation"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/baaivision.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-17T00:31:47.000Z","updated_at":"2026-01-15T06:26:56.000Z","dependencies_parsed_at":null,"dependency_job_id":"0557f651-87ca-4629-9123-278388ca6e6a","html_url":"https://github.com/baaivision/URSA","commit_stats":null,"previous_names":["baaivision/ursa"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/baaivision/URSA","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/baaivision%2FURSA","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/baaivision%2FURSA/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/baaivision%2FURSA/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/baaivision%2FURSA/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/baaivision","download_url":"https://codeload.github.com/baaivision/URSA/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/baaivision%2FURSA/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28880017,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-29T10:31:27.438Z","status":"ssl_error","status_checked_at":"2026-01-29T10:31:01.017Z","response_time":59,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["diffusion-forcing","discrete-diffusion","image-generation","video-generation"],"created_at":"2026-01-29T15:05:26.154Z","updated_at":"2026-01-29T15:06:00.222Z","avatar_url":"https://github.com/baaivision.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n\u003cimg src=\"assets/logo.png\" width=\"30%\" alt=\"logo\"/\u003e\n\n\u003ch1\u003e🐻 URSA: Uniform Discrete Diffusion with Metric Path\u003cbr\u003efor Video Generation\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n\u003ca href=\"https://arxiv.org/abs/2510.24717\"\u003e\u003cimg src=\"https://img.shields.io/badge/ArXiv-2510.24717-%23840707.svg\" alt=\"ArXiv\"\u003e\u003c/a\u003e\n\u003ca href=\"https://huggingface.co/collections/BAAI/ursa\"\u003e\u003cimg src=\"https://img.shields.io/badge/🤗 Weights-BAAI/URSA-rgb(166,109,59).svg\" alt=\"\"\u003e\u003c/a\u003e\n\u003ca href=\"https://huggingface.co/spaces/BAAI/nova-d48w1024-osp480\"\u003e\u003cimg src=\"https://img.shields.io/badge/🤗 Demo-TI2V-%26840707.svg\" alt=\"TI2VDemo\"\u003e\u003c/a\u003e\n\u003ca href=\"http://bitterdhg.github.io/URSA_page\"\u003e\u003cimg src=\"https://img.shields.io/badge/Project-URSA-%237CB4F7.svg\" alt=\"Project\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n\n[Haoge Deng](https://scholar.google.com/citations?user=S2sbvjgAAAAJ\u0026hl)\u003csup\u003e1,4*\u003c/sup\u003e, [Ting Pan](https://scholar.google.com/citations?\u0026user=qQv6YbsAAAAJ)\u003csup\u003e2,4*\u003c/sup\u003e, [Fan Zhang](https://scholar.google.com/citations?user=VsJ39HMAAAAJ)\u003csup\u003e4*\u003c/sup\u003e, [Yang Liu](https://scholar.google.com/citations?user=9JcQ2hwAAAAJ\u0026hl)\u003csup\u003e3,4*\u003c/sup\u003e, [Zhuoyan Luo](https://scholar.google.com/citations?user=mKQhEsIAAAAJ\u0026hl)\u003csup\u003e4\u003c/sup\u003e, [Yufeng Cui](https://scholar.google.com/citations?user=5Ydha2EAAAAJ\u0026hl)\u003csup\u003e4\u003c/sup\u003e, [Wenxuan Wang](https://scholar.google.com/citations?user=75OyC-oAAAAJ\u0026hl)\u003csup\u003e4\u003c/sup\u003e\u003cbr\u003e\n[Chunhua Shen](https://scholar.google.com/citations?user=Ljk2BvIAAAAJ\u0026hl)\u003csup\u003e3\u003c/sup\u003e, [Shiguang Shan](https://scholar.google.com/citations?user=Vkzd7MIAAAAJ\u0026hl)\u003csup\u003e2\u003c/sup\u003e, [Zhaoxiang Zhang](https://scholar.google.com/citations?user=qxWfV6cAAAAJ\u0026hl)\u003csup\u003e1†\u003c/sup\u003e, [Xinlong Wang](https://scholar.google.com/citations?user=DPz0DjYAAAAJ\u0026hl)\u003csup\u003e4†\u003c/sup\u003e\u003cbr\u003e\n\n[CASIA](http://english.ia.cas.cn)\u003csup\u003e1\u003c/sup\u003e, [CASICT](http://english.ict.cas.cn)\u003csup\u003e2\u003c/sup\u003e, [ZJU](https://www.zju.edu.cn/english)\u003csup\u003e3\u003c/sup\u003e, [BAAI](https://www.baai.ac.cn/en)\u003csup\u003e4\u003c/sup\u003e\u003cbr\u003e\n\u003csup\u003e*\u003c/sup\u003e Equal Contribution, \u003csup\u003e†\u003c/sup\u003e Corresponding Author\n\u003cbr\u003e\u003cbr\u003e\u003cimage src=\"assets/model_preview.gif\"/\u003e\n\u003cbr\u003e\u003cbr\u003e\u003cimage src=\"assets/model_overview.png\"/\u003e\n\u003c/div\u003e\n\nWe present **URSA** (**U**niform disc**R**ete diffu**S**ion with metric p**A**th), a simple yet powerful framework that bridges the gap with continuous approaches. **URSA** formulates the video generation task as an iterative global refinement of discrete spatiotemporal tokens and scales efficiently to long video generation, requiring fewer inference steps. **URSA** enables multi-task video generation with asynchronous timestep scheduling strategy in one unified model.\n\n## 🚀 News\n- ```[Jan 2026]``` Released [Training Guide](./docs/training.md).\n- ```[Oct 2025]``` 🎉 URSA is part of [Emu3.5](https://github.com/baaivision/Emu3.5) as DiDA (Discrete Diffusion Adaptation)!\n- ```[Oct 2025]``` Released \u003ca href=\"https://huggingface.co/spaces/BAAI/nova-d48w1024-osp480\"\u003e\u003cb\u003eTI2V\u003c/b\u003e\u003c/a\u003e 🤗 Demo.\n- ```[Oct 2025]``` Released [Paper](https://arxiv.org/abs/2510.24717) \u0026 [Project Page](http://bitterdhg.github.io/URSA_page) \u0026 [Evaluation Guide](./docs/evaluation.md).\n\n## ✨Hightlights\n\n- 🥇 **Novel Approach**: Uniform Discrete Diffusion with Metric Path.\n- 🥈 **SOTA Performance**: High efficiency with state-of-the-art T2I/T2V/I2V results.\n- 🥉 **Unified Modeling**: Multi-task capabilities in a single unified model.\n\n## 🗄️ Models\n\n### 🖼️ Text to Image\n\n| Model | Resolution | Data | Weight | GenEval | DPGBench |\n|:-----:|:----------:|:----:|:------:|:-------:|:--------:|\n| URSA-0.6B-IBQ1024 | 1024x1024 | 30M | [🤗 HF](https://huggingface.co/BAAI/URSA-0.6B-IBQ1024) \\| [🤖 ModelScope](https://www.modelscope.cn/models/BAAI/URSA-0.6B-IBQ1024) | 0.79 | 85.6 |\n| URSA-1.7B-IBQ1024 | 1024x1024 | 30M | [🤗 HF](https://huggingface.co/BAAI/URSA-1.7B-IBQ1024) \\| [🤖 ModelScope](https://www.modelscope.cn/models/BAAI/URSA-1.7B-IBQ1024) | 0.80 | 86.0 |\n\n### 🎬 Text to Video\n\n| Model | Resolution | Data | Weight | VBench-T2V | VBench-I2V |\n|:-----:|:----------:|:----:|:------:|:----------:|:----------:|\n| URSA-0.6B-FSQ320 | 49x512x320 | 24M | [🤗 HF](https://huggingface.co/BAAI/URSA-0.6B-FSQ320) \\| [🤖 ModelScope](https://www.modelscope.cn/models/BAAI/URSA-0.6B-FSQ320) | 81.4 | 86.0 |\n| URSA-1.7B-FSQ320 | 49x512x320 | 24M | [🤗 HF](https://huggingface.co/BAAI/URSA-1.7B-FSQ320) \\| [🤖 ModelScope](https://www.modelscope.cn/models/BAAI/URSA-1.7B-FSQ320) | 82.4 | 86.2 |\n\n## 📖 Table of Contents\n- [🔧 Installation](#installation)\n- [🔥 Quick Start](#quick-start)\n  - [🖼️ Image Generation](#quickstart-image-generation)\n  - [🎬 Video Generation](#quickstart-video-generation)\n- [💻 Gradio Demo](#gradio-demo)\n- [💯 Evaluation](./docs/evaluation.md)\n- [🤖 Training](./docs/training.md)\n\n## 🔧 Installation\n\u003ca id=\"installation\"\u003e\u003c/a\u003e\n\nClone this repository to local disk and install:\n```bash\npip install diffusers transformers\u003e=4.57.1 accelerate imageio imageio-ffmpeg omegaconf wandb\ngit clone https://github.com/baaivision/URSA.git\ncd URSA \u0026\u0026 pip install .\n```\n\n## 🔥 Quick Start\n\u003ca id=\"quick-start\"\u003e\u003c/a\u003e\n\n### 🖼️ Image Generation\n\u003ca id=\"quickstart-image-generation\"\u003e\u003c/a\u003e\n\n```python\nimport torch\nfrom diffnext.pipelines import URSAPipeline\n\nmodel_id, height, width = \"BAAI/URSA-1.7B-IBQ1024\", 1024, 1024\nmodel_args = {\"torch_dtype\": torch.float16, \"trust_remote_code\": True}\npipe = URSAPipeline.from_pretrained(model_id, **model_args)\npipe = pipe.to(torch.device(\"cuda\"))\n\nprompt = \"The bear, calm and still, gazes upward as if lost in contemplation of the cosmos.\"\nnegative_prompt = \"worst quality, low quality, inconsistent motion, static, still, blurry, jittery, distorted, ugly\"\n\nimage = pipe(**locals()).frames[0]\nimage.save(\"ursa.jpg\")\n```\n\n### 🎬 Video Generation\n\u003ca id=\"quickstart-video-generation\"\u003e\u003c/a\u003e\n\n```python\nimport os, torch, numpy\nfrom diffnext.pipelines import URSAPipeline\nfrom diffnext.utils import export_to_video\nos.environ[\"PYTORCH_CUDA_ALLOC_CONF\"] = \"expandable_segments:True\"\n\nmodel_id, height, width = \"BAAI/URSA-1.7B-FSQ320\", 320, 512\nmodel_args = {\"torch_dtype\": torch.float16, \"trust_remote_code\": True}\npipe = URSAPipeline.from_pretrained(model_id, **model_args)\npipe = pipe.to(torch.device(\"cuda\"))\n\ntext_prompt = \"a lone grizzly bear walks through a misty forest at dawn, sunlight catching its fur.\"\nnegative_prompt = \"worst quality, low quality, inconsistent motion, static, still, blurry, jittery, distorted, ugly\"\n\n# Text-to-Image\nprompt = text_prompt\nnum_frames, num_inference_steps = 1, 25\nimage = pipe(**locals()).frames[0]\nimage.save(\"ursa.jpg\")\n\n# Image-to-Video\nprompt = f\"motion=9.0, {text_prompt}\"\nnum_frames, num_inference_steps = 49, 50\nvideo = pipe(**locals()).frames[0]\nexport_to_video(video, \"ursa_1+48f.mp4\", fps=12)\n\n# Text-to-Video\nimage, video = None, None\nprompt = f\"motion=9.0, {text_prompt}\"\nnum_frames, num_inference_steps = 49, 50\nvideo = pipe(**locals()).frames[0]\nexport_to_video(video, \"ursa_49f.mp4\", fps=12)\n\n# Video-to-Video\nprompt = f\"motion=5.0, {text_prompt}\"\nnum_frames, num_inference_steps = 49, 50\nnum_cond_frames, cond_noise_scale = 13, 0.1\nfor i in range(12):\n    video, start_video = video[-num_cond_frames:], video\n    video = pipe(**locals()).frames[0]\n    video = numpy.concatenate([start_video, video[num_cond_frames:]])\n    export_to_video(video, \"ursa_{}f.mp4\".format(video.shape[0]), fps=12)\n```\n\n## 💻 Gradio Demo\n\u003ca id=\"gradio-demo\"\u003e\u003c/a\u003e\n\n```bash\n# Text-to-Image (T2I)\npython scripts/app_ursa_t2i.py --model \"BAAI/URSA-1.7B-IBQ1024\" --device 0\n\n# Text-to-Image-to-Video (TI2V)\npython scripts/app_ursa_ti2v.py --model \"BAAI/URSA-1.7B-FSQ320\" --device 0\n```\n\n## 📋 Todo List\n- [X] [Model Zoo](#model-zoo)\n- [X] [Quick Start](#quick-start)\n- [X] [Gradio Demo](#gradio-demo)\n- [X] [Evaluation Guide](./docs/evaluation.md)\n- [X] [Training Guide](./docs/training.md)\n- [ ] 4B Model\n\n## 📖 Citation\nIf you find this repository useful, please consider giving a star ⭐ and citation 🦖:\n```\n@article{deng2025ursa,\n  title={Uniform Discrete Diffusion with Metric Path for Video Generation},\n  author={Deng, Haoge and Pan, Ting and Zhang, Fan and Liu, Yang and Luo, Zhuoyan and Cui, Yufeng and Shen, Chunhua and Shan, Shiguang and Zhang, Zhaoxiang and Wang, Xinlong},\n  journal={arXiv preprint arXiv:2510.24717},\n  year={2025}\n}\n```\n```\n@article{deng2024nova,\n  title={Autoregressive Video Generation without Vector Quantization},\n  author={Deng, Haoge and Pan, Ting and Diao, Haiwen and Luo, Zhengxiong and Cui, Yufeng and Lu, Huchuan and Shan, Shiguang and Qi, Yonggang and Wang, Xinlong},\n  journal={arXiv preprint arXiv:2412.14169},\n  year={2024}\n}\n```\n\n## 🤗 Acknowledgement\n\nWe thank the repositories: \n- [NOVA](https://github.com/baaivision/NOVA). ✨NOVA is the predecessor of 🐻URSA.\n- [FlowMatching](https://github.com/facebookresearch/flow_matching). This codebase systemically provides CFM and DFM implementations.\n- [FUDOKI](https://github.com/fudoki-hku/FUDOKI). This codebase provides a naive multimodal DFM implementation.\n- [CodeWithGPU](https://github.com/seetacloud/codewithgpu). CodeWithGPU library is the core of our data loading pipeline.\n\n## License\nCode and models are licensed under [Apache License 2.0](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbaaivision%2Fursa","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbaaivision%2Fursa","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbaaivision%2Fursa/lists"}