{"id":44134446,"url":"https://github.com/thu-ml/Causal-Forcing","last_synced_at":"2026-03-05T20:01:09.325Z","repository":{"id":335823271,"uuid":"1147151321","full_name":"thu-ml/Causal-Forcing","owner":"thu-ml","description":"Official codebase for \"Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation\"","archived":false,"fork":false,"pushed_at":"2026-03-01T07:37:06.000Z","size":14896,"stargazers_count":398,"open_issues_count":8,"forks_count":18,"subscribers_count":7,"default_branch":"main","last_synced_at":"2026-03-01T10:51:50.341Z","etag":null,"topics":["auto-regressive-diffusion-model","autoregressive-models","consistency-models","diffusion","diffusion-models","distillation","few-step-generation","generative-ai","text-to-video","text-to-video-generation","video-diffusion-model","video-generation","wan-video","wan2","wan21","world-model","world-models"],"latest_commit_sha":null,"homepage":"https://thu-ml.github.io/CausalForcing.github.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thu-ml.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-01T09:30:13.000Z","updated_at":"2026-03-01T07:37:10.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/thu-ml/Causal-Forcing","commit_stats":null,"previous_names":["thu-ml/causal-forcing"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/thu-ml/Causal-Forcing","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thu-ml%2FCausal-Forcing","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thu-ml%2FCausal-Forcing/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thu-ml%2FCausal-Forcing/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thu-ml%2FCausal-Forcing/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thu-ml","download_url":"https://codeload.github.com/thu-ml/Causal-Forcing/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thu-ml%2FCausal-Forcing/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30147986,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-05T16:58:46.102Z","status":"ssl_error","status_checked_at":"2026-03-05T16:58:45.706Z","response_time":93,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["auto-regressive-diffusion-model","autoregressive-models","consistency-models","diffusion","diffusion-models","distillation","few-step-generation","generative-ai","text-to-video","text-to-video-generation","video-diffusion-model","video-generation","wan-video","wan2","wan21","world-model","world-models"],"created_at":"2026-02-08T23:00:22.926Z","updated_at":"2026-03-05T20:01:09.318Z","avatar_url":"https://github.com/thu-ml.png","language":"Python","funding_links":[],"categories":["Spatial Control"],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n# Causal Forcing\n### Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation\n\n\u003cp align=\"center\"\u003e\n  \u003cp align=\"center\"\u003e\n    \u003cdiv\u003e\n    \u003ca href=\"https://zhuhz22.github.io/\" target=\"_blank\"\u003eHongzhou Zhu*\u003c/a\u003e\u003csup\u003e\u003c/sup\u003e,\n    \u003ca href=\"https://gracezhao1997.github.io/\" target=\"_blank\"\u003eMin Zhao*\u003c/a\u003e\u003csup\u003e\u003c/sup\u003e , \n    \u003ca href=\"https://guandehe.github.io/\" target=\"_blank\"\u003eGuande He\u003c/a\u003e\u003csup\u003e\u003c/sup\u003e, \n    \u003ca href=\"https://scholar.google.com/citations?user=dxN1_X0AAAAJ\u0026hl=en\" target=\"_blank\"\u003eHang Su\u003c/a\u003e\u003csup\u003e\u003c/sup\u003e,\n    \u003ca href=\"https://zhenxuan00.github.io/\" target=\"_blank\"\u003eChongxuan Li\u003c/a\u003e\u003csup\u003e\u003c/sup\u003e ,\n    \u003ca href=\"https://ml.cs.tsinghua.edu.cn/~jun/index.shtml\" target=\"_blank\"\u003eJun Zhu\u003c/a\u003e\u003csup\u003e\u003c/sup\u003e\n\u003c/div\u003e\n\u003cdiv\u003e\n    \u003csup\u003e\u003c/sup\u003eTsinghua University \u0026 Shengshu \u0026 UT Austin\n\u003c/div\u003e\n\n\n\u003c/div\u003e\n  \u003c/p\u003e\n  \u003ch3 align=\"center\"\u003e\u003ca href=\"https://arxiv.org/abs/2602.02214\"\u003ePaper\u003c/a\u003e | \u003ca href=\"https://thu-ml.github.io/CausalForcing.github.io\"\u003eWebsite\u003c/a\u003e | \u003ca href=\"https://huggingface.co/zhuhz22/Causal-Forcing/tree/main\"\u003eModels\u003c/a\u003e | \u003ca href=\"assets/wechat.jpg\"\u003eWeChat\u003c/a\u003e \u003c/h3\u003e\n\u003c/p\u003e\n\n\n\n-----\n\n\nCausal Forcing significantly outperforms Self Forcing in **both visual quality and motion dynamics**, while keeping **the same training budget and inference efficiency**—enabling real-time, streaming video generation on a single RTX 4090.\n\n\n-----\n\n\n\nhttps://github.com/user-attachments/assets/310f0cfa-e1bb-496d-8941-87f77b3271c0\n\n\n## 🔥 News\n- **2026.2.28** : Add [FAQ section](#faq--blog) regarding hot topics, specifically which is the better Initialization between AR diffusion and causal ODE distillation.\n- **2026.2.11** : We now support **I2V** generation! Feel free to try it [here](#new-i2v)!\n- **2026.2.9** : [Infinity-RoPE](https://github.com/yesiltepe-hidir/infinity-rope) adopts Causal Forcing as one of the base models!\n- **2026.2.8** : [Deep Forcing](https://cvlab-kaist.github.io/DeepForcing/) adopts Causal Forcing as one of the base models!\n- **2026.2.7** : Causal Forcing now supports [Rolling Forcing](https://github.com/TencentARC/RollingForcing), enabling minute-level long video generation!\n- **2026.2.5** : Release causal consistency distillation (Preview) as substitute for ODE distillation, **free of generating ODE paired data**!\n- **2026.2.2** : The [paper](https://arxiv.org/abs/2602.02214), [project page](https://thu-ml.github.io/CausalForcing.github.io/), and code are released.\n\n\n## Quick Start\n\n\u003e The inference environment is identical to Self Forcing, so you can migrate directly using our configs and model.\n\n**NOTE**: Similar to CausVid/Self Forcing, Causal Forcing does not natively support videos longer than 81 frames. As a base training method, it is orthogonal to techniques like Longlive/Rolling Forcing. To use Causal Forcing as a long video baseline, see [this extension](#minute-level-long-video-generation). **Directly using the 5-second trained Causal Forcing model as a baseline for long video generation is extremely unfair**.\n\n\n### Installation\n```bash\nconda create -n causal_forcing python=3.10 -y\nconda activate causal_forcing\npip install -r requirements.txt\npip install git+https://github.com/openai/CLIP.git\npip install flash-attn --no-build-isolation\npython setup.py develop\n```\n### Download Checkpoints\n```bash\nhf download Wan-AI/Wan2.1-T2V-1.3B  --local-dir wan_models/Wan2.1-T2V-1.3B\nhf download Wan-AI/Wan2.1-T2V-14B  --local-dir wan_models/Wan2.1-T2V-14B\nhf download zhuhz22/Causal-Forcing chunkwise/causal_forcing.pt --local-dir checkpoints\nhf download zhuhz22/Causal-Forcing framewise/causal_forcing.pt --local-dir checkpoints\n```\n\n### CLI Inference\n\u003e We open-source both the frame-wise and chunk-wise models; the former is a setting that Self Forcing has chosen not to release.\n\n#### T2V\nFrame-wise model (**higher dynamic degree and more expressive, recommended**):\n```bash\npython inference.py \\\n  --config_path configs/causal_forcing_dmd_framewise.yaml \\\n  --output_folder output/framewise \\\n  --checkpoint_path  checkpoints/framewise/causal_forcing.pt \\\n  --data_path prompts/demos.txt \\\n  --use_ema\n    # Note: this frame-wise config not in Self Forcing; if using its framework, migrate this config too.\n```\n\nChunk-wise model (**more stable**):\n```bash\npython inference.py \\\n  --config_path configs/causal_forcing_dmd_chunkwise.yaml \\\n  --output_folder output/chunkwise \\\n  --checkpoint_path checkpoints/chunkwise/causal_forcing.pt \\\n  --data_path prompts/demos.txt\n```\n\n#### 🔥NEW: I2V\n\u003e Our frame-wise setting natively supports I2V. You simply need to set the first latent initial frame as your conditional image. \n\n```bash\npython inference.py \\\n  --config_path configs/causal_forcing_dmd_framewise.yaml \\\n  --output_folder output/framewise \\\n  --checkpoint_path  checkpoints/framewise/causal_forcing.pt \\\n  --data_path prompts/i2v \\\n  --i2v \\\n  --use_ema\n```\n\n\n### Minute-level Long Video Generation\nBuilt on [Rolling Forcing](https://github.com/TencentARC/RollingForcing), we implemented minute-level long video generation. See [here](./long_video) for the detail.\n\n[Infinity-RoPE](https://github.com/yesiltepe-hidir/infinity-rope) and [Deep Forcing](https://cvlab-kaist.github.io/DeepForcing/) also adopt Causal Forcing as one of their base models, enabling interactive (prompt-switchable) long video generation at the minute scale. You can also try them out at their repos.\n\n## Training\n\u003cimg width=\"4944\" height=\"2154\" alt=\"overview\" src=\"https://github.com/user-attachments/assets/df96fae3-cecc-4915-9a14-d1a5f326074e\" /\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e Stage 1: Autoregressive Diffusion Training (Can skip by using our pretrained checkpoints. Click to expand.)\u003c/summary\u003e\n\nFirst download the dataset (we provide a 6K toy dataset here):\n```bash\nhf download zhuhz22/Causal-Forcing-data  --local-dir dataset\npython utils/merge_and_get_clean.py\n```\n\u003e If the download gets stuck, Ctrl^C and then resume it.\n\n\n\u003e For training on your own dataset, refer to [this issue](https://github.com/thu-ml/Causal-Forcing/issues/8).\n\n\nThen train the AR-diffusion model:\n- Framewise:\n  ```bash\n    torchrun --nnodes=8 --nproc_per_node=8 --rdzv_id=5235 \\\n    --rdzv_backend=c10d \\\n    --rdzv_endpoint $MASTER_ADDR \\\n    train.py \\\n    --config_path configs/ar_diffusion_tf_framewise.yaml \\\n    --logdir logs/ar_diffusion_framewise\n  ```\n\n- Chunkwise:\n  ```bash\n    torchrun --nnodes=8 --nproc_per_node=8 --rdzv_id=5235 \\\n    --rdzv_backend=c10d \\\n    --rdzv_endpoint $MASTER_ADDR \\\n    train.py \\\n    --config_path configs/ar_diffusion_tf_chunkwise.yaml \\\n    --logdir logs/ar_diffusion_chunkwise\n  ```\n\n\u003e We recommend training no less than 2K steps, and more steps (e.g., 5~10K) will lead to better performance.\n\nInference to test training results:\n```bash\npython inference.py \\\n  --config_path configs/ar_diffusion_tf_{framewise OR chunkwise}.yaml \\\n  --output_folder output/{framewise OR chunkwise}_ar_diffusion \\\n  --checkpoint_path  checkpoints/{framewise OR chunkwise}/ar_diffusion.pt \\\n  --data_path prompts/demos.txt\n```\n\u003c/details\u003e\n\n\n\u003cdetails\u003e\n\u003csummary\u003e Stage 2: Causal ODE Initialization (Can skip by using our pretrained checkpoints. Click to expand.)\u003c/summary\u003e\n\nIf you have skipped Stage 1, you need to download the pretrained models:\n```bash\nhf download zhuhz22/Causal-Forcing framewise/ar_diffusion.pt --local-dir checkpoints\nhf download zhuhz22/Causal-Forcing chunkwise/ar_diffusion.pt --local-dir checkpoints\n```\n\nIn this stage, first generate ODE paired data:\n```bash\n# for the frame-wise model\ntorchrun --nproc_per_node=8 \\\n  get_causal_ode_data_framewise.py \\\n  --generator_ckpt checkpoints/framewise/ar_diffusion.pt \\\n  --rawdata_path dataset/clean_data \\\n  --output_folder dataset/ODE6KCausal_framewise_latents\n\npython utils/create_lmdb_iterative.py \\\n  --data_path dataset/ODE6KCausal_framewise_latents \\\n  --lmdb_path dataset/ODE6KCausal_framewise\n\n# for the chunk-wise model\ntorchrun --nproc_per_node=8 \\\n  get_causal_ode_data_chunkwise.py \\\n  --generator_ckpt checkpoints/chunkwise/ar_diffusion.pt \\\n  --rawdata_path dataset/clean_data \\\n  --output_folder dataset/ODE6KCausal_chunkwise_latents\n\npython utils/create_lmdb_iterative.py \\\n  --data_path dataset/ODE6KCausal_chunkwise_latents \\\n  --lmdb_path dataset/ODE6KCausal_chunkwise\n```\n\nOr you can also directly download our prepared dataset (~300G):\n```bash\nhf download zhuhz22/Causal-Forcing-data  --local-dir dataset\npython utils/merge_lmdb.py\n```\n\u003e If the download gets stuck, Ctrl^C and then resume it.\n\n\nAnd then train ODE initialization models:\n- Frame-wise:\n  ```bash\n  torchrun --nnodes=8 --nproc_per_node=8 --rdzv_id=5235 \\\n    --rdzv_backend=c10d \\\n    --rdzv_endpoint $MASTER_ADDR \\\n    train.py \\\n    --config_path configs/causal_ode_framewise.yaml \\\n    --logdir logs/causal_ode_framewise\n  ```\n- Chunk-wise:\n  ```bash\n  torchrun --nnodes=8 --nproc_per_node=8 --rdzv_id=5235 \\\n    --rdzv_backend=c10d \\\n    --rdzv_endpoint $MASTER_ADDR \\\n    train.py \\\n    --config_path configs/causal_ode_chunkwise.yaml \\\n    --logdir logs/causal_ode_chunkwise\n  ```\n\n\u003e We recommend training no less than 1K steps, and more steps (e.g., 5~10K) will lead to better performance.\n\nInference to test training results:\n\nThe same as [here](#cli-inference).\n\u003c/details\u003e\n\n\n\n\u003cdetails\u003e\n\u003csummary\u003e 🔥 NEW: Substitute for Stage 2, without creating ODE paired data: Causal CD (Click to expand.)\u003c/summary\u003e     \n\u003cbr\u003e\nSince creating ODE-paired data is very time-consuming, we also provide an alternative here that achieves the same effect as ODE distillation while requiring only ground-truth data.\n\n**Note:** The current CD is still in an early stage, with many suboptimal implementations in both the algorithm and (especially) infra efficiency. We’ll continue iterating and improving it.\n\n- Frame-wise:\n  ```bash\n  torchrun --nnodes=8 --nproc_per_node=8 --rdzv_id=5235 \\\n    --rdzv_backend=c10d \\\n    --rdzv_endpoint $MASTER_ADDR \\\n    train.py \\\n    --config_path configs/causal_cd_framewise.yaml \\\n    --logdir logs/causal_cd_framewise\n  ```\n- Chunk-wise:\n  ```bash\n  torchrun --nnodes=8 --nproc_per_node=8 --rdzv_id=5235 \\\n    --rdzv_backend=c10d \\\n    --rdzv_endpoint $MASTER_ADDR \\\n    train.py \\\n    --config_path configs/causal_cd_chunkwise.yaml \\\n    --logdir logs/causal_cd_chunkwise\n  ```\n\n\u003e We recommend training no less than 1K steps, and more steps (e.g., 3~5K) will lead to better performance.\n\nInference to test training results:\n\nThe same as [here](#cli-inference).\n\u003c/details\u003e\n\n\n\n### Stage 3: DMD\n\n\u003e This stage is compatible with Self Forcing training, so you can migrate seamlessly by using our configs and checkpoints.\n\n\u003e Set your wandb configs before training.\n\nFirst download the dataset:\n```bash\nhf download gdhe17/Self-Forcing vidprom_filtered_extended.txt --local-dir prompts\n```\nIf you have skipped Stage 2, you need to download the pretrained checkpoints:\n```bash\nhf download zhuhz22/Causal-Forcing framewise/causal_ode.pt --local-dir checkpoints\nhf download zhuhz22/Causal-Forcing chunkwise/causal_ode.pt --local-dir checkpoints\n```\n\nAnd then train DMD models:\n\n- Frame-wise model:\n  ```bash\n  torchrun --nnodes=8 --nproc_per_node=8 --rdzv_id=5235 \\\n    --rdzv_backend=c10d \\\n    --rdzv_endpoint $MASTER_ADDR \\\n    train.py \\\n    --config_path configs/causal_forcing_dmd_framewise.yaml \\\n    --logdir logs/causal_forcing_dmd_framewise\n  ```\n  \u003e We recommend training 500 steps. More than 1K steps will reduce dynamic degree.\n\n\n- Chunk-wise model:\n  ```bash\n  torchrun --nnodes=8 --nproc_per_node=8 --rdzv_id=5235 \\\n    --rdzv_backend=c10d \\\n    --rdzv_endpoint $MASTER_ADDR \\\n    train.py \\\n    --config_path configs/causal_forcing_dmd_chunkwise.yaml \\\n    --logdir logs/causal_forcing_dmd_chunkwise\n  ```\n  \u003e We recommend training 100~200 steps. More than 1K steps will reduce dynamic degree.\n\nSuch models are the final models used to generate videos.\n## FAQ \u0026 Blog \n\n**1. Why using bidirectional teacher in the DMD stage ?**\n- Q: In the DMD stage, do you still use a bidirectional teacher? Why not an AR teacher?\n- A: Yes. DMD only requires the student to match the teacher’s final distribution, not the generation trajectory, so a bidirectional teacher is fine. Also, bidirectional diffusion models are typically stronger than AR diffusion, so they make a better teacher.\n\n- Q: Then why must the ODE (or Consistency Distillation) stage use an AR teacher?\n- A: Because ODE/CD requires the student and teacher to follow the same trajectory, so their structures must be matched; an AR student cannot be trajectory-aligned with a bidirectional teacher.\n\n**2. 🔥🔥 ODE initialization or multi-step AR diffusion initialization ?**\n- Q: Which is better as initialization: a “proper” ODE initialization or directly using multi-step AR diffusion?\n- A: We compared this in the Appendix C2. Overall, proper ODE init is better: multi-step AR diffusion init + DMD occasionally yields grid-like or waxy/greasy results. A key reason is that DMD is inherently few-step, so the right comparison is under few-step; in that regime, a few-step diffusion teacher is much weaker than an ODE-distilled teacher. Without ODE distillation, DMD must both close the step gap and handle an added conditioning gap from self-rollout: early few-step errors corrupt the history and get amplified across frames (large exposure bias), which increases DMD pressure. It can still converge, but typically with worse quality than ODE initialization. Also, with ODE init, DMD can be trained very few steps (e.g., ~100), reducing the risk of dynamics degradation from long DMD training.\n\n**3. Can frame-level non-injectivity appears in the actual training dataset ?**\n- Q: Regarding the “one-to-many” analysis in the ODE stage: since a single frame’s latent has very high dimensionality, isn’t the probability of being exactly identical extremely small?\n- A: Yes, but the key point here is not whether the dataset literally contains identical samples; it’s whether there exists a well-defined function in the mathematical sense. Our vision modalities live in a continuous space—even in 1D, getting two samples to be exactly identical is extremely unlikely. However, the theoretical existence of exact collisions is enough to break the function property and make it ill-defined.\n\nFor more details, see [here](https://zhuanlan.zhihu.com/p/2002114039493461457). (currently in Chinese)\n\n## Acknowledgements\nThis codebase is built on top of the open-source implementation of [CausVid](https://github.com/tianweiy/CausVid), [Self Forcing](https://github.com/guandeh17/Self-Forcing), [Rolling Forcing](https://github.com/TencentARC/RollingForcing) and the [Wan2.1](https://github.com/Wan-Video/Wan2.1) repo.\n\n## References\nIf you find the method useful, please cite\n```\n@article{zhu2026causal,\n  title={Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation},\n  author={Zhu, Hongzhou and Zhao, Min and He, Guande and Su, Hang and Li, Chongxuan and Zhu, Jun},\n  journal={arXiv preprint arXiv:2602.02214},\n  year={2026}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthu-ml%2FCausal-Forcing","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthu-ml%2FCausal-Forcing","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthu-ml%2FCausal-Forcing/lists"}