https://github.com/minnie-lin/Awesome-Physics-Cognition-based-Video-Generation

A comprehensive list of papers investigating physical cognition in video generation, including papers, codes, and related websites.
https://github.com/minnie-lin/Awesome-Physics-Cognition-based-Video-Generation
List: Awesome-Physics-Cognition-based-Video-Generation
3d 4d embodied-ai physics survey t2v v2v video-generation world-models world-simulator
Last synced: about 1 month ago
JSON representation
A comprehensive list of papers investigating physical cognition in video generation, including papers, codes, and related websites.
Host: GitHub
URL: https://github.com/minnie-lin/Awesome-Physics-Cognition-based-Video-Generation
Owner: minnie-lin
Created: 2025-03-25T08:34:21.000Z (2 months ago)
Default Branch: main
Last Pushed: 2025-04-15T03:14:02.000Z (about 1 month ago)
Last Synced: 2025-04-15T04:22:02.048Z (about 1 month ago)
Topics: 3d, 4d, embodied-ai, physics, survey, t2v, v2v, video-generation, world-models, world-simulator
Homepage: https://arxiv.org/abs/2503.21765
Size: 239 KB
Stars: 58
Watchers: 3
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

World-Simulator - Awesome-Physics-Cognition-based-Video-Generation
ultimate-awesome - Awesome-Physics-Cognition-based-Video-Generation - A comprehensive list of papers investigating physical cognition in video generation, including papers, codes, and related websites. (Other Lists / Julia Lists)
README

        # Awesome-Physics-Cognition-based-Video-Generation

🚀🚀🚀 This is a repository for organizing papers, codes, and other resources related to physics cognition-based video generation. This repo is being actively updated, please stay tuned!

For more detailed information, please refer to our survey paper: [Exploring the Evolution of Physics Cognition in Video Generation: A Survey](https://arxiv.org/abs/2503.21765).

This paper list covers **T2V, V2V, and dynamic 3D and 4D generation based on physics**, and includes some **world models and world simulators**.

If you find this repository useful, please consider giving us a star 🌟 and a [cite](https://arxiv.org/abs/2503.21765).

## ⚡ Contributing

We welcome feedback, suggestions, and contributions that can help improve this survey and repository and make them valuable resources for the entire community.

We will actively maintain this repository by incorporating new research as it emerges. If you have any suggestions about our taxonomy, please take a look at any missed papers, or update any preprint arXiv paper that has been accepted to some venue.

If you want to add your work or model to this list, please do not hesitate to pull requests.

Markdown format:

```markdown

 * Paper Name. [[Paper]](link) [[Code]](link) [[Website]](link) [**Name of Conference or Journal + Year**]

```

## 📖 Table of Contents

- [⚡ Contributing](#-contributing)

- [📖 Table of Contents](#-table-of-contents)

  - [Surveys](#surveys)

  - [Basic Schematic Perception for Generation](#basic-schematic-perception-for-generation)

  - [Passive Cognition of Physical Knowledge for Generation](#passive-cognition-of-physical-knowledge-for-generation)

  - [Active Cognition for World Simulation](#active-cognition-for-world-simulation)

  - [Benchmarks and Metrics](#benchmarks-and-metrics)

- [♥️ Star History](#️-star-history)

- [📑 Citation](#-citation)

### Surveys

|                                                                                                                                 Title                                                                                                                                 |                                         arXiv                                         |                                                                                   Github                                                                                   |                                                    WebSite                                                    |         Pub. & Date         |

| :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------: | :--------------------------: |

|[Digital Gene: Learning about the Physical World through Analytic Concepts](https://arxiv.org/abs/2504.04170)|[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2504.04170)|-|-|Apr., 2025 |

|[Simulating the Real World: A Unified Survey of Multimodal Generative Models](https://arxiv.org/abs/2503.04641)|[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2503.04641)|[![Star](https://img.shields.io/github/stars/ALEEEHU/World-Simulator.svg?style=social&label=Star)](https://github.com/ALEEEHU/World-Simulator)|-|Mar., 2025|

|[Grounding Creativity in Physics: A Brief Survey of Physical Priors in AIGC](https://arxiv.org/abs/2502.07007)|[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2502.07007)|-|-|Feb., 2025|

|[Generative Physical AI in Vision: A Survey](https://arxiv.org/abs/2501.10928)|[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2501.10928)|[![Star](https://img.shields.io/github/stars/BestJunYu/Awesome-Physics-aware-Generation.svg?style=social&label=Star)](https://github.com/BestJunYu/Awesome-Physics-aware-Generation)|-|Jan., 2025|

|[Physics-Informed Computer Vision: A Review and Perspectives](https://dl.acm.org/doi/full/10.1145/3689037)|[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2305.18035)|-|-|ACM Computing Surveys, 2024|

### Basic Schematic Perception for Generation

|                                                                                                                                 Title                                                                                                                                 |                                         arXiv                                         |                                                                                   Github                                                                                   |                                                    WebSite                                                    |         Pub. & Date         |

| :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------: | :--------------------------: |

|[Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation](https://arxiv.org/abs/2503.24379)|[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2503.24379)|[![Star](https://img.shields.io/github/stars/ChocoWu/Any2Caption.svg?style=social&label=Star)](https://github.com/ChocoWu/Any2Caption)|[![Website](https://img.shields.io/badge/Website-9cf)](https://sqwu.top/Any2Cap/)|Mar., 2025|

| [Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach](https://arxiv.org/abs/2502.03639) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2502.03639) | - | [![Website](https://img.shields.io/badge/Website-9cf)](https://snap-research.github.io/PointVidGen/) | Feb, 2025 |

| [SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation](https://arxiv.org/abs/2411.04989) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2411.04989) | [![Star](https://img.shields.io/github/stars/Kmcode1/SG-I2V.svg?style=social&label=Star)](https://github.com/Kmcode1/SG-I2V) | [![Website](https://img.shields.io/badge/Website-9cf)](https://kmcode1.github.io/Projects/SG-I2V/) | ICLR, 2025 |

| [TrackGo: A Flexible and Efficient Method for Controllable Video Generation](https://arxiv.org/abs/2408.11475) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2408.11475) | - | - | AAAI, 2025 |

| [3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation](https://arxiv.org/abs/2403.17920) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2403.17920) | [![Star](https://img.shields.io/github/stars/KwaiVGI/3DTrajMaster.svg?style=social&label=Star)](https://github.com/KwaiVGI/3DTrajMaster) | [![Website](https://img.shields.io/badge/Website-9cf)](https://fuxiao0719.github.io/projects/3dtrajmaster/) | ICLR, 2025 |

|[Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis](https://arxiv.org/abs/2412.02168)|[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2412.02168)|[![Star](https://img.shields.io/github/stars/pandayuanyu/generative-photography.svg?style=social&label=Star)](https://github.com/pandayuanyu/generative-photography)|[![Website](https://img.shields.io/badge/Website-9cf)](https://generative-photography.github.io/project/)| CVPR, 2025|

| [Lux Post Facto: Learning Portrait Performance Relighting with Conditional Video Diffusion and a Hybrid Dataset](https://arxiv.org/abs/2503.14485) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2503.14485) |  | [![Website](https://img.shields.io/badge/Website-9cf)](https://www.eyelinestudios.com/research/luxpostfacto.html) | CVPR,2025 |

| [Identity-Preserving Text-to-Video Generation by Frequency Decomposition](https://arxiv.org/abs/2411.17440) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2411.17440) | [![Star](https://img.shields.io/github/stars/PKU-YuanGroup/ConsisID.svg?style=social&label=Star)](https://github.com/PKU-YuanGroup/ConsisID) | [![Website](https://img.shields.io/badge/Website-9cf)](https://pku-yuangroup.github.io/ConsisID/) | CVPR, 2025 |

| [Motion Modes: What Could Happen Next?](https://arxiv.org/abs/2412.00148) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2412.00148) | - | [![Website](https://img.shields.io/badge/Website-9cf)](https://motionmodes.github.io/) | CVPR, 2025 |

| [Spectral Motion Alignment for Video Motion Transfer using Diffusion Models](https://arxiv.org/abs/2403.15249) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2403.15249) | [![Star](https://img.shields.io/github/stars/geonyeong-park/Spectral-Motion-Alignment.svg?style=social&label=Star)](https://github.com/geonyeong-park/Spectral-Motion-Alignment) | [![Website](https://img.shields.io/badge/Website-9cf)](https://geonyeong-park.github.io/spectral-motion-alignment/) | AAAI, 2025 |

| [Video Creation by Demonstration](https://arxiv.org/abs/2412.09551) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2412.09551) | - | [![Website](https://img.shields.io/badge/Website-9cf)](https://delta-diffusion.github.io/) | Dec., 2024 |

| [InterDyn: Controllable Interactive Dynamics with Video Diffusion Models](https://arxiv.org/abs/2412.11785) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2412.11785) | - | [![Website](https://img.shields.io/badge/Website-9cf)](https://interdyn.is.tue.mpg.de/) | Dec., 2024 |

| [Motion Prompting: Controlling Video Generation with Motion Trajectories](https://arxiv.org/abs/2412.02700) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2412.02700) | - | [![Website](https://img.shields.io/badge/Website-9cf)](https://motion-prompting.github.io/) | Dec., 2024 |

| [LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis](https://arxiv.org/abs/2412.15214) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2412.15214) | [![Star](https://img.shields.io/github/stars/qiuyu96/LeviTor.svg?style=social&label=Star)](https://github.com/qiuyu96/LeviTor) | [![Website](https://img.shields.io/badge/Website-9cf)](https://ppetrichor.github.io/levitor.github.io/) | Dec., 2024 |

| [GenLit: Reformulating Single-Image Relighting as Video Generation](https://arxiv.org/abs/2412.11224) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2412.11224) | - | [![Website](https://img.shields.io/badge/Website-9cf)](https://genlit-probingi2v.github.io/) | Dec., 2024 |

| [Motion Dreamer: Realizing Physically Coherent Video Generation through Scene-Aware Motion Reasoning ](https://arxiv.org/abs/2412.00547) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2412.00547) | [![Star](https://img.shields.io/github/stars/EnVision-Research/MotionDreamer.svg?style=social&label=Star)](https://github.com/EnVision-Research/MotionDreamer) | [![Website](https://img.shields.io/badge/Website-9cf)](https://envision-research.github.io/MotionDreamer/) | Nov., 2024 |

| [AnimateAnything: Consistent and Controllable Animation for Video Generation](https://arxiv.org/abs/2411.10836) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2411.10836) | [![Star](https://img.shields.io/github/stars/yu-shaonian/AnimateAnything.svg?style=social&label=Star)](https://github.com/yu-shaonian/AnimateAnything) | [![Website](https://img.shields.io/badge/Website-9cf)](https://yu-shaonian.github.io/Animate_Anything/) | Nov., 2024 |

| [InTraGen: Trajectory-controlled Video Generation for Object Interactions](https://arxiv.org/abs/2411.16804) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2411.16804) | [![Star](https://img.shields.io/github/stars/insait-institute/InTraGen.svg?style=social&label=Star)](https://github.com/insait-institute/InTraGen) | - | Nov., 2024 |

| [DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control](https://arxiv.org/abs/2410.13830) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2410.13830) | - | [![Website](https://img.shields.io/badge/Website-9cf)](https://dreamvideo2.github.io/) | Oct., 2024 |

| [LumiSculpt: A Consistency Lighting Control Network for Video Generation](https://arxiv.org/abs/2410.22979) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2410.22979) | - | - | Oct., 2024 |

| [Tora: Trajectory-oriented Diffusion Transformer for Video Generation](https://arxiv.org/abs/2407.21705) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2407.21705) | [![Star](https://img.shields.io/github/stars/alibaba/Tora.svg?style=social&label=Star)](https://github.com/alibaba/Tora) | [![Website](https://img.shields.io/badge/Website-9cf)](https://ali-videoai.github.io/tora_video/) | Jul., 2024 |

| [UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation](https://arxiv.org/abs/2406.01188) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.01188) | [![Star](https://img.shields.io/github/stars/ali-vilab/UniAnimate.svg?style=social&label=Star)](https://github.com/ali-vilab/UniAnimate) | [![Website](https://img.shields.io/badge/Website-9cf)](https://unianimate.github.io/) | Jun., 2024 |

| [Image Conductor: Precision Control for Interactive Video Synthesis](https://arxiv.org/abs/2406.15339) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.15339) | [![Star](https://img.shields.io/github/stars/liyaowei-stu/ImageConductor.svg?style=social&label=Star)](https://github.com/liyaowei-stu/ImageConductor) | [![Website](https://img.shields.io/badge/Website-9cf)](https://liyaowei-stu.github.io/project/ImageConductor/) | Jun., 2024 |

| [Motion Inversion for Video Customization](https://arxiv.org/abs/2403.20193) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2403.20193) | [![Star](https://img.shields.io/github/stars/EnVision-Research/MotionInversion.svg?style=social&label=Star)](https://github.com/EnVision-Research/MotionInversion) | [![Website](https://img.shields.io/badge/Website-9cf)](https://wileewang.github.io/MotionInversion/) | Mar., 2024 |

| [VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models](https://openaccess.thecvf.com/content/CVPR2024/html/Jeong_VMC_Video_Motion_Customization_using_Temporal_Attention_Adaption_for_Text-to-Video_CVPR_2024_paper.html) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2312.00845) |     [![Star](https://img.shields.io/github/stars/HyeonHo99/Video-Motion-Customization.svg?style=social&label=Star)](https://github.com/HyeonHo99/Video-Motion-Customization)     |        [![Website](https://img.shields.io/badge/Website-9cf)](https://video-motion-customization.github.io/)        |          CVPR, 2024          |

|                                                                 [MotionDirector: Motion Customization of Text-to-Video Diffusion Models](https://link.springer.com/chapter/10.1007/978-3-031-72992-8_16)                                                                 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2310.08465) |                   [![Star](https://img.shields.io/github/stars/showlab/MotionDirector.svg?style=social&label=Star)](https://github.com/showlab/MotionDirector)                   |          [![Website](https://img.shields.io/badge/Website-9cf)](https://showlab.github.io/MotionDirector/)          |       ECCV, 2024, Oral       |

|             [FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis](https://openaccess.thecvf.com/content/CVPR2024/html/Liang_FlowVid_Taming_Imperfect_Optical_Flows_for_Consistent_Video-to-Video_Synthesis_CVPR_2024_paper.html)             | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2312.17681) |                                                                                      -                                                                                      |       [![Website](https://img.shields.io/badge/Website-9cf)](https://jeff-liangf.github.io/projects/flowvid/)       |    CVPR, 2024, Highlight    |

| [Generative Image Dynamics](https://openaccess.thecvf.com/content/CVPR2024/html/Li_Generative_Image_Dynamics_CVPR_2024_paper.html) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2309.07906) | [![Star](https://img.shields.io/github/stars/fltwr/generative-image-dynamics.svg?style=social&label=Star)](https://github.com/fltwr/generative-image-dynamics) | [![Website](https://img.shields.io/badge/Website-9cf)](https://generative-dynamics.github.io/) | CVPR, 2024, Best Paper Award |

|                               [MotionCtrl: A Unified and Flexible Motion Controller for Video Generation](https://dl.acm.org/doi/10.1145/3641519.3657518)                               | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2312.03641) |           [![Star](https://img.shields.io/github/stars/TencentARC/MotionCtrl.svg?style=social&label=Star)](https://github.com/TencentARC/MotionCtrl)           |  [![Website](https://img.shields.io/badge/Website-9cf)](https://wzhouxiff.github.io/projects/MotionCtrl/)  | SIGGRAPH, 2024 |

| [MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model](https://link.springer.com/chapter/10.1007/978-3-031-72655-2_7) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2405.20222) |              [![Star](https://img.shields.io/github/stars/MyNiuuu/MOFA-Video.svg?style=social&label=Star)](https://github.com/MyNiuuu/MOFA-Video)              |       [![Website](https://img.shields.io/badge/Website-9cf)](https://myniuuu.github.io/MOFA_Video/)       |   ECCV, 2024   |

|                    [Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling](https://dl.acm.org/doi/10.1145/3641519.3657497)                    | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2401.15977) |        [![Star](https://img.shields.io/github/stars/EnVision-Research/MotionDreamer.svg?style=social&label=Star)](https://github.com/G-U-N/Motion-I2V)        |     [![Website](https://img.shields.io/badge/Website-9cf)](https://xiaoyushi97.github.io/Motion-I2V/)     | SIGGRAPH, 2024 |

|                         [DragAnything: Motion Control for Anything using Entity Representation](https://link.springer.com/chapter/10.1007/978-3-031-72670-5_19)                         | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2403.07420) |            [![Star](https://img.shields.io/github/stars/showlab/DragAnything.svg?style=social&label=Star)](https://github.com/showlab/DragAnything)            |   [![Website](https://img.shields.io/badge/Website-9cf)](https://weijiawu.github.io/draganything_page/)   |   ECCV, 2024   |

|                                [FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing](https://openreview.net/forum?id=JgqftqZQZ7)                                | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2310.05922) |                  [![Star](https://img.shields.io/github/stars/yrcong/flatten.svg?style=social&label=Star)](https://github.com/yrcong/flatten)                  |      [![Website](https://img.shields.io/badge/Website-9cf)](https://flatten-video-editing.github.io/)      |   ICLR, 2024   |

| [Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation](https://arxiv.org/abs/2311.17117) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2311.17117) | [![Star](https://img.shields.io/github/stars/HumanAIGC/AnimateAnyone.svg?style=social&label=Star)](https://github.com/HumanAIGC/AnimateAnyone) | [![Website](https://img.shields.io/badge/Website-9cf)](https://humanaigc.github.io/animate-anyone/) | CVPR, 2024 |

| [Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion](https://dl.acm.org/doi/10.1145/3641519.3657481) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2402.03162) |     [![Star](https://img.shields.io/github/stars/ysy31415/direct_a_video.svg?style=social&label=Star)](https://github.com/ysy31415/direct_a_video)     |           [![Website](https://img.shields.io/badge/Website-9cf)](https://direct-a-video.github.io/)           | SIGGRAPH, 2024 |

| [Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling](https://arxiv.org/abs/2401.15977) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2401.15977) | - | [![Website](https://img.shields.io/badge/Website-9cf)](https://direct-a-video.github.io/) | SIGGRAPH, 2024 |

| [Compositional 3D-aware Video Generation with LLM Director](https://openreview.net/forum?id=oqdy2EFrja&referrer=%5Bthe%20profile%20of%20Zhibo%20Chen%5D(%2Fprofile%3Fid%3D~Zhibo_Chen1)) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2409.00558) |                                                                  -                                                                  | [![Website](https://img.shields.io/badge/Website-9cf)](https://www.microsoft.com/en-us/research/project/compositional-3d-aware-video-generation/) | NIPS, 2024 |

|                                   [TC4D: Trajectory-Conditioned Text-to-4D Generation](https://link.springer.com/chapter/10.1007/978-3-031-72952-2_4)                                   | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2403.17920) |  [![Star](https://img.shields.io/github/stars/sherwinbahmani/tc4d.svg?style=social&label=Star)](https://github.com/sherwinbahmani/tc4d)  |                          [![Website](https://img.shields.io/badge/Website-9cf)](https://sherwinbahmani.github.io/tc4d/)                          | ECCV, 2024 |

| [DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory]() | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2308.08089) | - |[![Website](https://img.shields.io/badge/Website-9cf)](https://www.microsoft.com/en-us/research/project/dragnuwa/)| Aug., 2023 |

| [Control-A-Video: Controllable Text-to-Video Diffusion Models with Motion Prior and Reward Feedback Learnin](https://arxiv.org/abs/2305.13840) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2305.13840) | [![Star](https://img.shields.io/github/stars/Weifeng-Chen/control-a-video.svg?style=social&label=Star)](https://github.com/Weifeng-Chen/control-a-video) | [![Website](https://img.shields.io/badge/Website-9cf)](https://controlavideo.github.io/) | May., 2023 |

| [VideoComposer: Compositional Video Synthesis with Motion Controllability](https://arxiv.org/abs/2306.02018) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2306.02018) | [![Star](https://img.shields.io/github/stars/ali-vilab/videocomposer.svg?style=social&label=Star)](https://github.com/insait-institute/InTraGen) | [![Website](https://img.shields.io/badge/Website-9cf)](https://videocomposer.github.io/) | NeurIPS,  2023 |

| [Adding Conditional Control to Text-to-Image Diffusion Models](https://arxiv.org/abs/2302.05543) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2302.05543) | [![Star](https://img.shields.io/github/stars/lllyasviel/ControlNet.svg?style=social&label=Star)](https://github.com/lllyasviel/ControlNet) | - | ICCV, 2023, Best Paper Award |

### Passive Cognition of Physical Knowledge for Generation

|                                                                                                                                 Title                                                                                                                                 |                                         arXiv                                         |                                                                                   Github                                                                                   |                                                    WebSite                                                    |         Pub. & Date         |

| :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------: | :--------------------------: |

|[Towards Physically Plausible Video Generation via VLM Planning](https://arxiv.org/abs/2503.23368)|[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)]([![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)])|-|-|Mar., 2025|

|[Articulated Kinematics Distillation from Video Diffusion Models](https://arxiv.org/abs/2504.01204)|[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2504.01204)|-|[![Website](https://img.shields.io/badge/Website-9cf)](https://research.nvidia.com/labs/dir/akd/)|Apr., 2025; CVPR, 2025|

|[RainyGS: Efficient Rain Synthesis with Physically-Based Gaussian Splatting](https://arxiv.org/abs/2503.21442)|[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2503.21442)|-|[![Website](https://img.shields.io/badge/Website-9cf)](https://pku-vcl-geometry.github.io/RainyGS/)|Mar., 2025; CVPR, 2025|

|[PhysGen3D: Crafting a Miniature Interactive World from a Single Image](https://arxiv.org/abs/2503.20746)|[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2503.20746)|[![Star](https://img.shields.io/github/stars/by-luckk/PhysGen3D.svg?style=social&label=Star)](https://github.com/by-luckk/PhysGen3D)|[![Website](https://img.shields.io/badge/Website-9cf)](https://by-luckk.github.io/PhysGen3D/)|Mar., 2025; CVPR, 2025|

|[AccidentSim: Generating Physically Realistic Vehicle Collision Videos from Real-World Accident Reports](https://arxiv.org/abs/2503.20654)|[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2503.20654)|-|[![Website](https://img.shields.io/badge/Website-9cf)](https://accidentsim.github.io/)|Mar., 2025|

|[Synthetic Video Enhances Physical Fidelity in Video Synthesis](https://www.arxiv.org/abs/2503.20822)| [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://www.arxiv.org/abs/2503.20822)|-|[![Website](https://img.shields.io/badge/Website-9cf)](https://kevinz8866.github.io/simulation/)|Mar., 2025|

|[PhysTwin: Physics-Informed Reconstruction and Simulation of Deformable Objects from Videos](https://arxiv.org/abs/2503.17973)|[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2503.17973)|[![Star](https://img.shields.io/github/stars/Jianghanxiao/PhysTwin.svg?style=social&label=Star)](https://github.com/Jianghanxiao/PhysTwin)| [![Website](https://img.shields.io/badge/Website-9cf)](https://jianghanxiao.github.io/phystwin-web/)|Mar., 2025|

| [PhysAnimator: Physics-Guided Generative Cartoon Animation](https://arxiv.org/abs/2501.16550) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2501.16550) | - | - | Jan., 2025 |

| [OmniPhysGS: 3D Constitutive Gaussians for General Physics-Based Dynamics Generation](https://openreview.net/forum?id=9HZtP6I5lv) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](hhttps://arxiv.org/abs/2501.18982) | - | - | ICLR, 2025 |

| [AutoVFX: Physically Realistic Video Editing from Natural Language Instructions](https://arxiv.org/abs/2411.02394) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2411.02394) | [![Star](https://img.shields.io/github/stars/haoyuhsu/autovfx.svg?style=social&label=Star)](https://github.com/haoyuhsu/autovfx) | [![Website](https://img.shields.io/badge/Website-9cf)](https://haoyuhsu.github.io/autovfx-website/) | 3DV, 2025 |

| [Gaussian Splashing: Unified Particles for Versatile Motion Synthesis and Rendering](https://arxiv.org/abs/2401.15318) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2401.15318) | - | [![Website](https://img.shields.io/badge/Website-9cf)](https://gaussiansplashing.github.io/) | CVPR, 2025 |

| [Unleashing the potential of multi-modal foundation models and video diffusion for 4d dynamic physical scene simulation](https://arxiv.org/abs/2411.14423) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2411.14423) | - | [![Website](https://img.shields.io/badge/Website-9cf)](https://zhuomanliu.github.io/PhysFlow/) | CVPR, 2025 |

| [FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video](https://arxiv.org/abs/2503.04720) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2503.04720) | - | [![Website](https://img.shields.io/badge/Website-9cf)](https://yuegao.me/FluidNexus/) | CVPR, 2025 |

| [OmniPhysGS: 3D Constitutive Gaussians for General Physics-Based Dynamics Generation]() | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2501.18982) | [![Star](https://img.shields.io/github/stars/wgsxm/OmniPhysGS.svg?style=social&label=Star)](https://github.com/wgsxm/OmniPhysGS) | [![Website](https://img.shields.io/badge/Website-9cf)](https://wgsxm.github.io/projects/omniphysgs/) | ICLR,2025 |

| [DreamPhysics: Learning Physical Properties of Dynamic 3D Gaussians with Video Diffusion Priors](https://arxiv.org/abs/2406.01476) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.01476) | [![Star](https://img.shields.io/github/stars/tyhuang0428/DreamPhysics.svg?style=social&label=Star)](https://github.com/tyhuang0428/DreamPhysics) | - | AAAI, 2025 |

| [GauSim: Registering Elastic Objects into Digital World by Gaussian Simulator](https://arxiv.org/abs/2412.17804) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2412.17804) | - | [![Website](https://img.shields.io/badge/Website-9cf)](https://www.mmlab-ntu.com/project/gausim/index.html) | Dec., 2024 |

| [ GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs](https://arxiv.org/abs/2412.11258) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2412.11258) | [![Star](https://img.shields.io/github/stars/xxlbigbrother/Gaussian-Property.svg?style=social&label=Star)](https://github.com/xxlbigbrother/Gaussian-Property) | [![Website](https://img.shields.io/badge/Website-9cf)](https://gaussian-property.github.io/) | Dec., 2024 |

| [Phys4DGen: A Physics-Driven Framework for Controllable and Efficient 4D Content Generation from a Single Image](https://arxiv.org/abs/2411.16800) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2411.16800) | - | [![Website](https://img.shields.io/badge/Website-9cf)](https://jiajinglin.github.io/Phys4DGen/) | Nov., 2024 |

| [Llmphy: Complex physical reasoning using large language models and world models](https://arxiv.org/abs/2411.08027) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2411.08027) | - | - | Nov., 2024 |

| [Automated 3D Physical Simulation of Open-world Scene with Gaussian Splatting](https://arxiv.org/abs/2411.12789) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2411.12789) | - | [![Website](https://img.shields.io/badge/Website-9cf)](https://sim-gs.github.io/) | Nov., 2024 |

| [Enhancing Sketch Animation: Text-to-Video Diffusion Models with Temporal Consistency and Rigidity Constraints](https://arxiv.org/abs/2411.19381) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2411.19381) |                              -                               |                              -                               |            Nov., 2024             |

| [PhysMotion: Physics-Grounded Dynamics From a Single Image](https://arxiv.org/abs/2411.17189) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2411.17189) | - | [![Website](https://img.shields.io/badge/Website-9cf)](https://supertan0204.github.io/physmotion_website/) | Nov., 2024 |

| [Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis](https://arxiv.org/abs/2410.07155) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2410.07155) | [![Star](https://img.shields.io/github/stars/YangLing0818/Trans4D.svg?style=social&label=Star)](https://github.com/YangLing0818/Trans4D) | - | Oct., 2024 |

| [Phy124: Fast Physics-Driven 4D Content Generation from a Single Image](https://arxiv.org/abs/2409.07179) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2409.07179) | - | - | Sep., 2024 |

| [Kubrick: Multimodal Agent Collaborations for Synthetic Video Generation](https://arxiv.org/abs/2408.10453) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2408.10453) | - | - | Aug., 2024 |

| [Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion](https://arxiv.org/abs/2406.04338) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.04338) | [![Star](https://img.shields.io/github/stars/liuff19/Physics3D.svg?style=social&label=Star)](https://github.com/liuff19/Physics3D) | [![Website](https://img.shields.io/badge/Website-9cf)](https://liuff19.github.io/Physics3D/) | Jun., 2024 |

| [Sync4D: Video Guided Controllable Dynamics for Physics-Based 4D Generation](https://arxiv.org/abs/2405.16849) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2405.16849) | - | [![Website](https://img.shields.io/badge/Website-9cf)](https://sync4dphys.github.io/) | May., 2024 |

| [ElastoGen: 4D Generative Elastodynamics](https://arxiv.org/abs/2405.15056) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2405.15056) |                              -                               | [![Website](https://img.shields.io/badge/Website-9cf)](https://anunrulybunny.github.io/elastogen/) |             May, 2024             |

| [MotionCraft: Physics-based Zero-Shot Video Generation](https://arxiv.org/abs/2405.13557) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2405.13557) | [![Star](https://img.shields.io/github/stars/mezzelfo/MotionCraft.svg?style=social&label=Star)](https://github.com/mezzelfo/MotionCraft) | [![Website](https://img.shields.io/badge/Website-9cf)](https://mezzelfo.github.io/MotionCraft/) | Nips, 2024 |

| [PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation](https://arxiv.org/abs/2409.18964) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2409.18964) | [![Star](https://img.shields.io/github/stars/stevenlsw/physgen.svg?style=social&label=Star)](https://github.com/stevenlsw/physgen) | [![Website](https://img.shields.io/badge/Website-9cf)](https://stevenlsw.github.io/physgen/) |            ECCV, 2024             |

| [PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation](https://link.springer.com/chapter/10.1007/978-3-031-72627-9_22) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2404.13026) | [![Star](https://img.shields.io/github/stars/a1600012888/PhysDreamer.svg?style=social&label=Star)](https://github.com/a1600012888/PhysDreamer) | [![Website](https://img.shields.io/badge/Website-9cf)](https://physdreamer.github.io/) |          ECCV, 2024 Oral          |

| [Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video](https://arxiv.org/abs/2404.09833) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2404.09833) | [![Star](https://img.shields.io/github/stars/video2game/video2game.svg?style=social&label=Star)](https://github.com/video2game/video2game) | [![Website](https://img.shields.io/badge/Website-9cf)](https://video2game.github.io/) |            CVPR, 2024             |

| [PIE-NeRF: Physics-based Interactive Elastodynamics with NeRF](https://openaccess.thecvf.com/content/CVPR2024/html/Feng_PIE-NeRF_Physics-based_Interactive_Elastodynamics_with_NeRF_CVPR_2024_paper.html) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2311.13099) | [![Star](https://img.shields.io/github/stars/FYTalon/pienerf.svg?style=social&label=Star)](https://github.com/FYTalon/pienerf) | [![Website](https://img.shields.io/badge/Website-9cf)](https://fytalon.github.io/pienerf/) |            CVPR, 2024             |

| [VR-GS: A Physical Dynamics-Aware Interactive Gaussian Splatting System in Virtual Reality](https://dl.acm.org/doi/10.1145/3641519.3657448) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2401.16663) |                              -                               | [![Website](https://img.shields.io/badge/Website-9cf)](https://yingjiang96.github.io/VR-GS/) |          SIGGRAPH, 2024           |

| [PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics ](https://arxiv.org/abs/2311.12198) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2311.12198) | [![Star](https://img.shields.io/github/stars/XPandora/PhysGaussian.svg?style=social&label=Star)](https://github.com/XPandora/PhysGaussian) |      [![Website](https://img.shields.io/badge/Website-9cf)]( https://xpandora.github.io/PhysGaussian/ )                              |            CVPR, 2024             |

| [Feature Splatting: Language-Driven Physics-Based Scene Synthesis and Editing](https://link.springer.com/chapter/10.1007/978-3-031-72940-9_21?fromPaywallRec=true) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2404.01223) | [![Star](https://img.shields.io/github/stars/vuer-ai/feature-splatting-inria.svg?style=social&label=Star)](https://github.com/vuer-ai/feature-splatting-inria) |   [![Website](https://img.shields.io/badge/Website-9cf)](https://feature-splatting.github.io/) |            ECCV, 2024             |

| [Neural Material Adaptor for Visual Grounding of Intrinsic Dynamics](https://arxiv.org/abs/2410.08257) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2410.08257) | [![Star](https://img.shields.io/github/stars/XJay18/NeuMA.svg?style=social&label=Star)](https://github.com/XJay18/NeuMA) | [![Website](https://img.shields.io/badge/Website-9cf)](https://xjay18.github.io/projects/neuma.html) |            NIPS, 2024             |

| [Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis](https://ieeexplore.ieee.org/abstract/document/10550869) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2308.09713) | [![Star](https://img.shields.io/github/stars/JonathonLuiten/Dynamic3DGaussians.svg?style=social&label=Star)](https://github.com/JonathonLuiten/Dynamic3DGaussians) | [![Website](https://img.shields.io/badge/Website-9cf)](https://dynamic3dgaussians.github.io/) |             3DV, 2024             |

| [LLM-grounded Video Diffusion Models](https://openreview.net/forum?id=exKHibougU) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2309.17444) | [![Star](https://img.shields.io/github/stars/TonyLianLong/LLM-groundedVideoDiffusion.svg?style=social&label=Star)](https://github.com/TonyLianLong/LLM-groundedVideoDiffusion) | [![Website](https://img.shields.io/badge/Website-9cf)](https://llm-grounded-video-diffusion.github.io/) |            ICLR, 2024             |

| [Compositional 3D-aware Video Generation with LLM Director](https://openreview.net/forum?id=oqdy2EFrja&referrer=%5Bthe%20profile%20of%20Zhibo%20Chen%5D(%2Fprofile%3Fid%3D~Zhibo_Chen1)) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2409.00558) |                              -                               | [![Website](https://img.shields.io/badge/Website-9cf)](https://www.microsoft.com/en-us/research/project/compositional-3d-aware-video-generation/) |            NIPS, 2024             |

| [GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning](https://openaccess.thecvf.com/content/CVPR2024W/PBDL/html/Lv_GPT4Motion_Scripting_Physical_Motions_in_Text-to-Video_Generation_via_Blender-Oriented_GPT_CVPRW_2024_paper.html) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2311.12631) | [![Star](https://img.shields.io/github/stars/jiaxilv/GPT4Motion.svg?style=social&label=Star)](https://github.com/jiaxilv/GPT4Motion) | [![Website](https://img.shields.io/badge/Website-9cf)](https://gpt4motion.github.io/) |       CVPR, 2024, workshop        |

| [DeformGS: Scene Flow in Highly Deformable Scenes for Deformable Object Manipulation](https://arxiv.org/abs/2312.00583) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2312.00583) | [![Star](https://img.shields.io/github/stars/momentum-robotics-lab/deformgs.svg?style=social&label=Star)](https://github.com/momentum-robotics-lab/deformgs) | [![Website](https://img.shields.io/badge/Website-9cf)](https://deformgs.github.io/) |             WAFR 2024             |

| [Learning Neural Constitutive Laws From Motion Observations for Generalizable PDE Dynamics](https://proceedings.mlr.press/v202/ma23a.html) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2304.14369) | [![Star](https://img.shields.io/github/stars/PingchuanMa/NCLaw.svg?style=social&label=Star)](https://github.com/PingchuanMa/NCLaw) | [![Website](https://img.shields.io/badge/Website-9cf)](https://sites.google.com/view/nclaw) |            ICML, 2023             |

| [Pac-nerf: Physics Augmented Continuum Neural Radiance Fields for Geometry-Agnostic System Identification](https://openreview.net/forum?id=tVkrbkz42vc) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2303.05512) | [![Star](https://img.shields.io/github/stars/xuan-li/PAC-NeRF.svg?style=social&label=Star)](https://github.com/xuan-li/PAC-NeRF) | [![Website](https://img.shields.io/badge/Website-9cf)](https://sites.google.com/view/PAC-NeRF) |       ICLR, 2023, Spotlight       |

### Active Cognition for World Simulation

|                            Title                             |                            arXiv                             |                            Github                            |                           WebSite                            |             Pub. & Date             |

| :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :---------------------------------: |

|[Aether: Geometric-Aware Unified World Modeling](https://arxiv.org/abs/2503.18945)|[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2503.18945)| [![Star](https://img.shields.io/github/stars/OpenRobotLab/Aether.svg?style=social&label=Star)](https://github.com/OpenRobotLab/Aether)|[![Website](https://img.shields.io/badge/Website-9cf)](https://aether-world.github.io/)|Mar., 2025|

|[AdaWorld: Learning Adaptable World Models with Latent Actions](https://arxiv.org/abs/2503.18938)|[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2503.18938)|[![Star](https://img.shields.io/github/stars/Little-Podi/AdaWorld.svg?style=social&label=Star)](https://github.com/Little-Podi/AdaWorld)| [![Website](https://img.shields.io/badge/Website-9cf)](https://adaptable-world-model.github.io/)|Mar., 2025|

| [Pisa experiments: Exploring physics post-training for video diffusion models by watching stuff drop](https://arxiv.org/abs/2503.09595) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2503.09595) | [![Star](https://img.shields.io/github/stars/vision-x-nyu/pisa-experiments.svg?style=social&label=Star)](https://github.com/vision-x-nyu/pisa-experiments) |                        [![Website](https://img.shields.io/badge/Website-9cf)]( https://vision-x-nyu.github.io/pisa-experiments.github.io/)                 |             Mar., 2025              |

| [Wisa: World simulator assistant for physics-aware text-to-video generation](https://arxiv.org/abs/2503.08153) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2503.08153) | [![Star](https://img.shields.io/github/stars/360CVGroup/WISA.svg?style=social&label=Star)](https://github.com/360CVGroup/WISA) | [![Website](https://img.shields.io/badge/Website-9cf)](https://360cvgroup.github.io/WISA/) |             Mar., 2025              |

| [Ipo: Iterative preference optimization for text-to-video generation](https://arxiv.org/abs/2502.02088) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2502.02088) | [![Star](https://img.shields.io/github/stars/SAIS-FUXI/IPO.svg?style=social&label=Star)](https://github.com/SAIS-FUXI/IPO) |                              -                               |              Feb, 2025              |

| [Do generative video models learn physical principles from watching videos?](https://arxiv.org/abs/2501.09038) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2501.09038) | [![Star](https://img.shields.io/github/stars/google-deepmind/physics-IQ-benchmark.svg?style=social&label=Star)](https://github.com/google-deepmind/physics-IQ-benchmark) | [![Website](https://img.shields.io/badge/Website-9cf)](https://physics-iq.github.io/) |             Jan., 2025              |

| [Cosmos world foundation model platform for physical ai](https://arxiv.org/abs/2501.03575) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2501.03575) | [![Star](https://img.shields.io/github/stars/nvidia-cosmos/cosmos-predict1.svg?style=social&label=Star)](https://github.com/nvidia-cosmos/cosmos-predict1) |     [ ![Website](https://img.shields.io/badge/Website-9cf) ](https://www.nvidia.com/en-us/ai/cosmos/)    |             Jan., 2025              |

| [Improving video generation with human feedback](https://arxiv.org/abs/2501.13918) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2501.13918) | [![Star](https://img.shields.io/github/stars/KwaiVGI/VideoAlign.svg?style=social&label=Star)](https://github.com/KwaiVGI/VideoAlign) | [![Website](https://img.shields.io/badge/Website-9cf)](https://gongyeliu.github.io/videoalign/) |             Jan., 2025              |

| [Phyt2v: Llm-guided iterative self-refinement for physics-grounded text-to-video generation](https://arxiv.org/abs/2412.00596) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2412.00596) | [![Star](https://img.shields.io/github/stars/pittisl/PhyT2V.svg?style=social&label=Star)](https://github.com/pittisl/PhyT2V) |                              -                               |             CVPR, 2025              |

| [Dream to manipulate: Compositional world models empowering robot imitation learning with imagination](https://arxiv.org/abs/2412.14957) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2412.14957) | [![Star](https://img.shields.io/github/stars/leobarcellona/drema_code.svg?style=social&label=Star)](https://github.com/leobarcellona/drema_code) | [![Website](https://img.shields.io/badge/Website-9cf)](https://dreamtomanipulate.github.io/) |             ICLR, 2025              |

| [MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators](https://arxiv.org/abs/2404.05014) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2404.05014) | [![Star](https://img.shields.io/github/stars/PKU-YuanGroup/MagicTime.svg?style=social&label=Star)](https://github.com/PKU-YuanGroup/MagicTime) |                        [![Website](https://img.shields.io/badge/Website-9cf)](https://pku-yuangroup.github.io/MagicTime/)                 |             TPAMI, 2025              |

| [ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation](https://arxiv.org/abs/2403.08321) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2403.08321) |                              -                               | [![Website](https://img.shields.io/badge/Website-9cf)](https://guanxinglu.github.io/ManiGaussian/) |             ECCV, 2025              |

| [Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback](https://arxiv.org/abs/2412.02617) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2412.02617) |                              -                               | [![Website](https://img.shields.io/badge/Website-9cf)](https://sites.google.com/view/aif-dynamic-t2v/) |             Dec., 2024              |

| [Physical informed driving world model](https://arxiv.org/abs/2412.08410) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2412.08410) |                              -                               | [![Website](https://img.shields.io/badge/Website-9cf)](https://metadrivescape.github.io/papers_project/DrivePhysica/page.html) |             Dec., 2024              |

| [How far is video generation from world model: A physical law perspective](https://arxiv.org/abs/2411.02385) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2411.02385) | [![Star](https://img.shields.io/github/stars/phyworld/phyworld.svg?style=social&label=Star)](https://github.com/phyworld/phyworld) | [![Website](https://img.shields.io/badge/Website-9cf)](https://phyworld.github.io/) |             Nov., 2024              |

| [Video generation models as world simulators](https://arxiv.org/abs/2410.18072) |                              -                               |                              -                               | [![Website](https://img.shields.io/badge/Website-9cf)](https://openai.com/index/video-generation-models-as-world-simulators/) |             Oct., 2024              |

| [Videoagent: Self-improving video generation](https://arxiv.org/abs/2410.10076) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2410.10076) | [![Star](https://img.shields.io/github/stars/Video-as-Agent/VideoAgent.svg?style=social&label=Star)](https://github.com/Video-as-Agent/VideoAgent) | [![Website](https://img.shields.io/badge/Website-9cf)](https://video-as-agent.github.io/) |             Oct., 2024              |

| [Gen-drive: Enhancing diffusion generative driving policies with reward modeling and reinforcement learning fine-tuning](https://arxiv.org/abs/2410.05582) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2410.05582) |                              -                               | [![Website](https://img.shields.io/badge/Website-9cf)](https://mczhi.github.io/GenDrive/) |              Oct, 2024              |

| [Drivedreamer4d: World models are effective data machines for 4d driving scene representation](https://arxiv.org/abs/2410.13571) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2410.13571) | [![Star](https://img.shields.io/github/stars/GigaAI-research/DriveDreamer4D.svg?style=social&label=Star)](https://github.com/GigaAI-research/DriveDreamer4D) | [![Website](https://img.shields.io/badge/Website-9cf)](https://drivedreamer4d.github.io/) |              Oct, 2024              |

| [Open-sora: Democratizing efficient video production for all](https://arxiv.org/abs/2412.20404) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2412.20404) | [![Star](https://img.shields.io/github/stars/hpcaitech/Open-Sora.svg?style=social&label=Star)](https://github.com/hpcaitech/Open-Sora) | [![Website](https://img.shields.io/badge/Website-9cf)](https://hpcaitech.github.io/Open-Sora/) |             Dec., 2024              |

|         [Imagen 3](https://arxiv.org/abs/2408.07009)         | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2408.07009) |                              -                               | [![Website](https://img.shields.io/badge/Website-9cf)](https://deepmind.google/technologies/imagen-3/) |             Aug., 2024              |

|      [Genie 2: A large-scale foundation world model]()       |                              -                               |                              -                               | [![Website](https://img.shields.io/badge/Website-9cf)](https://deepmind.google/discover/blog/genie-2-a-large-scale-foundation-world-model/) |                2024                 |

| [Genie: ¨ Generative interactive environments](https://arxiv.org/abs/2402.15391) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2402.15391) |                              -                               | [![Website](https://img.shields.io/badge/Website-9cf)](https://sites.google.com/view/genie-2024/?pli=1) |             Feb., 2024              |

| [Worlddreamer: Towards general world models for video generation via predicting masked tokens](https://arxiv.org/abs/2401.09985) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2401.09985) | [![Star](https://img.shields.io/github/stars/JeffWang987/WorldDreamer.svg?style=social&label=Star)](https://github.com/JeffWang987/WorldDreamer) | [![Website](https://img.shields.io/badge/Website-9cf)](https://world-dreamer.github.io/) |             Jan., 2024              |

| [Learning interactive real-world simulators](https://arxiv.org/abs/2310.06114) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2310.06114) |                              -                               | [![Website](https://img.shields.io/badge/Website-9cf)](https://universal-simulator.github.io/unisim/) | ICLR, 2024, Outstanding Paper Award |

| [Physically embodied gaussian splatting: A visually learnt and physically grounded 3d representation for robotics](https://openreview.net/forum?id=AEq0onGrN2&noteId=AEq0onGrN2) |                              -                               |                              -                               | [![Website](https://img.shields.io/badge/Website-9cf)](https://embodied-gaussians.github.io/) |             CoRL, 2024              |

| [Gaia-1: A generative world model for autonomous driving](https://arxiv.org/abs/2309.17080) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2309.17080) |                              -                               |                             [![Website](https://img.shields.io/badge/Website-9cf)]( https://wayve.ai/thinking/introducing-gaia1/)                           |             Sep., 2023              |

### Benchmarks and Metrics

|                            Title                             |                            arXiv                             |                            Github                            |                           Website                            | Pub. & Date |

| :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :---------: |

|[Morpheus: Benchmarking Physical Reasoning of Video Generative Models with Real Physical Experiments](https://arxiv.org/abs/2504.02918)|[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2504.02918)|-|-|Apr., 2025|

|[A Unified Evaluation Benchmark for World Generation](https://arxiv.org/abs/2504.00983)|[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2504.00983)| [![Star](https://img.shields.io/github/stars/haoyi-duan/WorldScore.svg?style=social&label=Star)](https://github.com/haoyi-duan/WorldScore)| [![Website](https://img.shields.io/badge/Website-9cf)](https://haoyi-duan.github.io/WorldScore/)|Apr., 2025|

|[HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation](https://arxiv.org/abs/2503.23715)|[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2503.23715)|-| [![Website](https://img.shields.io/badge/Website-9cf)](https://liuqi-creat.github.io/HOIGen.github.io/)|Mar., 2025; CVPR, 2025|

|[Cognitive Science-Inspired Evaluation of Core Capabilities for Object Understanding in AI](https://arxiv.org/abs/2503.21668)| [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2503.21668)|-|-|Mar., 2025|

| [Pisa:experiments: Exploring physics post-training for video diffusion models by watching stuff drop](https://arxiv.org/abs/2503.09595) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2503.09595) | [![Star](https://img.shields.io/github/stars/vision-x-nyu/pisa-experiments.svg?style=social&label=Star)](https://github.com/vision-x-nyu/pisa-experiments) |                              -                               | Mar., 2025  |

|[Impossible Videos](https://arxiv.org/abs/2503.14378)|[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2503.14378)|[![Star](https://img.shields.io/github/stars/showlab/Impossible-Videos.svg?style=social&label=Star)](https://github.com/showlab/Impossible-Videos)|[![Website](https://img.shields.io/badge/Website-9cf)](https://showlab.github.io/Impossible-Videos/)|Mar., 2025  |

| [Wisa: World simulator assistant for physics-aware text-to-video generation](https://arxiv.org/abs/2503.08153) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2503.08153) | [![Star](https://img.shields.io/github/stars/360CVGroup/WISA.svg?style=social&label=Star)](https://github.com/360CVGroup/WISA) | [![Website](https://img.shields.io/badge/Website-9cf)](https://360cvgroup.github.io/WISA/) | Mar., 2025  |

|[VideoPhy-2: A Challenging Action-Centric Physical Commonsense Evaluation in Video Generation](https://arxiv.org/abs/2503.06800)|[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2503.06800)|[![Star](https://img.shields.io/github/stars/Hritikbansal/videophy.svg?style=social&label=Star)](https://github.com/Hritikbansal/videophy)|[![Website](https://img.shields.io/badge/Website-9cf)](https://videophy2.github.io/)|Mar., 2025  |

| [A physical coherence benchmark for evaluating video generation models via optical flow-guided frame prediction](https://www.arxiv.org/abs/2502.05503) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://www.arxiv.org/abs/2502.05503) | [![Star](https://img.shields.io/github/stars/Jeckinchen/PhyCoBench.svg?style=social&label=Star)](https://github.com/Jeckinchen/PhyCoBench) |                              -                               | Feb., 2025  |

| [Do generative video models learn physical principles from watching videos?](https://arxiv.org/abs/2501.09038) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2501.09038) | [![Star](https://img.shields.io/github/stars/google-deepmind/physics-IQ-benchmark.svg?style=social&label=Star)](https://github.com/google-deepmind/physics-IQ-benchmark) | [![Website](https://img.shields.io/badge/Website-9cf)](https://physics-iq.github.io/) | Jan., 2025  |

|[ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation](https://arxiv.org/abs/2406.18522)|[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.18522)|[![Star](https://img.shields.io/github/stars/PKU-YuanGroup/ChronoMagic-Bench.svg?style=social&label=Star)](https://github.com/PKU-YuanGroup/ChronoMagic-Bench)|[![Website](https://img.shields.io/badge/Website-9cf)](https://pku-yuangroup.github.io/ChronoMagic-Bench/)|NeurIPS, 2024, Spotlight  |

| [Llmphy: Complex physical reasoning using large language models and world models](https://arxiv.org/abs/2411.08027) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2411.08027) |                              -                               |                              -                               | Nov., 2024  |

|[What You See Is What Matters: A Novel Visual and Physics-Based Metric for Evaluating Video Generation Quality](https://arxiv.org/abs/2411.13609)|[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2411.13609)|-|-| Nov., 2024  |

| [Towards world simulator: Crafting physical commonsense-based benchmark for video generation](https://arxiv.org/abs/2410.05363) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2410.05363) | [![Star](https://img.shields.io/github/stars/OpenGVLab/PhyGenBench.svg?style=social&label=Star)](https://github.com/OpenGVLab/PhyGenBench) | [![Website](https://img.shields.io/badge/Website-9cf)](https://phygenbench123.github.io/) | Oct., 2024  |

|[WorldSimBench: Towards Video Generation Models as World Simulators](https://arxiv.org/abs/2410.18072)|[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2410.18072)|-|[![Website](https://img.shields.io/badge/Website-9cf)](https://iranqin.github.io/WorldSimBench.github.io/)|Oct., 2024  |

| [Phybench: A physical commonsense benchmark for evaluating text-to-image model](https://arxiv.org/abs/2406.11802) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.11802) |                              -                               |                              -                               | Jun., 2024  |

| [Videophy: Evaluating physical commonsense for video generation](https://arxiv.org/abs/2406.03520) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.03520) | [![Star](https://img.shields.io/github/stars/Hritikbansal/videophy.svg?style=social&label=Star)](https://github.com/Hritikbansal/videophy) | [![Website](https://img.shields.io/badge/Website-9cf)](https://videophy.github.io/) | Jun., 2024  |

| [Videocon: Robust video-language alignment via contrast captions](https://arxiv.org/abs/2311.10111) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2311.10111) | [![Star](https://img.shields.io/github/stars/Hritikbansal/videocon.svg?style=social&label=Star)](https://github.com/Hritikbansal/videocon) | [![Website](https://img.shields.io/badge/Website-9cf)](https://video-con.github.io/) | CVPR, 2024  |

| [Physion++: Evaluating physical scene understanding that requires online inference of different physical properties](https://arxiv.org/abs/2306.15668) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)]([![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2106.08261)) |                              -                               | [![Website](https://img.shields.io/badge/Website-9cf)](https://dingmyu.github.io/physion_v2/) | Nips, 2023  |

| [Craft: A benchmark for causal reasoning about forces and interactions](https://arxiv.org/abs/2012.04293) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2012.04293) | [![Star](https://img.shields.io/github/stars/hucvl/craft.svg?style=social&label=Star)](https://github.com/hucvl/craft) | [![Website](https://img.shields.io/badge/Website-9cf)](https://sites.google.com/view/craft-benchmark) |  ACL, 2022  |

| [Physion: Evaluating physical prediction from vision in humans and machines](https://arxiv.org/abs/2106.08261) | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2106.08261) | [![Star](https://img.shields.io/github/stars/cogtoolslab/physics-benchmarking-neurips2021.svg?style=social&label=Star)](https://github.com/cogtoolslab/physics-benchmarking-neurips2021) | [![Website](https://img.shields.io/badge/Website-9cf)](https://physion-benchmark.github.io/) | Nips, 2021  |

## ♥️ Star History

## 📑 Citation

Please consider citing 📑 our papers if our repository is helpful to your work, thanks sincerely!

```

@misc{lin2025exploringevolutionphysicscognition,

      title={Exploring the Evolution of Physics Cognition in Video Generation: A Survey}, 

      author={Minghui Lin and Xiang Wang and Yishan Wang and Shu Wang and Fengqi Dai and Pengxiang Ding and Cunxiang Wang and Zhengrong Zuo and Nong Sang and Siteng Huang and Donglin Wang},

      year={2025},

      eprint={2503.21765},

      archivePrefix={arXiv},

      primaryClass={cs.CV},

      url={https://arxiv.org/abs/2503.21765}, 

}

```
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/minnie-lin/Awesome-Physics-Cognition-based-Video-Generation

Awesome Lists containing this project

README