Awesome-Physics-Cognition-based-Video-Generation
A comprehensive list of papers investigating physical cognition in video generation, including papers, codes, and related websites.
https://github.com/minnie-lin/Awesome-Physics-Cognition-based-Video-Generation
Last synced: 6 days ago
JSON representation
-
๐ Table of Contents
-
Surveys
- A Survey of Interactive Generative Video - b31b1b.svg)](https://arxiv.org/abs/2504.21853)|-|-|Apr., 2025 |
- Grounding Creativity in Physics: A Brief Survey of Physical Priors in AIGC - b31b1b.svg)](https://arxiv.org/abs/2502.07007)|-|-|Feb., 2025|
- Generative Physical AI in Vision: A Survey - b31b1b.svg)](https://arxiv.org/abs/2501.10928)|[](https://github.com/BestJunYu/Awesome-Physics-aware-Generation)|-|Jan., 2025|
- Digital Gene: Learning about the Physical World through Analytic Concepts - b31b1b.svg)](https://arxiv.org/abs/2504.04170)|-|-|Apr., 2025 |
- Simulating the Real World: A Unified Survey of Multimodal Generative Models - b31b1b.svg)](https://arxiv.org/abs/2503.04641)|[](https://github.com/ALEEEHU/World-Simulator)|-|Mar., 2025|
- Physics-Informed Computer Vision: A Review and Perspectives - b31b1b.svg)](https://arxiv.org/abs/2305.18035)|-|-|ACM Computing Surveys, 2024|
-
Basic Schematic Perception for Generation
- ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction - b31b1b.svg)](https://arxiv.org/abs/2504.21855)|-|[](https://revision-video.github.io/)|Apr., 2025|
- Motion Prompting: Controlling Video Generation with Motion Trajectories - b31b1b.svg)](https://arxiv.org/abs/2412.02700) | - | [](https://motion-prompting.github.io/) | CVPR, 2025 (Oral) |
- MotionDirector: Motion Customization of Text-to-Video Diffusion Models - b31b1b.svg)](https://arxiv.org/abs/2310.08465) | [](https://github.com/showlab/MotionDirector) | [](https://showlab.github.io/MotionDirector/) | ECCV, 2024, Oral |
- MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model - b31b1b.svg)](https://arxiv.org/abs/2405.20222) | [](https://github.com/MyNiuuu/MOFA-Video) | [](https://myniuuu.github.io/MOFA_Video/) | ECCV, 2024 |
- DragAnything: Motion Control for Anything using Entity Representation - b31b1b.svg)](https://arxiv.org/abs/2403.07420) | [](https://github.com/showlab/DragAnything) | [](https://weijiawu.github.io/draganything_page/) | ECCV, 2024 |
- TC4D: Trajectory-Conditioned Text-to-4D Generation - b31b1b.svg)](https://arxiv.org/abs/2403.17920) | [](https://github.com/sherwinbahmani/tc4d) | [](https://sherwinbahmani.github.io/tc4d/) | ECCV, 2024 |
- Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation - b31b1b.svg)](https://arxiv.org/abs/2503.24379)|[](https://github.com/ChocoWu/Any2Caption)|[](https://sqwu.top/Any2Cap/)|Mar., 2025|
- Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach - b31b1b.svg)](https://arxiv.org/abs/2502.03639) | - | [](https://snap-research.github.io/PointVidGen/) | Feb, 2025 |
- SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation - b31b1b.svg)](https://arxiv.org/abs/2411.04989) | [](https://github.com/Kmcode1/SG-I2V) | [](https://kmcode1.github.io/Projects/SG-I2V/) | ICLR, 2025 |
- TrackGo: A Flexible and Efficient Method for Controllable Video Generation - b31b1b.svg)](https://arxiv.org/abs/2408.11475) | - | - | AAAI, 2025 |
- 3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation - b31b1b.svg)](https://arxiv.org/abs/2403.17920) | [](https://github.com/KwaiVGI/3DTrajMaster) | [](https://fuxiao0719.github.io/projects/3dtrajmaster/) | ICLR, 2025 |
- Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis - b31b1b.svg)](https://arxiv.org/abs/2412.02168)|[](https://github.com/pandayuanyu/generative-photography)|[](https://generative-photography.github.io/project/)| CVPR, 2025|
- Lux Post Facto: Learning Portrait Performance Relighting with Conditional Video Diffusion and a Hybrid Dataset - b31b1b.svg)](https://arxiv.org/abs/2503.14485) | | [](https://www.eyelinestudios.com/research/luxpostfacto.html) | CVPR,2025 |
- Identity-Preserving Text-to-Video Generation by Frequency Decomposition - b31b1b.svg)](https://arxiv.org/abs/2411.17440) | [](https://github.com/PKU-YuanGroup/ConsisID) | [](https://pku-yuangroup.github.io/ConsisID/) | CVPR, 2025 |
- Motion Modes: What Could Happen Next? - b31b1b.svg)](https://arxiv.org/abs/2412.00148) | - | [](https://motionmodes.github.io/) | CVPR, 2025 |
- Spectral Motion Alignment for Video Motion Transfer using Diffusion Models - b31b1b.svg)](https://arxiv.org/abs/2403.15249) | [](https://github.com/geonyeong-park/Spectral-Motion-Alignment) | [](https://geonyeong-park.github.io/spectral-motion-alignment/) | AAAI, 2025 |
- Video Creation by Demonstration - b31b1b.svg)](https://arxiv.org/abs/2412.09551) | - | [](https://delta-diffusion.github.io/) | Dec., 2024 |
- InterDyn: Controllable Interactive Dynamics with Video Diffusion Models - b31b1b.svg)](https://arxiv.org/abs/2412.11785) | - | [](https://interdyn.is.tue.mpg.de/) | Dec., 2024 |
- LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis - b31b1b.svg)](https://arxiv.org/abs/2412.15214) | [](https://github.com/qiuyu96/LeviTor) | [](https://ppetrichor.github.io/levitor.github.io/) | Dec., 2024 |
- GenLit: Reformulating Single-Image Relighting as Video Generation - b31b1b.svg)](https://arxiv.org/abs/2412.11224) | - | [](https://genlit-probingi2v.github.io/) | Dec., 2024 |
- Motion Dreamer: Realizing Physically Coherent Video Generation through Scene-Aware Motion Reasoning - b31b1b.svg)](https://arxiv.org/abs/2412.00547) | [](https://github.com/EnVision-Research/MotionDreamer) | [](https://envision-research.github.io/MotionDreamer/) | Nov., 2024 |
- AnimateAnything: Consistent and Controllable Animation for Video Generation - b31b1b.svg)](https://arxiv.org/abs/2411.10836) | [](https://github.com/yu-shaonian/AnimateAnything) | [](https://yu-shaonian.github.io/Animate_Anything/) | Nov., 2024 |
- InTraGen: Trajectory-controlled Video Generation for Object Interactions - b31b1b.svg)](https://arxiv.org/abs/2411.16804) | [](https://github.com/insait-institute/InTraGen) | - | Nov., 2024 |
- DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control - b31b1b.svg)](https://arxiv.org/abs/2410.13830) | - | [](https://dreamvideo2.github.io/) | Oct., 2024 |
- LumiSculpt: A Consistency Lighting Control Network for Video Generation - b31b1b.svg)](https://arxiv.org/abs/2410.22979) | - | - | Oct., 2024 |
- Tora: Trajectory-oriented Diffusion Transformer for Video Generation - b31b1b.svg)](https://arxiv.org/abs/2407.21705) | [](https://github.com/alibaba/Tora) | [](https://ali-videoai.github.io/tora_video/) | Jul., 2024 |
- UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation - b31b1b.svg)](https://arxiv.org/abs/2406.01188) | [](https://github.com/ali-vilab/UniAnimate) | [](https://unianimate.github.io/) | Jun., 2024 |
- Image Conductor: Precision Control for Interactive Video Synthesis - b31b1b.svg)](https://arxiv.org/abs/2406.15339) | [](https://github.com/liyaowei-stu/ImageConductor) | [](https://liyaowei-stu.github.io/project/ImageConductor/) | Jun., 2024 |
- Motion Inversion for Video Customization - b31b1b.svg)](https://arxiv.org/abs/2403.20193) | [](https://github.com/EnVision-Research/MotionInversion) | [](https://wileewang.github.io/MotionInversion/) | Mar., 2024 |
- VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models - b31b1b.svg)](https://arxiv.org/abs/2312.00845) | [](https://github.com/HyeonHo99/Video-Motion-Customization) | [](https://video-motion-customization.github.io/) | CVPR, 2024 |
- MotionDirector: Motion Customization of Text-to-Video Diffusion Models - b31b1b.svg)](https://arxiv.org/abs/2310.08465) | [](https://github.com/showlab/MotionDirector) | [](https://showlab.github.io/MotionDirector/) | ECCV, 2024, Oral |
- FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis - b31b1b.svg)](https://arxiv.org/abs/2312.17681) | - | [](https://jeff-liangf.github.io/projects/flowvid/) | CVPR, 2024, Highlight |
- Generative Image Dynamics - b31b1b.svg)](https://arxiv.org/abs/2309.07906) | [](https://github.com/fltwr/generative-image-dynamics) | [](https://generative-dynamics.github.io/) | CVPR, 2024, Best Paper Award |
- MotionCtrl: A Unified and Flexible Motion Controller for Video Generation - b31b1b.svg)](https://arxiv.org/abs/2312.03641) | [](https://github.com/TencentARC/MotionCtrl) | [](https://wzhouxiff.github.io/projects/MotionCtrl/) | SIGGRAPH, 2024 |
- MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model - b31b1b.svg)](https://arxiv.org/abs/2405.20222) | [](https://github.com/MyNiuuu/MOFA-Video) | [](https://myniuuu.github.io/MOFA_Video/) | ECCV, 2024 |
- Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling - b31b1b.svg)](https://arxiv.org/abs/2401.15977) | [](https://github.com/G-U-N/Motion-I2V) | [](https://xiaoyushi97.github.io/Motion-I2V/) | SIGGRAPH, 2024 |
- DragAnything: Motion Control for Anything using Entity Representation - b31b1b.svg)](https://arxiv.org/abs/2403.07420) | [](https://github.com/showlab/DragAnything) | [](https://weijiawu.github.io/draganything_page/) | ECCV, 2024 |
- FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing - b31b1b.svg)](https://arxiv.org/abs/2310.05922) | [](https://github.com/yrcong/flatten) | [](https://flatten-video-editing.github.io/) | ICLR, 2024 |
- Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation - b31b1b.svg)](https://arxiv.org/abs/2311.17117) | [](https://github.com/HumanAIGC/AnimateAnyone) | [](https://humanaigc.github.io/animate-anyone/) | CVPR, 2024 |
- Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion - b31b1b.svg)](https://arxiv.org/abs/2402.03162) | [](https://github.com/ysy31415/direct_a_video) | [](https://direct-a-video.github.io/) | SIGGRAPH, 2024 |
- Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling - b31b1b.svg)](https://arxiv.org/abs/2401.15977) | - | [](https://direct-a-video.github.io/) | SIGGRAPH, 2024 |
- TC4D: Trajectory-Conditioned Text-to-4D Generation - b31b1b.svg)](https://arxiv.org/abs/2403.17920) | [](https://github.com/sherwinbahmani/tc4d) | [](https://sherwinbahmani.github.io/tc4d/) | ECCV, 2024 |
- Control-A-Video: Controllable Text-to-Video Diffusion Models with Motion Prior and Reward Feedback Learnin - b31b1b.svg)](https://arxiv.org/abs/2305.13840) | [](https://github.com/Weifeng-Chen/control-a-video) | [](https://controlavideo.github.io/) | May., 2023 |
- VideoComposer: Compositional Video Synthesis with Motion Controllability - b31b1b.svg)](https://arxiv.org/abs/2306.02018) | [](https://github.com/insait-institute/InTraGen) | [](https://videocomposer.github.io/) | NeurIPS, 2023 |
- Adding Conditional Control to Text-to-Image Diffusion Models - b31b1b.svg)](https://arxiv.org/abs/2302.05543) | [](https://github.com/lllyasviel/ControlNet) | - | ICCV, 2023, Best Paper Award |
- MotionDirector: Motion Customization of Text-to-Video Diffusion Models - b31b1b.svg)](https://arxiv.org/abs/2310.08465) | [](https://github.com/showlab/MotionDirector) | [](https://showlab.github.io/MotionDirector/) | ECCV, 2024, Oral |
- MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model - b31b1b.svg)](https://arxiv.org/abs/2405.20222) | [](https://github.com/MyNiuuu/MOFA-Video) | [](https://myniuuu.github.io/MOFA_Video/) | ECCV, 2024 |
- DragAnything: Motion Control for Anything using Entity Representation - b31b1b.svg)](https://arxiv.org/abs/2403.07420) | [](https://github.com/showlab/DragAnything) | [](https://weijiawu.github.io/draganything_page/) | ECCV, 2024 |
- TC4D: Trajectory-Conditioned Text-to-4D Generation - b31b1b.svg)](https://arxiv.org/abs/2403.17920) | [](https://github.com/sherwinbahmani/tc4d) | [](https://sherwinbahmani.github.io/tc4d/) | ECCV, 2024 |
- MotionDirector: Motion Customization of Text-to-Video Diffusion Models - b31b1b.svg)](https://arxiv.org/abs/2310.08465) | [](https://github.com/showlab/MotionDirector) | [](https://showlab.github.io/MotionDirector/) | ECCV, 2024, Oral |
- MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model - b31b1b.svg)](https://arxiv.org/abs/2405.20222) | [](https://github.com/MyNiuuu/MOFA-Video) | [](https://myniuuu.github.io/MOFA_Video/) | ECCV, 2024 |
- DragAnything: Motion Control for Anything using Entity Representation - b31b1b.svg)](https://arxiv.org/abs/2403.07420) | [](https://github.com/showlab/DragAnything) | [](https://weijiawu.github.io/draganything_page/) | ECCV, 2024 |
- TC4D: Trajectory-Conditioned Text-to-4D Generation - b31b1b.svg)](https://arxiv.org/abs/2403.17920) | [](https://github.com/sherwinbahmani/tc4d) | [](https://sherwinbahmani.github.io/tc4d/) | ECCV, 2024 |
- TC4D: Trajectory-Conditioned Text-to-4D Generation - b31b1b.svg)](https://arxiv.org/abs/2403.17920) | [](https://github.com/sherwinbahmani/tc4d) | [](https://sherwinbahmani.github.io/tc4d/) | ECCV, 2024 |
- DragAnything: Motion Control for Anything using Entity Representation - b31b1b.svg)](https://arxiv.org/abs/2403.07420) | [](https://github.com/showlab/DragAnything) | [](https://weijiawu.github.io/draganything_page/) | ECCV, 2024 |
- MotionDirector: Motion Customization of Text-to-Video Diffusion Models - b31b1b.svg)](https://arxiv.org/abs/2310.08465) | [](https://github.com/showlab/MotionDirector) | [](https://showlab.github.io/MotionDirector/) | ECCV, 2024, Oral |
- MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model - b31b1b.svg)](https://arxiv.org/abs/2405.20222) | [](https://github.com/MyNiuuu/MOFA-Video) | [](https://myniuuu.github.io/MOFA_Video/) | ECCV, 2024 |
-
Passive Cognition of Physical Knowledge for Generation
- VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior - b31b1b.svg)]([])|[](https://github.com/Madaoer/VLIPP)|[](https://madaoer.github.io/projects/physically_plausible_video_generation/)|Mar., 2025|
- PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation - b31b1b.svg)](https://arxiv.org/abs/2404.13026) | [](https://github.com/a1600012888/PhysDreamer) | [](https://physdreamer.github.io/) | ECCV, 2024 Oral |
- Feature Splatting: Language-Driven Physics-Based Scene Synthesis and Editing - b31b1b.svg)](https://arxiv.org/abs/2404.01223) | [](https://github.com/vuer-ai/feature-splatting-inria) | [](https://feature-splatting.github.io/) | ECCV, 2024 |
- Articulated Kinematics Distillation from Video Diffusion Models - b31b1b.svg)](https://arxiv.org/abs/2504.01204)|-|[](https://research.nvidia.com/labs/dir/akd/)|Apr., 2025; CVPR, 2025|
- RainyGS: Efficient Rain Synthesis with Physically-Based Gaussian Splatting - b31b1b.svg)](https://arxiv.org/abs/2503.21442)|-|[](https://pku-vcl-geometry.github.io/RainyGS/)|Mar., 2025; CVPR, 2025|
- PhysGen3D: Crafting a Miniature Interactive World from a Single Image - b31b1b.svg)](https://arxiv.org/abs/2503.20746)|[](https://github.com/by-luckk/PhysGen3D)|[](https://by-luckk.github.io/PhysGen3D/)|Mar., 2025; CVPR, 2025|
- AccidentSim: Generating Physically Realistic Vehicle Collision Videos from Real-World Accident Reports - b31b1b.svg)](https://arxiv.org/abs/2503.20654)|-|[](https://accidentsim.github.io/)|Mar., 2025|
- Synthetic Video Enhances Physical Fidelity in Video Synthesis - b31b1b.svg)](https://www.arxiv.org/abs/2503.20822)|-|[](https://kevinz8866.github.io/simulation/)|Mar., 2025|
- PhysTwin: Physics-Informed Reconstruction and Simulation of Deformable Objects from Videos - b31b1b.svg)](https://arxiv.org/abs/2503.17973)|[](https://github.com/Jianghanxiao/PhysTwin)| [](https://jianghanxiao.github.io/phystwin-web/)|Mar., 2025|
- PhysAnimator: Physics-Guided Generative Cartoon Animation - b31b1b.svg)](https://arxiv.org/abs/2501.16550) | - | - | Jan., 2025 |
- OmniPhysGS: 3D Constitutive Gaussians for General Physics-Based Dynamics Generation - b31b1b.svg)](hhttps://arxiv.org/abs/2501.18982) | - | - | ICLR, 2025 |
- Unleashing the potential of multi-modal foundation models and video diffusion for 4d dynamic physical scene simulation - b31b1b.svg)](https://arxiv.org/abs/2411.14423) | - | [](https://zhuomanliu.github.io/PhysFlow/) | CVPR, 2025 |
- AutoVFX: Physically Realistic Video Editing from Natural Language Instructions - b31b1b.svg)](https://arxiv.org/abs/2411.02394) | [](https://github.com/haoyuhsu/autovfx) | [](https://haoyuhsu.github.io/autovfx-website/) | 3DV, 2025 |
- Gaussian Splashing: Unified Particles for Versatile Motion Synthesis and Rendering - b31b1b.svg)](https://arxiv.org/abs/2401.15318) | - | [](https://gaussiansplashing.github.io/) | CVPR, 2025 |
- FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video - b31b1b.svg)](https://arxiv.org/abs/2503.04720) | - | [](https://yuegao.me/FluidNexus/) | CVPR, 2025 |
- DreamPhysics: Learning Physical Properties of Dynamic 3D Gaussians with Video Diffusion Priors - b31b1b.svg)](https://arxiv.org/abs/2406.01476) | [](https://github.com/tyhuang0428/DreamPhysics) | - | AAAI, 2025 |
- GauSim: Registering Elastic Objects into Digital World by Gaussian Simulator - b31b1b.svg)](https://arxiv.org/abs/2412.17804) | - | [](https://www.mmlab-ntu.com/project/gausim/index.html) | Dec., 2024 |
- GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs - b31b1b.svg)](https://arxiv.org/abs/2412.11258) | [](https://github.com/xxlbigbrother/Gaussian-Property) | [](https://gaussian-property.github.io/) | Dec., 2024 |
- Phys4DGen: A Physics-Driven Framework for Controllable and Efficient 4D Content Generation from a Single Image - b31b1b.svg)](https://arxiv.org/abs/2411.16800) | - | [](https://jiajinglin.github.io/Phys4DGen/) | Nov., 2024 |
- Automated 3D Physical Simulation of Open-world Scene with Gaussian Splatting - b31b1b.svg)](https://arxiv.org/abs/2411.12789) | - | [](https://sim-gs.github.io/) | Nov., 2024 |
- Enhancing Sketch Animation: Text-to-Video Diffusion Models with Temporal Consistency and Rigidity Constraints - b31b1b.svg)](https://arxiv.org/abs/2411.19381) | - | - | Nov., 2024 |
- PhysMotion: Physics-Grounded Dynamics From a Single Image - b31b1b.svg)](https://arxiv.org/abs/2411.17189) | - | [](https://supertan0204.github.io/physmotion_website/) | Nov., 2024 |
- Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis - b31b1b.svg)](https://arxiv.org/abs/2410.07155) | [](https://github.com/YangLing0818/Trans4D) | - | Oct., 2024 |
- Phy124: Fast Physics-Driven 4D Content Generation from a Single Image - b31b1b.svg)](https://arxiv.org/abs/2409.07179) | - | - | Sep., 2024 |
- Kubrick: Multimodal Agent Collaborations for Synthetic Video Generation - b31b1b.svg)](https://arxiv.org/abs/2408.10453) | - | - | Aug., 2024 |
- Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion - b31b1b.svg)](https://arxiv.org/abs/2406.04338) | [](https://github.com/liuff19/Physics3D) | [](https://liuff19.github.io/Physics3D/) | Jun., 2024 |
- Sync4D: Video Guided Controllable Dynamics for Physics-Based 4D Generation - b31b1b.svg)](https://arxiv.org/abs/2405.16849) | - | [](https://sync4dphys.github.io/) | May., 2024 |
- ElastoGen: 4D Generative Elastodynamics - b31b1b.svg)](https://arxiv.org/abs/2405.15056) | - | [](https://anunrulybunny.github.io/elastogen/) | May, 2024 |
- MotionCraft: Physics-based Zero-Shot Video Generation - b31b1b.svg)](https://arxiv.org/abs/2405.13557) | [](https://github.com/mezzelfo/MotionCraft) | [](https://mezzelfo.github.io/MotionCraft/) | Nips, 2024 |
- PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation - b31b1b.svg)](https://arxiv.org/abs/2409.18964) | [](https://github.com/stevenlsw/physgen) | [](https://stevenlsw.github.io/physgen/) | ECCV, 2024 |
- PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation - b31b1b.svg)](https://arxiv.org/abs/2404.13026) | [](https://github.com/a1600012888/PhysDreamer) | [](https://physdreamer.github.io/) | ECCV, 2024 Oral |
- Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video - b31b1b.svg)](https://arxiv.org/abs/2404.09833) | [](https://github.com/video2game/video2game) | [](https://video2game.github.io/) | CVPR, 2024 |
- PIE-NeRF: Physics-based Interactive Elastodynamics with NeRF - b31b1b.svg)](https://arxiv.org/abs/2311.13099) | [](https://github.com/FYTalon/pienerf) | [](https://fytalon.github.io/pienerf/) | CVPR, 2024 |
- VR-GS: A Physical Dynamics-Aware Interactive Gaussian Splatting System in Virtual Reality - b31b1b.svg)](https://arxiv.org/abs/2401.16663) | - | [](https://yingjiang96.github.io/VR-GS/) | SIGGRAPH, 2024 |
- PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics - b31b1b.svg)](https://arxiv.org/abs/2311.12198) | [](https://github.com/XPandora/PhysGaussian) | []( https://xpandora.github.io/PhysGaussian/ ) | CVPR, 2024 |
- Feature Splatting: Language-Driven Physics-Based Scene Synthesis and Editing - b31b1b.svg)](https://arxiv.org/abs/2404.01223) | [](https://github.com/vuer-ai/feature-splatting-inria) | [](https://feature-splatting.github.io/) | ECCV, 2024 |
- Neural Material Adaptor for Visual Grounding of Intrinsic Dynamics - b31b1b.svg)](https://arxiv.org/abs/2410.08257) | [](https://github.com/XJay18/NeuMA) | [](https://xjay18.github.io/projects/neuma.html) | NIPS, 2024 |
- Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis - b31b1b.svg)](https://arxiv.org/abs/2308.09713) | [](https://github.com/JonathonLuiten/Dynamic3DGaussians) | [](https://dynamic3dgaussians.github.io/) | 3DV, 2024 |
- LLM-grounded Video Diffusion Models - b31b1b.svg)](https://arxiv.org/abs/2309.17444) | [](https://github.com/TonyLianLong/LLM-groundedVideoDiffusion) | [](https://llm-grounded-video-diffusion.github.io/) | ICLR, 2024 |
- Compositional 3D-aware Video Generation with LLM Director - b31b1b.svg)](https://arxiv.org/abs/2409.00558) | - | [](https://www.microsoft.com/en-us/research/project/compositional-3d-aware-video-generation/) | NIPS, 2024 |
- GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning - b31b1b.svg)](https://arxiv.org/abs/2311.12631) | [](https://github.com/jiaxilv/GPT4Motion) | [](https://gpt4motion.github.io/) | CVPR, 2024, workshop |
- DeformGS: Scene Flow in Highly Deformable Scenes for Deformable Object Manipulation - b31b1b.svg)](https://arxiv.org/abs/2312.00583) | [](https://github.com/momentum-robotics-lab/deformgs) | [](https://deformgs.github.io/) | WAFR 2024 |
- Learning Neural Constitutive Laws From Motion Observations for Generalizable PDE Dynamics - b31b1b.svg)](https://arxiv.org/abs/2304.14369) | [](https://github.com/PingchuanMa/NCLaw) | [](https://sites.google.com/view/nclaw) | ICML, 2023 |
- Pac-nerf: Physics Augmented Continuum Neural Radiance Fields for Geometry-Agnostic System Identification - b31b1b.svg)](https://arxiv.org/abs/2303.05512) | [](https://github.com/xuan-li/PAC-NeRF) | [](https://sites.google.com/view/PAC-NeRF) | ICLR, 2023, Spotlight |
- Feature Splatting: Language-Driven Physics-Based Scene Synthesis and Editing - b31b1b.svg)](https://arxiv.org/abs/2404.01223) | [](https://github.com/vuer-ai/feature-splatting-inria) | [](https://feature-splatting.github.io/) | ECCV, 2024 |
- C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation - b31b1b.svg)](https://arxiv.org/abs/2502.19868)|[](https://github.com/WesLee88524/C-Drag-Official-Repo)|-|Feb., 2025|
- PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation - b31b1b.svg)](https://arxiv.org/abs/2404.13026) | [](https://github.com/a1600012888/PhysDreamer) | [](https://physdreamer.github.io/) | ECCV, 2024 Oral |
- Feature Splatting: Language-Driven Physics-Based Scene Synthesis and Editing - b31b1b.svg)](https://arxiv.org/abs/2404.01223) | [](https://github.com/vuer-ai/feature-splatting-inria) | [](https://feature-splatting.github.io/) | ECCV, 2024 |
- Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning - b31b1b.svg)](https://arxiv.org/abs/2504.15932)|-|-|Apr., 2025|
- Teaching Video Diffusion Model with Latent Physical Phenomenon Knowledge - b31b1b.svg)](https://arxiv.org/abs/2411.11343)|[](https://github.com/caoql98/TVML)|[](https://qinglongcao.xyz/TVML-Diffusion.github.io/)| Nov., 2024 |
- PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation - b31b1b.svg)](https://arxiv.org/abs/2404.13026) | [](https://github.com/a1600012888/PhysDreamer) | [](https://physdreamer.github.io/) | ECCV, 2024 Oral |
- PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation - b31b1b.svg)](https://arxiv.org/abs/2404.13026) | [](https://github.com/a1600012888/PhysDreamer) | [](https://physdreamer.github.io/) | ECCV, 2024 Oral |
- Feature Splatting: Language-Driven Physics-Based Scene Synthesis and Editing - b31b1b.svg)](https://arxiv.org/abs/2404.01223) | [](https://github.com/vuer-ai/feature-splatting-inria) | [](https://feature-splatting.github.io/) | ECCV, 2024 |
-
Benchmarks and Metrics
- T2VPhysBench: A First-Principles Benchmark for Physical Consistency in Text-to-Video Generation - b31b1b.svg)](https://arxiv.org/abs/2505.00337)|-|-|May, 2025|
- Direct Motion Models for Assessing Generated Videos - b31b1b.svg)](https://arxiv.org/abs/2505.00209)|[](https://github.com/google-deepmind/tapnet)|[](https://trajan-paper.github.io/)|Apr., 2025|
- Morpheus: Benchmarking Physical Reasoning of Video Generative Models with Real Physical Experiments - b31b1b.svg)](https://arxiv.org/abs/2504.02918)|-|-|Apr., 2025|
- A Unified Evaluation Benchmark for World Generation - b31b1b.svg)](https://arxiv.org/abs/2504.00983)| [](https://github.com/haoyi-duan/WorldScore)| [](https://haoyi-duan.github.io/WorldScore/)|Apr., 2025|
- HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation - b31b1b.svg)](https://arxiv.org/abs/2503.23715)|-| [](https://liuqi-creat.github.io/HOIGen.github.io/)|Mar., 2025; CVPR, 2025|
- Cognitive Science-Inspired Evaluation of Core Capabilities for Object Understanding in AI - b31b1b.svg)](https://arxiv.org/abs/2503.21668)|-|-|Mar., 2025|
- Impossible Videos - b31b1b.svg)](https://arxiv.org/abs/2503.14378)|[](https://github.com/showlab/Impossible-Videos)|[](https://showlab.github.io/Impossible-Videos/)|Mar., 2025 |
- VideoPhy-2: A Challenging Action-Centric Physical Commonsense Evaluation in Video Generation - b31b1b.svg)](https://arxiv.org/abs/2503.06800)|[](https://github.com/Hritikbansal/videophy)|[](https://videophy2.github.io/)|Mar., 2025 |
- A physical coherence benchmark for evaluating video generation models via optical flow-guided frame prediction - b31b1b.svg)](https://www.arxiv.org/abs/2502.05503) | [](https://github.com/Jeckinchen/PhyCoBench) | - | Feb., 2025 |
- ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation - b31b1b.svg)](https://arxiv.org/abs/2406.18522)|[](https://github.com/PKU-YuanGroup/ChronoMagic-Bench)|[](https://pku-yuangroup.github.io/ChronoMagic-Bench/)|NeurIPS, 2024, Spotlight |
- What You See Is What Matters: A Novel Visual and Physics-Based Metric for Evaluating Video Generation Quality - b31b1b.svg)](https://arxiv.org/abs/2411.13609)|-|-| Nov., 2024 |
- Towards world simulator: Crafting physical commonsense-based benchmark for video generation - b31b1b.svg)](https://arxiv.org/abs/2410.05363) | [](https://github.com/OpenGVLab/PhyGenBench) | [](https://phygenbench123.github.io/) | Oct., 2024 |
- WorldSimBench: Towards Video Generation Models as World Simulators - b31b1b.svg)](https://arxiv.org/abs/2410.18072)|-|[](https://iranqin.github.io/WorldSimBench.github.io/)|Oct., 2024 |
- Phybench: A physical commonsense benchmark for evaluating text-to-image model - b31b1b.svg)](https://arxiv.org/abs/2406.11802) | - | - | Jun., 2024 |
- Videophy: Evaluating physical commonsense for video generation - b31b1b.svg)](https://arxiv.org/abs/2406.03520) | [](https://github.com/Hritikbansal/videophy) | [](https://videophy.github.io/) | Jun., 2024 |
- Videocon: Robust video-language alignment via contrast captions - b31b1b.svg)](https://arxiv.org/abs/2311.10111) | [](https://github.com/Hritikbansal/videocon) | [](https://video-con.github.io/) | CVPR, 2024 |
- Physion++: Evaluating physical scene understanding that requires online inference of different physical properties - b31b1b.svg)]([](https://arxiv.org/abs/2106.08261)) | - | [](https://dingmyu.github.io/physion_v2/) | Nips, 2023 |
- Craft: A benchmark for causal reasoning about forces and interactions - b31b1b.svg)](https://arxiv.org/abs/2012.04293) | [](https://github.com/hucvl/craft) | [](https://sites.google.com/view/craft-benchmark) | ACL, 2022 |
- Physion: Evaluating physical prediction from vision in humans and machines - b31b1b.svg)](https://arxiv.org/abs/2106.08261) | [](https://github.com/cogtoolslab/physics-benchmarking-neurips2021) | [](https://physion-benchmark.github.io/) | Nips, 2021 |
- Pisa:experiments: Exploring physics post-training for video diffusion models by watching stuff drop - b31b1b.svg)](https://arxiv.org/abs/2503.09595) | [](https://github.com/vision-x-nyu/pisa-experiments) | - | Mar., 2025 |
- Do generative video models learn physical principles from watching videos? - b31b1b.svg)](https://arxiv.org/abs/2501.09038) | [](https://github.com/google-deepmind/physics-IQ-benchmark) | [](https://physics-iq.github.io/) | Jan., 2025 |
- Llmphy: Complex physical reasoning using large language models and world models - b31b1b.svg)](https://arxiv.org/abs/2411.08027) | - | - | Nov., 2024 |
- Wisa: World simulator assistant for physics-aware text-to-video generation - b31b1b.svg)](https://arxiv.org/abs/2503.08153) | [](https://github.com/360CVGroup/WISA) | [](https://360cvgroup.github.io/WISA/) | Mar., 2025 |
-
Active Cognition for World Simulation
- Cosmos world foundation model platform for physical ai - b31b1b.svg)](https://arxiv.org/abs/2501.03575) | [](https://github.com/nvidia-cosmos/cosmos-predict1) | [  ](https://www.nvidia.com/en-us/ai/cosmos/) | Jan., 2025 |
- Aether: Geometric-Aware Unified World Modeling - b31b1b.svg)](https://arxiv.org/abs/2503.18945)| [](https://github.com/OpenRobotLab/Aether)|[](https://aether-world.github.io/)|Mar., 2025|
- AdaWorld: Learning Adaptable World Models with Latent Actions - b31b1b.svg)](https://arxiv.org/abs/2503.18938)|[](https://github.com/Little-Podi/AdaWorld)| [](https://adaptable-world-model.github.io/)|Mar., 2025|
- Ipo: Iterative preference optimization for text-to-video generation - b31b1b.svg)](https://arxiv.org/abs/2502.02088) | [](https://github.com/SAIS-FUXI/IPO) | - | Feb, 2025 |
- Improving video generation with human feedback - b31b1b.svg)](https://arxiv.org/abs/2501.13918) | [](https://github.com/KwaiVGI/VideoAlign) | [](https://gongyeliu.github.io/videoalign/) | Jan., 2025 |
- Phyt2v: Llm-guided iterative self-refinement for physics-grounded text-to-video generation - b31b1b.svg)](https://arxiv.org/abs/2412.00596) | [](https://github.com/pittisl/PhyT2V) | - | CVPR, 2025 |
- Dream to manipulate: Compositional world models empowering robot imitation learning with imagination - b31b1b.svg)](https://arxiv.org/abs/2412.14957) | [](https://github.com/leobarcellona/drema_code) | [](https://dreamtomanipulate.github.io/) | ICLR, 2025 |
- MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators - b31b1b.svg)](https://arxiv.org/abs/2404.05014) | [](https://github.com/PKU-YuanGroup/MagicTime) | [](https://pku-yuangroup.github.io/MagicTime/) | TPAMI, 2025 |
- ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation - b31b1b.svg)](https://arxiv.org/abs/2403.08321) | - | [](https://guanxinglu.github.io/ManiGaussian/) | ECCV, 2025 |
- Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback - b31b1b.svg)](https://arxiv.org/abs/2412.02617) | - | [](https://sites.google.com/view/aif-dynamic-t2v/) | Dec., 2024 |
- Physical informed driving world model - b31b1b.svg)](https://arxiv.org/abs/2412.08410) | - | [](https://metadrivescape.github.io/papers_project/DrivePhysica/page.html) | Dec., 2024 |
- How far is video generation from world model: A physical law perspective - b31b1b.svg)](https://arxiv.org/abs/2411.02385) | [](https://github.com/phyworld/phyworld) | [](https://phyworld.github.io/) | Nov., 2024 |
- Videoagent: Self-improving video generation - b31b1b.svg)](https://arxiv.org/abs/2410.10076) | [](https://github.com/Video-as-Agent/VideoAgent) | [](https://video-as-agent.github.io/) | Oct., 2024 |
- Gen-drive: Enhancing diffusion generative driving policies with reward modeling and reinforcement learning fine-tuning - b31b1b.svg)](https://arxiv.org/abs/2410.05582) | - | [](https://mczhi.github.io/GenDrive/) | Oct, 2024 |
- Drivedreamer4d: World models are effective data machines for 4d driving scene representation - b31b1b.svg)](https://arxiv.org/abs/2410.13571) | [](https://github.com/GigaAI-research/DriveDreamer4D) | [](https://drivedreamer4d.github.io/) | Oct, 2024 |
- Open-sora: Democratizing efficient video production for all - b31b1b.svg)](https://arxiv.org/abs/2412.20404) | [](https://github.com/hpcaitech/Open-Sora) | [](https://hpcaitech.github.io/Open-Sora/) | Dec., 2024 |
- Imagen 3 - b31b1b.svg)](https://arxiv.org/abs/2408.07009) | - | [](https://deepmind.google/technologies/imagen-3/) | Aug., 2024 |
- Genie: ยจ Generative interactive environments - b31b1b.svg)](https://arxiv.org/abs/2402.15391) | - | [](https://sites.google.com/view/genie-2024/?pli=1) | Feb., 2024 |
- Worlddreamer: Towards general world models for video generation via predicting masked tokens - b31b1b.svg)](https://arxiv.org/abs/2401.09985) | [](https://github.com/JeffWang987/WorldDreamer) | [](https://world-dreamer.github.io/) | Jan., 2024 |
- Learning interactive real-world simulators - b31b1b.svg)](https://arxiv.org/abs/2310.06114) | - | [](https://universal-simulator.github.io/unisim/) | ICLR, 2024, Outstanding Paper Award |
- Physically embodied gaussian splatting: A visually learnt and physically grounded 3d representation for robotics - | - | [](https://embodied-gaussians.github.io/) | CoRL, 2024 |
- Gaia-1: A generative world model for autonomous driving - b31b1b.svg)](https://arxiv.org/abs/2309.17080) | - | []( https://wayve.ai/thinking/introducing-gaia1/) | Sep., 2023 |
- Science-T2I: Addressing Scientific Illusions in Image Synthesis - b31b1b.svg)](https://arxiv.org/abs/2504.13129)|[](https://github.com/Jialuo-Li/Science-T2I)|[](https://jialuo-li.github.io/Science-T2I-Web/)|CVPR, 2025|
- MirrorVerse: Pushing Diffusion Models to Realistically Reflect the World - b31b1b.svg)](https://arxiv.org/abs/2504.15397)|-|[](https://mirror-verse.github.io/)|CVPR, 2025|
-