Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Awesome-Video-Diffusion
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
https://github.com/showlab/Awesome-Video-Diffusion
Last synced: 1 day ago
JSON representation
-
Table of Contents <!-- omit in toc -->
-
Video Generation
- ![Star - research/magic-animate)
- FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax
- ![Star - ly/FlowZero)
- Sketch Video Synthesis
- ![Star
- Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
- Decouple Content and Motion for Conditional Image-to-Video Generation
- FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline
- ![Star - forever/KandinskyVideo)
- Fine-Grained Open Domain Image Animation with Motion Guidance
- ![Star - anything)
- GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
- ![Star
- MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer
- ![Star
- MoVideo: Motion-Aware Video Generation with Diffusion Models
- Make Pixels Dance: High-Dynamic Video Generation
- Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning
- Optimal Noise pursuit for Augmenting Text-to-Video Generation
- VideoDreamer: Customized Multi-Subject Text-to-Video Generation with Disen-Mix Finetuning
- ![Star
- SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction
- ![Star
- FreeNoise: Tuning-Free Longer Video Diffusion Via Noise Rescheduling
- ![Star - qiu/LongerCrafter)
- DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
- ![Star - Wu/LAMP)
- Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
- ![Star
- LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models
- ![Star
- ![Star
- Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator
- ![Star - Bloom)
- Hierarchical Masked 3D Diffusion Model for Video Outpainting
- Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation
- ![Star
- VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation
- MagicAvatar: Multimodal Avatar Generation and Animation
- ![Star - research/magic-avatar)
- Empowering Dynamics-aware Text-to-Video Diffusion with Large Language Models
- ![Star
- SimDA: Simple Diffusion Adapter for Efficient Video Generation
- ![Star
- ModelScope Text-to-Video Technical Report
- Dual-Stream Diffusion Net for Text-to-Video Generation
- InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
- ![Star
- Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation
- ![Star - A-Story)
- AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
- ![Star
- DisCo: Disentangled Control for Referring Human Dance Generation in Real World
- ![Star - CN/DisCo)
- VideoComposer: Compositional Video Synthesis with Motion Controllability
- ![Star - vilab/videocomposer)
- Probabilistic Adaptation of Text-to-Video Models
- Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance
- ![Star - Your-Video)
- Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising
- ![Star - U-N/Gen-L-Video)
- ![Star - Chen/control-a-video)
- ![Star
- Cinematic Mindscapes: High-quality Video Reconstruction from Brain Activity
- ![Star
- Any-to-Any Generation via Composable Diffusion
- Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models
- ![Star - Code/tree/main/i-Code-V3)
- VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation
- LaMD: Latent Motion Diffusion for Video Generation
- Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
- Text2Performer: Text-Driven Human Video Generation
- ![Star
- Generative Disco: Text-to-Video Generation for Music Visualization
- Latent-Shift: Latent Diffusion with Temporal Shift
- DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion
- Seer: Language Instructed Video Prediction with Latent Diffusion Models
- Text2video-Zero: Text-to-Image Diffusion Models Are Zero-Shot Video Generators
- ![Star - AI-Research/Text2Video-Zero)
- Conditional Image-to-Video Generation with Latent Flow Diffusion Models
- ![Star
- Decomposed Diffusion Models for High-Quality Video Generation
- ![Star
- Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos
- ![Star
- Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos
- ![Star - physics-sound-diffusion)
- Video Probabilistic Diffusion Models in Projected Latent Space
- ![Star - yu/PVDM)
- Learning 3D Photography Videos via Self-supervised Diffusion on Single Images
- Structure and Content-Guided Video Synthesis With Diffusion Models
- Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
- ![Star - A-Video)
- Mm-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
- ![Star - Diffusion)
- Magvit: Masked Generative Video Transformer
- ![Star
- VIDM: Video Implicit Diffusion Models
- ![Star
- Latent Video Diffusion Models for High-Fidelity Video Generation With Arbitrary Lengths
- Imagen Video: High Definition Video Generation With Diffusion Models
- Make-A-Video: Text-to-Video Generation without Text-Video Data
- Diffusion Models for Video Prediction and Infilling
- SinFusion: Training Diffusion Models on a Single Image or Video
- ![Star - code)
- MagicVideo: Efficient Video Generation With Latent Diffusion Models
- ![Star - pytorch)
- Video Diffusion Models
- Diffusion Probabilistic Modeling for Video Generation
- ![Star
- MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation
- VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM
- ![Star - vicml/Trailblazer)
- FlashVideo: A Framework for Swift Inference in Text-to-Video Generation
- I2V-Adapter: A General Image-to-Video Adapter for Video Diffusion Models
- A Recipe for Scaling up Text-to-Video Generation with Text-free Videos
- PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models
- ![Star - mmlab/PIA)
- VideoPoet: A Large Language Model for Zero-Shot Video Generation
- InstructVideo: Instructing Video Diffusion Models with Human Feedback
- VideoLCM: Video Latent Consistency Model
- ![Star
- FreeInit: Bridging Initialization Gap in Video Diffusion Models
- ![Star
- Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution
- ![Star - A-Video)
- DreaMoving: A Human Video Generation Framework based on Diffusion Models
- ![Star - project)
- Photorealistic Video Generation with Diffusion Models
- MotionCrafter: One-Shot Motion Customization of Diffusion Models
- ![Star
- AnimateZero: Video Diffusion Models are Zero-Shot Image Animators
- ![Star
- AVID: Any-Length Video Inpainting with Diffusion Model
- ![Star - zx/AVID)
- MTVG : Multi-text Video Generation with Text-to-Video Models
- ![Star - vilab/i2vgen-xl)
- Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation
- GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation
- GenDeF: Learning Generative Deformation Field for Video Generation
- ![Star - uofa/GenDeF)
- ![Star
- F3-Pruning: A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text-to-Video Synthesis
- DreamVideo: High-Fidelity Image-to-Video Generation with Image Retention and Text Guidance
- ![Star
- LivePhoto: Real Image Animation with Text-guided Motion Control
- VideoBooth: Diffusion-based Video Generation with Image Prompts
- ![Star
- ![Star
- Fine-grained Controllable Video Generation via Object Appearance and Context
- StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter
- ![Star
- MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
- ART•V: Auto-Regressive Text-to-Video Generation with Diffusion Models
- ![Star
- VideoAssembler: Identity-Consistent Video Generation with Reference Entities using Diffusion Model
- Smooth Video Synthesis with Noise Constraints on Diffusion Models for One-shot Video Tuning
- ![Star
- ![Star
- MotionZero:Exploiting Motion Priors for Zero-shot Text-to-Video Generation
- MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
- FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention
- StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
- ![Star - AI-Research/StreamingT2V)
- ![Star - generative-vision/champ)
- Intention-driven Ego-to-Exo Video Generation
- VideoAgent: Self-Improving Video Generation
- PEEKABOO: Interactive Video Generation via Masked-Diffusion
- ![Star - forever/KandinskyVideo)
- WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
- ![Star
- CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
- UniVG: Towards UNIfied-modal Video Generation
- VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
- ![Star - CVC/VideoCrafter)
- ![Dataset
- ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation
- ![Star - AI-Lab/ConsistI2V)
- Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
- Learn the Force We Can: Enabling Sparse Motion Control in Multi-Object Video Generation
- ![Star
- Efficient Video Prediction via Sparsely Conditioned Flow Matching
- ![Star
- Lumiere: A Space-Time Diffusion Model for Video Generation
- ActAnywhere: Subject-Aware Video Background Generation
- MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
- ![Star - YuanGroup/MagicTime)
- Boximator: Generating Rich and Controllable Motions for Video Synthesis
- CogVideoX: Text-to-video generation
- ![Star
- VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis
- Real-Time Video Generation with Pyramid Attention Broadcast
- ![Star - HPC-AI-Lab/VideoSys)
- xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
- ![Star - videosyn)
- Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
- Magic-Me: Identity-Specific Video Customized Diffusion
- ![Star - Dong/Magic-Me)
- 360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model
- RAVEN: Rethinking Adversarial Video Generation with Efficient Tri-plane Networks
- Latte: Latent Diffusion Transformer for Video Generation
- ![Star
- ![Star
- ![Star
- ![Star - vilab/i2vgen-xl)
- ![Star - videosyn)
- ![Star
- ![Star
- ![Star - mmlab/Live2Diff)
- ![Star
- ![Star
- ![Star - ml/cond-image-leakage/tree/main?tab=readme-ov-file)
- ![Star - Diffusion_public)
- ![Star - VLLM/Lumina-T2X)
- ![Star - Zero)
- ![Star - CVC/VideoCrafter)
- ![Star
- ![Star - project)
- ![Star
- ![Star
- ![Star
- ![Star - ly/FlowZero)
- ![Star - anything)
- ![Star
- ![Star
- ![Star
- ![Star
- ![Star
- ![Star
- ![Star
- ![Star - A-Story)
- ![Star - CN/DisCo)
- ![Star
- ![Star - Your-Video)
- ![Star
- ![Star
- ![Star
- ![Star
- ![Star
- VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models
- ![Star
- Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer
- FIFO-Diffusion: Generating Infinite Videos from Text without Training
- ![Star - Diffusion_public)
- Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models
- Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
- ![Star - VLLM/Lumina-T2X)
- StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
- ![Star - NKU/StoryDiffusion)
- TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models
- ID-Animator: Zero-Shot Identity-Preserving Human Video Generation
- ![Star - Animator/ID-Animator)
- AnimateZoo: Zero-shot Video Generation of Cross-Species Animation via Subject Alignment
- ![Star
- ![Star
- TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models
- ![Star
- Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model
- ![Star - ml/cond-image-leakage/tree/main?tab=readme-ov-file)
- ![Star - Zero)
- ![Star - turbo)
- Progressive Autoregressive Video Diffusion Models
- ![Star
- T2V-Turbo-v2: Enhancing Video Generation Model Post-Training Through Data, Reward, and Conditional Guidance Design
- T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback
- Video-Infinity: Distributed Long Video Generation
- MotionBooth: Motion-Aware Customized Text-to-Video Generation
- Text-Animator: Controllable Visual Text Video Generation
- UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation
- ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning
- Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models
- ![Star - mmlab/Live2Diff)
- Video Diffusion Alignment via Reward Gradient
- VEnhancer: Generative Space-Time Enhancement for Video Generation
- ![Star
- ![Star
-
Controllable Video Generation
- DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory
- ControlVideo: Training-free Controllable Text-to-Video Generation
- Motion-Conditioned Diffusion Model for Controllable Video Synthesis
- Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions
- TrailBlazer: Trajectory Control for Diffusion-Based Video Generation
- Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
- SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models
- Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models
- HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation
- ![Star
- ![Star - cn/Cinemo)
- VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
- Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance
- Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention
- Expressive Whole-Body 3D Gaussian Avatar
- Motion-Zero: Zero-Shot Moving Object Control Framework for Diffusion-Based Video Generation
- Animate Your Motion: Turning Still Images into Dynamic Videos
- Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics
- Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches
- ControlNeXt: Powerful and Efficient Control for Image and Video Generation
- ![Star - research/ControlNeXt)
- TrackGo: A Flexible and Efficient Method for Controllable Video Generation
- EasyControl: Transfer ControlNet to Video Diffusion for Controllable Generation and Interpolation
- ![Star - research/ControlNeXt)
- ![Star
- ![Star - stu/ImageConductor)
- ![Star - generative-vision/champ)
- CameraCtrl: Enabling Camera Control for Video Diffusion Models
- LumiSculpt: A Consistency Lighting Control Network for Video Generation
- FRAMER: INTERACTIVE FRAME INTERPOLATION
- ![Star - uofa/Framer)
- CamI2V: Camera-Controlled Image-to-Video Diffusion Model
- ![Star
- MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
- ![Star - Video)
- MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
- Training-free Camera Control for Video Generation
- ![Star - qiu/FreeTraj)
- Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models
- LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control
- ![Star
- Image Conductor: Precision Control for Interactive Video Synthesis
- ![Star - stu/ImageConductor)
- Still-Moving: Customized Video Generation without Customized Video Data
-
Open-source Toolboxes and Foundation Models
- ![Star - 1)
- ![Star
- ![Star - XL)
- ![Website
- ![Website
- I2VGen-XL (image-to-video / video-to-video)
- ![Website - 9cf)](https://modelscope.cn/models/damo/Image-to-Video/summary)
- ![Website - 9cf)](https://modelscope.cn/models/damo/Video-to-Video/summary)
- ![Star - AI/generative-models)
- ![Star - to-video-synthesis-colab)
- ModelScope (Text-to-video synthesis)
- ![Star
- Diffusers (Text-to-video synthesis)
- ![Star
- Open-Sora-Plan
- Open-Sora
- Stable Video Diffusion
- Show-1
- text-to-video-synthesis-colab
- ![Star
- zeroscope_v2
- ![Star - YuanGroup/Open-Sora-Plan)
- ![Star - Sora)
- ![Star - XL)
- Mochi 1
- ![Star
- Pyramidal Flow Matching for Efficient Video Generative Modeling
- ![Star - Flow)
-
Evaluation Benchmarks and Metrics
- T2VScore: Towards A Better Metric for Text-to-Video Generation
- VBench: Comprehensive Benchmark Suite for Video Generative Models
- EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
- Frechet Video Motion Distance: A Metric for Evaluating Motion Consistency in Videos
- ![Star - Lab/FVMD-frechet-video-motion-distance)
- T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation
- ![Star - CompBench)
- FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation
- ![Star
- ![Star
- StoryBench: A Multifaceted Benchmark for Continuous Story Visualization
- ![Star
- ![Star
- ![Star
- ![Star - research/Panda-70M)
- ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation
- ![Star - YuanGroup/ChronoMagic-Bench)
- VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models
- ![Star
- Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
- ![Star - research/Panda-70M)
- Evaluation of Text-to-Video Generation Models: A Dynamics Perspective
- ![Star
-
New Video Generation Benchmark and Metrics
-
Video Editing
- Object-Centric Diffusion for Efficient Video Editing
- VASE: Object-Centric Shape and Appearance Manipulation of Real Videos
- ![Star
- FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis
- ![Star - LiangF/FlowVid)
- Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis
- RealCraft: Attention Control as A Solution for Zero-shot Long Video Editing
- VidToMe: Video Token Merging for Zero-Shot Video Editing
- ![Star
- A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing
- ![Star - Inv/stem-inv)
- Neutral Editing Framework for Diffusion-based Video Editing
- DiffusionAtlas: High-Fidelity Consistent Diffusion Video Editing
- RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models
- ![Star - lab/RAVE)
- SAVE: Protagonist Diversification with Structure Agnostic Video Editing
- MagicStick: Controllable Video Editing via Control Handle Transformations
- ![Star
- VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
- ![Star
- DragVideo: Interactive Drag-style Video Editing
- ![Star - Official)
- Drag-A-Video: Non-rigid Video Editing with Point-based Interaction
- Motion-Conditioned Image Animation for Video Editing
- Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer
- Cut-and-Paste: Subject-Driven Video Editing with Attention Control
- LatentWarp: Consistent Diffusion Latents for Zero-Shot Video-to-Video Translation
- Fuse Your Latents: Video Editing with Multi-source Latent Diffusion Models
- DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing
- Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models
- BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models
- ![Star - Motion-Customization)
- MotionEditor: Editing Video Motion via Content-Aware Diffusion
- ![Star - Rings/MotionEditor)
- ![Star - A-Video/Ground-A-Video)
- CCEdit: Creative and Controllable Video Editing via Diffusion Models
- MagicProp: Diffusion-based Video Editing via Motion-aware Appearance Propagation
- MagicEdit: High-Fidelity and Temporally Coherent Video Editing
- ![Star - research/magic-edit)
- StableVideo: Text-driven Consistency-aware Diffusion Video Editing
- ![Star
- CoDeF: Content Deformation Fields for Temporally Consistent Video Processing
- ![Star
- TokenFlow: Consistent Diffusion Features for Consistent Video Editing
- ![Star
- INVE: Interactive Neural Video Editing
- VidEdit: Zero-Shot and Spatially Aware Text-Driven Video Editing
- Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
- ![Star
- ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing
- ![Star - ml/controlvideo)
- Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts
- ![Star - A-Protagonist/Make-A-Protagonist)
- Soundini: Sound-Guided Diffusion for Natural Video Editing
- ![Star - lab/soundini-official)
- ![Star - zero)
- FateZero: Fusing Attentions for Zero-shot Text-based Video Editing
- ![Star
- Pix2video: Video Editing Using Image Diffusion
- Video-P2P: Video Editing with Cross-attention Control
- ![Star - P2P)
- Dreamix: Video Diffusion Models Are General Video Editors
- Shape-Aware Text-Driven Layered Video Editing
- Speech Driven Video Editing via an Audio-Conditioned Diffusion Model
- ![Star
- Diffusion Video Autoencoders: Toward Temporally Consistent Face Video Editing via Disentangled Video Encoding
- ![Star - Video-Autoencoders)
- EVA: Zero-shot Accurate Attributes and Multi-Object Video Editing
- ![Star
- DreamMotion: Space-Time Self-Similarity Score Distillation for Zero-Shot Video Editing
- MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers
- FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing
- ![Star
- Spectral Motion Alignment for Video Motion Transfer using Diffusion Models
- Edit-A-Video: Single Video Editing with Object-Aware Consistency
- VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models
- UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing
- ![Star
- ![Star - LiangF/streamv2v)
- ![Star
- ![Star - AI-Lab/AnyV2V)
- Video Editing via Factorized Diffusion Distillation
- FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video Editing
- CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility
- Unified Editing of Panorama, 3D Scenes, and Videos Through Disentangled Self-Attention Injection
- I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models
- Looking Backward: Streaming Video-to-Video Translation with Feature Banks
- Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices
- ![Star - LiangF/streamv2v)
- ReVideo: Remake a Video with Motion and Content Control
- ![Star - E/ReVideo)
- ![Star
- ViViD: Video Virtual Try-on using Diffusion Models
- ![Star
- Edit-Your-Motion: Space-Time Diffusion Decoupling Learning for Video Motion Editing
- GenVideo: One-shot target-image and shape aware video editing using T2I diffusion models
- AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks
- ![Star - AI-Lab/AnyV2V)
- ![Star
-
Long-form Video Generation and Completion
- ![Star - pytorch)
- NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation
- Flexible Diffusion Modeling of Long Videos
- ![Star - group/flexible-video-diffusion-modeling)
- Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion
- MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation
- Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach
- ![Star - Liu/FVDM)
-
Human or Subject Motion
- Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model
- ![Star
- InterGen: Diffusion-based Multi-human Motion Generation under Complex Interactions
- ![Star
- ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model
- ![Star - zhang/ReMoDiffuse)
- Human Motion Diffusion as a Generative Prior
- ![Star
- Can We Use Diffusion Probabilistic Models for 3d Motion Prediction?
- ![Star - ahn/diffusion-motion-prediction)
- Single Motion Diffusion
- ![Star
- HumanMAC: Masked Motion Completion for Human Motion Prediction
- ![Star
- DiffMotion: Speech-Driven Gesture Synthesis Using Denoising Diffusion Model
- Modiff: Action-Conditioned 3d Motion Generation With Denoising Diffusion Probabilistic Models
- PhysDiff: Physics-Guided Human Motion Diffusion Model
- BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction
- ![Star
- Unifying Human Motion Synthesis and Style Transfer With Denoising Diffusion Probabilistic Models
- ![Star
- Executing Your Commands via Motion Diffusion in Latent Space
- ![Star - latent-diffusion)
- Pretrained Diffusion Models for Unified Human Motion Synthesis
- Diffusion Motion: Generate Text-Guided 3d Human Motion by Diffusion Model
- Human Joint Kinematics Diffusion-Refinement for Stochastic Motion Prediction
- Human Motion Diffusion Model
- ![Star - diffusion-model)
- FLAME: Free-form Language-based Motion Synthesis & Editing
- ![Star
- MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model
- ![Star - zhang/MotionDiffuse)
- Stochastic Trajectory Prediction via Motion Indeterminacy Diffusion
- ![Star
- EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions
- ![Star
- DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation
- ![Star
- ![Star
- Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning
- ![Star - Ryan/DEMO)
- A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights
-
Talking Head Generation
- Listen, Denoise, Action! Audio-Driven Motion Synthesis With Diffusion Models
- DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation
- ![Star - Cheng/DAWN-pytorch)
- HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models
- ![Star
- GestureDiffuCLIP: Gesture Diffusion Model with CLIP Latents
- From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
- Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical and Landmark Loss Optimization
- PersonaTalk: Bring Attention to Your Persona in Visual Dubbing
- Talking With Hands 16.2M: A Large-Scale Dataset of Synchronized Body-Finger Motion and Audio for Conversational Motion Analysis and Synthesis
- Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion
- ![Star
- TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio Motion Embedding and Diffusion Interpolation
- ![Star - generative-vision/hallo)
- Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation
- ![Star - generative-vision/hallo2)
- ![Star
- MimicTalk: Mimicking a personalized and expressive 3D talking face in few minutes
- Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
-
3D / NeRF
- RoomDreamer: Text-Driven 3D Indoor Scene Synthesis with Coherent Geometry and Texture
- Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields
- NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models
- Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction
- ![Star
- Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions
- ![Star - nerf2nerf)
- DiffusioNeRF: Regularizing Neural Radiance Fields with Denoising Diffusion Models
- ![Star
- NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from 3D-aware Diffusion
- SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency
- Shape of Motion: 4D Reconstruction from a Single Video
- ![Star - of-motion/)
- WonderWorld: Interactive 3D Scene Generation from a Single Image
- WonderJourney: Going from Anywhere to Everywhere
- ![Star
- L3DG: Latent 3D Gaussian Diffusion
- ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model
- ![Star
- Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models
- GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation
- ![Star
- MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion
- ![Star
- MultiDiff: Consistent Novel View Synthesis from a Single Image
- ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model
- ![Star
- Vivid-ZOO: Multi-View Video Generation with Diffusion Model
- DiffRF: Rendering-guided 3D Radiance Field Diffusion
- Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text
- ![Star
- YouDream: Generating Anatomically Controllable Consistent Text-to-3D Animals
- ![Star
-
Video Enhancement and Restoration
-
Video Understanding
- ![Star - NJU/PDPP)
- Exploring Diffusion Models for Unsupervised Video Anomaly Detection
- PDPP:Projected Diffusion for Procedure Planning in Instructional Videos
- DiffTAD: Temporal Action Detection with Proposal Denoising Diffusion
- ![Star
- DiffusionRet: Generative Text-Video Retrieval with Diffusion Model
- ![Star
- Refined Semantic Enhancement Towards Frequency Diffusion for Video Captioning
- ![Star
- A Generalist Framework for Panoptic Segmentation of Images and Videos
- ![Star - research/pix2seq)
- Diffusion Action Segmentation
- ![Star
- VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding
- ![Star - Grounder)
-
Healthcare and Biology
- Annealed Score-Based Diffusion Model for Mr Motion Artifact Reduction
- Feature-Conditioned Cascaded Video Diffusion Models for Precise Echocardiogram Synthesis
- Neural Cell Video Synthesis via Optical-Flow Diffusion
- ![arXiv
- MedSora: Optical Flow Representation Alignment Mamba Diffusion Model for Medical Video Generation
- ![Website
-
Long Video / Film Generation
- MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence
- ![Star - uofa/MovieDreamer)
- AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description
- ![Star - Zero)
- AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production
- TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation
- ![Star
- AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation
- ![Star
- DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion
- VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning
- DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework
- Kubrick: Multimodal Agent Collaborations for Synthetic Video Generation
- DreamCinema: Cinematic Transfer with Free Camera and 3D Character
- ![Star - wl20/DreamCinema?tab=readme-ov-file)
- SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama
- ![Star - 100M)
- Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation
- ![Star - TMG/Anim-Director)
- ![Star - wl20/DreamCinema?tab=readme-ov-file)
- CinePreGen: Camera Controllable Video Previsualization via Engine-powered Diffusion
- ARLON: Boosting Diffusion Transformers With Autoregressive Models for Long Video Generation
- Unbounded: A Generative Infinite Game of Character Life Simulation
- Loong: Generating Minute-level Long Videos with Autoregressive Language Models
- DreamCinema: Cinematic Transfer with Free Camera and 3D Character
-
Video Generation with Physical Prior / 3D
- PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation
- ![Star
- ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis
- ![Star
- Compositional 3D-aware Video Generation with LLM Director
- StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos
- ![Star
- Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models
- PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation
- ![Star
- How Far is Video Generation from World Model: A Physical Law Perspective
- ![Star
- ![Star
- PhyGenBench: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
- ![Star
- IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation
-
Human Feedback for Video Generation
- VIDEOSCORE: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
- ![Star - AI-Lab/VideoScore/)
- ![Star - AI-Lab/VideoScore/)
-
Policy Learning with Video Generation
- Any-point Trajectory Modeling for Policy Learning
- ![Star - Trajectory-Model/ATM)
- GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal Conditioned Policy
- ![Star - MG/tree/main)
- Dreamitate: Real-World Visuomotor Policy Learning via Video Generation
- ![Star - columbia/dreamitate)
- This&That: Language-Gesture Controlled Video Generation for Robot Planning
- ![Star - and-that)
-
World Model
-
Motion Customization
- ![Star
- MotionBank: A Large-scale Video Motion Benchmark with Disentangled Rule-based Annotations
- ![Star
- LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation
- Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion
- MotionCtrl: A Unified and Flexible Motion Controller for Video Generation
- ![Star - Research/MotionInversion)
- MotionDirector: Motion Customization of Text-to-Video Diffusion Models
- DreamVideo: Composing Your Dream Videos with Customized Subject and Motion
- ![Star - vilab/VGen)
- ![Star
- Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models
- ![Star - park/Spectral-Motion-Alignment)
- Motion Inversion for Video Customization
- Video Diffusion Models are Training-free Motion Interpreter and Controller
- ![Star
- ![Star
- Zero-Shot Controllable Image-to-Video Animation via Motion Decomposition
- Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion
- Tora: Trajectory-oriented Diffusion Transformer for Video Generation
- ![Star - Wu/LAMP)
- ![Star - qiu/FreeTraj)
- ![Star
- Customizing Motion in Text-to-Video Diffusion Models
- DragAnything: Motion Control for Anything using Entity Representation
- Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control
- MotionClone: Training-Free Motion Cloning for Controllable Video Generation
- FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models
-
Audio Synthesis for Video
- Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis
- ![Star - an-Audio-Code)
- ![Star
- Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
- ![Star - omni/mini-omni)
- Speech To Speech: an effort for an open-sourced and modular GPT4-o
- ![Star - to-speech)
- VMAs: Video-to-Music Generation via Semantic Alignment in Web Music Videos
- STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
- MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization
- Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation
- FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds
- Network Bending of Diffusion Models for Audio-Visual Generation
- ![Star
- Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity
- Video-to-Audio Generation with Hidden Alignment
- ![Star - ldm)
- Read, Watch and Scream! Sound Generation from Text and Video
- ![Star - ai/rewas)
-
AI Safety for Video Generation
-
Try On with Video Generation
-
Efficiency for Video Generation
-
Video Generation with Physical Prior/3D
-
Programming Languages
Categories
Sub Categories
Video Generation
276
Video Editing
99
Controllable Video Generation
44
Human or Subject Motion
42
3D / NeRF
33
Motion Customization
28
Open-source Toolboxes and Foundation Models
28
Long Video / Film Generation
25
Evaluation Benchmarks and Metrics
23
Talking Head Generation
19
Audio Synthesis for Video
19
Video Generation with Physical Prior / 3D
16
Video Understanding
15
Policy Learning with Video Generation
8
Long-form Video Generation and Completion
8
Healthcare and Biology
6
Video Enhancement and Restoration
5
World Model
4
Human Feedback for Video Generation
3
New Video Generation Benchmark and Metrics
3
Efficiency for Video Generation
3
AI Safety for Video Generation
1
Try On with Video Generation
1
Video Generation with Physical Prior/3D
1