An open API service indexing awesome lists of open source software.

https://github.com/foruck/awesome-human-motion

An aggregation of human motion understanding research.
https://github.com/foruck/awesome-human-motion

List: awesome-human-motion

character-control dance-generation human-motion human-motion-analysis human-motion-generation human-motion-synthesis humanoid-control motion-control motion-editing motion-generation motion-synthesis text-to-motion

Last synced: 4 months ago
JSON representation

An aggregation of human motion understanding research.

Awesome Lists containing this project

README

          

# Awesome Human Motion

An aggregation of human motion understanding research; feel free to contribute.

- [Reviews & Surveys](#review)
- [Motion Generation](#motion-generation)
- [Motion Editing](#motion-editing)
- [Motion Stylization](#motion-stylization)
- [Human-Object Interaction](#hoi)
- [Human-Scene Interaction](#hsi)
- [Human-Human Interaction](#hhi)
- [Datasets](#datasets)
- [Humanoid](#humanoid)
- [Bio-stuff](#bio)
- [Human Reconstruction](#motion-reconstruction)
- [Human-Object/Scene/Human Interaction Reconstruction](#hoi/hsi-reconstruction)
- [Motion Controlled Image/Video Generation](#motion-video/image-generation)
- [Human Pose Estimation/Recognition](#pose-estimation)
- [Human Motion Understanding](#motion-understanding)

---


## Reviews & Surveys


## Motion Generation, Text/Speech/Music-Driven



    2026




    • (ArXiv 2026) FrankenMotion: Part-level Human Motion Generation and Composition, Li et al.


    • (ArXiv 2026) CoMoVi: Co-Generation of 3D Human Motions and Realistic Videos, Zhao et al.


    • (WACV 2026) SegMo: Segment-aligned Text to 3D Human Motion Generation, Dang et al.


    • (AAAI 2026) ReAlign: Bilingual Text-to-Motion Generation via Step-Aware Reward-Guided Alignment, Weng et al.


    • (AAAI 2026) FineXtrol: Controllable Motion Generation via Fine-Grained Text, Shen et al.



    2025




    • (NeurIPS 2025) HMVLM: Human Motion-Vision-Lanuage Model via MoE LoRA, Hu et al.


    • (NeurIPS 2025) TransPhase: Deep Compositional Phase Diffusion for Long Motion Sequence Generation, Au et al.


    • (SIGGRAPH Asia 2025) TCM: Learning Human Motion with Temporally Conditional Mamba, Nguyen et al.


    • (TMLR 2025) MoReact: Generating Reactive Motion from Textual Descriptions, Xu et al.


    • (ICCV 2025) Align Your Rhythm: Generating Highly Aligned Dance Poses with Gating-Enhanced Rhythm-Aware Feature Representation, Fan et al.


    • (ICCV 2025) UniEgoMotion: A Unified Model for Egocentric Motion Reconstruction, Forecasting, and Generation, Patel et al.


    • (ICCV 2025) FineMotion: A Dataset and Benchmark with both Spatial and Temporal Annotation for Fine-grained Motion Generation and Editing, Wu et al.


    • (ICCV 2025) PUMPS: Skeleton-Agnostic Point-based Universal Motion Pre-Training for Synthesis in Human Motion Tasks, Mo et al.


    • (ICCV 2025) GENMO: A GENeralist Model for Human MOtion, Li et al.


    • (ICCV 2025) InfiniDreamer: Arbitrarily Long Human Motion Generation via Segment Score Distillation, Zhuo et al.


    • (ICCV 2025) Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data, Fan et al.


    • (ICCV 2025) Morph: A Motion-free Physics Optimization Framework for Human Motion Generation, Li et al.


    • (ICCV 2025) DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding, Cho et al.


    • (ICCV 2025) SemTalk: Holistic Co-speech Motion Generation with Frame-level Semantic Emphasis, Zhang et al.


    • (ICCV 2025) KinMo: Kinematic-aware Human Motion Understanding and Generation, Zhang et al.


    • (ICCV 2025) GestureLSM: Latent Shortcut-based Co-Speech Gesture Generation with Spatial-Temporal Modeling, Liu et al.


    • (ICCV 2025) Motion-2-to-3: Leveraging 2D Motion Data to Boost 3D Motion Generation, Pi et al.


    • (ICCV 2025) MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm, Guo et al.


    • (ICCV 2025) SFControl: Motion Synthesis with Sparse and Flexible Keyjoint Control, Hwang et al.


    • (ICCV 2025) Less Is More: Improving Motion Diffusion Models with Sparse Keyframes, Bae et al.


    • (ICCV 2025) ControlMM: Controllable Masked Motion Generation, Pinyoanuntapong et al.


    • (ICCV 2025) PRIMAL: Physically Reactive and Interactive Motor Model for Avatar Learning, Zhang et al.


    • (ICCV 2025) HERO: Human Reaction Generation from Videos, Yu et al.


    • (ICCV 2025) MotionStreamer: Streaming Motion Generation via Diffusion-based Autoregressive Model in Causal Latent Space, Xiao et al.


    • (ICCV 2025) GenM3: Generative Pretrained Multi-path Motion Model for Text Conditional Human Motion Generation, Shi et al.


    • (ACM MM 2025) ChoreoMuse: Robust Music-to-Dance Video Generation with Style Transfer and Beat-Adherent Motion, Wang et al.


    • (ICML 2025) Being-M0: Scaling Motion Generation Models with Million-Level Human Motions, Wang et al.


    • (TOG 2025) Sketch2Anim: Towards Transferring Sketch Storyboards into 3D Animation, Zhong et al.


    • (SIGGRAPH 2025) MECo: Motion-example-controlled Co-speech Gesture Generation Leveraging Large Language Models, Chen et al.


    • (SIGGRAPH 2025) Chang et al.: Large-Scale Multi-Character Interaction Synthesis, Chang et al.


    • (SIGGRAPH 2025) AnyTop: Character Animation Diffusion with Any Topology, Gat et al.


    • (CVPR 2025) DSDFM: Deterministic-to-Stochastic Diverse Latent Feature Mapping for Human Motion Synthesis, Hua et al.


    • (CVPR 2025) EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation, Hua et al.


    • (CVPR 2025) UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing, Li et al.


    • (CVPR 2025) From Sparse Signal to Smooth Motion: Real-Time Motion Generation with Rolling Prediction Models, Barquero et al.


    • (CVPR 2025) Shape My Moves: Text-Driven Shape-Aware Synthesis of Human Motions, Liao et al.


    • (CVPR 2025) MG-MotionLLM: A Unified Framework for Motion Comprehension and Generation across Multiple Granularities, Wu et al.


    • (CVPR 2025) SALAD: Skeleton-aware Latent Diffusion for Text-driven Motion Generation and Editing, Hong et al.


    • (CVPR 2025) PersonalBooth: Personalized Text-to-Motion Generation, Kim et al.


    • (CVPR 2025) MARDM: Rethinking Diffusion for Text-Driven Human Motion Generation, Meng et al.


    • (CVPR 2025) StickMotion: Generating 3D Human Motions by Drawing a Stickman, Wang et al.


    • (CVPR 2025) LLaMo: Human Motion Instruction Tuning, Li et al.


    • (CVPR 2025) HOP: Heterogeneous Topology-based Multimodal Entanglement for Co-Speech Gesture Generation, Cheng et al.


    • (CVPR 2025) AtoM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward, Han et al.


    • (CVPR 2025) EnergyMoGen: Compositional Human Motion Generation with Energy-Based Diffusion Model in Latent Space, Zhang et al.


    • (CVPR 2025) The Languate of Motion: Unifying Verbal and Non-verbal Language of 3D Human Motion, Chen et al.


    • (CVPR 2025) ScaMo: Exploring the Scaling Law in Autoregressive Motion Generation Model, Lu et al.


    • (CVPR 2025) Move in 2D: 2D-Conditioned Human Motion Generation, Huang et al.


    • (CVPR 2025) SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters, Jiang et al.


    • (CVPR 2025) MVLift: Lifting Motion to the 3D World via 2D Diffusion, Li et al.


    • (CVPR 2025 Workshop) MoCLIP: Motion-Aware Fine-Tuning and Distillation of CLIP for Human Motion Generation, Maldonado et al.


    • (CVPR 2025 Workshop) Dyadic Mamba: Long-term Dyadic Human Motion Synthesis, Tanke et al.


    • (ACM Sensys 2025) SHADE-AD: An LLM-Based Framework for Synthesizing Activity Data of Alzheimer’s Patients, Fu et al.


    • (ICRA 2025) MotionGlot: A Multi-Embodied Motion Generation Model, Harithas et al.


    • (ICLR 2025) CLoSD: Closing the Loop between Simulation and Diffusion for Multi-Task Character Control, Tevet et al.


    • (ICLR 2025) PedGen: Learning to Generate Diverse Pedestrian Movements from Web Videos with Noisy Labels, Liu et al.


    • (ICLR 2025) HGM³: Hierarchical Generative Masked Motion Modeling with Hard Token Mining, Jeong et al.


    • (ICLR 2025) LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning, Li et al.


    • (ICLR 2025) MotionDreamer: One-to-Many Motion Synthesis with Localized Generative Masked Transformer, Wang et al.


    • (ICLR 2025) Lyu et al: Towards Unified Human Motion-Language Understanding via Sparse Interpretable Characterization, Lyu et al.


    • (ICLR 2025) DART: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control, Zhao et al.


    • (ICLR 2025) Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs, Wu et al.


    • (TMM 2025) MCG-IMM: A Plug-and-Play Multi-Criteria Guidance for Diverse In-Betweening Human Motion Generation, Yu et al.


    • (IJCV 2025) Fg-T2M++: LLMs-Augmented Fine-Grained Text Driven Human Motion Generation, Wang et al.


    • (TCSVT 2025) Zeng et al: Progressive Human Motion Generation Based on Text and Few Motion Frames, Zeng et al.


    • (Arxiv 2025) HY-Motion 1.0: Scaling Flow Matching Models for Text-To-Motion Generation, Wen et al.


    • (Arxiv 2025) DeMoGen: Towards Decompositional Human Motion Generation with Energy-Based Diffusion Models, Zhang et al.


    • (Arxiv 2025) Jeong et al: Pose-Guided Residual Refinement for Interpretable Text-to-Motion Generation and Editing, Jeong et al.


    • (Arxiv 2025) FlowerDance: MeanFlow for Efficient and Refined 3D Dance Generation, Yang et al.


    • (ArXiv 2025) OmniMoGen: Unifying Human Motion Generation via Learning from Interleaved Text-Motion Instructions, Bu et al.


    • (ArXiv 2025) MoLingo: Motion–Language Alignment for Text-to-Human Motion Generation, He et al.


    • (ArXiv 2025) FunPhase: A Periodic Functional Autoencoder for Motion Generation via Phase Manifolds, Pegoraro et al.


    • (ArXiv 2025) IRG-MotionLLM: Interleaving Motion Generation, Assessment and Refinement for Text-to-Motion Generation, Li et al.


    • (ArXiv 2025) Kinetic Mining in Context: Few-Shot Action Synthesis via Text-to-Motion Distillations, Cazzola et al.


    • (ArXiv 2025) COMET: Controllable Long-term Motion Generation with Extended Joint Targets, Li et al.


    • (ArXiv 2025) Back to Basics: Motion Representation Matters for Human Motion Generation Using Diffusion Model, Jin et al.


    • (ArXiv 2025) UniMo: Unifying 2D Video and 3D Human Motion with an Autoregressive Framework, Pang et al.


    • (ArXiv 2025) Free3D: 3D Human Motion Emerges from Single-View 2D Supervision, Liu et al.


    • (ArXiv 2025) Pressure2Motion: Hierarchical Motion Synthesis from Ground Pressure with Text Guidance, Li et al.


    • (ArXiv 2025) Mem-MLP: Real-Time 3D Human Motion Generation from Sparse Inputs, Mutlu et al.


    • (ArXiv 2025) The Quest for Generalizable Motion Generation: Data, Model, and Evaluation, Lin et al.


    • (ArXiv 2025) MoSa: Motion Generation with Scalable Autoregressive Modeling, Liu et al.


    • (ArXiv 2025) OmniMotion-X: Versatile Multimodal Whole-Body Motion Generation, Xu et al.


    • (ArXiv 2025) OmniMotion: Multimodal Motion Generation with Continuous Masked Autoregression, Li et al.


    • (ArXiv 2025) No MoCap Needed: Post-Training Motion Diffusion Models with Reinforcement Learning using Only Textual Prompts, Girolamo et al.


    • (ArXiv 2025) Pulp Motion: Framing-aware multimodal camera and human motion generation, Courant et al.


    • (ArXiv 2025) MonSTeR: a Unified Model for Motion, Scene, Text Retrieval, Collorone et al.


    • (ArXiv 2025) MoGIC: Boosting Motion Generation via Intention Understanding and Visual Context, Shi et al.


    • (ArXiv 2025) Gupta et al: Unified Multi-Modal Interactive & Reactive 3D Motion Generation via Rectified Flow, Gupta et al.


    • (ArXiv 2025) LaMoGen: Laban Movement-Guided Diffusion for Text-to-Motion Generation, Kim et al.


    • (ArXiv 2025) LUMA: Low-Dimension Unified Motion Alignment with Dual-Path Anchoring for Text-to-Motion Diffusion Model, Jia et al.


    • (ArXiv 2025) SimDiff: Simulator-constrained Diffusion Model for Physically Plausible Motion Generation, Watanabe et al.


    • (ArXiv 2025) SmooGPT: Stylized Motion Generation using Large Language Models, Zhong et al.


    • (ArXiv 2025) Embracing Aleatoric Uncertainty: Generating Diverse 3D Human Motion, Qin et al.


    • (ArXiv 2025) MotionFLUX: Efficient Text-Guided Motion Generation through Rectified Flow Matching and Preference Alignment, Gao et al.


    • (ArXiv 2025) VimoRAG: Video-based Retrieval-augmented 3D Motion Generation for Motion Language Models, Xu et al.


    • (ArXiv 2025) MSQ: Spatial-Temporal Multi-Scale Quantizationfor Flexible Motion Generation, Wang et al.


    • (ArXiv 2025) X-MoGen: Unified Motion Generation across Humans and Animals, Wang et al.


    • (ArXiv 2025) ReMoMask: Retrieval-Augmented Masked Motion Generation, Li et al.


    • (ArXiv 2025) OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation, Gan et al.


    • (ArXiv 2025) SpeakerVid-5M: A Large-Scale High-Quality Dataset for audio-visual Dyadic Interactive Human Generation, Zhang et al.


    • (ArXiv 2025) EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation, Meng et al.


    • (ArXiv 2025) MOSPA: Human Motion Generation Driven by Spatial Audio, Xu et al.


    • (ArXiv 2025) SnapMoGen: Human Motion Generation from Expressive Texts, Wang et al.


    • (ArXiv 2025) MOST: Motion Diffusion Model for Rare Text via Temporal Clip Banzhaf Interaction, Wang et al.


    • (ArXiv 2025) Grounded Gestures: Language, Motion and Space, Deichler et al.


    • (ArXiv 2025) MotionGPT3: Human Motion as a Second Modality, Zhu et al.


    • (ArXiv 2025) HumanAttr: Generating Attribute-Aware Human Motions from Textual Prompt, Wang et al.


    • (ArXiv 2025) PlanMoGPT: Flow-Enhanced Progressive Planning for Text to Motion Synthesis, Jin et al.


    • (ArXiv 2025) Motion-R1: Chain-of-Thought Reasoning and Reinforcement Learning for Human Motion Generation, Ouyang et al.


    • (ArXiv 2025) ANT: Adaptive Neural Temporal-Aware Text-to-Motion Model, Chen et al.


    • (ArXiv 2025) MotionRAG-Diff: A Retrieval-Augmented Diffusion Framework for Long-Term Music-to-Dance Generation, Huang et al.


    • (ArXiv 2025) IKMo: Image-Keyframed Motion Generation with Trajectory-Pose Conditioned Motion Diffusion Model, Zhao et al.


    • (ArXiv 2025) Li et al: How Much Do Large Language Models Know about Human Motion? A Case Study in 3D Avatar Control, Li et al.


    • (ArXiv 2025) UniMoGen: Universal Motion Generation, Khani et al.


    • (ArXiv 2025) Wang et al: Semantics-Aware Human Motion Generation from Audio Instructions, Wang et al.


    • (ArXiv 2025) ACMDM: Absolute Coordinates Make Motion Generation Easy, Meng et al.


    • (ArXiv 2025) PAMD: Plausibility-Aware Motion Diffusion Model for Long Dance Generation, Zhu et al.


    • (ArXiv 2025) Intentional Gesture: Deliver Your Intentions with Gestures for Speech, Liu et al.


    • (ArXiv 2025) MatchDance: Collaborative Mamba-Transformer Architecture Matching for High-Quality 3D Dance Synthesis, Yang et al.


    • (ArXiv 2025) M3G: Multi-Granular Gesture Generator for Audio-Driven Full-Body Human Motion Synthesis, Yin et al.


    • (ArXiv 2025) ReactDance: Progressive-Granular Representation for Long-Term Coherent Reactive Dance Generation, Lin et al.


    • (ArXiv 2025) PMG: Progressive Motion Generation via Sparse Anchor Postures Curriculum Learning, Xi et al.


    • (ArXiv 2025) DanceMosaic: High-Fidelity Dance Generation with Multimodal Editability, Shah et al.


    • (ArXiv 2025) ReCoM: Realistic Co-Speech Motion Generation with Recurrent Embedded Transformer, Xie et al.


    • (ArXiv 2025) HMU: Human Motion Unlearning, Matteis et al.


    • (ArXiv 2025) ACMo: Attribute Controllable Motion Generation, Wei et al.


    • (ArXiv 2025) BioMoDiffuse: Physics-Guided Biomechanical Diffusion for Controllable and Authentic Human Motion Synthesis, Kang et al.


    • (ArXiv 2025) ExGes: Expressive Human Motion Retrieval and Modulation for Audio-Driven Gesture Synthesis, Zhou et al.


    • (ArXiv 2025) Motion Anything: Any to Motion Generation, Zhang et al.


    • (ArXiv 2025) GCDance: Genre-Controlled 3D Full Body Dance Generation Driven By Music, Liu et al.


    • (ArXiv 2025) CASIM: Composite Aware Semantic Injection for Text to Motion Generation, Chang et al.


    • (ArXiv 2025) MotionPCM: Real-Time Motion Synthesis with Phased Consistency Model, Jiang et al.


    • (ArXiv 2025) Free-T2M: Frequency Enhanced Text-to-Motion Diffusion Model With Consistency Loss, Chen et al.


    • (ArXiv 2025) FlexMotion: Lightweight, Physics-Aware, and Controllable Human Motion Generation, Tashakori et al.


    • (ArXiv 2025) HiSTF Mamba: Hierarchical Spatiotemporal Fusion with Multi-Granular Body-Spatial Modeling for High-Fidelity Text-to-Motion Generation, Zhan et al.


    • (ArXiv 2025) PackDiT: Joint Human Motion and Text Generation via Mutual Prompting, Jiang et al.


    • (3DV 2025) Unimotion: Unifying 3D Human Motion Synthesis and Understanding, Li et al.


    • (3DV 2025) HoloGest: Decoupled Diffusion and Motion Priors for Generating Holisticly Expressive Co-speech Gestures, Cheng et al.


    • (AAAI 2025) RemoGPT: Part-Level Retrieval-Augmented Motion-Language Models, Yu et al.


    • (AAAI 2025) UniMuMo: Unified Text, Music and Motion Generation, Yang et al.


    • (AAAI 2025) EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning, Chen et al.


    • (AAAI 2025) ALERT-Motion: Autonomous LLM-Enhanced Adversarial Attack for Text-to-Motion, Miao et al.


    • (AAAI 2025) MotionCraft: Crafting Whole-Body Motion with Plug-and-Play Multimodal Controls, Bian et al.


    • (AAAI 2025) Light-T2M: A Lightweight and Fast Model for Text-to-Motion Generation, Zeng et al.


    • (WACV 2025 Worhshop) LS-GAN: Human Motion Synthesis with Latent-space GANs, Amballa et al.


    • (WACV 2025) ReinDiffuse: Crafting Physically Plausible Motions with Reinforced Diffusion Model, Han et al.


    • (WACV 2025) MoRAG: Multi-Fusion Retrieval Augmented Generation for Human Motion, Shashank et al.


    • (WACV 2025) Mandelli et al: Generation of Complex 3D Human Motion by Temporal and Spatial Composition of Diffusion Models, Mandelli et al.



    2024




    • (ArXiv 2024) MMoFusion: Multi-modal Co-Speech Motion Generation with Diffusion Model, Wang et al.


    • (ArXiv 2024) InterDance: Reactive 3D Dance Generation with Realistic Duet Interactions, Li et al.


    • (ArXiv 2024) Mogo: RQ Hierarchical Causal Transformer for High-Quality 3D Human Motion Generation, Fu et al.


    • (ArXiv 2024) CoMA: Compositional Human Motion Generation with Multi-modal Agents, Sun et al.


    • (ArXiv 2024) SoPo: Text-to-Motion Generation Using Semi-Online Preference Optimization, Tan et al.


    • (ArXiv 2024) RMD: A Simple Baseline for More General Human Motion Generation via Training-free Retrieval-Augmented Motion Diffuse, Liao et al.


    • (ArXiv 2024) BiPO: Bidirectional Partial Occlusion Network for Text-to-Motion Synthesis, Hong et al.


    • (ArXiv 2024) MoTe: Learning Motion-Text Diffusion Model for Multiple Generation Tasks, Wue et al.


    • (ArXiv 2024) FTMoMamba: Motion Generation with Frequency and Text State Space Models, Li et al.


    • (ArXiv 2024) KMM: Key Frame Mask Mamba for Extended Motion Generation, Zhang et al.


    • (ArXiv 2024) MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding, Wang et al.


    • (ArXiv 2024) Lodge++: High-quality and Long Dance Generation with Vivid Choreography Patterns, Li et al.


    • (ArXiv 2024) MotionCLR: Motion Generation and Training-Free Editing via Understanding Attention Mechanisms, Chen et al.


    • (ArXiv 2024) LEAD: Latent Realignment for Human Motion Diffusion, Andreou et al.


    • (ArXiv 2024) Leite et al. Enhancing Motion Variation in Text-to-Motion Models via Pose and Video Conditioned Editing, Leite et al.


    • (ArXiv 2024) MotionRL: Align Text-to-Motion Generation to Human Preferences with Multi-Reward Reinforcement Learning, Liu et al.


    • (ArXiv 2024) MotionLLM: Understanding Human Behaviors from Human Motions and Videos, Chen et al.


    • (ArXiv 2024) T2M-X: Learning Expressive Text-to-Motion Generation from Partially Annotated Data, Liu et al.


    • (ArXiv 2024) BAD: Bidirectional Auto-regressive Diffusion for Text-to-Motion Generation, Hosseyni et al.


    • (ArXiv 2024) synNsync: Synergy and Synchrony in Couple Dances, Manukele et al.


    • (EMNLP 2024) Dong et al: Word-Conditioned 3D American Sign Language Motion Generation, Dong et al.


    • (NeurIPS D&B 2024) Kim et al: Text to Blind Motion, Kim et al.


    • (NeurIPS 2024) UniMTS: Unified Pre-training for Motion Time Series, Zhang et al.


    • (NeurIPS 2024) Christopher et al.: Constrained Synthesis with Projected Diffusion Models, Christopher et al.


    • (NeurIPS 2024) MoMu-Diffusion: On Learning Long-Term Motion-Music Synchronization and Correspondence, You et al.


    • (NeurIPS 2024) MoGenTS: Motion Generation based on Spatial-Temporal Joint Modeling, Yuan et al.


    • (NeurIPS 2024) M3GPT: An Advanced Multimodal, Multitask Framework for Motion Comprehension and Generation, Luo et al.


    • (NeurIPS Workshop 2024) Bikov et al: Fitness Aware Human Motion Generation with Fine-Tuning, Bikov et al.


    • (NeurIPS Workshop 2024) DGFM: Full Body Dance Generation Driven by Music Foundation Models, Liu et al.


    • (ICPR 2024) FG-MDM: Towards Zero-Shot Human Motion Generation via ChatGPT-Refined Descriptions, Shi et al.


    • (ACM MM 2024) SynTalker: Enabling Synergistic Full-Body Control in Prompt-Based Co-Speech Motion Generation, Chen et al.


    • (ACM MM 2024) L3EM: Towards Emotion-enriched Text-to-Motion Generation via LLM-guided Limb-level Emotion Manipulating. Yu et al.


    • (ACM MM 2024) StableMoFusion: Towards Robust and Efficient Diffusion-based Motion Generation Framework, Huang et al.


    • (ACM MM 2024) SATO: Stable Text-to-Motion Framework, Chen et al.


    • (ICANN 2024) PIDM: Personality-Aware Interaction Diffusion Model for Gesture Generation, Shibasaki et al.


    • (HFES 2024) Macwan et al: High-Fidelity Worker Motion Simulation With Generative AI, Macwan et al.


    • (ECCV 2024) Jin et al: Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation, Jin et al.


    • (ECCV 2024) Motion Mamba: Efficient and Long Sequence Motion Generation, Zhong et al.


    • (ECCV 2024) EMDM: Efficient Motion Diffusion Model for Fast, High-Quality Human Motion Generation, Zhou et al.


    • (ECCV 2024) CoMo: Controllable Motion Generation through Language Guided Pose Code Editing, Huang et al.


    • (ECCV 2024) CoMusion: Towards Consistent Stochastic Human Motion Prediction via Motion Diffusion, Sun et al.


    • (ECCV 2024) Shan et al: Towards Open Domain Text-Driven Synthesis of Multi-Person Motions, Shan et al.


    • (ECCV 2024) ParCo: Part-Coordinating Text-to-Motion Synthesis, Zou et al.


    • (ECCV 2024) Sampieri et al: Length-Aware Motion Synthesis via Latent Diffusion, Sampieri et al.


    • (ECCV 2024) ChroAccRet: Chronologically Accurate Retrieval for Temporal Grounding of Motion-Language Models, Fujiwara et al.


    • (ECCV 2024) MHC: Generating Physically Realistic and Directable Human Motions from Multi-Modal Inputs, Liu et al.


    • (ECCV 2024) ProMotion: Plan, Posture and Go: Towards Open-vocabulary Text-to-Motion Generation, Liu et al.


    • (ECCV 2024) FreeMotion: MoCap-Free Human Motion Synthesis with Multimodal Large Language Models, Zhang et al.


    • (ECCV 2024) Text Motion Translator: A Bi-Directional Model for Enhanced 3D Human Motion Generation from Open-Vocabulary Descriptions, Qian et al.


    • (ECCV 2024) FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis, Fan et al.


    • (ECCV 2024) Kinematic Phrases: Bridging the Gap between Human Motion and Action Semantics via Kinematic Phrases, Liu et al.


    • (ECCV 2024) MotionChain: Conversational Motion Controllers via Multimodal Prompts, Jiang et al.


    • (ECCV 2024) SMooDi: Stylized Motion Diffusion Model, Zhong et al.


    • (ECCV 2024) BAMM: Bidirectional Autoregressive Motion Model, Pinyoanuntapong et al.


    • (ECCV 2024) MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model, Dai et al.


    • (ECCV 2024) Ren et al: Realistic Human Motion Generation with Cross-Diffusion Models, Ren et al.


    • (ECCV 2024) M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models, Chi et al.


    • (ECCV 2024) LMM: Large Motion Model for Unified Multi-Modal Motion Generation, Zhang et al.


    • (ECCV 2024) TesMo: Generating Human Interaction Motions in Scenes with Text Control, Yi et al.


    • (ECCV 2024) TLcontrol: Trajectory and Language Control for Human Motion Synthesis, Wan et al.


    • (ICME 2024) ExpGest: Expressive Speaker Generation Using Diffusion Model and Hybrid Audio-Text Guidance, Cheng et al.


    • (ICME Workshop 2024) Chen et al: Anatomically-Informed Vector Quantization Variational Auto-Encoder for Text-to-Motion Generation, Chen et al.


    • (ICML 2024) HumanTOMATO: Text-aligned Whole-body Motion Generation, Lu et al.


    • (ICML 2024) GPHLVM: Bringing Motion Taxonomies to Continuous Domains via GPLVM on Hyperbolic Manifolds, Jaquier et al.


    • (SIGGRAPH 2024) DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models, Sun et al.


    • (SIGGRAPH 2024) CondMDI: Flexible Motion In-betweening with Diffusion Models, Cohan et al.


    • (SIGGRAPH 2024) CAMDM: Taming Diffusion Probabilistic Models for Character Control, Chen et al.


    • (SIGGRAPH 2024) LGTM: Local-to-Global Text-Driven Human Motion Diffusion Models, Sun et al.


    • (SIGGRAPH 2024) TEDi: Temporally-Entangled Diffusion for Long-Term Motion Synthesis, Zhang et al.


    • (SIGGRAPH 2024) A-MDM: Interactive Character Control with Auto-Regressive Motion Diffusion Models, Shi et al.


    • (SIGGRAPH 2024) Starke et al: Categorical Codebook Matching for Embodied Character Controllers, Starke et al.


    • (SIGGRAPH 2024) SuperPADL: Scaling Language-Directed Physics-Based Control with Progressive Supervised Distillation, Juravsky et al.


    • (CVPR 2024) ProgMoGen: Programmable Motion Generation for Open-set Motion Control Tasks, Liu et al.


    • (CVPR 2024) PACER+: On-Demand Pedestrian Animation Controller in Driving Scenarios, Wang et al.


    • (CVPR 2024) AMUSE: Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion, Chhatre et al.


    • (CVPR 2024) Liu et al: Towards Variable and Coordinated Holistic Co-Speech Motion Generation, Liu et al.


    • (CVPR 2024) MAS: Multi-view Ancestral Sampling for 3D motion generation using 2D diffusion, Kapon et al.


    • (CVPR 2024) WANDR: Intention-guided Human Motion Generation, Diomataris et al.


    • (CVPR 2024) MoMask: Generative Masked Modeling of 3D Human Motions, Guo et al.


    • (CVPR 2024) ChatPose: Chatting about 3D Human Pose, Feng et al.


    • (CVPR 2024) AvatarGPT: All-in-One Framework for Motion Understanding, Planning, Generation and Beyond, Zhou et al.


    • (CVPR 2024) MMM: Generative Masked Motion Model, Pinyoanuntapong et al.


    • (CVPR 2024) AAMDM: Accelerated Auto-regressive Motion Diffusion Model, Li et al.


    • (CVPR 2024) OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers, Liang et al.


    • (CVPR 2024) FlowMDM: Seamless Human Motion Composition with Blended Positional Encodings, Barquero et al.


    • (CVPR 2024) Digital Life Project: Autonomous 3D Characters with Social Intelligence, Cai et al.


    • (CVPR 2024) EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling, Liu et al.


    • (CVPR Workshop 2024) STMC: Multi-Track Timeline Control for Text-Driven 3D Human Motion Generation, Petrovich et al.


    • (CVPR Workshop 2024) InstructMotion: Exploring Text-to-Motion Generation with Human Preference, Sheng et al.


    • (ICLR 2024) Single Motion Diffusion: Raab et al.


    • (ICLR 2024) NeRM: Learning Neural Representations for High-Framerate Human Motion Synthesis, Wei et al.


    • (ICLR 2024) PriorMDM: Human Motion Diffusion as a Generative Prior, Shafir et al.


    • (ICLR 2024) OmniControl: Control Any Joint at Any Time for Human Motion Generation, Xie et al.


    • (ICLR 2024) Adiya et al.: Bidirectional Temporal Diffusion Model for Temporally Consistent Human Animation, Adiya et al.


    • (ICLR 2024) Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment, Li et al.


    • (AAAI 2024) HuTuDiffusion: Human-Tuned Navigation of Latent Motion Diffusion Models with Minimal Feedback, Han et al.


    • (AAAI 2024) AMD: Anatomical Motion Diffusion with Interpretable Motion Decomposition and Fusion, Jing et al.


    • (AAAI 2024) MotionMix: Weakly-Supervised Diffusion for Controllable Motion Generation, Hoang et al.


    • (AAAI 2024) B2A-HDM: Towards Detailed Text-to-Motion Synthesis via Basic-to-Advanced Hierarchical Diffusion Model, Xie et al.


    • (AAAI 2024) Everything2Motion: Everything2Motion: Synchronizing Diverse Inputs via a Unified Framework for Human Motion Synthesis, Fan et al.


    • (AAAI 2024) MotionGPT: Finetuned LLMs are General-Purpose Motion Generators, Zhang et al.


    • (AAAI 2024) Dong et al: Enhanced Fine-grained Motion Diffusion for Text-driven Human Motion Synthesis, Dong et al.


    • (AAAI 2024) UNIMASKM: A Unified Masked Autoencoder with Patchified Skeletons for Motion Synthesis, Mascaro et al.


    • (AAAI 2024) B2A-HDM: Towards Detailed Text-to-Motion Synthesis via Basic-to-Advanced Hierarchical Diffusion Model, Xie et al.


    • (TPAMI 2024) GUESS: GradUally Enriching SyntheSis for Text-Driven Human Motion Generation, Gao et al.


    • (WACV 2024) Xie et al.: Sign Language Production with Latent Motion Transformer, Xie et al.



    2023




    • (NeurIPS 2023) GraphMotion: Act As You Wish: Fine-grained Control of Motion Diffusion Model with Hierarchical Semantic Graphs, Jin et al.


    • (NeurIPS 2023) MotionGPT: Human Motion as Foreign Language, Jiang et al.


    • (NeurIPS 2023) FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing, Zhang et al.


    • (NeurIPS 2023) InsActor: Instruction-driven Physics-based Characters, Ren et al.


    • (ICCV 2023) AttT2M: Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism, Zhong et al.


    • (ICCV 2023) TMR: Text-to-Motion Retrieval Using Contrastive 3D Human Motion Synthesis, Petrovich et al.


    • (ICCV 2023) MAA: Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation, Azadi et al.


    • (ICCV 2023) PhysDiff: Physics-Guided Human Motion Diffusion Model, Yuan et al.


    • (ICCV 2023) ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model, Zhang et al.


    • (ICCV 2023) BelFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction, Barquero et al.


    • (ICCV 2023) GMD: Guided Motion Diffusion for Controllable Human Motion Synthesis, Karunratanakul et al.


    • (ICCV 2023) HMD-NeMo: Online 3D Avatar Motion Generation From Sparse Observations, Aliakbarian et al.


    • (ICCV 2023) SINC: Spatial Composition of 3D Human Motions for Simultaneous Action Generation, Athanasiou et al.


    • (ICCV 2023) Kong et al.: Priority-Centric Human Motion Generation in Discrete Latent Space, Kong et al.


    • (ICCV 2023) Fg-T2M: Fine-Grained Text-Driven Human Motion Generation via Diffusion Model, Wang et al.


    • (ICCV 2023) EMS: Breaking The Limits of Text-conditioned 3D Motion Synthesis with Elaborative Descriptions, Qian et al.


    • (SIGGRAPH 2023) GenMM: Example-based Motion Synthesis via Generative Motion Matching, Li et al.


    • (SIGGRAPH 2023) GestureDiffuCLIP: Gesture Diffusion Model with CLIP Latents, Ao et al.


    • (SIGGRAPH 2023) BodyFormer: Semantics-guided 3D Body Gesture Synthesis with Transformer, Pang et al.


    • (SIGGRAPH 2023) Alexanderson et al.: Listen, denoise, action! Audio-driven motion synthesis with diffusion models, Alexanderson et al.


    • (CVPR 2023) AGroL: Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model, Du et al.


    • (CVPR 2023) TALKSHOW: Generating Holistic 3D Human Motion from Speech, Yi et al.


    • (CVPR 2023) T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations, Zhang et al.


    • (CVPR 2023) UDE: A Unified Driving Engine for Human Motion Generation, Zhou et al.


    • (CVPR 2023) OOHMG: Being Comes from Not-being: Open-vocabulary Text-to-Motion Generation with Wordless Training, Lin et al.


    • (CVPR 2023) EDGE: Editable Dance Generation From Music, Tseng et al.


    • (CVPR 2023) MLD: Executing your Commands via Motion Diffusion in Latent Space, Chen et al.


    • (CVPR 2023) MoDi: Unconditional Motion Synthesis from Diverse Data, Raab et al.


    • (CVPR 2023) MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis, Dabral et al.


    • (CVPR 2023) Mo et al.: Continuous Intermediate Token Learning with Implicit Motion Manifold for Keyframe Based Motion Interpolation, Mo et al.


    • (ICLR 2023) HMDM: Human Motion Diffusion Model, Tevet et al.


    • (TPAMI 2023) MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model, Zhang et al.


    • (TPAMI 2023) Bailando++: 3D Dance GPT with Choreographic Memory, Li et al.


    • (ArXiv 2023) UDE-2: A Unified Framework for Multimodal, Multi-Part Human Motion Synthesis, Zhou et al.


    • (ArXiv 2023) Motion Script: Natural Language Descriptions for Expressive 3D Human Motions, Yazdian et al.



    2022 and earlier




    • (NeurIPS 2022) NeMF: Neural Motion Fields for Kinematic Animation, He et al.


    • (SIGGRAPH Asia 2022) PADL: Language-Directed Physics-Based Character, Juravsky et al.


    • (SIGGRAPH Asia 2022) Rhythmic Gesticulator: Rhythm-Aware Co-Speech Gesture Synthesis with Hierarchical Neural Embeddings, Ao et al.


    • (3DV 2022) TEACH: Temporal Action Composition for 3D Human, Athanasiou et al.


    • (ECCV 2022) Implicit Motion: Implicit Neural Representations for Variable Length Human Motion Generation, Cervantes et al.


    • (ECCV 2022) Zhong et al.: Learning Uncoupled-Modulation CVAE for 3D Action-Conditioned Human Motion Synthesis, Zhong et al.


    • (ECCV 2022) MotionCLIP: Exposing Human Motion Generation to CLIP Space, Tevet et al.


    • (ECCV 2022) PoseGPT: Quantizing human motion for large scale generative modeling, Lucas et al.


    • (ECCV 2022) TEMOS: Generating diverse human motions from textual descriptions, Petrovich et al.


    • (ECCV 2022) TM2T: Stochastic and Tokenized Modeling for the Reciprocal Generation of 3D Human Motions and Texts, Guo et al.


    • (SIGGRAPH 2022) AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars, Hong et al.


    • (SIGGRAPH 2022) DeepPhase: Periodic autoencoders for learning motion phase manifolds, Starke et al.


    • (CVPR 2022) Guo et al.: Generating Diverse and Natural 3D Human Motions from Text, Guo et al.


    • (CVPR 2022) Bailando: 3D Dance Generation by Actor-Critic GPT with Choreographic Memory, Li et al.


    • (ICCV 2021) ACTOR: Action-Conditioned 3D Human Motion Synthesis with Transformer VAE, Petrovich et al.


    • (ICCV 2021) AIST++: AI Choreographer: Music Conditioned 3D Dance Generation with AIST++, Li et al.


    • (SIGGRAPH 2021) Starke et al.: Neural animation layering for synthesizing martial arts movements, Starke et al.


    • (CVPR 2021) MOJO: We are More than Our Joints: Predicting how 3D Bodies Move, Zhang et al.


    • (ECCV 2020) DLow: Diversifying Latent Flows for Diverse Human Motion Prediction, Yuan et al.


    • (SIGGRAPH 2020) Starke et al.: Local motion phases for learning multi-contact character movements, Starke et al.



## Motion Editing



  • (IVA 2025) TF-JAX-IK: Real-Time Inverse Kinematics for Generating Multi-Constrained Movements of Virtual Human Characters, Voss et al.


  • (ICCV 2025) PRIMAL: Physically Reactive and Interactive Motor Model for Avatar Learning, Zhang et al.


  • (CVPR 2025) SALAD: Skeleton-aware Latent Diffusion for Text-driven Motion Generation and Editing, Hong et al.


  • (CVPR 2025) MixerMDM: Learnable Composition of Human Motion Diffusion Models, Ruiz-Ponce et al.


  • (CVPR 2025) AnyMoLe: Any Character Motion In-Betweening Leveraging Video Diffusion Models, Yun et al.


  • (CVPR 2025) SimMotionEdit: Text-Based Human Motion Editing with Motion Similarity Prediction, Li et al.


  • (CVPR 2025) MotionReFit: Dynamic Motion Blending for Versatile Motion Editing, Jiang et al.


  • (ArXiv 2025) StableMotion: Training Motion Cleanup Models with Unpaired Corrupted Data, Mu et al.


  • (ArXiv 2025) Dai et al: Towards Synthesized and Editable Motion In-Betweening Through Part-Wise Phase Representation, Dai et al.


  • (SIGGRAPH Asia 2024) MotionFix: Text-Driven 3D Human Motion Editing, Athanasiou et al.


  • (NeurIPS 2024) CigTime: Corrective Instruction Generation Through Inverse Motion Editing, Fang et al.


  • (SIGGRAPH 2024) Iterative Motion Editing: Iterative Motion Editing with Natural Language, Goel et al.


  • (CVPR 2024) DNO: Optimizing Diffusion Noise Can Serve As Universal Motion Priors, Karunratanakul et al.


## Motion Stylization



  • (ICCV 2025) StyleMotif: Multi-Modal Motion Stylization using Style-Content Cross Fusion, Guo et al.


  • (CVPR 2025) Visual Persona: Foundation Model for Full-Body Human Customization, Nam et al.


  • (ArXiv 2025) ClusterStyle: Modeling Intra-Style Diversity with Prototypical Clustering for Stylized Motion Generation, Chen et al.


  • (ArXiv 2025) MotionPersona: Characteristics-aware Locomotion Control, Shi et al.


  • (ArXiv 2025) AStF: Motion Style Transfer via Adaptive Statistics Fusor, Chen et al.


  • (ArXiv 2025) Dance Like a Chicken: Low-Rank Stylization for Human Motion Diffusion, Sawdayee et al.


  • (ArXiv 2024) MulSMo: Multimodal Stylized Motion Generation by Bidirectional Control Flow, Li et al.


  • (TSMC 2024) D-LORD: D-LORD for Motion Stylization, Gupta et al.


  • (ECCV 2024) HUMOS: Human Motion Model Conditioned on Body Shape, Tripathi et al.


  • (SIGGRAPH 2024) SMEAR: Stylized Motion Exaggeration with ARt-direction, Basset et al.


  • (SIGGRAPH 2024) Portrait3D: Text-Guided High-Quality 3D Portrait Generation Using Pyramid Representation and GANs Prior, Wu et al.


  • (CVPR 2024) MCM-LDM: Arbitrary Motion Style Transfer with Multi-condition Motion Latent Diffusion Model, Song et al.


  • (CVPR 2024) MoST: Motion Style Transformer between Diverse Action Contents, Kim et al.


  • (ICLR 2024) GenMoStyle: Generative Human Motion Stylization in Latent Space, Guo et al.


## Human-Object Interaction



    2025




    • (NeurIPS 2025) HHOI: Learning to Generate Human-Human-Object Interactions from Textual Descriptions, Na et al.


    • (ACM MM 2025) PA-HOI: A Physics-Aware Human and Object Interaction Dataset, Wang et al.


    • (ACM MM 2025) OnlineHOI: Towards Online Human-Object Interaction Generation and Perception, Ji et al.


    • (ICCV 2025) Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions, Xu et al.


    • (ICCV 2025) TriDi: Trilateral Diffusion of 3D Humans, Objects and Interactions, Petrov et al.


    • (ICCV 2025) SMGDiff: Soccer Motion Generation using diffusion probabilistic models, Yang et al.


    • (ICCV 2025) SyncDiff: Synchronized Motion Diffusion for Multi-Body Human-Object Interaction Synthesis, He et al.


    • (ICCV 2025) Wu et al: Human-Object Interaction from Human-Level Instructions, Wu et al.


    • (ICCV 2025) HUMOTO: A 4D Dataset of Mocap Human Object Interactions, Lu et al.


    • (SIGGRAPH 2025) PhysicsFC: Learning User-Controlled Skills for a Physics-Based Football Player Controller, Kim et al.


    • (SIGGRAPH 2025) SkillMimic-v2: Learning Robust and Generalizable Interaction Skills from Sparse and Noisy Demonstrations, Yu et al.


    • (Bioengineering 2025) MeLLO: The Utah Manipulation and Locomotion of Large Objects (MeLLO) Data Library, Luttmer et al.


    • (CVPR 2025) ChainHOI: Joint-based Kinematic Chain Modeling for Human-Object Interaction Generation, Zeng et al.


    • (CVPR 2025) HOIGPT: Learning Long Sequence Hand-Object Interaction with Language Models, Huang et al.


    • (CVPR 2025) Hui et al: An Image-like Diffusion Method for Human-Object Interaction Detection, Hui et al.


    • (CVPR 2025) PersonaHOI: Effortlessly Improving Personalized Face with Human-Object Interaction Generation, Hu et al.


    • (CVPR 2025) InteractVLM: 3D Interaction Reasoning from 2D Foundational Models, Dwivedi et al.


    • (CVPR 2025) PICO: Reconstructing 3D People In Contact with Objects, Cseke et al.


    • (CVPR 2025) EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild, Liu et al.


    • (CVPR 2025) FIction: 4D Future Interaction Prediction from Video, Ashutosh et al.


    • (CVPR 2025) ROG: Guiding Human-Object Interactions with Rich Geometry and Relations, Xue et al.


    • (CVPR 2025) SemGeoMo: Dynamic Contextual Human Motion Generation with Semantic and Geometric Guidance, Cong et al.


    • (CVPR 2025) Phys-Reach-Grasp: Learning Physics-Based Full-Body Human Reaching and Grasping from Brief Walking References, Li et al.


    • (CVPR 2025) ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative Modeling of Human-Object Interactions, Kim et al.


    • (CVPR 2025) InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions, Xu et al.


    • (CVPR 2025) CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement, Zhang et al.


    • (CVPR 2025) InteractAnything: Zero-shot Human Object-Interaction Synthesis via LLM Feedback and Object Affordance Parsing, Zhang et al.


    • (CVPR 2025) SkillMimic: Learning Reusable Basketball Skills from Demonstrations, Wang et al.


    • (CVPR 2025) MobileH2R: Learning Generalizable Human to Mobile Robot Handover Exclusively from Scalable and Diverse Synthetic Data, Wang et al.


    • (AAAI 2025) ARDHOI: Auto-Regressive Diffusion for Generating 3D Human-Object Interactions, Geng et al.


    • (AAAI 2025) DiffGrasp: Whole-Body Grasping Synthesis Guided by Object Motion Using a Diffusion Model, Zhang et al.


    • (3DV 2025) Paschalidis et al: 3D Whole-body Grasp Synthesis with Directional Controllability, Paschalidis et al.


    • (3DV 2025) InterTrack: Tracking Human Object Interaction without Object Templates, Xie et al.


    • (3DV 2025) FORCE: Dataset and Method for Intuitive Physics Guided Human-object Interaction, Zhang et al.


    • (PAMI 2025) MotionVerse: A Unified Multimodal Framework for Motion Comprehension, Generation and Editing, Hou et al.


    • (PAMI 2025) EigenActor: Variant Body-Object Interaction Generation Evolved from Invariant Action Basis Reasoning, Guo et al.


    • (ArXiv 2025) InteractMove: Text-Controlled Human-Object Interaction Generation in 3D Scenes with Movable Objects, Cai et al.


    • (ArXiv 2025) InterPose: Learning to Generate Human-Object Interactions from Large-Scale Web Videos, Zhang et al.


    • (ArXiv 2025) ECHO: Ego-Centric modeling of Human-Object interactions, Petrov et al.


    • (ArXiv 2025) CoopDiff: Anticipating 3D Human-object Interactions via Contact-consistent Decoupled Diffusion, Lin et al.


    • (ArXiv 2025) HOI-Dyn: Learning Interaction Dynamics for Human-Object Motion Diffusion, Wu et al.


    • (ArXiv 2025) HOIDiNi: Human-Object Interaction through Diffusion Noise Optimization, Ron et al.


    • (ArXiv 2025) GenHOI: Generalizing Text-driven 4D Human-Object Interaction Synthesis for Unseen Objects, Li et al.


    • (ArXiv 2025) HOI-PAGE: Zero-Shot Human-Object Interaction Generation with Part Affordance Guidance, Li et al.


    • (ArXiv 2025) HOSIG: Full-Body Human-Object-Scene Interaction Generation, Yao et al.


    • (ArXiv 2025) CoDA: Coordinated Diffusion Noise Optimization for Whole-Body Manipulation of Articulated Objects, Pi et al.


    • (ArXiv 2025) MaskedManipulator: Versatile Whole-Body Control for Loco-Manipulation, Tessler et al.


    • (ArXiv 2025) UniHM: Universal Human Motion Generation with Object Interactions in Indoor Scenes, Geng et al.


    • (ArXiv 2025) EJIM: Efficient Explicit Joint-level Interaction Modeling with Mamba for Text-guided HOI Generation, Huang et al.


    • (ArXiv 2025) ZeroHOI: Zero-Shot Human-Object Interaction Synthesis with Multimodal Priors, Lou et al.


    • (ArXiv 2025) RMD-HOI: Human-Object Interaction with Vision-Language Model Guided Relative Movement Dynamics, Deng et al.


    • (ArXiv 2025) Kaiwu: A Multimodal Manipulation Dataset and Framework for Robot Learning and Human-Robot Interaction, Jiang et al.



    2024




    • (ArXiv 2024) CHOICE: Coordinated Human-Object Interaction in Cluttered Environments for Pick-and-Place Actions, Lu et al.


    • (ArXiv 2024) OOD-HOI: Text-Driven 3D Whole-Body Human-Object Interactions Generation Beyond Training Domains, Zhang et al.


    • (ArXiv 2024) COLLAGE: Collaborative Human-Agent Interaction Generation using Hierarchical Latent Diffusion and Language Models, Daiya et al.


    • (NeurIPS 2024) HumanVLA: Towards Vision-Language Directed Object Rearrangement by Physical Humanoid, Xu et al.


    • (NeurIPS 2024) OmniGrasp: Grasping Diverse Objects with Simulated Humanoids, Luo et al.


    • (NeurIPS 2024) EgoChoir: Capturing 3D Human-Object Interaction Regions from Egocentric Views, Yang et al.


    • (NeurIPS 2024) CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics, Gao et al.


    • (NeurIPS 2024) InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object Interaction, Xu et al.


    • (NeurIPS 2024) PiMForce: Posture-Informed Muscular Force Learning for Robust Hand Pressure Estimation, Seo et al.


    • (ECCV 2024) InterFusion: Text-Driven Generation of 3D Human-Object Interaction, Dai et al.


    • (ECCV 2024) CHOIS: Controllable Human-Object Interaction Synthesis, Li et al.


    • (ECCV 2024) F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions, Yang et al.


    • (ECCV 2024) HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects, Lv et al.


    • (SIGGRAPH 2024) PhysicsPingPong: Strategy and Skill Learning for Physics-based Table Tennis Animation, Wang et al.


    • (CVPR 2024) NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis, Kulkarni et al.


    • (CVPR 2024) HOI Animator: Generating Text-Prompt Human-Object Animations using Novel Perceptive Diffusion Models, Son et al.


    • (CVPR 2024) CG-HOI: Contact-Guided 3D Human-Object Interaction Generation, Diller et al.


    • (IJCV 2024) InterCap: Joint Markerless 3D Tracking of Humans and Objects in Interaction, Huang et al.


    • (3DV 2024) Phys-Fullbody-Grasp: Physically Plausible Full-Body Hand-Object Interaction Synthesis, Braun et al.


    • (3DV 2024) GRIP: Generating Interaction Poses Using Spatial Cues and Latent Consistency, Taheri et al.


    • (AAAI 2024) FAVOR: Full-Body AR-driven Virtual Object Rearrangement Guided by Instruction Text, Li et al.



    2023 and earlier




    • (SIGGRAPH Asia 2023) OMOMO: Object Motion Guided Human Motion Synthesis, Li et al.


    • (ICCV 2023) CHAIRS: Full-Body Articulated Human-Object Interaction, Jiang et al.


    • (ICCV 2023) HGHOI: Hierarchical Generation of Human-Object Interactions with Diffusion Probabilistic Models, Pi et al.


    • (ICCV 2023) InterDiff: Generating 3D Human-Object Interactions with Physics-Informed Diffusion, Xu et al.


    • (CVPR 2023) Object Pop Up: Can we infer 3D objects and their poses from human interactions alone? Petrov et al.


    • (CVPR 2023) ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation, Fan et al.


    • (ECCV 2022) TOCH: Spatio-Temporal Object-to-Hand Correspondence for Motion Refinement, Zhou et al.


    • (ECCV 2022) COUCH: Towards Controllable Human-Chair Interactions, Zhang et al.


    • (ECCV 2022) SAGA: Stochastic Whole-Body Grasping with Contact, Wu et al.


    • (CVPR 2022) GOAL: Generating 4D Whole-Body Motion for Hand-Object Grasping, Taheri et al.


    • (CVPR 2022) BEHAVE: Dataset and Method for Tracking Human Object Interactions, Bhatnagar et al.


    • (ECCV 2020) GRAB: A Dataset of Whole-Body Human Grasping of Objects, Taheri et al.



## Human-Scene Interaction



    2025




    • (ICCV 2025) Being-M0.5: A Real-Time Controllable Vision-Language-Motion Model, Cao et al.


    • (ICCV 2025) SceneMI: Motion In-Betweening for Modeling Human-Scene Interactions, Hwang et al.


    • (ICCV 2025) SIMS: Simulating Human-Scene Interactions with Real World Script Planning, Wang et al.


    • (ICCV 2025) Lim et al: Event-Driven Storytelling with Multiple Lifelike Humans in a 3D scene, Lim et al.


    • (ICME 2025) TSTMotion: Training-free Scene-aware Text-to-motion Generation, Guo et al.


    • (CVPR 2025) HSI-GPT: A General-Purpose Large Scene-Motion-Language Model for Human Scene Interaction. Wang et al.


    • (CVPR 2025) Vision-Guided Action: Enhancing 3D Human Motion Prediction with Gaze-informed Affordance in 3D Scenes. Yu et al.


    • (CVPR 2025) Yi et al: Estimating Body and Hand Motion in an Ego‑sensed World, Yi et al.


    • (CVPR 2025) EnvPoser: Environment-aware Realistic Human Motion Estimation from Sparse Observations with Uncertainty Modeling. Xia et al.


    • (CVPR 2025) TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization, Pan et al.


    • (ICLR 2025) Sitcom-Crafter: A Plot-Driven Human Motion Generation System in 3D Scenes, Chen et al.


    • (3DV 2025) Paschalidis et al: 3D Whole-body Grasp Synthesis with Directional Controllability, Paschalidis et al.


    • (WACV 2025) GHOST: Grounded Human Motion Generation with Open Vocabulary Scene-and-Text Contexts, Milacski et al.


    • (ArXiv 2025) Prime and Reach: Synthesising Body Motion for Gaze-Primed Object Reach, Hatano et al.


    • (ArXiv 2025) Uni-Inter: Unifying 3D Human Motion Synthesis Across Diverse Interaction Contexts, Liu et al.


    • (ArXiv 2025) SSOMotion: HumanMotion Synthesis in 3D Scenes via Unified Scene Semantic Occupancy, Cho et al.


    • (ArXiv 2025) SceneAdapt: Scene-aware Adaptation of Human Motion Diffusion, Cho et al.


    • (ArXiv 2025) FantasyHSI: Video-Generation-Centric 4D Human Synthesis In Any Scene through A Graph-based Multi-Agent Framework, Mu et al.


    • (ArXiv 2025) Half-Physics: Enabling Kinematic 3D Human Model with Physical Interactions, Li et al.


    • (ArXiv 2025) GenHSI: Controllable Generation of Human-Scene Interaction Videos, Li et al.


    • (ArXiv 2025) RMD-HOI: Human-Object Interaction with Vision-Language Model Guided Relative Movement Dynamics, Deng et al.


    • (ArXiv 2025) HIS-GPT: Towards 3D Human-In-Scene Multimodal Understanding, Zhao et al.


    • (ArXiv 2025) Jointly Understand Your Command and Intention: Reciprocal Co-Evolution between Scene-Aware 3D Human Motion Synthesis and Analysis, Gao et al.



    2024




    • (ArXiv 2024) ZeroHSI: Zero-Shot 4D Human-Scene Interaction by Video Generation, Li et al.


    • (ArXiv 2024) Mimicking-Bench: A Benchmark for Generalizable Humanoid-Scene Interaction Learning via Human Mimicking, Liu et al.


    • (ArXiv 2024) SCENIC: Scene-aware Semantic Navigation with Instruction-guided Control, Zhang et al.


    • (ArXiv 2024) Diffusion Implicit Policy: Diffusion Implicit Policy for Unpaired Scene-aware Motion synthesis, Gong et al.


    • (ArXiv 2024) LaserHuman: Language-guided Scene-aware Human Motion Generation in Free Environment, Cong et al.


    • (SIGGRAPH Asia 2024) LINGO: Autonomous Character-Scene Interaction Synthesis from Text Instruction, Jiang et al.


    • (NeurIPS 2024) DiMoP3D: Harmonizing Stochasticity and Determinism: Scene-responsive Diverse Human Motion Prediction, Lou et al.


    • (ECCV 2024) MOB: Revisit Human-Scene Interaction via Space Occupancy, Liu et al.


    • (ECCV 2024) TesMo: Generating Human Interaction Motions in Scenes with Text Control, Yi et al.


    • (ECCV 2024 Workshop) SAST: Massively Multi-Person 3D Human Motion Forecasting with Scene Context, Mueller et al.


    • (Eurographics 2024) Kang et al: Learning Climbing Controllers for Physics-Based Characters, Kang et al.


    • (CVPR 2024) Afford-Motion: Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance, Wang et al.


    • (CVPR 2024) GenZI: Zero-Shot 3D Human-Scene Interaction Generation, Li et al.


    • (CVPR 2024) Cen et al.: Generating Human Motion in 3D Scenes from Text Descriptions, Cen et al.


    • (CVPR 2024) TRUMANS: Scaling Up Dynamic Human-Scene Interaction Modeling, Jiang et al.


    • (ICLR 2024) UniHSI: Unified Human-Scene Interaction via Prompted Chain-of-Contacts, Xiao et al.


    • (3DV 2024) Purposer: Putting Human Motion Generation in Context, Ugrinovic et al.


    • (3DV 2024) InterScene: Synthesizing Physically Plausible Human Motions in 3D Scenes, Pan et al.


    • (3DV 2024) Mir et al: Generating Continual Human Motion in Diverse 3D Scenes, Mir et al.



    2023 and earlier




    • (ICCV 2023) DIMOS: Synthesizing Diverse Human Motions in 3D Indoor Scenes, Zhao et al.


    • (ICCV 2023) LAMA: Locomotion-Action-Manipulation: Synthesizing Human-Scene Interactions in Complex 3D Environments, Lee et al.


    • (ICCV 2023) Narrator: Towards Natural Control of Human-Scene Interaction Generation via Relationship Reasoning, Xuan et al.


    • (CVPR 2023) CIMI4D: A Large Multimodal Climbing Motion Dataset under Human-Scene Interactions, Yan et al.


    • (CVPR 2023) Scene-Ego: Scene-aware Egocentric 3D Human Pose Estimation, Wang et al.


    • (CVPR 2023) SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments, Dai et al.


    • (CVPR 2023) CIRCLE: Capture in Rich Contextual Environments, Araujo et al.


    • (CVPR 2023) SceneDiffuser: Diffusion-based Generation, Optimization, and Planning in 3D Scenes, Huang et al.


    • (CVPR 2023) MIME: Human-Aware 3D Scene Generation, Yi et al.


    • (SIGGRAPH 2023) PMP: Learning to Physically Interact with Environments using Part-wise Motion Priors, Bae et al.


    • (SIGGRAPH 2023) QuestEnvSim: Environment-Aware Simulated Motion Tracking from Sparse Sensors, Lee et al.


    • (SIGGRAPH 2023) Hassan et al.: Synthesizing Physical Character-Scene Interactions, Hassan et al.


    • (NeurIPS 2022) Mao et al.: Contact-Aware Human Motion Forecasting, Mao et al.


    • (NeurIPS 2022) HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes, Wang et al.


    • (NeurIPS 2022) EmbodiedPose: Embodied Scene-aware Human Pose Estimation, Luo et al.


    • (ECCV 2022) GIMO: Gaze-Informed Human Motion Prediction in Context, Zheng et al.


    • (ECCV 2022) COINS: Compositional Human-Scene Interaction Synthesis with Semantic Control, Zhao et al.


    • (CVPR 2022) Wang et al.: Towards Diverse and Natural Scene-aware 3D Human Motion Synthesis, Wang et al.


    • (CVPR 2022) GAMMA: The Wanderings of Odysseus in 3D Scenes, Zhang et al.


    • (ICCV 2021) SAMP: Stochastic Scene-Aware Motion Prediction, Hassan et al.


    • (ICCV 2021) LEMO: Learning Motion Priors for 4D Human Body Capture in 3D Scenes, Zhang et al.


    • (3DV 2020) PLACE: Proximity Learning of Articulation and Contact in 3D Environments, Zhang et al.


    • (SIGGRAPH 2020) Starke et al.: Local motion phases for learning multi-contact character movements, Starke et al.


    • (CVPR 2020) PSI: Generating 3D People in Scenes without People, Zhang et al.


    • (SIGGRAPH Asia 2019) NSM: Neural State Machine for Character-Scene Interactions, Starke et al.


    • (ICCV 2019) PROX: Resolving 3D Human Pose Ambiguities with 3D Scene Constraints, Hassan et al.



## Human-Human Interaction



  • (ICCV 2025) Ponimator: Unfolding Interactive Pose for Versatile Human-human Interaction Animation, Liu et al.


  • (ICCV 2025) Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions, Xu et al.


  • (ICCV 2025) Towards Immersive Human-X Interaction: A Real-Time Framework for Physically Plausible Motion Synthesis, Ji et al.


  • (ICCV 2025) PINO: Person-Interaction Noise Optimization for Long-Duration and Customizable Motion Generation of Arbitrary-Sized Groups, Ota et al.


  • (SIGGRAPH 2025) Xu et al: Multi-Person Interaction Generation from Two-Person Motion Priors, Xu et al.


  • (SIGGRAPH 2025) DuetGen: Music Driven Two-Person Dance Generation via Hierarchical Masked Modeling, Ghosh et al.


  • (CVPR 2025) TIMotion: Temporal and Interactive Framework for Efficient Human-Human Motion Generation, Wang et al.


  • (ICLR 2025) Think Then React: Towards Unconstrained Action-to-Reaction Motion Generation, Tan et al.


  • (ICLR 2025) Ready-to-React: Online Reaction Policy for Two-Character Interaction Generation, Cen et al.


  • (ICLR 2025) InterMask: 3D Human Interaction Generation via Collaborative Masked Modelling, Javed et al.


  • (3DV 2025) Interactive Humanoid: Online Full-Body Motion Reaction Synthesis with Social Affordance Canonicalization and Forecasting, Liu et al.


  • (AAAI 2026) InterMoE: Individual-Specific 3D Human Interaction Generation via Dynamic Temporal-Selective MoE, Wang et al.


  • (ArXiv 2025) Interact2Ar: Full-Body Human-Human Interaction Generation via Autoregressive Diffusion Models, Ruiz-Ponce et al.


  • (ArXiv 2025) Text2Interact: High-Fidelity and Diverse Text-to-Two-Person Interaction Generation, Wu et al.


  • (ArXiv 2025) InterAct: A Large-Scale Dataset of Dynamic, Expressive and Interactive Activities between Two People in Daily Scenarios, Ho et al.


  • (ArXiv 2025) E-React: Towards Emotionally Controlled Synthesis of Human Reactions, Zhu et al.


  • (ArXiv 2025) Seamless Interaction: Dyadic Audiovisual Motion Modeling and Large-Scale Dataset, Agrawal et al.


  • (ArXiv 2025) MAMMA: Markerless & Automatic Multi-Person Motion Action Capture, Cuevas-Velasquez et al.


  • (ArXiv 2025) PhysInter: Integrating Physical Mapping for High-Fidelity Human Interaction Generation, Yao et al.


  • (ArXiv 2025) InterMamba: Efficient Human-Human Interaction Generation with Adaptive Spatio-Temporal Mamba, Wu et al.


  • (ArXiv 2025) MARRS: MaskedAutoregressive Unit-based Reaction Synthesis, Wang et al.


  • (ArXiv 2025) SocialGen: Modeling Multi-Human Social Interaction with Language Models, Yu et al.


  • (ArXiv 2025) ARFlow: Human Action-Reaction Flow Matching with Physical Guidance, Jiang et al.


  • (ArXiv 2025) Fan et al: 3D Human Interaction Generation: A Survey, Fan et al.


  • (ArXiv 2025) Invisible Strings: Revealing Latent Dancer-to-Dancer Interactions with Graph Neural Networks, Zerkowski et al.


  • (ArXiv 2025)