https://github.com/robingg1/PoseTraj
[CVPR 2025] PoseTraj: Pose-Aware Trajectory Control in Video Diffusion
https://github.com/robingg1/PoseTraj
Last synced: 3 months ago
JSON representation
[CVPR 2025] PoseTraj: Pose-Aware Trajectory Control in Video Diffusion
- Host: GitHub
- URL: https://github.com/robingg1/PoseTraj
- Owner: robingg1
- Created: 2025-03-02T16:58:08.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-03-18T09:44:19.000Z (3 months ago)
- Last Synced: 2025-03-18T10:36:45.432Z (3 months ago)
- Language: Python
- Size: 13.1 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-diffusion-categorized - [Code
README
# PoseTraj
### [CVPR 2025] PoseTraj: Pose-Aware Trajectory Control in Video DiffusionOfficial implementation of paper ["PoseTraj: Pose-Aware Trajectory Control in Video Diffusion"](https://arxiv.org/abs/2503.16068).
![]()
## **Updates**
- [ ] Support gradio demo/ More Checkpoints.
- [ ] Release checkpoint on VIPSeg.
- [x] Release training and inference code.
- [x] Release dataset and rendering process.
- [x] Repo initalization.---
## Abstract
Recent advancements in trajectory-guided video generation have achieved notable progress.
However, existing models still face challenges in generating object motions with potentially changing 6D poses under wide-range rotations, due to limited 3D understanding.
To address this problem, we introduce PoseTraj, a pose-aware video dragging model for generating 3D-aligned motion from 2D trajectories.
Our method adopts a novel two-stage pose-aware pretraining framework, improving 3D understanding across diverse trajectories.
Specifically, we propose a large-scale synthetic dataset PoseTraj-10k, containing 10k videos of objects following rotational trajectories, and enhance the model perception of object pose changes by incorporating 3D bounding boxes as intermediate supervision signals.
Following this, we fine-tune the trajectory-controlling module on real-world videos, applying an additional camera-disentanglement module to further refine motion accuracy.
Experiments on various benchmark datasets demonstrate that our method not only excels in 3D pose-aligned dragging for rotational trajectories but also outperforms existing baselines in trajectory accuracy and video quality.---
## Pose-Aware Dragging for Rotational Motions
Input Image
Drag Trajectory
Generated Video
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
## **1. Environment Setup**
### **Step 1: Create and Activate the Environment**
```bash
conda create -n PoseTraj python=3.8
conda activate PoseTraj
pip install -r requirements.txt
```
### **Step 2. Download model weights**
Download SVD model weights from [hub](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid).## **2. Dataset Preparation**
You can either use our pre-processed dataset or create your own.### **Option 1: Download Prebuilt Dataset**
[split1](https://drive.google.com/file/d/17bF1lKoAfCBWDbIMJ2jXRuDmGhj6d0vC/view?usp=drive_link), [split2](https://drive.google.com/file/d/14nsLOFXVB1YUPVjR5hrNNR70k4na9FNo/view?usp=drive_link)
### **Option 2: Construct Your Own Dataset**
Refer to the detailed steps in data_render/ to generate your own dataset.## **3. Runing Inference**
To perform inference, simply run:
```
python scripts/run_inference_vipseg_json_repro.py
```
Gradio Demo will soon be supported !## **4. Training Instructions**
### Two-stage Pretrain:
```
sh scripts/start_10k_pretrain.sh
```
### Open-domain Finetuning:
```
# with camera-disentangle
sh scripts/start_ft_cam.sh# without camera-disentangle
sh scripts/start_ft.sh
```