https://github.com/robingg1/PoseTraj

[CVPR 2025] PoseTraj: Pose-Aware Trajectory Control in Video Diffusion
https://github.com/robingg1/PoseTraj

Last synced: 3 months ago
JSON representation

[CVPR 2025] PoseTraj: Pose-Aware Trajectory Control in Video Diffusion

Host: GitHub
URL: https://github.com/robingg1/PoseTraj
Owner: robingg1
Created: 2025-03-02T16:58:08.000Z (4 months ago)
Default Branch: main
Last Pushed: 2025-03-18T09:44:19.000Z (3 months ago)
Last Synced: 2025-03-18T10:36:45.432Z (3 months ago)
Language: Python
Size: 13.1 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-diffusion-categorized - [Code

README

# PoseTraj
### [CVPR 2025] PoseTraj: Pose-Aware Trajectory Control in Video Diffusion

Official implementation of paper ["PoseTraj: Pose-Aware Trajectory Control in Video Diffusion"](https://arxiv.org/abs/2503.16068).

## **Updates**

- [ ] Support gradio demo/ More Checkpoints.
- [ ] Release checkpoint on VIPSeg.
- [x] Release training and inference code.
- [x] Release dataset and rendering process.
- [x] Repo initalization.

---

## Abstract
Recent advancements in trajectory-guided video generation have achieved notable progress.
However, existing models still face challenges in generating object motions with potentially changing 6D poses under wide-range rotations, due to limited 3D understanding.
To address this problem, we introduce PoseTraj, a pose-aware video dragging model for generating 3D-aligned motion from 2D trajectories.
Our method adopts a novel two-stage pose-aware pretraining framework, improving 3D understanding across diverse trajectories.
Specifically, we propose a large-scale synthetic dataset PoseTraj-10k, containing 10k videos of objects following rotational trajectories, and enhance the model perception of object pose changes by incorporating 3D bounding boxes as intermediate supervision signals.
Following this, we fine-tune the trajectory-controlling module on real-world videos, applying an additional camera-disentanglement module to further refine motion accuracy.
Experiments on various benchmark datasets demonstrate that our method not only excels in 3D pose-aligned dragging for rotational trajectories but also outperforms existing baselines in trajectory accuracy and video quality.

---

## Pose-Aware Dragging for Rotational Motions

Input Image
Drag Trajectory
Generated Video

## **1. Environment Setup**
### **Step 1: Create and Activate the Environment**
```bash
conda create -n PoseTraj python=3.8
conda activate PoseTraj
pip install -r requirements.txt
```
### **Step 2. Download model weights**
Download SVD model weights from [hub](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid).

## **2. Dataset Preparation**
You can either use our pre-processed dataset or create your own.

### **Option 1: Download Prebuilt Dataset**
[split1](https://drive.google.com/file/d/17bF1lKoAfCBWDbIMJ2jXRuDmGhj6d0vC/view?usp=drive_link), [split2](https://drive.google.com/file/d/14nsLOFXVB1YUPVjR5hrNNR70k4na9FNo/view?usp=drive_link)
### **Option 2: Construct Your Own Dataset**
Refer to the detailed steps in data_render/ to generate your own dataset.

## **3. Runing Inference**
To perform inference, simply run:
```
python scripts/run_inference_vipseg_json_repro.py
```
Gradio Demo will soon be supported !

## **4. Training Instructions**
### Two-stage Pretrain:
```
sh scripts/start_10k_pretrain.sh
```
### Open-domain Finetuning:
```
# with camera-disentangle
sh scripts/start_ft_cam.sh

# without camera-disentangle
sh scripts/start_ft.sh
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/robingg1/PoseTraj

Awesome Lists containing this project

README