https://github.com/tencentarc/geometrycrafter
GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors
https://github.com/tencentarc/geometrycrafter
depth-estimation video-to-4d
Last synced: 10 months ago
JSON representation
GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors
- Host: GitHub
- URL: https://github.com/tencentarc/geometrycrafter
- Owner: TencentARC
- License: other
- Created: 2025-03-31T03:38:56.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-04-28T03:58:58.000Z (12 months ago)
- Last Synced: 2025-04-28T04:30:26.813Z (12 months ago)
- Topics: depth-estimation, video-to-4d
- Language: Python
- Homepage: https://geometrycrafter.github.io/
- Size: 15.8 MB
- Stars: 252
- Watchers: 12
- Forks: 8
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## ___***GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors***___
_**[Tian-Xing Xu1](https://scholar.google.com/citations?user=zHp0rMIAAAAJ&hl=zh-CN),
[Xiangjun Gao3](https://scholar.google.com/citations?user=qgdesEcAAAAJ&hl=en),
[Wenbo Hu2 †](https://wbhu.github.io),
[Xiaoyu Li2](https://xiaoyu258.github.io),
[Song-Hai Zhang1 †](https://scholar.google.com/citations?user=AWtV-EQAAAAJ&hl=en),
[Ying Shan2](https://scholar.google.com/citations?user=4oXBp9UAAAAJ&hl=en)**_
1Tsinghua University
2ARC Lab, Tencent PCG
3HKUST
## 🔆 Notice
**GeometryCrafter is still under active development!**
We recommend that everyone use English to communicate on issues, as this helps developers from around the world discuss, share experiences, and answer questions together. For further implementation details, please contact `xutx21@mails.tsinghua.edu.cn`. For business licensing and other related inquiries, don't hesitate to contact `wbhu@tencent.com`.
If you find GeometryCrafter useful, **please help ⭐ this repo**, which is important to Open-Source projects. Thanks!
## 📝 Introduction
We present GeometryCrafter, a novel approach that estimates temporally consistent, high-quality point maps from open-world videos, facilitating downstream applications such as 3D/4D reconstruction and depth-based video editing or generation.
Release Notes:
- `[28/04/2025]` 🤗🤗🤗 We release our implemented SfM method for in-the-wild videos, based on [SAM2](https://github.com/facebookresearch/sam2), [glue-factory](https://github.com/cvg/glue-factory) and [SpaTracker](https://github.com/henry123-boy/SpaTracker).
- `[14/04/2025]` 🚀🚀🚀 We provide a `low_memory_usage` option in pipeline for saving GPU memory usage, thanks to [calledit](https://github.com/calledit)'s helpful suggestion.
- `[01/04/2025]` 🔥🔥🔥**GeometryCrafter** is released now, have fun!
## 🚀 Quick Start
### Installation
1. Clone this repo:
```bash
git clone --recursive https://github.com/TencentARC/GeometryCrafter
```
2. Install dependencies (please refer to [requirements.txt](requirements.txt)):
```bash
pip install -r requirements.txt
```
### Inference
Run inference code on our provided demo videos at 1.27FPS, which requires a GPU with ~40GB memory for 110 frames with 1024x576 resolution:
```bash
python run.py \
--video_path examples/video1.mp4 \
--save_folder workspace/examples_output \
--height 576 --width 1024
# resize the input video to the target resolution for processing, which should be divided by 64
# the output point maps will be restored to the original resolution before saving
# you can use --downsample_ratio to downsample the input video or reduce --decode_chunk_size to save the memory usage
```
Run inference code with our deterministic variant at 1.50 FPS
```bash
python run.py \
--video_path examples/video1.mp4 \
--save_folder workspace/examples_output \
--height 576 --width 1024 \
--model_type determ
```
Run low-resolution processing at 2.49 FPS, which requires a GPU with ~22GB memory:
```bash
python run.py \
--video_path examples/video1.mp4 \
--save_folder workspace/examples_output \
--height 384 --width 640
```
Run low-resolution processing at 1.76 FPS with <20GB memory usage, following the advice of [calledit](https://github.com/calledit) in [Pull Request 1](https://github.com/TencentARC/GeometryCrafter/pull/1):
```bash
python run.py \
--video_path examples/video1.mp4 \
--save_folder workspace/examples_output \
--height 384 --width 640 \
--low_memory_usage True \
--decode_chunk_size 6
```
### Visualization
Visualize the predicted point maps with `Viser`
```bash
python visualize/vis_point_maps.py \
--video_path examples/video1.mp4 \
--data_path workspace/examples_output/video1.npz
```
## 🤖 Gradio Demo
- Online demo: [**GeometryCrafter**](https://huggingface.co/spaces/TencentARC/GeometryCrafter)
- Local demo:
```bash
gradio app.py
```
## 📊 Dataset Evaluation
Please check the `evaluation` folder.
- To create the dataset we use in the paper, you need to run `evaluation/preprocess/gen_{dataset_name}.py`.
- You need to change `DATA_DIR` and `OUTPUT_DIR` first accordint to your working environment.
- Then you will get the preprocessed datasets containing extracted RGB video and point map npz files. We also provide the catelog of these files.
- Inference for all datasets scripts:
```bash
bash evaluation/run_batch.sh
```
(Remember to replace the `data_root_dir` and `save_root_dir` with your path.)
- Evaluation for all datasets scripts (scale-invariant point map estimation):
```bash
bash evaluation/eval.sh
```
(Remember to replace the `pred_data_root_dir` and `gt_data_root_dir` with your path.)
- Evaluation for all datasets scripts (affine-invariant depth estimation):
```bash
bash evaluation/eval_depth.sh
```
(Remember to replace the `pred_data_root_dir` and `gt_data_root_dir` with your path.)
- We also provide the comparison results of MoGe and the deterministic variant of our method. You can evaluate these methods under the same protocol by uncomment the corresponding lines in `evaluation/run.sh` `evaluation/eval.sh` `evaluation/run_batch.sh` and `evaluation/eval_depth.sh`.
## 📷 Camera Pose Estimation for In-the-wild Videos
Leveraging the temporally consistent point maps output by GeometryCrafter, we implement a camera pose estimation method designed for in-the-wild videos. We hope that our work will serve as a launchpad for 4D reconstruction. Our implementation can be summarized as follows
- Segment the dynamic objects from the video with [SAM2](https://github.com/facebookresearch/sam2). We refer to a huggingface demo [here](https://huggingface.co/spaces/fffiloni/SAM2-Video-Predictor), thanks to [fffiloni](https://huggingface.co/fffiloni)'s great work.
- Find a set of feature points in the static background with SIFT and SuperPoint implemented by [glue-factory](https://github.com/cvg/glue-factory)
- Track these points with [SpaTracker](https://github.com/henry123-boy/SpaTracker), which takes the monocular video and metric depth maps as input.
- Use gradient descent to solve the point-set rigid transformation problem (3-DoF rotation and 3-DoF translation), based on the tracking results. More details can be found in our paper.
```bash
# We provide an example here
VIDEO_PATH=examples/video7.mp4
POINT_MAP_PATH=workspace/examples_output/video7.npz
MASK_PATH=examples/video7_mask.mp4
TRACK_DIR=workspace/trackers/video7
SFM_DIR=workspace/sfm/video7
# Download the checkpoints of SpaTracker and Superpoint and put them in the following path
# - pretrained_models/spaT_final.pth
# - pretrained_models/superpoint_v6_from_tf.pth
# Here's the urls
# - SpaTracker: https://drive.google.com/drive/folders/1UtzUJLPhJdUg2XvemXXz1oe6KUQKVjsZ?usp=sharing
# - SuperPoint: https://github.com/rpautrat/SuperPoint/raw/master/weights/superpoint_v6_from_tf.pth
python sfm/run_track.py \
--video_path ${VIDEO_PATH} \
--point_map_path ${POINT_MAP_PATH} \
--mask_path ${MASK_PATH} \
--out_dir ${TRACK_DIR} \
--vis_dir ${TRACK_DIR} \
--use_ori_res \
--spatracker_checkpoint pretrained_models/spaT_final.pth \
--superpoint_checkpoint pretrained_models/superpoint_v6_from_tf.pth
python sfm/run.py \
--num_iterations 2000 \
--video_path ${VIDEO_PATH} \
--point_map_path ${POINT_MAP_PATH} \
--mask_path ${MASK_PATH} \
--track_dir ${TRACK_DIR} \
--out_dir ${SFM_DIR} \
--use_ori_res
# You'll find the processed dataset used for 4D reconstruction in ${SFM_DIR}
# Visualize per-frame point maps in the world coordinates
python sfm/vis_points.py \
--sfm_dir ${SFM_DIR}
```
⚠️ Camera pose estimation is **NOT** the primary objective and the core contribution of GeometryCrafter. This simplified application just demonstrates the potential for 4D reconstruction using GeometryCrafter. If you find it useful, **please help ⭐ this repo**.
⚠️ According to our experiments, it exhibits less robustness in certain cases. Camera pose estimation for dynamic videos remains a challenging problem for researchers.
## 🤝 Contributing
- Welcome to open issues and pull requests.
- Welcome to optimize the inference speed and memory usage, e.g., through model quantization, distillation, or other acceleration techniques.
## ❤️ Acknowledgement
We have used codes from other great research work, including [DepthCrafter](https://github.com/Tencent/DepthCrafter), [MoGe](https://github.com/microsoft/moge), [SAM2](https://github.com/facebookresearch/sam2), [glue-factory](https://github.com/cvg/glue-factory) and [SpaTracker](https://github.com/henry123-boy/SpaTracker). We sincerely thank the authors for their awesome work!
## 📜 Citation
If you find this work helpful, please consider citing:
```BibTeXw
@article{xu2025geometrycrafter,
title={GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors},
author={Xu, Tian-Xing and Gao, Xiangjun and Hu, Wenbo and Li, Xiaoyu and Zhang, Song-Hai and Shan, Ying},
journal={arXiv preprint arXiv:2504.01016},
year={2025}
}
```