{"id":29009881,"url":"https://github.com/tencentarc/geometrycrafter","last_synced_at":"2025-06-25T15:33:35.863Z","repository":{"id":285559124,"uuid":"957742919","full_name":"TencentARC/GeometryCrafter","owner":"TencentARC","description":"GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors","archived":false,"fork":false,"pushed_at":"2025-04-28T03:58:58.000Z","size":16596,"stargazers_count":252,"open_issues_count":2,"forks_count":8,"subscribers_count":12,"default_branch":"main","last_synced_at":"2025-04-28T04:30:26.813Z","etag":null,"topics":["depth-estimation","video-to-4d"],"latest_commit_sha":null,"homepage":"https://geometrycrafter.github.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TencentARC.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-03-31T03:38:56.000Z","updated_at":"2025-04-28T03:59:01.000Z","dependencies_parsed_at":"2025-04-28T04:37:22.540Z","dependency_job_id":null,"html_url":"https://github.com/TencentARC/GeometryCrafter","commit_stats":null,"previous_names":["tencentarc/geometrycrafter"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/TencentARC/GeometryCrafter","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FGeometryCrafter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FGeometryCrafter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FGeometryCrafter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FGeometryCrafter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TencentARC","download_url":"https://codeload.github.com/TencentARC/GeometryCrafter/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FGeometryCrafter/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261901405,"owners_count":23227593,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["depth-estimation","video-to-4d"],"created_at":"2025-06-25T15:33:30.074Z","updated_at":"2025-06-25T15:33:35.851Z","avatar_url":"https://github.com/TencentARC.png","language":"Python","readme":"## ___***GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors***___\n\u003cdiv align=\"center\"\u003e\n\u003cimg src='assets/logo.png' style=\"height:100px\"\u003e\u003c/img\u003e\n\u003cbr\u003e\n\n_**[Tian-Xing Xu\u003csup\u003e1\u003c/sup\u003e](https://scholar.google.com/citations?user=zHp0rMIAAAAJ\u0026hl=zh-CN), \n[Xiangjun Gao\u003csup\u003e3\u003c/sup\u003e](https://scholar.google.com/citations?user=qgdesEcAAAAJ\u0026hl=en), \n[Wenbo Hu\u003csup\u003e2 \u0026dagger;\u003c/sup\u003e](https://wbhu.github.io), \n[Xiaoyu Li\u003csup\u003e2\u003c/sup\u003e](https://xiaoyu258.github.io), \n[Song-Hai Zhang\u003csup\u003e1 \u0026dagger;\u003c/sup\u003e](https://scholar.google.com/citations?user=AWtV-EQAAAAJ\u0026hl=en), \n[Ying Shan\u003csup\u003e2\u003c/sup\u003e](https://scholar.google.com/citations?user=4oXBp9UAAAAJ\u0026hl=en)**_\n\u003cbr\u003e\n\u003csup\u003e1\u003c/sup\u003eTsinghua University\n\u003csup\u003e2\u003c/sup\u003eARC Lab, Tencent PCG\n\u003csup\u003e3\u003c/sup\u003eHKUST\n\n![Version](https://img.shields.io/badge/version-1.0.0-blue) \u0026nbsp;\n \u003ca href='https://arxiv.org/abs/2504.01016'\u003e\u003cimg src='https://img.shields.io/badge/arXiv-2504.01016-b31b1b.svg'\u003e\u003c/a\u003e \u0026nbsp;\n \u003ca href='https://geometrycrafter.github.io'\u003e\u003cimg src='https://img.shields.io/badge/Project-Page-Green'\u003e\u003c/a\u003e \u0026nbsp;\n \u003ca href='https://huggingface.co/spaces/TencentARC/GeometryCrafter'\u003e\u003cimg src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-blue'\u003e\u003c/a\u003e \u0026nbsp;\n\n\u003c/div\u003e\n\n## 🔆 Notice\n\n**GeometryCrafter is still under active development!**\n\nWe recommend that everyone use English to communicate on issues, as this helps developers from around the world discuss, share experiences, and answer questions together. For further implementation details, please contact `xutx21@mails.tsinghua.edu.cn`. For business licensing and other related inquiries, don't hesitate to contact `wbhu@tencent.com`.\n\nIf you find GeometryCrafter useful, **please help ⭐ this repo**, which is important to Open-Source projects. Thanks!\n\n## 📝 Introduction\n\nWe present GeometryCrafter, a novel approach that estimates temporally consistent, high-quality point maps from open-world videos, facilitating downstream applications such as 3D/4D reconstruction and depth-based video editing or generation.\n\nRelease Notes:\n- `[28/04/2025]` 🤗🤗🤗 We release our implemented SfM method for in-the-wild videos, based on [SAM2](https://github.com/facebookresearch/sam2), [glue-factory](https://github.com/cvg/glue-factory) and [SpaTracker](https://github.com/henry123-boy/SpaTracker).\n- `[14/04/2025]` 🚀🚀🚀 We provide a `low_memory_usage` option in pipeline for saving GPU memory usage, thanks to [calledit](https://github.com/calledit)'s helpful suggestion. \n- `[01/04/2025]` 🔥🔥🔥**GeometryCrafter** is released now, have fun!\n\n## 🚀 Quick Start\n\n### Installation\n1. Clone this repo:\n```bash\ngit clone --recursive https://github.com/TencentARC/GeometryCrafter\n```\n2. Install dependencies (please refer to [requirements.txt](requirements.txt)):\n```bash\npip install -r requirements.txt\n```\n\n### Inference\n\nRun inference code on our provided demo videos at 1.27FPS, which requires a GPU with ~40GB memory for 110 frames with 1024x576 resolution:\n\n```bash\npython run.py \\\n  --video_path examples/video1.mp4 \\\n  --save_folder workspace/examples_output \\\n  --height 576 --width 1024\n  # resize the input video to the target resolution for processing, which should be divided by 64 \n  # the output point maps will be restored to the original resolution before saving\n  # you can use --downsample_ratio to downsample the input video or reduce --decode_chunk_size to save the memory usage\n```\n\nRun inference code with our deterministic variant at 1.50 FPS\n\n```bash\npython run.py \\\n  --video_path examples/video1.mp4 \\\n  --save_folder workspace/examples_output \\\n  --height 576 --width 1024 \\\n  --model_type determ\n```\n\nRun low-resolution processing at 2.49 FPS, which requires a GPU with ~22GB memory:\n\n```bash\npython run.py \\\n  --video_path examples/video1.mp4 \\\n  --save_folder workspace/examples_output \\\n  --height 384 --width 640\n```\n\nRun low-resolution processing at 1.76 FPS with \u003c20GB memory usage, following the advice of [calledit](https://github.com/calledit) in [Pull Request 1](https://github.com/TencentARC/GeometryCrafter/pull/1):\n\n```bash\npython run.py \\\n  --video_path examples/video1.mp4 \\\n  --save_folder workspace/examples_output \\\n  --height 384 --width 640 \\\n  --low_memory_usage True \\\n  --decode_chunk_size 6\n```\n\n### Visualization\n\nVisualize the predicted point maps with `Viser`\n\n```bash\npython visualize/vis_point_maps.py \\\n  --video_path examples/video1.mp4 \\\n  --data_path workspace/examples_output/video1.npz\n```\n\n## 🤖 Gradio Demo\n\n- Online demo: [**GeometryCrafter**](https://huggingface.co/spaces/TencentARC/GeometryCrafter)\n- Local demo:\n  ```bash\n  gradio app.py\n  ```\n\n## 📊 Dataset Evaluation\n\nPlease check the `evaluation` folder. \n- To create the dataset we use in the paper, you need to run `evaluation/preprocess/gen_{dataset_name}.py`.\n- You need to change `DATA_DIR` and `OUTPUT_DIR` first accordint to your working environment.\n- Then you will get the preprocessed datasets containing extracted RGB video and point map npz files. We also provide the catelog of these files.\n- Inference for all datasets scripts:\n  ```bash\n  bash evaluation/run_batch.sh\n  ```\n  (Remember to replace the `data_root_dir` and `save_root_dir` with your path.)\n- Evaluation for all datasets scripts (scale-invariant point map estimation):\n  ```bash\n  bash evaluation/eval.sh\n  ```\n   (Remember to replace the `pred_data_root_dir` and `gt_data_root_dir` with your path.)\n- Evaluation for all datasets scripts (affine-invariant depth estimation):\n  ```bash\n  bash evaluation/eval_depth.sh\n  ```\n   (Remember to replace the `pred_data_root_dir` and `gt_data_root_dir` with your path.)\n- We also provide the comparison results of MoGe and the deterministic variant of our method. You can evaluate these methods under the same protocol by uncomment the corresponding lines in `evaluation/run.sh` `evaluation/eval.sh` `evaluation/run_batch.sh` and `evaluation/eval_depth.sh`.\n\n## 📷 Camera Pose Estimation for In-the-wild Videos\n\nLeveraging the temporally consistent point maps output by GeometryCrafter, we implement a camera pose estimation method designed for in-the-wild videos. We hope that our work will serve as a launchpad for 4D reconstruction. Our implementation can be summarized as follows\n- Segment the dynamic objects from the video with [SAM2](https://github.com/facebookresearch/sam2). We refer to a huggingface demo [here](https://huggingface.co/spaces/fffiloni/SAM2-Video-Predictor), thanks to [fffiloni](https://huggingface.co/fffiloni)'s great work.\n- Find a set of feature points in the static background with SIFT and SuperPoint implemented by [glue-factory](https://github.com/cvg/glue-factory) \n- Track these points with [SpaTracker](https://github.com/henry123-boy/SpaTracker), which takes the monocular video and metric depth maps as input.\n- Use gradient descent to solve the point-set rigid transformation problem (3-DoF rotation and 3-DoF translation), based on the tracking results. More details can be found in our paper.\n\n```bash\n# We provide an example here\nVIDEO_PATH=examples/video7.mp4\nPOINT_MAP_PATH=workspace/examples_output/video7.npz\nMASK_PATH=examples/video7_mask.mp4\nTRACK_DIR=workspace/trackers/video7\nSFM_DIR=workspace/sfm/video7\n\n# Download the checkpoints of SpaTracker and Superpoint and put them in the following path\n# - pretrained_models/spaT_final.pth\n# - pretrained_models/superpoint_v6_from_tf.pth\n\n# Here's the urls\n# - SpaTracker: https://drive.google.com/drive/folders/1UtzUJLPhJdUg2XvemXXz1oe6KUQKVjsZ?usp=sharing\n# - SuperPoint: https://github.com/rpautrat/SuperPoint/raw/master/weights/superpoint_v6_from_tf.pth\n\npython sfm/run_track.py \\\n    --video_path ${VIDEO_PATH} \\\n    --point_map_path ${POINT_MAP_PATH} \\\n    --mask_path ${MASK_PATH} \\\n    --out_dir ${TRACK_DIR} \\\n    --vis_dir ${TRACK_DIR} \\\n    --use_ori_res \\\n    --spatracker_checkpoint pretrained_models/spaT_final.pth \\\n    --superpoint_checkpoint pretrained_models/superpoint_v6_from_tf.pth\n\npython sfm/run.py \\\n    --num_iterations 2000 \\\n    --video_path ${VIDEO_PATH} \\\n    --point_map_path ${POINT_MAP_PATH} \\\n    --mask_path ${MASK_PATH} \\\n    --track_dir ${TRACK_DIR} \\\n    --out_dir ${SFM_DIR} \\\n    --use_ori_res\n\n# You'll find the processed dataset used for 4D reconstruction in ${SFM_DIR}\n# Visualize per-frame point maps in the world coordinates\n\npython sfm/vis_points.py \\\n    --sfm_dir ${SFM_DIR}\n\n```\n\n⚠️ Camera pose estimation is **NOT** the primary objective and the core contribution of GeometryCrafter. This simplified application just demonstrates the potential for 4D reconstruction using GeometryCrafter. If you find it useful, **please help ⭐ this repo**. \n\n⚠️ According to our experiments, it exhibits less robustness in certain cases. Camera pose estimation for dynamic videos remains a challenging problem for researchers.\n\n\n## 🤝 Contributing\n\n- Welcome to open issues and pull requests.\n- Welcome to optimize the inference speed and memory usage, e.g., through model quantization, distillation, or other acceleration techniques.\n\n## ❤️ Acknowledgement\n\nWe have used codes from other great research work, including [DepthCrafter](https://github.com/Tencent/DepthCrafter), [MoGe](https://github.com/microsoft/moge), [SAM2](https://github.com/facebookresearch/sam2), [glue-factory](https://github.com/cvg/glue-factory) and [SpaTracker](https://github.com/henry123-boy/SpaTracker). We sincerely thank the authors for their awesome work!\n\n## 📜 Citation\n\nIf you find this work helpful, please consider citing:\n\n```BibTeXw\n@article{xu2025geometrycrafter,\n  title={GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors},\n  author={Xu, Tian-Xing and Gao, Xiangjun and Hu, Wenbo and Li, Xiaoyu and Zhang, Song-Hai and Shan, Ying},\n  journal={arXiv preprint arXiv:2504.01016},\n  year={2025}\n}\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftencentarc%2Fgeometrycrafter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftencentarc%2Fgeometrycrafter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftencentarc%2Fgeometrycrafter/lists"}