{"id":26159676,"url":"https://github.com/DepthAnything/PromptDA","last_synced_at":"2025-03-11T11:33:34.126Z","repository":{"id":268606775,"uuid":"904578990","full_name":"DepthAnything/PromptDA","owner":"DepthAnything","description":"[CVPR 2025] Prompt Depth Anything","archived":false,"fork":false,"pushed_at":"2025-03-04T15:13:26.000Z","size":33019,"stargazers_count":586,"open_issues_count":11,"forks_count":32,"subscribers_count":12,"default_branch":"main","last_synced_at":"2025-03-04T16:26:28.051Z","etag":null,"topics":["3d-reconstruction","4d-reconstruction","depth-estimation","robotics-grasping"],"latest_commit_sha":null,"homepage":"https://promptda.github.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DepthAnything.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-17T06:56:47.000Z","updated_at":"2025-03-04T15:13:30.000Z","dependencies_parsed_at":"2025-02-17T16:39:18.219Z","dependency_job_id":null,"html_url":"https://github.com/DepthAnything/PromptDA","commit_stats":null,"previous_names":["depthanything/promptda"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DepthAnything%2FPromptDA","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DepthAnything%2FPromptDA/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DepthAnything%2FPromptDA/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DepthAnything%2FPromptDA/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DepthAnything","download_url":"https://codeload.github.com/DepthAnything/PromptDA/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243025969,"owners_count":20223902,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-reconstruction","4d-reconstruction","depth-estimation","robotics-grasping"],"created_at":"2025-03-11T11:33:28.463Z","updated_at":"2025-03-11T11:33:34.086Z","avatar_url":"https://github.com/DepthAnything.png","language":"Python","readme":"# Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation\n### [Project Page](https://promptda.github.io/) | [Paper](https://promptda.github.io/assets/main_paper_with_supp.pdf) | [Hugging Face Demo](https://huggingface.co/spaces/depth-anything/PromptDA) | [Interactive Results](https://promptda.github.io/interactive.html) | [Data](https://promptda.github.io/)\n\n\u003e Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation  \n\u003e [Haotong Lin](https://haotongl.github.io/),\n[Sida Peng](https://pengsida.net/),\n[Jingxiao Chen](https://scholar.google.com/citations?user=-zs1V28AAAAJ),\n[Songyou Peng](https://pengsongyou.github.io/),\n[Jiaming Sun](https://jiamingsun.me/),\n[Minghuan Liu](https://minghuanliu.com/),\n[Hujun Bao](http://www.cad.zju.edu.cn/home/bao/),\n[Jiashi Feng](https://scholar.google.com/citations?user=Q8iay0gAAAAJ),\n[Xiaowei Zhou](https://www.xzhou.me/),\n[Bingyi Kang](https://bingykang.github.io/)  \n\u003e CVPR 2025\n\n![teaser](assets/teaser.gif)\n\n\n## 🛠️ Installation\n\n\u003cdetails\u003e \u003csummary\u003e Setting up the environment \u003c/summary\u003e\n\n```bash\ngit clone https://github.com/DepthAnything/PromptDA.git\ncd PromptDA\npip install -r requirements.txt\npip install -e .\nsudo apt install ffmpeg  # for video generation\n```\n\u003c/details\u003e\n\u003cdetails\u003e \u003csummary\u003e Pre-trained Models \u003c/summary\u003e\n\n| Model | Params | Checkpoint |\n|:-|-:|:-:|\n| Prompt-Depth-Anything-Large | 340M | [Download](https://huggingface.co/depth-anything/prompt-depth-anything-vitl/resolve/main/model.ckpt) |\n| Prompt-Depth-Anything-Small | 25.1M | [Download](https://huggingface.co/depth-anything/prompt-depth-anything-vits/resolve/main/model.ckpt) |\n| Prompt-Depth-Anything-Small-Transparent | 25.1M | [Download](https://huggingface.co/depth-anything/prompt-depth-anything-vits-transparent/resolve/main/model.ckpt) |\n\nOnly Prompt-Depth-Anything-Large is used to benchmark in our paper. Prompt-Depth-Anything-Small-Transparent is further fine-tuned 10K steps with [hammer dataset](https://github.com/Junggy/HAMMER-dataset) with our iPhone lidar simulation method to improve the performance on transparent objects.\n\n\u003c/details\u003e\n\n\n## 🚀 Usage\n\u003cdetails\u003e \u003csummary\u003e Example usage \u003c/summary\u003e\n\n```python\nfrom promptda.promptda import PromptDA\nfrom promptda.utils.io_wrapper import load_image, load_depth, save_depth\n\nDEVICE = 'cuda'\nimage_path = \"assets/example_images/image.jpg\"\nprompt_depth_path = \"assets/example_images/arkit_depth.png\"\nimage = load_image(image_path).to(DEVICE)\nprompt_depth = load_depth(prompt_depth_path).to(DEVICE) # 192x256, ARKit LiDAR depth in meters\n\nmodel = PromptDA.from_pretrained(\"depth-anything/prompt-depth-anything-vitl\").to(DEVICE).eval()\ndepth = model.predict(image, prompt_depth) # HxW, depth in meters\n\nsave_depth(depth, prompt_depth=prompt_depth, image=image)\n```\n\u003c/details\u003e\n\n\n## 📸 Running on your own capture\n\nYou can use [Stray Scanner App](https://apps.apple.com/us/app/stray-scanner/id1557051662) to capture your own data, which requires iPhone 12 Pro or later Pro models, iPad 2020 Pro or later Pro models. We setup a [Hugging Face Space](https://huggingface.co/spaces/depth-anything/PromptDA) for you to quickly test our model. If you want to obtain video results, please follow the following steps.\n\n\u003cdetails\u003e \u003csummary\u003e Testing steps \u003c/summary\u003e\n\n1. Capture a scene with the Stray Scanner App. (The charging port is preferred to face downward or to the right.)\n2. Use the iPhone Files App to compress it into a zip file and transfer it to your computer. Here is an [example screen recording](https://haotongl.github.io/promptda/assets/ScreenRecording_12-16-2024.mp4).\n3. Run the following commands to infer our model and generate the video results.\n```bash\nexport PATH_TO_ZIP_FILE=data/8b98276b0a.zip # Replace with your own zip file path\nexport PATH_TO_SAVE_FOLDER=data/8b98276b0a_results # Replace with your own save folder path\npython3 -m promptda.scripts.infer_stray_scan --input_path ${PATH_TO_ZIP_FILE} --output_path ${PATH_TO_SAVE_FOLDER}\npython3 -m promptda.scripts.generate_video process_stray_scan --input_path ${PATH_TO_ZIP_FILE} --result_path ${PATH_TO_SAVE_FOLDER}\nffmpeg -framerate 60 -i ${PATH_TO_SAVE_FOLDER}/%06d_smooth.jpg  -c:v libx264 -pix_fmt yuv420p ${PATH_TO_SAVE_FOLDER}.mp4\n```\n\u003c/details\u003e\n\n\n## 👏 Acknowledgements\nWe thank the generous support from Prof. [Weinan Zhang](https://wnzhang.net/) for robot experiments, including the space, objects and the Unitree H1 robot. We also thank [Zhengbang Zhu](https://scholar.google.com/citations?user=ozatRA0AAAAJ), Jiahang Cao, Xinyao Li, Wentao Dong for their help in setting up the robot platform and collecting robot data.\n\n## 📚 Citation\nIf you find this code useful for your research, please use the following BibTeX entry\n```\n@inproceedings{lin2024promptda,\n  title={Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation},\n  author={Lin, Haotong and Peng, Sida and Chen, Jingxiao and Peng, Songyou and Sun, Jiaming and Liu, Minghuan and Bao, Hujun and Feng, Jiashi and Zhou, Xiaowei and Kang, Bingyi},\n  journal={arXiv},\n  year={2024}\n}\n```\n","funding_links":[],"categories":["Uncategorized"],"sub_categories":["Uncategorized"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDepthAnything%2FPromptDA","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FDepthAnything%2FPromptDA","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDepthAnything%2FPromptDA/lists"}