{"id":15033477,"url":"https://github.com/depthanything/depth-anything-v2","last_synced_at":"2025-05-12T20:52:28.965Z","repository":{"id":245151621,"uuid":"814827405","full_name":"DepthAnything/Depth-Anything-V2","owner":"DepthAnything","description":"[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation","archived":false,"fork":false,"pushed_at":"2025-01-22T07:53:27.000Z","size":46233,"stargazers_count":5308,"open_issues_count":202,"forks_count":479,"subscribers_count":45,"default_branch":"main","last_synced_at":"2025-04-23T18:41:05.169Z","etag":null,"topics":["monocular-depth-estimation"],"latest_commit_sha":null,"homepage":"https://depth-anything-v2.github.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DepthAnything.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-13T19:39:39.000Z","updated_at":"2025-04-23T11:35:47.000Z","dependencies_parsed_at":"2025-02-20T01:01:33.298Z","dependency_job_id":"98ab6a4b-3d22-49c2-8d1b-6317b134fe6a","html_url":"https://github.com/DepthAnything/Depth-Anything-V2","commit_stats":{"total_commits":17,"total_committers":3,"mean_commits":5.666666666666667,"dds":"0.17647058823529416","last_synced_commit":"28ad5a0797dfb8ac76d1e3dcddbe2160cbcc6c8d"},"previous_names":["depthanything/depth-anything-v2"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DepthAnything%2FDepth-Anything-V2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DepthAnything%2FDepth-Anything-V2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DepthAnything%2FDepth-Anything-V2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DepthAnything%2FDepth-Anything-V2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DepthAnything","download_url":"https://codeload.github.com/DepthAnything/Depth-Anything-V2/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253821609,"owners_count":21969738,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["monocular-depth-estimation"],"created_at":"2024-09-24T20:21:23.503Z","updated_at":"2025-05-12T20:52:28.931Z","avatar_url":"https://github.com/DepthAnything.png","language":"Python","readme":"\u003cdiv align=\"center\"\u003e\n\u003ch1\u003eDepth Anything V2\u003c/h1\u003e\n\n[**Lihe Yang**](https://liheyoung.github.io/)\u003csup\u003e1\u003c/sup\u003e · [**Bingyi Kang**](https://bingykang.github.io/)\u003csup\u003e2\u0026dagger;\u003c/sup\u003e · [**Zilong Huang**](http://speedinghzl.github.io/)\u003csup\u003e2\u003c/sup\u003e\n\u003cbr\u003e\n[**Zhen Zhao**](http://zhaozhen.me/) · [**Xiaogang Xu**](https://xiaogang00.github.io/) · [**Jiashi Feng**](https://sites.google.com/site/jshfeng/)\u003csup\u003e2\u003c/sup\u003e · [**Hengshuang Zhao**](https://hszhao.github.io/)\u003csup\u003e1*\u003c/sup\u003e\n\n\u003csup\u003e1\u003c/sup\u003eHKU\u0026emsp;\u0026emsp;\u0026emsp;\u003csup\u003e2\u003c/sup\u003eTikTok\n\u003cbr\u003e\n\u0026dagger;project lead\u0026emsp;*corresponding author\n\n\u003ca href=\"https://arxiv.org/abs/2406.09414\"\u003e\u003cimg src='https://img.shields.io/badge/arXiv-Depth Anything V2-red' alt='Paper PDF'\u003e\u003c/a\u003e\n\u003ca href='https://depth-anything-v2.github.io'\u003e\u003cimg src='https://img.shields.io/badge/Project_Page-Depth Anything V2-green' alt='Project Page'\u003e\u003c/a\u003e\n\u003ca href='https://huggingface.co/spaces/depth-anything/Depth-Anything-V2'\u003e\u003cimg src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-blue'\u003e\u003c/a\u003e\n\u003ca href='https://huggingface.co/datasets/depth-anything/DA-2K'\u003e\u003cimg src='https://img.shields.io/badge/Benchmark-DA--2K-yellow' alt='Benchmark'\u003e\u003c/a\u003e\n\u003c/div\u003e\n\nThis work presents Depth Anything V2. It significantly outperforms [V1](https://github.com/LiheYoung/Depth-Anything) in fine-grained details and robustness. Compared with SD-based models, it enjoys faster inference speed, fewer parameters, and higher depth accuracy.\n\n![teaser](assets/teaser.png)\n\n\n## News\n- **2025-01-22:** [Video Depth Anything](https://videodepthanything.github.io) has been released. It generates consistent depth maps for super-long videos (e.g., over 5 minutes).\n- **2024-12-22:** [Prompt Depth Anything](https://promptda.github.io/) has been released. It supports 4K resolution metric depth estimation when low-res LiDAR is used to prompt the DA models.\n- **2024-07-06:** Depth Anything V2 is supported in [Transformers](https://github.com/huggingface/transformers/). See the [instructions](https://huggingface.co/docs/transformers/main/en/model_doc/depth_anything_v2) for convenient usage.\n- **2024-06-25:** Depth Anything is integrated into [Apple Core ML Models](https://developer.apple.com/machine-learning/models/). See the instructions ([V1](https://huggingface.co/apple/coreml-depth-anything-small), [V2](https://huggingface.co/apple/coreml-depth-anything-v2-small)) for usage.\n- **2024-06-22:** We release [smaller metric depth models](https://github.com/DepthAnything/Depth-Anything-V2/tree/main/metric_depth#pre-trained-models) based on Depth-Anything-V2-Small and Base.\n- **2024-06-20:** Our repository and project page are flagged by GitHub and removed from the public for 6 days. Sorry for the inconvenience.\n- **2024-06-14:** Paper, project page, code, models, demo, and benchmark are all released.\n\n\n## Pre-trained Models\n\nWe provide **four models** of varying scales for robust relative depth estimation:\n\n| Model | Params | Checkpoint |\n|:-|-:|:-:|\n| Depth-Anything-V2-Small | 24.8M | [Download](https://huggingface.co/depth-anything/Depth-Anything-V2-Small/resolve/main/depth_anything_v2_vits.pth?download=true) |\n| Depth-Anything-V2-Base | 97.5M | [Download](https://huggingface.co/depth-anything/Depth-Anything-V2-Base/resolve/main/depth_anything_v2_vitb.pth?download=true) |\n| Depth-Anything-V2-Large | 335.3M | [Download](https://huggingface.co/depth-anything/Depth-Anything-V2-Large/resolve/main/depth_anything_v2_vitl.pth?download=true) |\n| Depth-Anything-V2-Giant | 1.3B | Coming soon |\n\n\n## Usage\n\n### Prepraration\n\n```bash\ngit clone https://github.com/DepthAnything/Depth-Anything-V2\ncd Depth-Anything-V2\npip install -r requirements.txt\n```\n\nDownload the checkpoints listed [here](#pre-trained-models) and put them under the `checkpoints` directory.\n\n### Use our models\n```python\nimport cv2\nimport torch\n\nfrom depth_anything_v2.dpt import DepthAnythingV2\n\nDEVICE = 'cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu'\n\nmodel_configs = {\n    'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]},\n    'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]},\n    'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]},\n    'vitg': {'encoder': 'vitg', 'features': 384, 'out_channels': [1536, 1536, 1536, 1536]}\n}\n\nencoder = 'vitl' # or 'vits', 'vitb', 'vitg'\n\nmodel = DepthAnythingV2(**model_configs[encoder])\nmodel.load_state_dict(torch.load(f'checkpoints/depth_anything_v2_{encoder}.pth', map_location='cpu'))\nmodel = model.to(DEVICE).eval()\n\nraw_img = cv2.imread('your/image/path')\ndepth = model.infer_image(raw_img) # HxW raw depth map in numpy\n```\n\nIf you do not want to clone this repository, you can also load our models through [Transformers](https://github.com/huggingface/transformers/). Below is a simple code snippet. Please refer to the [official page](https://huggingface.co/docs/transformers/main/en/model_doc/depth_anything_v2) for more details.\n\n- Note 1: Make sure you can connect to Hugging Face and have installed the latest Transformers.\n- Note 2: Due to the [upsampling difference](https://github.com/huggingface/transformers/pull/31522#issuecomment-2184123463) between OpenCV (we used) and Pillow (HF used), predictions may differ slightly. So you are more recommended to use our models through the way introduced above.\n```python\nfrom transformers import pipeline\nfrom PIL import Image\n\npipe = pipeline(task=\"depth-estimation\", model=\"depth-anything/Depth-Anything-V2-Small-hf\")\nimage = Image.open('your/image/path')\ndepth = pipe(image)[\"depth\"]\n```\n\n### Running script on *images*\n\n```bash\npython run.py \\\n  --encoder \u003cvits | vitb | vitl | vitg\u003e \\\n  --img-path \u003cpath\u003e --outdir \u003coutdir\u003e \\\n  [--input-size \u003csize\u003e] [--pred-only] [--grayscale]\n```\nOptions:\n- `--img-path`: You can either 1) point it to an image directory storing all interested images, 2) point it to a single image, or 3) point it to a text file storing all image paths.\n- `--input-size` (optional): By default, we use input size `518` for model inference. ***You can increase the size for even more fine-grained results.***\n- `--pred-only` (optional): Only save the predicted depth map, without raw image.\n- `--grayscale` (optional): Save the grayscale depth map, without applying color palette.\n\nFor example:\n```bash\npython run.py --encoder vitl --img-path assets/examples --outdir depth_vis\n```\n\n### Running script on *videos*\n\n```bash\npython run_video.py \\\n  --encoder \u003cvits | vitb | vitl | vitg\u003e \\\n  --video-path assets/examples_video --outdir video_depth_vis \\\n  [--input-size \u003csize\u003e] [--pred-only] [--grayscale]\n```\n\n***Our larger model has better temporal consistency on videos.***\n\n### Gradio demo\n\nTo use our gradio demo locally:\n\n```bash\npython app.py\n```\n\nYou can also try our [online demo](https://huggingface.co/spaces/Depth-Anything/Depth-Anything-V2).\n\n***Note: Compared to V1, we have made a minor modification to the DINOv2-DPT architecture (originating from this [issue](https://github.com/LiheYoung/Depth-Anything/issues/81)).*** In V1, we *unintentionally* used features from the last four layers of DINOv2 for decoding. In V2, we use [intermediate features](https://github.com/DepthAnything/Depth-Anything-V2/blob/2cbc36a8ce2cec41d38ee51153f112e87c8e42d8/depth_anything_v2/dpt.py#L164-L169) instead. Although this modification did not improve details or accuracy, we decided to follow this common practice.\n\n\n## Fine-tuned to Metric Depth Estimation\n\nPlease refer to [metric depth estimation](./metric_depth).\n\n\n## DA-2K Evaluation Benchmark\n\nPlease refer to [DA-2K benchmark](./DA-2K.md).\n\n\n## Community Support\n\n**We sincerely appreciate all the community support for our Depth Anything series. Thank you a lot!**\n\n- Apple Core ML:\n    - https://developer.apple.com/machine-learning/models\n    - https://huggingface.co/apple/coreml-depth-anything-v2-small\n    - https://huggingface.co/apple/coreml-depth-anything-small\n- Transformers:\n    - https://huggingface.co/docs/transformers/main/en/model_doc/depth_anything_v2\n    - https://huggingface.co/docs/transformers/main/en/model_doc/depth_anything\n- TensorRT:\n    - https://github.com/spacewalk01/depth-anything-tensorrt\n    - https://github.com/zhujiajian98/Depth-Anythingv2-TensorRT-python\n- ONNX: https://github.com/fabio-sim/Depth-Anything-ONNX\n- ComfyUI: https://github.com/kijai/ComfyUI-DepthAnythingV2\n- Transformers.js (real-time depth in web): https://huggingface.co/spaces/Xenova/webgpu-realtime-depth-estimation\n- Android:\n  - https://github.com/shubham0204/Depth-Anything-Android\n  - https://github.com/FeiGeChuanShu/ncnn-android-depth_anything\n\n\n## Acknowledgement\n\nWe are sincerely grateful to the awesome Hugging Face team ([@Pedro Cuenca](https://huggingface.co/pcuenq), [@Niels Rogge](https://huggingface.co/nielsr), [@Merve Noyan](https://huggingface.co/merve), [@Amy Roberts](https://huggingface.co/amyeroberts), et al.) for their huge efforts in supporting our models in Transformers and Apple Core ML.\n\nWe also thank the [DINOv2](https://github.com/facebookresearch/dinov2) team for contributing such impressive models to our community.\n\n\n## LICENSE\n\nDepth-Anything-V2-Small model is under the Apache-2.0 license. Depth-Anything-V2-Base/Large/Giant models are under the CC-BY-NC-4.0 license.\n\n\n## Citation\n\nIf you find this project useful, please consider citing:\n\n```bibtex\n@article{depth_anything_v2,\n  title={Depth Anything V2},\n  author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},\n  journal={arXiv:2406.09414},\n  year={2024}\n}\n\n@inproceedings{depth_anything_v1,\n  title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data}, \n  author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},\n  booktitle={CVPR},\n  year={2024}\n}\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdepthanything%2Fdepth-anything-v2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdepthanything%2Fdepth-anything-v2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdepthanything%2Fdepth-anything-v2/lists"}