{"id":13706230,"url":"https://github.com/lpiccinelli-eth/unidepth","last_synced_at":"2025-05-05T20:30:48.993Z","repository":{"id":229675813,"uuid":"777348319","full_name":"lpiccinelli-eth/UniDepth","owner":"lpiccinelli-eth","description":"Universal Monocular Metric Depth Estimation","archived":false,"fork":false,"pushed_at":"2024-10-15T17:03:31.000Z","size":23771,"stargazers_count":599,"open_issues_count":43,"forks_count":47,"subscribers_count":15,"default_branch":"main","last_synced_at":"2024-10-16T23:01:02.590Z","etag":null,"topics":["3d-reconstruction","computer-vision","depth-estimation"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lpiccinelli-eth.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-25T17:22:22.000Z","updated_at":"2024-10-16T17:17:19.000Z","dependencies_parsed_at":"2024-06-23T06:16:13.090Z","dependency_job_id":"536b0803-71d5-4026-82ed-169e5ed96cf2","html_url":"https://github.com/lpiccinelli-eth/UniDepth","commit_stats":null,"previous_names":["lpiccinelli-eth/unidepth"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lpiccinelli-eth%2FUniDepth","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lpiccinelli-eth%2FUniDepth/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lpiccinelli-eth%2FUniDepth/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lpiccinelli-eth%2FUniDepth/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lpiccinelli-eth","download_url":"https://codeload.github.com/lpiccinelli-eth/UniDepth/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224465736,"owners_count":17315866,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-reconstruction","computer-vision","depth-estimation"],"created_at":"2024-08-02T22:00:53.358Z","updated_at":"2024-11-13T14:30:42.568Z","avatar_url":"https://github.com/lpiccinelli-eth.png","language":"Python","readme":"[![arXiv](https://img.shields.io/badge/arXiv-2403.18913-blue?logo=arxiv\u0026color=%23B31B1B)](https://arxiv.org/abs/2403.18913)\n[![ProjectPage](https://img.shields.io/badge/Project_Page-UniDepth-blue)](https://lpiccinelli-eth.github.io/pub/unidepth/)\n\u003c!-- [![HuggingFace](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Cooming%20Soon-yellow)](https://huggingface.co/spaces/lpiccinelli/UniDepth) --\u003e\n\n[![KITTI Benchmark](https://img.shields.io/badge/KITTI%20Benchmark-1st%20(at%20submission%20time)-orange)](https://www.cvlibs.net/datasets/kitti/eval_depth.php?benchmark=depth_prediction)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unidepth-universal-monocular-metric-depth/monocular-depth-estimation-on-nyu-depth-v2)](https://paperswithcode.com/sota/monocular-depth-estimation-on-nyu-depth-v2?p=unidepth-universal-monocular-metric-depth)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unidepth-universal-monocular-metric-depth/monocular-depth-estimation-on-kitti-eigen)](https://paperswithcode.com/sota/monocular-depth-estimation-on-kitti-eigen?p=unidepth-universal-monocular-metric-depth)\n\n\n# UniDepth: Universal Monocular Metric Depth Estimation\n\n![](assets/docs/unidepth-banner.png)\n\n\u003e [**UniDepth: Universal Monocular Metric Depth Estimation**](https://arxiv.org/abs/2403.18913),  \n\u003e Luigi Piccinelli, Yung-Hsu Yang, Christos Sakaridis, Mattia Segu, Siyuan Li, Luc Van Gool, Fisher Yu,  \n\u003e CVPR 2024,  \n\u003e *Paper at [arXiv 2403.18913](https://arxiv.org/pdf/2403.18913.pdf)*  \n\n\n## News and ToDo\n\n- [ ] Release UniDepth on PyPI.\n- [ ] Release HuggingFace/Gradio demo.\n- [ ] Solve image corners artifacts (retraining in progress...)\n- [x] `12.06.2024`: Release smaller V2 models.\n- [x] `01.05.2024`: Release UniDepthV2.\n- [x] `02.04.2024`: Release UniDepth as python package.\n- [x] `01.04.2024`: Inference code and V1 models are released.\n- [x] `26.02.2024`: UniDepth is accepted at CVPR 2024! (Highlight :star:)\n\n\n## Zero-Shot Visualization\n\n### YouTube (The Office - Parkour)\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/docs/theoffice.gif\" alt=\"animated\" /\u003e\n\u003c/p\u003e\n\n### NuScenes (stitched cameras)\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/docs/nuscenes_surround.gif\" alt=\"animated\" /\u003e\n\u003c/p\u003e\n\n\n## Installation\n\nRequirements are not in principle hard requirements, but there might be some differences (not tested):\n- Linux\n- Python 3.10+ \n- CUDA 11.8\n\nInstall the environment needed to run UniDepth with:\n```shell\nexport VENV_DIR=\u003cYOUR-VENVS-DIR\u003e\nexport NAME=Unidepth\n\npython -m venv $VENV_DIR/$NAME\nsource $VENV_DIR/$NAME/bin/activate\n\n# Install UniDepth and dependencies\npip install -e . --extra-index-url https://download.pytorch.org/whl/cu118\n\n# Install Pillow-SIMD (Optional)\npip uninstall pillow\nCC=\"cc -mavx2\" pip install -U --force-reinstall pillow-simd\n```\n\nIf you use conda, you should change the following: \n```shell\npython -m venv $VENV_DIR/$NAME -\u003e conda create -n $NAME python=3.11\nsource $VENV_DIR/$NAME/bin/activate -\u003e conda activate $NAME\n```\n\n*Note*: Make sure that your compilation CUDA version and runtime CUDA version match.  \nYou can check the supported CUDA version for precompiled packages on the [PyTorch website](https://pytorch.org/).\n\n*Note*: xFormers may raise the the Runtime \"error\": `Triton Error [CUDA]: device kernel image is invalid`.  \nThis is related to xFormers mismatching system-wide CUDA and CUDA shipped with torch.  \nIt may considerably slow down inference.\n\nRun UniDepth on the given assets to test your installation (you can check this script as guideline for further usage):\n```shell\npython ./scripts/demo.py\n```\nIf everything runs correctly, `demo.py` should print: `ARel: 5.13%`.\n\nIf you encounter `Segmentation Fault` after running the demo, you may need to uninstall torch via pip (`pip uninstall torch`) and install the torch version present in [requirements](requirements.txt) with `conda`.\n\n## Get Started\n\nAfter installing the dependencies, you can load the pre-trained models easily from [Hugging Face](https://huggingface.co/models?other=UniDepth) as follows:\n\n```python\nfrom unidepth.models import UniDepthV1\n\nmodel = UniDepthV1.from_pretrained(\"lpiccinelli/unidepth-v1-vitl14\") # or \"lpiccinelli/unidepth-v1-cnvnxtl\" for the ConvNext backbone\n```\n\nThen you can generate the metric depth estimation and intrinsics prediction directly from RGB image only as follows:\n\n```python\nimport numpy as np\nfrom PIL import Image\n\n# Move to CUDA, if any\ndevice = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\nmodel = model.to(device)\n\n# Load the RGB image and the normalization will be taken care of by the model\nrgb = torch.from_numpy(np.array(Image.open(image_path))).permute(2, 0, 1) # C, H, W\n\npredictions = model.infer(rgb)\n\n# Metric Depth Estimation\ndepth = predictions[\"depth\"]\n\n# Point Cloud in Camera Coordinate\nxyz = predictions[\"points\"]\n\n# Intrinsics Prediction\nintrinsics = predictions[\"intrinsics\"]\n```\n\nYou can use ground truth intrinsics as input to the model as well:\n```python\nintrinsics_path = \"assets/demo/intrinsics.npy\"\n\n# Load the intrinsics if available\nintrinsics = torch.from_numpy(np.load(intrinsics_path)) # 3 x 3\n\npredictions = model.infer(rgb, intrinsics)\n```\n\nTo use the forward method for your custom training, you should:  \n1) Take care of the dataloading:  \n  a) ImageNet-normalization  \n  b) Long-edge based resizing (and padding) with input shape provided in `image_shape` under configs  \n  c) `BxCxHxW` format  \n  d) If any intriniscs given, adapt them accordingly to your resizing  \n2) Format the input data structure as:  \n```python\ndata = {\"image\": rgb, \"K\": intrinsics}\npredictions = model(data, {})\n```\n\n## Model Zoo\n\nThe available models are the following:\n\n\u003ctable border=\"0\"\u003e\n    \u003ctr\u003e\n        \u003cth\u003eModel\u003c/th\u003e\n        \u003cth\u003eBackbone\u003c/th\u003e\n        \u003cth\u003eName\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd rowspan=\"2\"\u003e\u003cb\u003eUnidepthV1\u003c/b\u003e\u003c/td\u003e\n        \u003ctd\u003eConvNext-L\u003c/td\u003e\n        \u003ctd\u003e\u003ca href=\"https://huggingface.co/lpiccinelli/unidepth-v1-cnvnxtl\"\u003eunidepth-v1-cnvnxtl\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eViT-L\u003c/td\u003e\n        \u003ctd\u003e\u003ca href=\"https://huggingface.co/lpiccinelli/unidepth-v1-vitl14\"\u003eunidepth-v1-vitl14\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003chr style=\"border: 2px solid black;\"\u003e\n    \u003ctr\u003e\n        \u003ctd rowspan=\"3\"\u003e\u003cb\u003eUnidepthV2\u003c/b\u003e\u003c/td\u003e\n        \u003ctd\u003eViT-S\u003c/td\u003e\n        \u003ctd\u003e\u003ca href=\"https://huggingface.co/lpiccinelli/unidepth-v2-vits14\"\u003eunidepth-v2-vits14\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eViT-B\u003c/td\u003e\n        \u003ctd\u003eunidepth-v1-vitb14 (Coming Soon)\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eViT-L\u003c/td\u003e\n        \u003ctd\u003e\u003ca href=\"https://huggingface.co/lpiccinelli/unidepth-v2-vitl14\"\u003eunidepth-v2-vitl14\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n\u003c/table\u003e\n\nPlease visit [Hugging Face](https://huggingface.co/lpiccinelli) or click on the links above to access the repo models with weights.\nYou can load UniDepth as the following, with `name` variable matching the table above:\n\n```python\nfrom unidepth.models import UniDepthV1, UniDepthV2\n\nmodel_v1 = UniDepthV1.from_pretrained(f\"lpiccinelli/{name}\")\nmodel_v2 = UniDepthV2.from_pretrained(f\"lpiccinelli/{name}\")\n```\n\nIn addition, we provide loading from TorchHub as:\n\n```python\nversion = \"v2\"\nbackbone = \"vitl14\"\n\nmodel = torch.hub.load(\"lpiccinelli-eth/UniDepth\", \"UniDepth\", version=version, backbone=backbone, pretrained=True, trust_repo=True, force_reload=True)\n```\n\nYou can look into function `UniDepth` in [hubconf.py](hubconf.py) to see how to instantiate the model from local file: provide a local `path` in line 34.\n\n\n## UniDepthV2\n\nVisit [UniDepthV2 ReadMe](assets/docs/V2_README.md) for a more detailed changelog.\nTo summarize the main differences are:  \n- Input shape and ratio flexibility.  \n- Confidence output  \n- Decoder design  \n- Faster inference  \n- ONNX support\n\n\n## Results\n\n### Metric Depth Estimation\nThe performance reported is for UniDepthV1 model and the metrics is d1 (higher is better) on zero-shot evaluation. The common split between SUN-RGBD and NYUv2 is removed from SUN-RGBD validation set for evaluation. \n*: non zero-shot on NYUv2 and KITTI.\n\n| Model | NYUv2 | SUN-RGBD | ETH3D | Diode (In) | IBims-1 | KITTI | Nuscenes | DDAD | \n| :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: |\n| BTS* | 88.5 | 76.1 | 26.8 | 19.2 | 53.1 | 96.2 | 33.7 | 43.0 |\n| AdaBins* | 90.1 | 77.7 | 24.3 | 17.4 | 55.0 | 96.3 | 33.3 | 37.7 |\n| NeWCRF* | 92.1 | 75.3 | 35.7 | 20.1 | 53.6 | 97.5 | 44.2 | 45.6 | \n| iDisc* | 93.8 | 83.7 | 35.6 | 23.8 | 48.9 | 97.5 | 39.4 | 28.4 |\n| ZoeDepth* | 95.2 | 86.7 | 35.0 | 36.9 | 58.0 | 96.5 | 28.3 | 27.2 |\n| Metric3D | 92.6 | 15.4 | 45.6 | 39.2 | 79.7 | 97.5 | 72.3 | - |\n| UniDepth_ConvNext | 97.2| 94.8 | 49.8 | 60.2 | 79.7 | 97.2 | 83.3 | 83.2 |\n| UniDepth_ViT | 98.4 | 96.6 | 32.6 | 77.1 | 23.9 | 98.6 | 86.2 | 86.4 |\n\n\n## Contributions\n\nIf you find any bug in the code, please report to Luigi Piccinelli (lpiccinelli@ethz.ch)\n\n\n## Citation\n\nIf you find our work useful in your research please consider citing our publication:\n```bibtex\n@inproceedings{piccinelli2024unidepth,\n    title     = {{U}ni{D}epth: Universal Monocular Metric Depth Estimation},\n    author    = {Piccinelli, Luigi and Yang, Yung-Hsu and Sakaridis, Christos and Segu, Mattia and Li, Siyuan and Van Gool, Luc and Yu, Fisher},\n    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},\n    year      = {2024}\n}\n```\n\n\n## License\n\nThis software is released under Creatives Common BY-NC 4.0 license. You can view a license summary [here](LICENSE).\n\n\n## Acknowledgement\n\nWe would like to express our gratitude to [@niels](https://huggingface.co/nielsr) for helping intergrating UniDepth in HuggingFace.\n\nThis work is funded by Toyota Motor Europe via the research project [TRACE-Zurich](https://trace.ethz.ch) (Toyota Research on Automated Cars Europe).\n","funding_links":[],"categories":["Papers"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flpiccinelli-eth%2Funidepth","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flpiccinelli-eth%2Funidepth","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flpiccinelli-eth%2Funidepth/lists"}