{"id":18600796,"url":"https://github.com/autonomousvision/unimatch","last_synced_at":"2025-05-14T19:07:35.637Z","repository":{"id":63159385,"uuid":"561617511","full_name":"autonomousvision/unimatch","owner":"autonomousvision","description":"[TPAMI'23] Unifying Flow, Stereo and Depth Estimation","archived":false,"fork":false,"pushed_at":"2025-01-04T22:09:34.000Z","size":22474,"stargazers_count":1233,"open_issues_count":9,"forks_count":124,"subscribers_count":18,"default_branch":"master","last_synced_at":"2025-04-13T21:33:53.484Z","etag":null,"topics":["correspondence","cross-attention","depth","matching","optical-flow","stereo","transformer","unified-model"],"latest_commit_sha":null,"homepage":"https://haofeixu.github.io/unimatch/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/autonomousvision.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-11-04T04:47:31.000Z","updated_at":"2025-04-13T08:28:50.000Z","dependencies_parsed_at":"2025-01-12T04:32:49.904Z","dependency_job_id":"5c7738a7-88bc-4f58-8154-73240fd934b4","html_url":"https://github.com/autonomousvision/unimatch","commit_stats":{"total_commits":33,"total_committers":5,"mean_commits":6.6,"dds":0.1515151515151515,"last_synced_commit":"95ffabe53adea0bc33a13de302d827d55c600edd"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/autonomousvision%2Funimatch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/autonomousvision%2Funimatch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/autonomousvision%2Funimatch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/autonomousvision%2Funimatch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/autonomousvision","download_url":"https://codeload.github.com/autonomousvision/unimatch/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254209859,"owners_count":22032897,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["correspondence","cross-attention","depth","matching","optical-flow","stereo","transformer","unified-model"],"created_at":"2024-11-07T02:05:37.623Z","updated_at":"2025-05-14T19:07:30.270Z","avatar_url":"https://github.com/autonomousvision.png","language":"Python","readme":"\u003cp align=\"center\"\u003e\n  \u003ch1 align=\"center\"\u003eUnifying Flow, Stereo and Depth Estimation\u003c/h1\u003e\n  \u003cp align=\"center\"\u003e\n    \u003ca href=\"https://haofeixu.github.io/\"\u003eHaofei Xu\u003c/a\u003e\n    ·\n    \u003ca href=\"https://scholar.google.com/citations?user=9jH5v74AAAAJ\"\u003eJing Zhang\u003c/a\u003e\n    ·\n    \u003ca href=\"https://jianfei-cai.github.io/\"\u003eJianfei Cai\u003c/a\u003e\n    ·\n    \u003ca href=\"https://scholar.google.com/citations?user=VxAuxMwAAAAJ\"\u003eHamid Rezatofighi\u003c/a\u003e\n    ·\n    \u003ca href=\"https://www.yf.io/\"\u003eFisher Yu\u003c/a\u003e\n    ·\n    \u003ca href=\"https://scholar.google.com/citations?user=RwlJNLcAAAAJ\"\u003eDacheng Tao\u003c/a\u003e\n    ·\n    \u003ca href=\"http://www.cvlibs.net/\"\u003eAndreas Geiger\u003c/a\u003e\n  \u003c/p\u003e\n  \u003ch3 align=\"center\"\u003eTPAMI 2023\u003c/h3\u003e\n  \u003ch3 align=\"center\"\u003e\u003ca href=\"https://arxiv.org/abs/2211.05783\"\u003ePaper\u003c/a\u003e | \u003ca href=\"https://haofeixu.github.io/slides/20221228_synced_unimatch.pdf\"\u003eSlides\u003c/a\u003e | \u003ca href=\"https://haofeixu.github.io/unimatch/\"\u003eProject Page\u003c/a\u003e | \u003ca href=\"https://colab.research.google.com/drive/1r5m-xVy3Kw60U-m5VB-aQ98oqqg_6cab?usp=sharing\"\u003eColab\u003c/a\u003e | \u003ca href=\"https://huggingface.co/spaces/haofeixu/unimatch\"\u003eDemo\u003c/a\u003e \u003c/h3\u003e\n  \u003cdiv align=\"center\"\u003e\u003c/div\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"\"\u003e\n    \u003cimg src=\"https://haofeixu.github.io/unimatch/resources/teaser.png\" alt=\"Logo\" width=\"70%\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\n\u003cp align=\"center\"\u003e\nA unified model for three motion and 3D perception tasks.\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"\"\u003e\n    \u003cimg src=\"https://haofeixu.github.io/unimatch/resources/sota_compare.png\" alt=\"Logo\" width=\"100%\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\nWe achieve the \u003cstrong\u003e1st\u003c/strong\u003e places on Sintel (clean), Middlebury (rms metric) and Argoverse benchmarks.\n\u003c/p\u003e\n\nThis project is developed based on our previous works: \n\n- [GMFlow: Learning Optical Flow via Global Matching, CVPR 2022, Oral](https://github.com/haofeixu/gmflow)\n\n- [High-Resolution Optical Flow from 1D Attention and Correlation, ICCV 2021, Oral](https://github.com/haofeixu/flow1d)\n\n- [AANet: Adaptive Aggregation Network for Efficient Stereo Matching, CVPR 2020](https://github.com/haofeixu/aanet)\n\n\n## Updates\n\n- 2025-01-04: Check out [DepthSplat](https://haofeixu.github.io/depthsplat/) for a modern multi-view depth model, which leverages monocular depth ([Depth Anything V2](https://github.com/DepthAnything/Depth-Anything-V2)) to significantly improve the robustness of UniMatch.\n\n- 2025-01-04: The UniMatch depth model served as the foundational backbone of [MVSplat (ECCV 2024, Oral)](https://donydchen.github.io/mvsplat/) for sparse-view feed-forward 3DGS reconstruction.\n\n## Installation\n\nOur code is developed based on pytorch 1.9.0, CUDA 10.2 and python 3.8. Higher version pytorch should also work well.\n\nWe recommend using [conda](https://www.anaconda.com/distribution/) for installation:\n\n```\nconda env create -f conda_environment.yml\nconda activate unimatch\n```\n\nAlternatively, we also support installing with pip:\n\n```\nbash pip_install.sh\n```\n\n\nTo use the [depth models from DepthSplat](https://github.com/cvg/depthsplat/blob/main/MODEL_ZOO.md), you need to create a new conda environment with higher version dependencies:\n\n```\nconda create -y -n depthsplat-depth python=3.10\nconda activate depthsplat-depth\npip install torch==2.4.0 torchvision==0.19.0 --index-url https://download.pytorch.org/whl/cu124\npip install tensorboard==2.9.1 einops opencv-python\u003e=4.8.1.78 matplotlib\n```\n\n\n## Model Zoo\n\nA large number of pretrained models with different speed-accuracy trade-offs for flow, stereo and depth are available at [MODEL_ZOO.md](MODEL_ZOO.md).\n\nCheck out [DepthSplat's Model Zoo](https://github.com/cvg/depthsplat/blob/main/MODEL_ZOO.md) for better depth models.\n\nWe assume the downloaded weights are located under the `pretrained` directory.\n\nOtherwise, you may need to change the corresponding paths in the scripts.\n\n\n\n## Demo\n\nGiven an image pair or a video sequence, our code supports generating prediction results of optical flow, disparity and depth.\n\nPlease refer to [scripts/gmflow_demo.sh](scripts/gmflow_demo.sh), [scripts/gmstereo_demo.sh](scripts/gmstereo_demo.sh), [scripts/gmdepth_demo.sh](scripts/gmdepth_demo.sh) and [scripts/depthsplat_depth_demo.sh](scripts/depthsplat_depth_demo.sh) for example usages.\n\n\n\n\nhttps://user-images.githubusercontent.com/19343475/199893756-998cb67e-37d7-4323-ab6e-82fd3cbcd529.mp4\n\n\n\n## Datasets\n\nThe datasets used to train and evaluate our models for all three tasks are given in [DATASETS.md](DATASETS.md)\n\n\n\n## Evaluation\n\nThe evaluation scripts used to reproduce the numbers in our paper are given in [scripts/gmflow_evaluate.sh](scripts/gmflow_evaluate.sh), [scripts/gmstereo_evaluate.sh](scripts/gmstereo_evaluate.sh) and [scripts/gmdepth_evaluate.sh](scripts/gmdepth_evaluate.sh).\n\nFor submission to KITTI, Sintel, Middlebury and ETH3D online test sets, you can run [scripts/gmflow_submission.sh](scripts/gmflow_submission.sh) and [scripts/gmstereo_submission.sh](scripts/gmstereo_submission.sh) to generate the prediction results. The results can be submitted directly.\n\n\n\n## Training\n\nAll training scripts for different model variants on different datasets can be found in [scripts/*_train.sh](scripts).\n\nWe support using tensorboard to monitor and visualize the training process. You can first start a tensorboard session with\n\n```\ntensorboard --logdir checkpoints\n```\n\nand then access [http://localhost:6006](http://localhost:6006/) in your browser.\n\n\n\n## Citation\n\n```\n@article{xu2023unifying,\n  title={Unifying Flow, Stereo and Depth Estimation},\n  author={Xu, Haofei and Zhang, Jing and Cai, Jianfei and Rezatofighi, Hamid and Yu, Fisher and Tao, Dacheng and Geiger, Andreas},\n  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},\n  year={2023}\n}\n```\n\nThis work is a substantial extension of our previous conference paper [GMFlow (CVPR 2022, Oral)](https://arxiv.org/abs/2111.13680), please consider citing GMFlow as well if you found this work useful in your research.\n\n```\n@inproceedings{xu2022gmflow,\n  title={GMFlow: Learning Optical Flow via Global Matching},\n  author={Xu, Haofei and Zhang, Jing and Cai, Jianfei and Rezatofighi, Hamid and Tao, Dacheng},\n  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},\n  pages={8121-8130},\n  year={2022}\n}\n```\n\nPlease consider citing [DepthSplat](https://arxiv.org/abs/2410.13862) if DepthSplat's depth model is used in your research.\n\n```\n@article{xu2024depthsplat,\n      title   = {DepthSplat: Connecting Gaussian Splatting and Depth},\n      author  = {Xu, Haofei and Peng, Songyou and Wang, Fangjinhua and Blum, Hermann and Barath, Daniel and Geiger, Andreas and Pollefeys, Marc},\n      journal = {arXiv preprint arXiv:2410.13862},\n      year    = {2024}\n    }\n```\n\n\n## Acknowledgements\n\nThis project would not have been possible without relying on some awesome repos: [RAFT](https://github.com/princeton-vl/RAFT), [LoFTR](https://github.com/zju3dv/LoFTR), [DETR](https://github.com/facebookresearch/detr), [Swin](https://github.com/microsoft/Swin-Transformer), [mmdetection](https://github.com/open-mmlab/mmdetection) and [Detectron2](https://github.com/facebookresearch/detectron2/blob/main/projects/TridentNet/tridentnet/trident_conv.py). We thank the original authors for their excellent work.\n\n\n\n\n\n\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fautonomousvision%2Funimatch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fautonomousvision%2Funimatch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fautonomousvision%2Funimatch/lists"}