https://github.com/autonomousvision/unimatch

[TPAMI'23] Unifying Flow, Stereo and Depth Estimation
https://github.com/autonomousvision/unimatch

correspondence cross-attention depth matching optical-flow stereo transformer unified-model

Last synced: 5 months ago
JSON representation

[TPAMI'23] Unifying Flow, Stereo and Depth Estimation

Host: GitHub
URL: https://github.com/autonomousvision/unimatch
Owner: autonomousvision
License: mit
Created: 2022-11-04T04:47:31.000Z (almost 3 years ago)
Default Branch: master
Last Pushed: 2025-01-04T22:09:34.000Z (9 months ago)
Last Synced: 2025-04-13T21:33:53.484Z (6 months ago)
Topics: correspondence, cross-attention, depth, matching, optical-flow, stereo, transformer, unified-model
Language: Python
Homepage: https://haofeixu.github.io/unimatch/
Size: 21.4 MB
Stars: 1,233
Watchers: 18
Forks: 124
Open Issues: 9
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          


  
Unifying Flow, Stereo and Depth Estimation

  

    Haofei Xu

    ·

    Jing Zhang

    ·

    Jianfei Cai

    ·

    Hamid Rezatofighi

    ·

    Fisher Yu

    ·

    Dacheng Tao

    ·

    Andreas Geiger

  

  TPAMI 2023

  

Paper | Slides | Project Page | Colab | Demo 

  




  

    

  





A unified model for three motion and 3D perception tasks.





  

    

  





We achieve the 1st places on Sintel (clean), Middlebury (rms metric) and Argoverse benchmarks.



This project is developed based on our previous works: 

- [GMFlow: Learning Optical Flow via Global Matching, CVPR 2022, Oral](https://github.com/haofeixu/gmflow)

- [High-Resolution Optical Flow from 1D Attention and Correlation, ICCV 2021, Oral](https://github.com/haofeixu/flow1d)

- [AANet: Adaptive Aggregation Network for Efficient Stereo Matching, CVPR 2020](https://github.com/haofeixu/aanet)

## Updates

- 2025-01-04: Check out [DepthSplat](https://haofeixu.github.io/depthsplat/) for a modern multi-view depth model, which leverages monocular depth ([Depth Anything V2](https://github.com/DepthAnything/Depth-Anything-V2)) to significantly improve the robustness of UniMatch.

- 2025-01-04: The UniMatch depth model served as the foundational backbone of [MVSplat (ECCV 2024, Oral)](https://donydchen.github.io/mvsplat/) for sparse-view feed-forward 3DGS reconstruction.

## Installation

Our code is developed based on pytorch 1.9.0, CUDA 10.2 and python 3.8. Higher version pytorch should also work well.

We recommend using [conda](https://www.anaconda.com/distribution/) for installation:

```

conda env create -f conda_environment.yml

conda activate unimatch

```

Alternatively, we also support installing with pip:

```

bash pip_install.sh

```

To use the [depth models from DepthSplat](https://github.com/cvg/depthsplat/blob/main/MODEL_ZOO.md), you need to create a new conda environment with higher version dependencies:

```

conda create -y -n depthsplat-depth python=3.10

conda activate depthsplat-depth

pip install torch==2.4.0 torchvision==0.19.0 --index-url https://download.pytorch.org/whl/cu124

pip install tensorboard==2.9.1 einops opencv-python>=4.8.1.78 matplotlib

```

## Model Zoo

A large number of pretrained models with different speed-accuracy trade-offs for flow, stereo and depth are available at [MODEL_ZOO.md](MODEL_ZOO.md).

Check out [DepthSplat's Model Zoo](https://github.com/cvg/depthsplat/blob/main/MODEL_ZOO.md) for better depth models.

We assume the downloaded weights are located under the `pretrained` directory.

Otherwise, you may need to change the corresponding paths in the scripts.

## Demo

Given an image pair or a video sequence, our code supports generating prediction results of optical flow, disparity and depth.

Please refer to [scripts/gmflow_demo.sh](scripts/gmflow_demo.sh), [scripts/gmstereo_demo.sh](scripts/gmstereo_demo.sh), [scripts/gmdepth_demo.sh](scripts/gmdepth_demo.sh) and [scripts/depthsplat_depth_demo.sh](scripts/depthsplat_depth_demo.sh) for example usages.

https://user-images.githubusercontent.com/19343475/199893756-998cb67e-37d7-4323-ab6e-82fd3cbcd529.mp4

## Datasets

The datasets used to train and evaluate our models for all three tasks are given in [DATASETS.md](DATASETS.md)

## Evaluation

The evaluation scripts used to reproduce the numbers in our paper are given in [scripts/gmflow_evaluate.sh](scripts/gmflow_evaluate.sh), [scripts/gmstereo_evaluate.sh](scripts/gmstereo_evaluate.sh) and [scripts/gmdepth_evaluate.sh](scripts/gmdepth_evaluate.sh).

For submission to KITTI, Sintel, Middlebury and ETH3D online test sets, you can run [scripts/gmflow_submission.sh](scripts/gmflow_submission.sh) and [scripts/gmstereo_submission.sh](scripts/gmstereo_submission.sh) to generate the prediction results. The results can be submitted directly.

## Training

All training scripts for different model variants on different datasets can be found in [scripts/*_train.sh](scripts).

We support using tensorboard to monitor and visualize the training process. You can first start a tensorboard session with

```

tensorboard --logdir checkpoints

```

and then access [http://localhost:6006](http://localhost:6006/) in your browser.

## Citation

```

@article{xu2023unifying,

  title={Unifying Flow, Stereo and Depth Estimation},

  author={Xu, Haofei and Zhang, Jing and Cai, Jianfei and Rezatofighi, Hamid and Yu, Fisher and Tao, Dacheng and Geiger, Andreas},

  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},

  year={2023}

}

```

This work is a substantial extension of our previous conference paper [GMFlow (CVPR 2022, Oral)](https://arxiv.org/abs/2111.13680), please consider citing GMFlow as well if you found this work useful in your research.

```

@inproceedings{xu2022gmflow,

  title={GMFlow: Learning Optical Flow via Global Matching},

  author={Xu, Haofei and Zhang, Jing and Cai, Jianfei and Rezatofighi, Hamid and Tao, Dacheng},

  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},

  pages={8121-8130},

  year={2022}

}

```

Please consider citing [DepthSplat](https://arxiv.org/abs/2410.13862) if DepthSplat's depth model is used in your research.

```

@article{xu2024depthsplat,

      title   = {DepthSplat: Connecting Gaussian Splatting and Depth},

      author  = {Xu, Haofei and Peng, Songyou and Wang, Fangjinhua and Blum, Hermann and Barath, Daniel and Geiger, Andreas and Pollefeys, Marc},

      journal = {arXiv preprint arXiv:2410.13862},

      year    = {2024}

    }

```

## Acknowledgements

This project would not have been possible without relying on some awesome repos: [RAFT](https://github.com/princeton-vl/RAFT), [LoFTR](https://github.com/zju3dv/LoFTR), [DETR](https://github.com/facebookresearch/detr), [Swin](https://github.com/microsoft/Swin-Transformer), [mmdetection](https://github.com/open-mmlab/mmdetection) and [Detectron2](https://github.com/facebookresearch/detectron2/blob/main/projects/TridentNet/tridentnet/trident_conv.py). We thank the original authors for their excellent work.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/autonomousvision/unimatch

Awesome Lists containing this project

README

Unifying Flow, Stereo and Depth Estimation

TPAMI 2023

Paper | Slides | Project Page | Colab | Demo