https://github.com/cocowy1/adstereo

[TIP 2025] ADStereo: Efficient Stereo Matching with Adaptive Downsampling and Disparity Alignment
https://github.com/cocowy1/adstereo

depth-estimation efficient stereo

Last synced: about 1 year ago
JSON representation

[TIP 2025] ADStereo: Efficient Stereo Matching with Adaptive Downsampling and Disparity Alignment

Host: GitHub
URL: https://github.com/cocowy1/adstereo
Owner: cocowy1
Created: 2024-10-02T11:34:34.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2025-02-21T06:45:08.000Z (over 1 year ago)
Last Synced: 2025-03-31T02:33:50.097Z (over 1 year ago)
Topics: depth-estimation, efficient, stereo
Language: Python
Homepage:
Size: 21.9 MB
Stars: 8
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# ADStereo: Efficient Stereo Matching with Adaptive Downsampling and Disparity Alignment [TIP 2025](https://ieeexplore.ieee.org/document/10890914)
This paper presents two sampling strategies: the Adaptive Downsampling Module (ADM) and the Disparity Alignment Module (DAM), to prioritize real-time inference while ensuring accuracy. The ADM leverages local features to learn adaptive weights, enabling more effective downsampling while preserving crucial structure information. On the other hand, the DAM employs a learnable interpolation strategy to predict transformation offsets of pixels, thereby mitigating the spatial misalignment issue.
Building upon these modules, we introduce **ADStereo**, a real-time yet accurate network that achieves highly competitive performance on multiple public benchmarks.

# Demo on KITTI raw data
The pretrained KITTI model is loaded from './fined/KITTI/' datafolders.

https://github.com/user-attachments/assets/326230a6-871d-47ca-abf2-8a5ac4d959b7

Run `demo_video.py` to perform stereo matching on the raw Kitti sequence.
Here is an example result on our system with RTX a5000ada on Ubuntu 20.04

# Adaptive Downsampling Module \& Disparity Alignment Module

|||
|--|--|
| | |

# Overview

# New Added
We introduce a more lightweight model called **ADStereo_fast** (highly competetive performance & faster speed), also included in this repo.

# Quantative Results
![image](https://github.com/user-attachments/assets/1526061f-c624-47af-8009-08ec8c14be26)

# Model Zoo

All pretrained models are available in the [Google Driver:ADStereo](https://drive.google.com/drive/folders/1jdx4-gU8WuytiolZbGDLI-NSUHlQWuH4) and [Google Driver:ADStereo_fast](https://drive.google.com/drive/folders/1WcGgA7OS1lf5JJ3ajbXw-hMtz8cXrQ7k?dmr=1&ec=wgc-drive-globalnav-goto)

We assume the downloaded weights are located under the `./trained` directory.

Otherwise, you may need to change the corresponding paths in the scripts.

# Environment
```
Python 3.9
Pytorch 2.4.0
```
# Create a virtual environment and activate it.
```
conda create -n ADStereo python=3.9
conda activate ADStereo
```

# Dependencies
```
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch -c nvidia
pip install opencv-python
pip install scikit-image
pip install tensorboard
pip install matplotlib
pip install tqdm
pip install chardet
pip install imageio
pip install thop
pip install timm==0.5.4
```

# 1. Prepare training data
To evaluate/train ADStereo, you will need to download the required datasets.

[SceneFlow](https://lmb.informatik.uni-freiburg.de/resources/datasets/SceneFlowDatasets.en.html)

[KITTI](https://www.cvlibs.net/datasets/kitti/eval_scene_flow.php?benchmark=stereo)

[Middlebury](https://vision.middlebury.edu/stereo/submit3/)

[ETH3D](https://www.eth3d.net/datasets#low-res-two-view-test-data)

By default `datasets.py` will search for the datasets in these locations.

```bash
DATA
├── KITTI
│   ├── kitti_2012
│   │   └── training
└── testing
│   ├── kitti_2015
│   │   └── training
└── testing
└── SceneFlow
   ├── Driving
   │   ├── disparity
   │   └── frames_finalpass
   ├── FlyingThings3D
   │   ├── disparity
   │   └── frames_finalpass
   └── Monkaa
   ├── disparity
   └── frames_finalpass
└── Middlebury
├── trainingH
├── trainingH_GT
└── ETH3D
├── two_view_training
├── two_view_training_gt
```

# 2. Train on SceneFlow
Run `main.py` to train on the SceneFlow dataset. Please update datapath in `main.py` as your training data path.

# 3. Finetune \& Inference
Run `finetune.py` to finetune on the different real-world datasets, such as KITTI 2012, KITTI 2015, and ETH3D. Please update datapath in `finetune.py` as your training data path.

# 4. Evaluate FLOPs
Run `counts_op.py` to validate FLOPs consumption.
# 5. Results

To generate prediction results on the test set of the KITTI dataset, you can run `evaluate_kitti.py`.
The inference time can be printed once you run `evaluate_kitti.py`.
And the inference results on the KITTI dataset can be directly submitted to the online evaluation server for benchmarking.

## Citation

If you find our work useful in your research, please consider citing our paper:

```bibtex

@article{wang2025ad,
author={Wang, Yun and Li, Kunhong and Wang, Longguang and Hu, Junjie and Wu, Dapeng Oliver and Guo, Yulan},
journal={IEEE Transactions on Image Processing},
title={ADStereo: Efficient Stereo Matching with Adaptive Downsampling and Disparity Alignment},
journal={IEEE Transactions on Image Processing},
year={2025},
publisher={IEEE}
}

```

# Acknowledgements

This project is based on [GwcNet](https://github.com/xy-guo/GwcNet), [IGEV-Stereo](https://github.com/gangweiX/IGEV), and [CoEx](https://github.com/antabangun/coex). We thank the original authors for their excellent works.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/cocowy1/adstereo

Awesome Lists containing this project

README