https://github.com/nianticlabs/wavelet-monodepth

[CVPR 2021] Monocular depth estimation using wavelets for efficiency
https://github.com/nianticlabs/wavelet-monodepth

computer-vision cvpr2021 depth-estimation kitti-dataset nyu-depth-v2 wavelets

Last synced: 10 months ago
JSON representation

[CVPR 2021] Monocular depth estimation using wavelets for efficiency

Host: GitHub
URL: https://github.com/nianticlabs/wavelet-monodepth
Owner: nianticlabs
License: other
Created: 2021-05-05T19:34:49.000Z (about 5 years ago)
Default Branch: main
Last Pushed: 2021-12-31T01:45:25.000Z (over 4 years ago)
Last Synced: 2025-03-27T01:09:39.644Z (over 1 year ago)
Topics: computer-vision, cvpr2021, depth-estimation, kitti-dataset, nyu-depth-v2, wavelets
Language: Jupyter Notebook
Homepage:
Size: 9.06 MB
Stars: 230
Watchers: 13
Forks: 34
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Single Image Depth Prediction with Wavelet Decomposition

**[Michaël Ramamonjisoa](https://michaelramamonjisoa.github.io), 

[Michael Firman](http://www.michaelfirman.co.uk), 

[Jamie Watson](https://scholar.google.com/citations?view_op=list_works&hl=en&user=5pC7fw8AAAAJ), 

[Vincent Lepetit](http://imagine.enpc.fr/~lepetitv/) and 

[Daniyar Turmukhambetov](http://dantkz.github.io/)**

***CVPR 2021***

[[Link to paper]](http://arxiv.org/abs/2106.02022)



  

  



**We introduce *WaveletMonoDepth*, which improves efficiency of standard encoder-decoder monocular depth estimation methods

by exploiting wavelet decomposition.**



  

  

  



## 🧑‍🏫 Methodology 

**WaveletMonoDepth** was implemented for two benchmarks, KITTI and NYUv2. For each dataset, we build our code upon 

a baseline code. Both baselines share a common encoder-decoder architecture, and we modify their decoder to provide a 

wavelet prediction.

Wavelets predictions are sparse, and can therefore be computed only at relevant locations, therefore saving a lot of 

unnecessary computations.



  



The network is first trained with a **dense** convolutions in the decoder until convergence, and the dense convolutions 

are then replaced with **sparse** ones. 

This is because the network first needs to learn to predict sparse wavelet coefficients before we can use sparse convolutions.

## 🗂 Environment Requirements 🗂 ##

We recommend creating a new Anaconda environment to use WaveletMonoDepth. Use the following to setup a new environment:

```

conda env create -f environment.yml

conda activate wavelet-mdp

```

Our work uses [Pytorch Wavelets](https://github.com/fbcotter/pytorch_wavelets), a great package from Fergal Cotter 

which implements the Inverse Discrete Wavelet Transform (IDWT) used in our work, and a lot more! 

To install Pytorch Wavelets, simply run:

```

git clone https://github.com/fbcotter/pytorch_wavelets

cd pytorch_wavelets

pip install .

```

## 🚗🚦 KITTI 🌳🛣  

[Depth Hints](https://github.com/nianticlabs/depth-hints) was used as a baseline for KITTI.

***Depth Hints*** builds upon [monodepth2](https://github.com/nianticlabs/monodepth2). If you have questions about running the code, please see the issues in their repositories first.

### ⚙ Setup, Training and Evaluation 

Please see the [KITTI](./KITTI/README.md) directory of this repository for details on how to train and evaluate our method. 

### 📊 Results 📦 Trained models

Please find below the scores using **dense** convolutions to predict wavelet coefficients. Download links coming soon!

| Model name | Training modality | Resolution | abs_rel | RMSE | δ<1.25 |

| ---------- | ---------- | ---------- | ----- | ------ | ----- |

| [`Ours Resnet18`](https://drive.google.com/file/d/1uDJoikUBiDZZOLDMDNsn_eAXByL5g9mi/view?usp=sharing) | Stereo + DepthHints | 640 x 192 | 0.106 | 4.693 | 0.876 |

| [`Ours Resnet50`](https://drive.google.com/file/d/1UykLnyAlWjqVYWQ5wGK2I1uonYvdJ-2F/view?usp=sharing) | Stereo + DepthHints | 640 x 192 | 0.105 | 4.625 | 0.879 |

| [`Ours Resnet18`](https://drive.google.com/file/d/1wyXNOgaboQI1s2EwJIuE2APWPVLJwKWM/view?usp=sharing) | Stereo + DepthHints | 1024 x 320 | 0.102 | 4.452 | 0.890 |

| [`Ours Resnet50`](https://drive.google.com/file/d/1fVkPEv71b-3RBr_n52WPcj3wd-UKVrkF/view?usp=sharing) | Stereo + DepthHints | 1024 x 320 | 0.097 | 4.387 | 0.891 |

### 🎚 Playing with sparsity

However the most interesting part is that we can make use of the sparsity property of the predicted wavelet coefficients

to trade-off performance with efficiency, at a minimal cost on performance. We do so by tuning the threshold, and:

- low thresholds values will lead to high performance but high number of computations,

- high thresholds will lead to highly efficient computation, as convolutions will be computed only in a few pixel locations. This will have a minimal impact on performance.



  



Computing coefficients at only 10% of the pixels in the decoding process gives a relative score loss of less than 1.4%.



  



Our wavelet based method allows us to greatly reduce the number of computation in the decoder at a minimal expense in 

performance. We can measure the performance-vs-efficiency trade-off by evaluating scores vs FLOPs.



  



## 🪑🛁 NYUv2 🛋🚪

[Dense Depth](https://github.com/ialhashim/DenseDepth) was used as a baseline for NYUv2.

Note that we used the experimental PyTorch implementation of DenseDepth. Note that compared to the original paper, we 

made a few different modifications:

- we supervise depth directly instead of supervising disparity

- we do not use SSIM

- we use DenseNet161 as encoder instead of DenseNet169

### ⚙ Setup, Training and Evaluation 

Please see the [NYUv2](./NYUv2/README.md) directory of this repository for details on how to train and evaluate our method.

### 📊 Results and 📦 Trained models

Please find below the scores and associated trained models, using **dense** convolutions to predict wavelet 

coefficients.

| Model name | Encoder | Resolution | abs_rel | RMSE | δ<1.25 | ε_acc |

| ---------- | ---------- | ---------- | ---------- | ----- | ----- | ----- |

| [`Baseline`](https://drive.google.com/file/d/1WmGBXBwbR8jh8H_F7TK2LuUNJ1T_wfcQ/view?usp=sharing) | DenseNet | 640 x 480 | 0.1277 | 0.5479 | 0.8430 | 1.7170 |

| [`Ours`](https://drive.google.com/file/d/1LubjqXEzAd2SI6Zwse6VFvHoobTr4P8Z/view?usp=sharing) | DenseNet | 640 x 480 | 0.1258 | 0.5515 | 0.8451 | 1.8070 |

| [`Baseline`](https://drive.google.com/file/d/18BU-4u_9NWm67NCLk1On5IA0lJor6DHY/view?usp=sharing) | MobileNetv2 | 640 x 480 | 0.1772 | 0.6638 | 0.7419 | 1.8911 |

| [`Ours`](https://drive.google.com/file/d/1-dcOO0T_YlFATwZBTg5ejg5evtR319Zi/view?usp=sharing) | MobileNetv2 | 640 x 480 | 0.1727 | 0.6776 | 0.7380 | 1.9732 |

### 🎚 Playing with sparsity

As with the KITTI dataset, we can tune the wavelet threshold to greatly reduce computation at minimal cost on 

performance.



  



Computing coefficients at only 5% of the pixels in the decoding process gives a relative depth score loss of less than 

0.15%. 



  



## 🎮 Try it yourself!

Try using our Jupyter notebooks to visualize results with different levels of sparsity, as well as compute the 

resulting computational saving in FLOPs. Notebooks can be found in `/sparsity_test_notebook.ipynb` where 

`` is either KITTI or NYUv2. 

## ✏️ 📄 Citation

If you find our work useful or interesting, please consider citing [our paper](http://arxiv.org/abs/2106.02022/):

```

@inproceedings{ramamonjisoa-2021-wavelet-monodepth,

  title     = {Single Image Depth Prediction with Wavelet Decomposition},

  author    = {Ramamonjisoa, Micha{\"{e}}l and

               Michael Firman and

               Jamie Watson and

               Vincent Lepetit and

               Daniyar Turmukhambetov},

  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},

  month = {June},

  year = {2021}

}

```

## 👩‍⚖️ License

Copyright © Niantic, Inc. 2021. Patent Pending.

All rights reserved.

Please see the [license file](LICENSE) for terms.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nianticlabs/wavelet-monodepth

Awesome Lists containing this project

README