https://github.com/isl-org/vi-depth
Code for Monocular Visual-Inertial Depth Estimation (ICRA 2023)
https://github.com/isl-org/vi-depth
Last synced: 9 months ago
JSON representation
Code for Monocular Visual-Inertial Depth Estimation (ICRA 2023)
- Host: GitHub
- URL: https://github.com/isl-org/vi-depth
- Owner: isl-org
- License: mit
- Created: 2023-03-06T16:35:57.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2024-08-26T12:33:14.000Z (over 1 year ago)
- Last Synced: 2025-04-13T08:26:11.435Z (9 months ago)
- Language: Python
- Homepage:
- Size: 2.33 MB
- Stars: 176
- Watchers: 12
- Forks: 15
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Monocular Visual-Inertial Depth Estimation
This repository contains code and models for our paper:
> [Monocular Visual-Inertial Depth Estimation](https://arxiv.org/abs/2303.12134)
> Diana Wofk, René Ranftl, Matthias Müller, Vladlen Koltun
For a quick overview of the work you can watch the [short talk](https://youtu.be/Ja4Nic3YYCg) and [teaser](https://youtu.be/IMwiKwSpshQ) on YouTube.
## Introduction

We present a visual-inertial depth estimation pipeline that integrates monocular depth estimation and visual-inertial odometry to produce dense depth estimates with metric scale. Our approach consists of three stages: (1) input processing, where RGB and IMU data feed into monocular depth estimation alongside visual-inertial odometry, (2) global scale and shift alignment, where monocular depth estimates are fitted to sparse depth from VIO in a least-squares manner, and (3) learning-based dense scale alignment, where globally-aligned depth is locally realigned using a dense scale map regressed by the ScaleMapLearner (SML). The images at the bottom in the diagram above illustrate a VOID sample being processed through our pipeline; from left to right: the input RGB, ground truth depth, sparse depth from VIO, globally-aligned depth, scale map scaffolding, dense scale map regressed by SML, final depth output.

## Setup
1) Setup dependencies:
```shell
conda env create -f environment.yaml
conda activate vi-depth
```
2) Pick one or more ScaleMapLearner (SML) models and download the corresponding weights to the `weights` folder.
| Depth Predictor | SML on VOID 150 | SML on VOID 500 | SML on VOID 1500 |
| :--- | :----: | :----: | :----: |
| DPT-BEiT-Large | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_beit_large_512.nsamples.150.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_beit_large_512.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_beit_large_512.nsamples.1500.ckpt) |
| DPT-SwinV2-Large | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_swin2_large_384.nsamples.150.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_swin2_large_384.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_swin2_large_384.nsamples.1500.ckpt) |
| DPT-Large | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_large.nsamples.150.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_large.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_large.nsamples.1500.ckpt) |
| DPT-Hybrid | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_hybrid.nsamples.150.ckpt)* | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_hybrid.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_hybrid.nsamples.1500.ckpt) |
| DPT-SwinV2-Tiny | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_swin2_tiny_256.nsamples.150.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_swin2_tiny_256.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_swin2_tiny_256.nsamples.1500.ckpt) |
| DPT-LeViT | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_levit_224.nsamples.150.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_levit_224.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_levit_224.nsamples.1500.ckpt) |
| MiDaS-small | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.midas_small.nsamples.150.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.midas_small.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.midas_small.nsamples.1500.ckpt) |
*Also available with pretraining on TartanAir: [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_hybrid.nsamples.150.pretrained.ckpt)
## Inference
1) Place inputs into the `input` folder. An input image and corresponding sparse metric depth map are expected:
```bash
input
├── image # RGB image
│ ├── .png
│ └── ...
└── sparse_depth # sparse metric depth map
├── .png # as 16b PNG
└── ...
```
The `load_sparse_depth` function in `run.py` may need to be modified depending on the format in which sparse depth is stored. By default, the depth storage method [used in the VOID dataset](https://github.com/alexklwong/void-dataset/blob/master/src/data_utils.py) is assumed.
2) Run the `run.py` script as follows:
```bash
DEPTH_PREDICTOR="dpt_beit_large_512"
NSAMPLES=150
SML_MODEL_PATH="weights/sml_model.dpredictor.${DEPTH_PREDICTOR}.nsamples.${NSAMPLES}.ckpt"
python run.py -dp $DEPTH_PREDICTOR -ns $NSAMPLES -sm $SML_MODEL_PATH --save-output
```
3) The `--save-output` flag enables saving outputs to the `output` folder. By default, the following outputs will be saved per sample:
```bash
output
├── ga_depth # metric depth map after global alignment
│ ├── .pfm # as PFM
│ ├── .png # as 16b PNG
│ └── ...
└── sml_depth # metric depth map output by SML
├── .pfm # as PFM
├── .png # as 16b PNG
└── ...
```
## Evaluation
Models provided in this repo were trained on the VOID dataset.
1) Download the VOID dataset following [the instructions in the VOID dataset repo](https://github.com/alexklwong/void-dataset#downloading-void).
2) To evaluate on VOID test sets, run the `evaluate.py` script as follows:
```bash
DATASET_PATH="/path/to/void_release/"
DEPTH_PREDICTOR="dpt_beit_large_512"
NSAMPLES=150
SML_MODEL_PATH="weights/sml_model.dpredictor.${DEPTH_PREDICTOR}.nsamples.${NSAMPLES}.ckpt"
python evaluate.py -ds $DATASET_PATH -dp $DEPTH_PREDICTOR -ns $NSAMPLES -sm $SML_MODEL_PATH
```
Results for the example shown above:
```
Averaging metrics for globally-aligned depth over 800 samples
Averaging metrics for SML-aligned depth over 800 samples
+---------+----------+----------+
| metric | GA Only | GA+SML |
+---------+----------+----------+
| RMSE | 191.36 | 142.85 |
| MAE | 115.84 | 76.95 |
| AbsRel | 0.069 | 0.046 |
| iRMSE | 72.70 | 57.13 |
| iMAE | 49.32 | 34.25 |
| iAbsRel | 0.071 | 0.048 |
+---------+----------+----------+
```
To evaluate on VOID test sets at different densities (void_150, void_500, void_1500), change the `NSAMPLES` argument above accordingly.
## Citation
If you reference our work, please consider citing the following:
```bib
@inproceedings{wofk2023videpth,
author = {{Wofk, Diana and Ranftl, Ren\'{e} and M{\"u}ller, Matthias and Koltun, Vladlen}},
title = {{Monocular Visual-Inertial Depth Estimation}},
booktitle = {{IEEE International Conference on Robotics and Automation (ICRA)}},
year = {{2023}}
}
```
## Acknowledgements
Our work builds on and uses code from [MiDaS](https://github.com/isl-org/MiDaS), [timm](https://github.com/rwightman/pytorch-image-models), and [PyTorch Lightning](https://lightning.ai/docs/pytorch/stable/). We'd like to thank the authors for making these libraries and frameworks available.
_Last revisited: August 2024_