Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/facebookresearch/distdepth

Repository for "Toward Practical Monocular Indoor Depth Estimation" (CVPR 2022)
https://github.com/facebookresearch/distdepth

Last synced: 8 days ago
JSON representation

Repository for "Toward Practical Monocular Indoor Depth Estimation" (CVPR 2022)

Host: GitHub
URL: https://github.com/facebookresearch/distdepth
Owner: facebookresearch
License: other
Created: 2022-04-05T22:04:37.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-03-21T04:39:13.000Z (8 months ago)
Last Synced: 2024-03-21T08:39:14.423Z (8 months ago)
Language: Python
Size: 42.3 MB
Stars: 205
Watchers: 9
Forks: 18
Open Issues: 16
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

Toward Practical Monocular Indoor Depth Estimation

Cho-Ying Wu, Jialiang Wang, Michael Hall, Ulrich Neumann, Shuochen Su

[arXiv] [CVF open access] [project site: data, supplementary]

[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/toward-practical-self-supervised-monocular/monocular-depth-estimation-on-nyu-depth-v2-4)](https://paperswithcode.com/sota/monocular-depth-estimation-on-nyu-depth-v2-4?p=toward-practical-self-supervised-monocular)

Updates

[Mar 2024]: Fix bugs for instability for coverting scale and shift

[Jan 2024]: Add the online least square alignemnt for expert and student's scale and fix bugs for per-sample edge map calculation.

[Augest 2023]: Add a snippet for simple AR effects using z-buffer. See the last section

**[June 2023]: Revise the instruction for training codes and train on your own dataset.**

[June 2023]: Fix bugs in sample training code.

[June 2023]: Fix bugs in visualization and saving.

Introduction

As this project includes data contribution, please refer to the project page for data download instructions, including SimSIN, UniSIN, and VA, as well as UniSIN leaderboard participation.

Advantage

Results

### DistDepth
Our DistDepth is a highly robust monocular depth estimation approach for generic indoor scenes.
* Trained with stereo sequences without their groundtruth depth
* Structured and metric-accurate
* Run in an interactive rate with Laptop GPU
* Sim-to-real: trained on simulation and becomes transferrable to real scenes

Single Image Inference Demo

We test on Ubuntu 20.04 LTS with an laptop NVIDIA 2080 GPU.

Install packages

1. Use conda

``` conda create --name distdepth python=3.8 ```
``` conda activate distdepth ```

2. Install pre-requisite common packages. Go to https://pytorch.org/get-started/locally/ and install pytorch that is compatible to your computer. We update the code to be compatible with pytorch 2.0+ (tested on torch v2.2.1, torchvision 0.17, cudatoolkit-12). The current code requires torchvision > 0.13

3. Install other dependencies: opencv-python and matplotlib, imageio, Pillow, augly, tensorboardX

``` pip install opencv-python, matplotlib, imageio, Pillow, augly, tensorboardX ```

Download pretrained models

4. Download pretrained models [here] (ResNet152, 246MB, illustation for averagely good in-the-wild indoor scenes).

5. Unzip the model under the root directory. 'ckpts' containing the pretrained models is then created.

6. Run

``` python demo.py ```

7. Results will be stored under `results/`

Note that during inference, we will apply a 1.312 scale for models trained on SimSIN, since SimSIN is created with stereo baseline of 13.12cm, and during training it uses a stereo scale of 10cm. (See [Issue 27](https://github.com/facebookresearch/DistDepth/issues/27#issue-1989386374))

Pointcloud Generation

Some Sample data are provided in `data/sample_pc`.

``` python visualize_pc.py ```

This will generate pointcloud in '.ply' format by image and depth map inputs for 'data/sample_pc/0000.jpg'. ply file is saved under 'data/sample_pc' folder. Use meshlab to visualize the pointcloud.

Data

Download SimSIN [here]. For UniSIN and VA, please download at the [project site].

To generate stereo data with depth using Habitat, we provide a snippet here. Install Habitat first.

``` python visualize_pc.py ```

Training with PoseNet and DepthNet

For a simple taste of training, download a smaller replica set [here] and create and put under './SimSIN-simple'.

The folder structure should be

.
├── SimSIN-simple
├── replica
├── replica_train.txt

Download weights

```shell
mkdir weights

wget -O weights/dpt_hybrid_nyu-2ce69ec7.pt https://github.com/intel-isl/DPT/releases/download/1_0/dpt_hybrid_nyu-2ce69ec7.pt
```

Training command

The below command trains networks by using stereo and current frame (PoseNet is not used)

```shell
python execute.py --exe train --model_name distdepth-distilled --frame_ids 0 --log_dir='./tmp' --data_path SimSIN-simple --dataset SimSIN --batch_size 15 --width 256 --height 256 --max_depth 10.0 --num_epochs 10 --scheduler_step_size 8 --learning_rate 0.0001 --use_stereo --thre 0.95 --num_layers 152 --log_frequency 25
```

The below command trains networks by using current and past/future one frame

```shell
python execute.py --exe train --model_name distdepth-distilled --frame_ids 0 -1 1 --log_dir='./tmp' --data_path SimSIN-simple --dataset SimSIN --batch_size 15 --width 256 --height 256 --max_depth 10.0 --num_epochs 10 --scheduler_step_size 8 --learning_rate 0.0001 --thre 0.95 --num_layers 152 --log_frequency 25
```

The below command trains networks by using current and past/future one frame and stereo

The memory usage requires 21G (stereo + 0, -1, 1 temporal) and needs about 30 min to train on a RTX 3090 GPU.

Changing different expert network: See execute_func.py L59. Switch to different version of DPTDepthModel. The default now uses DPT finetuned on NYUv2 (NYUv2 model is more capable of indoor scenes, and midas model is for general purpose)

If you would like to use more frames, you'll need to leave more buffer frames in the data list file. See below notes for details.

**Notes for training on your own dataset:**

1. Create your dataloader. You can find SimSIN sample (containing both temporal and stereo) under dataset/ , and then add your dataloader in execute_func.py L111.

2. In execute_func.py L130-141, add your data list file. See format in Replica sample data. Specifically each line contains

3. Use the before commands to train on your data. Note that your data need to have stereo if you specify --use_stereo. If you sepcify frame_id -1, 1, you'll need to leave one buffer frame at the top and end to avoid reading from None. For example, replica sample data contain 0-49 time steps, but in the data list file, only 1-48 are in file

Evaluation

SimSIN trained models, evaluation on VA

| Name | Arch | Expert | MAE | AbsRel | RMSE | acc@ 1.25 | acc@ 1.25^2 | acc@ 1.25^3 | Download |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| DistDepth | ResNet152 | DPT Large | 0.252 | 0.175 | 0.371 | 75.1 | 93.9 | 98.4 | [model](https://drive.google.com/file/d/1X_VMg1LYLmm8xCloLRjqHtIslyXfOrn5/view?usp=sharing) |
| DistDepth | ResNet152 | DPT Legacy | 0.270 | 0.186 | 0.386 | 73.2 | 93.2 | 97.9 | [model](https://drive.google.com/file/d/1rTBSglo_h-Ke5HMe4xvHhCjpeBDRl6vx/view?usp=sharing) |
| DistDepth-Multi| ResNet101 | DPT Legacy | 0.243 | 0.169 | 0.362 | 77.1 | 93.7 | 97.9 | [model](https://drive.google.com/file/d/1Sg_dXAyKI2VfKzHiAu9i8WqT9I7Y9k0D/view?usp=sharing) |

Download VA (8G) first. Extract under the root folder.

.
├── VA
├── camera_0
├── 00000000.png
......
├── camera_1
├── 00000000.png
......
├── gt_depth_rectify
├── cam0_frame0000.depth.pfm
......
├── VA_left_all.txt

Run ``` bash eval.sh ``` The performances will be saved under the root folder.

To visualize the predicted depth maps in a minibatch (adjust batch_size for different numbers):

```shell
python execute.py --exe eval_save --log_dir='./tmp' --data_path VA --dataset VA --batch_size 10 --load_weights_folder --models_to_load encoder depth --width 256 --height 256 --max_depth 10 --frame_ids 0 --num_layers 152
```

If missing 'weights/dpt_hybrid_nyu-2ce69ec7.pt' message pops up, download the model from [DPT](https://github.com/isl-org/DPT) and put it under 'weights'.

To visualize the predicted depth maps for all testing data on the list:

```shell
python execute.py --exe eval_save_all --log_dir='./tmp' --data_path VA --dataset VA --batch_size 1 --load_weights_folder --models_to_load encoder depth --width 256 --height 256 --max_depth 10 --frame_ids 0 --num_layers 152
```

Only batch_size = 1 is valid under this mode.

Evaluation on NYUv2

Prepare NYUv2 data.

.
├── NYUv2
├── img_val
├── 00001.png
......
├── depth_val
├── 00001.npy
......
......
├── NYUv2.txt

| Name | Arch | Expert | MAE | AbsRel | RMSE | acc@ 1.25 | acc@ 1.25^2 | acc@ 1.25^3 | Download |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| DistDepth-finetuned | ResNet152 | DPT on NYUv2 | 0.308 | 0.113 | 0.444 | 87.3 | 97.3 | 99.3 | [model](https://drive.google.com/file/d/1kLJBuMOf0xSpYq7DtxnPpBTxMwW0ylGm/view?usp=sharing) |
| DistDepth-SimSIN | ResNet152 | DPT | 0.411 | 0.163 | 0.563 | 78.0 | 93.6 | 98.1 | [model](https://drive.google.com/file/d/1Hf_WPaBGMpPBFymCwmN8Xh1blXXZU1cd/view?usp=sharing) |

Change train_filenames (dummy) and val_filenames in execute_func.py to NYUv2. Then,

```shell
python execute.py --exe eval_measure --log_dir='./tmp' --data_path NYUv2 --dataset NYUv2 --batch_size 1 --load_weights_folder --models_to_load encoder depth --width 256 --height 256 --max_depth 12 --frame_ids 0 --num_layers 152
```

Depth-aware AR effects

To reproduce the object dragging with depth map, we provide some data under AR_effects/ and a snippet 'AR_simple.py'

```shell
python AR_simple.py
```

It will generate inserted images along a preset trajectory. Use ffmpeg or other video command to compile the images to video.

Virtual object insertion:

Dragging objects along a trajectory:

Citation

@inproceedings{wu2022toward,
title={Toward Practical Monocular Indoor Depth Estimation},
author={Wu, Cho-Ying and Wang, Jialiang and Hall, Michael and Neumann, Ulrich and Su, Shuochen},
booktitle={CVPR},
year={2022}
}

## License
DistDepth is CC-BY-NC licensed, as found in the LICENSE file.