Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/princeton-vl/DeepV2D

Last synced: about 1 month ago
JSON representation

Host: GitHub
URL: https://github.com/princeton-vl/DeepV2D
Owner: princeton-vl
License: bsd-3-clause
Created: 2019-04-19T14:58:57.000Z (about 5 years ago)
Default Branch: master
Last Pushed: 2020-10-22T16:55:26.000Z (over 3 years ago)
Last Synced: 2024-04-01T22:45:39.636Z (3 months ago)
Language: Python
Homepage:
Size: 54.6 MB
Stars: 638
Watchers: 20
Forks: 91
Open Issues: 27
Metadata Files:
- Readme: readme.md
- License: LICENSE

Lists

Awesome-SLAM - DeepV2D

README

# DeepV2D
This repository contains the source code for our paper:

[DeepV2D: Video to Depth with Differentiable Structure from Motion](https://arxiv.org/abs/1812.04605)

Zachary Teed and Jia Deng

International Conference on Learning Representations (ICLR) 2020

## Requirements
Our code was tested using Tensorflow 1.12.0 and Python 3. To use the code, you need to first install the following python packages:

First create a clean virtualenv

```Shell
virtualenv --no-site-packages -p python3 deepv2d_env
source deepv2d_env/bin/activate
```

```Shell
pip install tensorflow-gpu==1.12.0
pip install h5py
pip install easydict
pip install scipy
pip install opencv-python
pip install pyyaml
pip install toposort
pip install vtk
```

You can optionally compile our cuda backprojection operator by running

```Shell
cd deepv2d/special_ops && ./make.sh && cd ../..
```

This will reduce peak GPU memory usage. You may need to change CUDALIB to where you have cuda is installed.

## Demos

### Video to Depth (V2D)

Try it out on one of the provided test sequences. First download our pretrained models

```Shell
./data/download_models.sh
```
or from [google drive](https://drive.google.com/file/d/1pEUwOw_aOCIz53dGXOnU3KXArzR41YAV/view?usp=sharing)

The demo code will output a depth map and display a point cloud for visualization. Once the depth map has appeared, press any key to open the point cloud visualization.

[NYUv2](https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html):
```Shell
python demos/demo_v2d.py --model=models/nyu.ckpt --sequence=data/demos/nyu_0
```

[ScanNet](http://www.scan-net.org/):
```Shell
python demos/demo_v2d.py --model=models/scannet.ckpt --sequence=data/demos/scannet_0
```

[KITTI](http://www.cvlibs.net/datasets/kitti/):
```Shell
python demos/demo_v2d.py --model=models/kitti.ckpt --sequence=data/demos/kitti_0
```

You can also run motion estimation in `global` mode which updates all the poses jointly as a single optimization problem

```Shell
python demos/demo_v2d.py --model=models/nyu.ckpt --sequence=data/demos/nyu_0 --mode=global
```

### Uncalibrated Video to Depth (V2D-Uncalibrated)

If you do not know the camera intrinsics you can run DeepV2D in uncalibrated mode. In the uncalibrated setting, the motion module estimates the focal length during inference.

```Shell
python demos/demo_uncalibrated.py --video=data/demos/golf.mov
```

### SLAM / VO

DeepV2D can also be used for tracking and mapping on longer videos. First, download some test sequences

```Shell
./data/download_slam_sequences.sh
```

Try it out on [NYU-Depth](https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html), [ScanNet](http://www.scan-net.org/), [TUM-RGBD](https://vision.in.tum.de/data/datasets/rgbd-dataset), or [KITTI](http://www.cvlibs.net/datasets/kitti/). Using more keyframes `--n_keyframes=?` reduces drift but results in slower tracking.

```Shell
python demos/demo_slam.py --dataset=kitti --n_keyframes=2
```

```Shell
python demos/demo_slam.py --dataset=scannet --n_keyframes=3
```

The `--cinematic` flag forces the visualization to follow the camera
```Shell
python demos/demo_slam.py --dataset=nyu --n_keyframes=3 --cinematic
```

The `--clear_points` flag can be used so that only the point cloud of the current depth is plotted.
```Shell
python demos/demo_slam.py --dataset=tum --n_keyframes=3 --clear_points
```

## Evaluation

You can evaluate the trained models on one of the datasets...

#### [NYUv2](https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html):
```Shell
./data/download_nyu_data.sh
python evaluation/eval_nyu.py --model=models/nyu.ckpt
```

#### [KITTI](http://www.cvlibs.net/datasets/kitti/):
First download the dataset using this [script](http://www.cvlibs.net/download.php?file=raw_data_downloader.zip) provided on the official website. Then run the evaluation script where KITTI_PATH is the location of where the dataset was downloaded

```Shell
./data/download_kitti_data.sh
python evaluation/eval_kitti.py --model=models/kitti.ckpt --dataset_dir=KITTI_PATH
```

#### [ScanNet](http://www.scan-net.org/):
First download the [ScanNet](https://github.com/ScanNet/ScanNet) dataset.

Then run the evaluation script where SCANNET_PATH is the location of where you downloaded ScanNet

```Shell
python evaluation/eval_scannet.py --model=models/scannet.ckpt --dataset_dir=SCANNET_PATH
```

## Training

You can train a model on one of the datasets

#### [NYUv2](https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html):
First download the training tfrecords file [here](https://drive.google.com/file/d/1-kfW55tpwxFVfv9AL76IFXWuNMBE3b7Y/view?usp=sharing
) (143Gb) containing the NYU data. Once the data has been downloaded, train the model by running the command (training takes about 1 week on a Nvidia 1080Ti GPU)

Camera poses for NYU were estimated using [ORB-SLAM2](https://github.com/raulmur/ORB_SLAM2) using kinect measurements. You can download the estimated poses from [google drive](https://drive.google.com/file/d/13L8ZrFQM1Jlghicvi6dACosK2hu2T45g/view?usp=sharing).

```Shell
python training/train_nyu.py --cfg=cfgs/nyu.yaml --name=nyu_model --tfrecords=nyu_train.tfrecords
```

Note: this creates a temporary directory which is used to store intermediate depth predictions. You can specify the location of the temporary directory using the `--tmp` flag. You can use multiple gpus by using the `--num_gpus` flag. If you train with multiple gpus, you can reduce the number of training iterations in cfgs/nyu.yaml.

#### [KITTI](http://www.cvlibs.net/datasets/kitti/):
First download the dataset using this [script](http://www.cvlibs.net/download.php?file=raw_data_downloader.zip) provided on the official website. Once the dataset has been downloaded, write the training sequences to a tfrecords file

```Shell
python training/write_tfrecords.py --dataset=kitti --dataset_dir=KITTI_DIR --records_file=kitti_train.tfrecords
```

You can now train the model (training takes about 1 week on a Nvidia 1080Ti GPU). Note: this creates a temporary directory which is used to store intermediate depth predictions. You can specify the location of the temporary directory using the `--tmp` flag. You can use multiple gpus by using the `--num_gpus` flag.

```Shell
python training/train_kitti.py --cfg=cfgs/kitti.yaml --name=kitti_model --tfrecords=kitti_train.tfrecords
```

#### [ScanNet](http://www.scan-net.org/):
```Shell
python training/train_scannet.py --cfg=cfgs/scannet.yaml --name=scannet_model --dataset_dir="path to scannet"
```