Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/zlthinker/KFNet

KFNet: Learning Temporal Camera Relocalization using Kalman Filtering (CVPR 2020 Oral)
https://github.com/zlthinker/KFNet

7scenes kalman-filtering localization optical-flows tensorflow uncertainties

Last synced: 3 months ago
JSON representation

KFNet: Learning Temporal Camera Relocalization using Kalman Filtering (CVPR 2020 Oral)

Awesome Lists containing this project

README

        

# KFNet
This is a Tensorflow implementation of our CVPR 2020 Oral paper - ["KFNet: Learning Temporal Camera Relocalization using Kalman Filtering"](https://arxiv.org/abs/2003.10629) by Lei Zhou, Zixin Luo, Tianwei Shen, Jiahui Zhang, Mingmin Zhen, Yao Yao, Tian Fang, Long Quan.

This paper addresses the temporal camera relocalization of time-series image data by folding the scene coordinate regression problem into the principled Kalman filter framework.

If you find this project useful, please cite:
```
@inproceedings{zhou2020kfnet,
title={KFNet: Learning Temporal Camera Relocalization using Kalman Filtering},
author={Zhou, Lei and Luo, Zixin and Shen, Tianwei and Zhang, Jiahui and Zhen, Mingmin and Yao, Yao and Fang, Tian and Quan, Long},
booktitle={Computer Vision and Pattern Recognition (CVPR)},
year={2020}
}
```
## Contents

- [About](#about)
- [File format](#file-format)
- [Environment](#environment)
- [Testing](#testing)
- [Training](#training)
- [Credit](#credit)

## About

### Network architecture


drawing

### Sample results on [7scenes](https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/) and [12scenes](http://graphics.stanford.edu/projects/reloc/)

KFNet simultaneously predicts the mapping points and camera poses in a temporal fashion within the coordinate system defined by a known scene.

|| DSAC++ | KFNet |
|:--:|:--:|:--:|
|7scenes-fire | ![Alt Text](doc/fire_DSAC++_pip.gif) | ![Alt Text](doc/fire_KFNet_pip.gif) |
|12scenes-office2-5a| ![Alt Text](doc/office2_5a_DSAC++_pip.gif) | ![Alt Text](doc/office2_5a_KFNet_pip.gif)|
|Description | Blue - ground truth poses | Red - estimated poses |

### Intermediate uncertainty predictions

Below we visualize the measurement and process noise.

|Data | Measurement noise | Process noise |
|:--:|:--:|:--:|
|7scenes-fire | ![Alt Text](doc/fire_mea_uncertainty.gif) | ![Alt Text](doc/fire-process_uncertainty.gif) |
|12scenes-office2-5a| ![Alt Text](doc/office2_5a_uncertainty.gif) | ![Alt Text](doc/office2_5a_process_uncertainty.gif)|
|Description | The brighter color means smaller noise. | The figure bar measures the inverse of the covariances (in centimeters) |

### Intermediate optical flow results on [7scenes](https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/), [12scenes](http://graphics.stanford.edu/projects/reloc/), [Cambridge](http://mi.eng.cam.ac.uk/projects/relocalisation/) and [DeepLoc](http://deeploc.cs.uni-freiburg.de/)

As an essential component of KFNet, the process system of KFNet (i.e., OFlowNet) delineates pixel transitions across frames through optical flow reasoning **yet without recourse to grourd truth optical flow labelling**. We visualize the predicted optical flow fields below while suppressing the predictions with too large uncertainties.

|Data | Description | Optical flow |
|:--:|:--:|:--:|
|7scenes-fire | Indoor; hand-held; small shaky motions | |
|12scenes-office2-5a | Indoor; hand-held; larger movements | |
|Cambridge-KingsCollege | Outdoor; hand-held; large random motions | |
|DeepLoc | Outdoor; vehicle-mounted; forward motions | |

**Remark** For DeepLoc, since OFlowNet is trained only on one scene included in DeepLoc, the flow predictions appear somewhat messy due to the lack of training data. Training with a larger amount and variety of data would improve the results.

## Usage

### File format

* **Input:** The input folder of a project should contain the files below.
* `image_list.txt` comprising the sequential full image paths in lines. Please go to the [7scenes](https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/) dataset to download the source images.
* `label_list.txt` comprising the full label paths in lines corresponding to the images. The label files are generated by the `tofile()` function of numpy matrices. They have a channel number of 4, with 3 for scene coordinates and 1 for binary masks of pixels. The mask for one pixel is 1 if its label scene coordinates are valid and 0 otherwise. Their resolutions are 8 times lower than the images. For example, for the [7scenes](https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/) dataset, the images have a resolution of 480x640, while the label maps have a resolution of 60x80.
* `transform.txt` recording the 4x4 Euclidean transformation matrix which decorrelates the scene point cloud to give zero mean and correlations.
* You can download the prepared input label map files of 7scenes from the Google drive links below.

|[chess(13G)](https://drive.google.com/open?id=15LCNv8cZkg1tINggssB--MWDGxE3LoYq) |[fire(9G)](https://drive.google.com/open?id=1EaVPg_-6gp_7PWvsiHk05QHU425t5dql) |[heads(4G)](https://drive.google.com/open?id=1aYJPdekYuofNcqdsLNdphzCVVX93zT1w) |[office(22G)](https://drive.google.com/open?id=16hMHwI8dnWEmt0HoevfQxNsnyO7ND6Nb) |[pumpkin(13G)](https://drive.google.com/open?id=1elobB_maZ5tW1v_K3Anl9BGGlnkCKI8e) |[redkitchen(27G)](https://drive.google.com/open?id=1j5UG23me1Z8Sz9PBCeTNeZsw3mSeUTtS) |[stairs(7G)](https://drive.google.com/open?id=1Hv9bOsf68xNyaOJqpnOKHKcv9YYXroLj) |
|:-:|:-:|:-:|:-:|:-:|:-:|:-:|

* **Output:** The testing program (to be introduced below) outputs a 3-d scene coordinate map (in meters) and a 1-d confidence map into a 4-channel numpy matrix for each input image. And then you can run the provided PnP program (in ```PnP.zip```) or your own algorithms to compute the camera poses from them.
* The confidences are the inverse of predicted Gaussian variances / uncertainties. Thus, the larger the confidences, the smaller the variances are.
* You can visualize a scene coordinate map as a point cloud via [Open3d](http://www.open3d.org/docs/release/getting_started.html) by running ```python vis/vis_scene_coordinate_map.py ```.
* Or you can visualize a streaming scene coordinate map list by running ```python vis/vis_scene_coordinate_map_list.py ```.

### Environment

* The codes are tested along with
* python 2.7,
* tensorflow-gpu 1.10~1.13 (inclusive),
* corresponding versions of CUDA and CUDNN to enable tensorflow-gpu (see [link](https://stackoverflow.com/questions/50622525/which-tensorflow-and-cuda-version-combinations-are-compatible) for reference of the version combinations),
* other python packages including numpy, matplotlib and open3d.

* To directly install tensorflow and other python packages, run
```
sudo pip install -r requirements.txt
```

* If you are familiar with Conda, you can create the environment for KFNet by running
```
conda create -f environment.yml
conda activate KFNet
```

### Testing

* Download

You can download the trained models of [7scenes](https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/) from the [Google drive link (3G)](https://drive.google.com/open?id=13KZGz_akJw8iTQW90pgbuw2JAQzV7cG8).

* Test SCoordNet
```
git checkout SCoordNet
python SCoordnet/eval.py --input_folder --output_folder --model_folder --scene
# = chess/fire/heads/office/pumpkin/redkitchen/stairs, i.e., one of the scene names of 7scenes dataset
```

* Test OFlowNet
```
git checkout OFlowNet
python OFlowNet/eval --input_folder --output_folder --model_folder
```
The testing program of OFlowNet will save the 2-d optical flows and 1-d uncertainties of consecutive image pairs as npy files of the dimension 60x80x3. You can visualize the flow results by running scripts ```vis/vis_optical_flow.py``` and ```vis/vis_optical_flow_list.py```.

* Test KFNet
```
git checkout master
python KFNet/eval.py --input_folder --output_folder --model_folder --scene
```

* Run PnP to compute camera poses

```
unzip PnP.zip && cd PnP
python main.py --gt --thread_num <32>
// Please note that you need to install git-lfs before cloning to get PnP.zip, since the zip file is stored via LFS.
```

### Training

The training procedure has 3 stages.

1. **Train SCoordNet** for each scene independently.
```
git checkout SCoordnet
python SCoordNet/train.py --input_folder --model_folder --scene
```

2. **Train OFlowNet** using all the image sequences that are not limited to any specific scenes, for example, concatenating all the ```image_list.txt``` and ```label_list.txt``` of 7scenes for training.
```
git checkout OFlowNet
python OFlowNet/train.py --input_folder --model_folder
```

3. **Train KFNet** for each scene from the pre-trained SCoordNet and OFlowNet models to jointly finetune their parameters.
```
git checkout master
python KFNet/train.py --input_folder --model_folder --scoordnet --oflownet --scene
```

## Credit

This implementation was developed by [Lei Zhou](https://zlthinker.github.io/). Feel free to contact Lei for any enquiry.