https://github.com/zlthinker/KFNet

KFNet: Learning Temporal Camera Relocalization using Kalman Filtering (CVPR 2020 Oral)
https://github.com/zlthinker/KFNet

7scenes kalman-filtering localization optical-flows tensorflow uncertainties

Last synced: 3 months ago
JSON representation

KFNet: Learning Temporal Camera Relocalization using Kalman Filtering (CVPR 2020 Oral)

Host: GitHub
URL: https://github.com/zlthinker/KFNet
Owner: zlthinker
License: mit
Created: 2020-01-05T03:06:33.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2020-06-25T06:06:59.000Z (almost 5 years ago)
Last Synced: 2024-11-02T15:36:26.416Z (8 months ago)
Topics: 7scenes, kalman-filtering, localization, optical-flows, tensorflow, uncertainties
Language: Python
Homepage:
Size: 66.3 MB
Stars: 217
Watchers: 8
Forks: 28
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

Awesome-SLAM - KFNet

README

        # KFNet

This is a Tensorflow implementation of our CVPR 2020 Oral paper - ["KFNet: Learning Temporal Camera Relocalization using Kalman Filtering"](https://arxiv.org/abs/2003.10629) by Lei Zhou, Zixin Luo, Tianwei Shen, Jiahui Zhang, Mingmin Zhen, Yao Yao, Tian Fang, Long Quan.

This paper addresses the temporal camera relocalization of time-series image data by folding the scene coordinate regression problem into the principled Kalman filter framework.

If you find this project useful, please cite:

```

@inproceedings{zhou2020kfnet,

  title={KFNet: Learning Temporal Camera Relocalization using Kalman Filtering},

  author={Zhou, Lei and Luo, Zixin and Shen, Tianwei and Zhang, Jiahui and Zhen, Mingmin and Yao, Yao and Fang, Tian and Quan, Long},

  booktitle={Computer Vision and Pattern Recognition (CVPR)},

  year={2020}

}

```

## Contents 

- [About](#about)

- [File format](#file-format)

- [Environment](#environment)

- [Testing](#testing)

- [Training](#training)

- [Credit](#credit)

## About

### Network architecture







### Sample results on [7scenes](https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/) and [12scenes](http://graphics.stanford.edu/projects/reloc/)

KFNet simultaneously predicts the mapping points and camera poses in a temporal fashion within the coordinate system defined by a known scene.

|| DSAC++ | KFNet |

|:--:|:--:|:--:|

|7scenes-fire       | ![Alt Text](doc/fire_DSAC++_pip.gif)       | ![Alt Text](doc/fire_KFNet_pip.gif)      |

|12scenes-office2-5a| ![Alt Text](doc/office2_5a_DSAC++_pip.gif) | ![Alt Text](doc/office2_5a_KFNet_pip.gif)|

|Description | Blue - ground truth poses   | Red - estimated poses |

### Intermediate uncertainty predictions

Below we visualize the measurement and process noise.

|Data | Measurement noise | Process noise |

|:--:|:--:|:--:|

|7scenes-fire       | ![Alt Text](doc/fire_mea_uncertainty.gif)       | ![Alt Text](doc/fire-process_uncertainty.gif)      |

|12scenes-office2-5a| ![Alt Text](doc/office2_5a_uncertainty.gif) | ![Alt Text](doc/office2_5a_process_uncertainty.gif)|

|Description | The brighter color means smaller noise.   | The figure bar measures the inverse of the covariances (in centimeters) |

### Intermediate optical flow results on [7scenes](https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/), [12scenes](http://graphics.stanford.edu/projects/reloc/), [Cambridge](http://mi.eng.cam.ac.uk/projects/relocalisation/) and [DeepLoc](http://deeploc.cs.uni-freiburg.de/)

As an essential component of KFNet, the process system of KFNet (i.e., OFlowNet) delineates pixel transitions across frames through optical flow reasoning **yet without recourse to grourd truth optical flow labelling**. We visualize the predicted optical flow fields below while suppressing the predictions with too large uncertainties.

|Data | Description | Optical flow |

|:--:|:--:|:--:|

|7scenes-fire | Indoor; hand-held; small shaky motions |  | 

|12scenes-office2-5a | Indoor; hand-held; larger movements |  |

|Cambridge-KingsCollege | Outdoor; hand-held; large random motions |  |

|DeepLoc | Outdoor; vehicle-mounted; forward motions |  |

**Remark** For DeepLoc, since OFlowNet is trained only on one scene included in DeepLoc, the flow predictions appear somewhat messy due to the lack of training data. Training with a larger amount and variety of data would improve the results. 

## Usage

### File format

* **Input:** The input folder of a project should contain the files below.

	* `image_list.txt` comprising the sequential full image paths in lines. Please go to the [7scenes](https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/) dataset to download the source images.

	* `label_list.txt` comprising the full label paths in lines corresponding to the images. The label files are generated by the `tofile()` function of numpy matrices. They have a channel number of 4, with 3 for scene coordinates and 1 for binary masks of pixels. The mask for one pixel is 1 if its label scene coordinates are valid and 0 otherwise. Their resolutions are 8 times lower than the images. For example, for the [7scenes](https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/) dataset, the images have a resolution of 480x640, while the label maps have a resolution of 60x80.

	* `transform.txt` recording the 4x4 Euclidean transformation matrix which decorrelates the scene point cloud to give zero mean and correlations.

	* You can download the prepared input label map files of 7scenes from the Google drive links below.

	|[chess(13G)](https://drive.google.com/open?id=15LCNv8cZkg1tINggssB--MWDGxE3LoYq) |[fire(9G)](https://drive.google.com/open?id=1EaVPg_-6gp_7PWvsiHk05QHU425t5dql) |[heads(4G)](https://drive.google.com/open?id=1aYJPdekYuofNcqdsLNdphzCVVX93zT1w) |[office(22G)](https://drive.google.com/open?id=16hMHwI8dnWEmt0HoevfQxNsnyO7ND6Nb) |[pumpkin(13G)](https://drive.google.com/open?id=1elobB_maZ5tW1v_K3Anl9BGGlnkCKI8e) |[redkitchen(27G)](https://drive.google.com/open?id=1j5UG23me1Z8Sz9PBCeTNeZsw3mSeUTtS) |[stairs(7G)](https://drive.google.com/open?id=1Hv9bOsf68xNyaOJqpnOKHKcv9YYXroLj) |

	|:-:|:-:|:-:|:-:|:-:|:-:|:-:|

* **Output:** The testing program (to be introduced below) outputs a 3-d scene coordinate map (in meters) and a 1-d confidence map into a 4-channel numpy matrix for each input image. And then you can run the provided PnP program (in ```PnP.zip```) or your own algorithms to compute the camera poses from them.

	* The confidences are the inverse of predicted Gaussian variances / uncertainties. Thus, the larger the confidences, the smaller the variances are. 

	* You can visualize a scene coordinate map as a point cloud via [Open3d](http://www.open3d.org/docs/release/getting_started.html) by running ```python vis/vis_scene_coordinate_map.py ```.

	* Or you can visualize a streaming scene coordinate map list by running ```python vis/vis_scene_coordinate_map_list.py ```.

### Environment

* The codes are tested along with 

	* python 2.7,

	* tensorflow-gpu 1.10~1.13 (inclusive),

	* corresponding versions of CUDA and CUDNN to enable tensorflow-gpu (see [link](https://stackoverflow.com/questions/50622525/which-tensorflow-and-cuda-version-combinations-are-compatible) for reference of the version combinations), 

	* other python packages including numpy, matplotlib and open3d.

* To directly install tensorflow and other python packages, run

```

sudo pip install -r requirements.txt

``` 

* If you are familiar with Conda, you can create the environment for KFNet by running 

```

conda create -f environment.yml

conda activate KFNet

```

### Testing

* Download

You can download the trained models of [7scenes](https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/) from the [Google drive link (3G)](https://drive.google.com/open?id=13KZGz_akJw8iTQW90pgbuw2JAQzV7cG8).

* Test SCoordNet

```

git checkout SCoordNet

python SCoordnet/eval.py --input_folder  --output_folder  --model_folder  --scene 

#  = chess/fire/heads/office/pumpkin/redkitchen/stairs, i.e., one of the scene names of 7scenes dataset

```

* Test OFlowNet

```

git checkout OFlowNet

python OFlowNet/eval --input_folder  --output_folder  --model_folder 

```

The testing program of OFlowNet will save the 2-d optical flows and 1-d uncertainties of consecutive image pairs as npy files of the dimension 60x80x3. You can visualize the flow results by running scripts ```vis/vis_optical_flow.py``` and ```vis/vis_optical_flow_list.py```.

* Test KFNet

```

git checkout master

python KFNet/eval.py --input_folder  --output_folder  --model_folder  --scene 

```

* Run PnP to compute camera poses

```

unzip PnP.zip && cd PnP

python main.py   --gt  --thread_num <32>

// Please note that you need to install git-lfs before cloning to get PnP.zip, since the zip file is stored via LFS.

```

### Training

The training procedure has 3 stages. 

1. **Train SCoordNet** for each scene independently.

```

git checkout SCoordnet

python SCoordNet/train.py --input_folder  --model_folder  --scene 

```

2. **Train OFlowNet** using all the image sequences that are not limited to any specific scenes, for example, concatenating all the ```image_list.txt``` and ```label_list.txt``` of 7scenes for training.

```

git checkout OFlowNet

python OFlowNet/train.py --input_folder  --model_folder 

```

3. **Train KFNet** for each scene from the pre-trained SCoordNet and OFlowNet models to jointly finetune their parameters.

```

git checkout master

python KFNet/train.py --input_folder  --model_folder  --scoordnet  --oflownet  --scene 

```

## Credit

This implementation was developed by [Lei Zhou](https://zlthinker.github.io/). Feel free to contact Lei for any enquiry.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/zlthinker/KFNet

Awesome Lists containing this project

README