Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/zlthinker/KFNet
KFNet: Learning Temporal Camera Relocalization using Kalman Filtering (CVPR 2020 Oral)
https://github.com/zlthinker/KFNet
7scenes kalman-filtering localization optical-flows tensorflow uncertainties
Last synced: 3 months ago
JSON representation
KFNet: Learning Temporal Camera Relocalization using Kalman Filtering (CVPR 2020 Oral)
- Host: GitHub
- URL: https://github.com/zlthinker/KFNet
- Owner: zlthinker
- License: mit
- Created: 2020-01-05T03:06:33.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2020-06-25T06:06:59.000Z (over 4 years ago)
- Last Synced: 2024-05-21T12:40:13.773Z (8 months ago)
- Topics: 7scenes, kalman-filtering, localization, optical-flows, tensorflow, uncertainties
- Language: Python
- Homepage:
- Size: 66.3 MB
- Stars: 212
- Watchers: 8
- Forks: 28
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- Awesome-SLAM - KFNet
README
# KFNet
This is a Tensorflow implementation of our CVPR 2020 Oral paper - ["KFNet: Learning Temporal Camera Relocalization using Kalman Filtering"](https://arxiv.org/abs/2003.10629) by Lei Zhou, Zixin Luo, Tianwei Shen, Jiahui Zhang, Mingmin Zhen, Yao Yao, Tian Fang, Long Quan.This paper addresses the temporal camera relocalization of time-series image data by folding the scene coordinate regression problem into the principled Kalman filter framework.
If you find this project useful, please cite:
```
@inproceedings{zhou2020kfnet,
title={KFNet: Learning Temporal Camera Relocalization using Kalman Filtering},
author={Zhou, Lei and Luo, Zixin and Shen, Tianwei and Zhang, Jiahui and Zhen, Mingmin and Yao, Yao and Fang, Tian and Quan, Long},
booktitle={Computer Vision and Pattern Recognition (CVPR)},
year={2020}
}
```
## Contents- [About](#about)
- [File format](#file-format)
- [Environment](#environment)
- [Testing](#testing)
- [Training](#training)
- [Credit](#credit)## About
### Network architecture
### Sample results on [7scenes](https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/) and [12scenes](http://graphics.stanford.edu/projects/reloc/)
KFNet simultaneously predicts the mapping points and camera poses in a temporal fashion within the coordinate system defined by a known scene.
|| DSAC++ | KFNet |
|:--:|:--:|:--:|
|7scenes-fire | ![Alt Text](doc/fire_DSAC++_pip.gif) | ![Alt Text](doc/fire_KFNet_pip.gif) |
|12scenes-office2-5a| ![Alt Text](doc/office2_5a_DSAC++_pip.gif) | ![Alt Text](doc/office2_5a_KFNet_pip.gif)|
|Description | Blue - ground truth poses | Red - estimated poses |### Intermediate uncertainty predictions
Below we visualize the measurement and process noise.
|Data | Measurement noise | Process noise |
|:--:|:--:|:--:|
|7scenes-fire | ![Alt Text](doc/fire_mea_uncertainty.gif) | ![Alt Text](doc/fire-process_uncertainty.gif) |
|12scenes-office2-5a| ![Alt Text](doc/office2_5a_uncertainty.gif) | ![Alt Text](doc/office2_5a_process_uncertainty.gif)|
|Description | The brighter color means smaller noise. | The figure bar measures the inverse of the covariances (in centimeters) |### Intermediate optical flow results on [7scenes](https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/), [12scenes](http://graphics.stanford.edu/projects/reloc/), [Cambridge](http://mi.eng.cam.ac.uk/projects/relocalisation/) and [DeepLoc](http://deeploc.cs.uni-freiburg.de/)
As an essential component of KFNet, the process system of KFNet (i.e., OFlowNet) delineates pixel transitions across frames through optical flow reasoning **yet without recourse to grourd truth optical flow labelling**. We visualize the predicted optical flow fields below while suppressing the predictions with too large uncertainties.
|Data | Description | Optical flow |
|:--:|:--:|:--:|
|7scenes-fire | Indoor; hand-held; small shaky motions | |
|12scenes-office2-5a | Indoor; hand-held; larger movements | |
|Cambridge-KingsCollege | Outdoor; hand-held; large random motions | |
|DeepLoc | Outdoor; vehicle-mounted; forward motions | |**Remark** For DeepLoc, since OFlowNet is trained only on one scene included in DeepLoc, the flow predictions appear somewhat messy due to the lack of training data. Training with a larger amount and variety of data would improve the results.
## Usage
### File format
* **Input:** The input folder of a project should contain the files below.
* `image_list.txt` comprising the sequential full image paths in lines. Please go to the [7scenes](https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/) dataset to download the source images.
* `label_list.txt` comprising the full label paths in lines corresponding to the images. The label files are generated by the `tofile()` function of numpy matrices. They have a channel number of 4, with 3 for scene coordinates and 1 for binary masks of pixels. The mask for one pixel is 1 if its label scene coordinates are valid and 0 otherwise. Their resolutions are 8 times lower than the images. For example, for the [7scenes](https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/) dataset, the images have a resolution of 480x640, while the label maps have a resolution of 60x80.
* `transform.txt` recording the 4x4 Euclidean transformation matrix which decorrelates the scene point cloud to give zero mean and correlations.
* You can download the prepared input label map files of 7scenes from the Google drive links below.|[chess(13G)](https://drive.google.com/open?id=15LCNv8cZkg1tINggssB--MWDGxE3LoYq) |[fire(9G)](https://drive.google.com/open?id=1EaVPg_-6gp_7PWvsiHk05QHU425t5dql) |[heads(4G)](https://drive.google.com/open?id=1aYJPdekYuofNcqdsLNdphzCVVX93zT1w) |[office(22G)](https://drive.google.com/open?id=16hMHwI8dnWEmt0HoevfQxNsnyO7ND6Nb) |[pumpkin(13G)](https://drive.google.com/open?id=1elobB_maZ5tW1v_K3Anl9BGGlnkCKI8e) |[redkitchen(27G)](https://drive.google.com/open?id=1j5UG23me1Z8Sz9PBCeTNeZsw3mSeUTtS) |[stairs(7G)](https://drive.google.com/open?id=1Hv9bOsf68xNyaOJqpnOKHKcv9YYXroLj) |
|:-:|:-:|:-:|:-:|:-:|:-:|:-:|* **Output:** The testing program (to be introduced below) outputs a 3-d scene coordinate map (in meters) and a 1-d confidence map into a 4-channel numpy matrix for each input image. And then you can run the provided PnP program (in ```PnP.zip```) or your own algorithms to compute the camera poses from them.
* The confidences are the inverse of predicted Gaussian variances / uncertainties. Thus, the larger the confidences, the smaller the variances are.
* You can visualize a scene coordinate map as a point cloud via [Open3d](http://www.open3d.org/docs/release/getting_started.html) by running ```python vis/vis_scene_coordinate_map.py ```.
* Or you can visualize a streaming scene coordinate map list by running ```python vis/vis_scene_coordinate_map_list.py ```.### Environment
* The codes are tested along with
* python 2.7,
* tensorflow-gpu 1.10~1.13 (inclusive),
* corresponding versions of CUDA and CUDNN to enable tensorflow-gpu (see [link](https://stackoverflow.com/questions/50622525/which-tensorflow-and-cuda-version-combinations-are-compatible) for reference of the version combinations),
* other python packages including numpy, matplotlib and open3d.* To directly install tensorflow and other python packages, run
```
sudo pip install -r requirements.txt
```* If you are familiar with Conda, you can create the environment for KFNet by running
```
conda create -f environment.yml
conda activate KFNet
```### Testing
* Download
You can download the trained models of [7scenes](https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/) from the [Google drive link (3G)](https://drive.google.com/open?id=13KZGz_akJw8iTQW90pgbuw2JAQzV7cG8).
* Test SCoordNet
```
git checkout SCoordNet
python SCoordnet/eval.py --input_folder --output_folder --model_folder --scene
# = chess/fire/heads/office/pumpkin/redkitchen/stairs, i.e., one of the scene names of 7scenes dataset
```* Test OFlowNet
```
git checkout OFlowNet
python OFlowNet/eval --input_folder --output_folder --model_folder
```
The testing program of OFlowNet will save the 2-d optical flows and 1-d uncertainties of consecutive image pairs as npy files of the dimension 60x80x3. You can visualize the flow results by running scripts ```vis/vis_optical_flow.py``` and ```vis/vis_optical_flow_list.py```.* Test KFNet
```
git checkout master
python KFNet/eval.py --input_folder --output_folder --model_folder --scene
```* Run PnP to compute camera poses
```
unzip PnP.zip && cd PnP
python main.py --gt --thread_num <32>
// Please note that you need to install git-lfs before cloning to get PnP.zip, since the zip file is stored via LFS.
```### Training
The training procedure has 3 stages.
1. **Train SCoordNet** for each scene independently.
```
git checkout SCoordnet
python SCoordNet/train.py --input_folder --model_folder --scene
```2. **Train OFlowNet** using all the image sequences that are not limited to any specific scenes, for example, concatenating all the ```image_list.txt``` and ```label_list.txt``` of 7scenes for training.
```
git checkout OFlowNet
python OFlowNet/train.py --input_folder --model_folder
```3. **Train KFNet** for each scene from the pre-trained SCoordNet and OFlowNet models to jointly finetune their parameters.
```
git checkout master
python KFNet/train.py --input_folder --model_folder --scoordnet --oflownet --scene
```## Credit
This implementation was developed by [Lei Zhou](https://zlthinker.github.io/). Feel free to contact Lei for any enquiry.