{"id":13528799,"url":"https://github.com/zlthinker/KFNet","last_synced_at":"2025-04-01T14:33:12.265Z","repository":{"id":94737283,"uuid":"231860236","full_name":"zlthinker/KFNet","owner":"zlthinker","description":"KFNet: Learning Temporal Camera Relocalization using Kalman Filtering (CVPR 2020 Oral)","archived":false,"fork":false,"pushed_at":"2020-06-25T06:06:59.000Z","size":69525,"stargazers_count":217,"open_issues_count":3,"forks_count":28,"subscribers_count":8,"default_branch":"master","last_synced_at":"2024-11-02T15:36:26.416Z","etag":null,"topics":["7scenes","kalman-filtering","localization","optical-flows","tensorflow","uncertainties"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zlthinker.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2020-01-05T03:06:33.000Z","updated_at":"2024-10-31T01:30:19.000Z","dependencies_parsed_at":"2023-03-23T07:40:39.188Z","dependency_job_id":null,"html_url":"https://github.com/zlthinker/KFNet","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zlthinker%2FKFNet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zlthinker%2FKFNet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zlthinker%2FKFNet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zlthinker%2FKFNet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zlthinker","download_url":"https://codeload.github.com/zlthinker/KFNet/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246655280,"owners_count":20812612,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["7scenes","kalman-filtering","localization","optical-flows","tensorflow","uncertainties"],"created_at":"2024-08-01T07:00:24.947Z","updated_at":"2025-04-01T14:33:07.245Z","avatar_url":"https://github.com/zlthinker.png","language":"Python","funding_links":[],"categories":["5. Learning based SLAM"],"sub_categories":["5.2 Others"],"readme":"# KFNet\nThis is a Tensorflow implementation of our CVPR 2020 Oral paper - [\"KFNet: Learning Temporal Camera Relocalization using Kalman Filtering\"](https://arxiv.org/abs/2003.10629) by Lei Zhou, Zixin Luo, Tianwei Shen, Jiahui Zhang, Mingmin Zhen, Yao Yao, Tian Fang, Long Quan.\n\nThis paper addresses the temporal camera relocalization of time-series image data by folding the scene coordinate regression problem into the principled Kalman filter framework.\n\nIf you find this project useful, please cite:\n```\n@inproceedings{zhou2020kfnet,\n  title={KFNet: Learning Temporal Camera Relocalization using Kalman Filtering},\n  author={Zhou, Lei and Luo, Zixin and Shen, Tianwei and Zhang, Jiahui and Zhen, Mingmin and Yao, Yao and Fang, Tian and Quan, Long},\n  booktitle={Computer Vision and Pattern Recognition (CVPR)},\n  year={2020}\n}\n```\n## Contents \n\n- [About](#about)\n- [File format](#file-format)\n- [Environment](#environment)\n- [Testing](#testing)\n- [Training](#training)\n- [Credit](#credit)\n\n\n## About\n\n### Network architecture\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=doc/architecture.jpg alt=\"drawing\" width=\"700\"/\u003e\n\u003c/p\u003e\n\n\n### Sample results on [7scenes](https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/) and [12scenes](http://graphics.stanford.edu/projects/reloc/)\n\nKFNet simultaneously predicts the mapping points and camera poses in a temporal fashion within the coordinate system defined by a known scene.\n\n|| DSAC++ | KFNet |\n|:--:|:--:|:--:|\n|7scenes-fire       | ![Alt Text](doc/fire_DSAC++_pip.gif)       | ![Alt Text](doc/fire_KFNet_pip.gif)      |\n|12scenes-office2-5a| ![Alt Text](doc/office2_5a_DSAC++_pip.gif) | ![Alt Text](doc/office2_5a_KFNet_pip.gif)|\n|Description | Blue - ground truth poses   | Red - estimated poses |\n\n### Intermediate uncertainty predictions\n\nBelow we visualize the measurement and process noise.\n\n|Data | Measurement noise | Process noise |\n|:--:|:--:|:--:|\n|7scenes-fire       | ![Alt Text](doc/fire_mea_uncertainty.gif)       | ![Alt Text](doc/fire-process_uncertainty.gif)      |\n|12scenes-office2-5a| ![Alt Text](doc/office2_5a_uncertainty.gif) | ![Alt Text](doc/office2_5a_process_uncertainty.gif)|\n|Description | The brighter color means smaller noise.   | The figure bar measures the inverse of the covariances (in centimeters) |\n\n### Intermediate optical flow results on [7scenes](https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/), [12scenes](http://graphics.stanford.edu/projects/reloc/), [Cambridge](http://mi.eng.cam.ac.uk/projects/relocalisation/) and [DeepLoc](http://deeploc.cs.uni-freiburg.de/)\n\nAs an essential component of KFNet, the process system of KFNet (i.e., OFlowNet) delineates pixel transitions across frames through optical flow reasoning **yet without recourse to grourd truth optical flow labelling**. We visualize the predicted optical flow fields below while suppressing the predictions with too large uncertainties.\n\n|Data | Description | Optical flow |\n|:--:|:--:|:--:|\n|7scenes-fire | Indoor; hand-held; small shaky motions | \u003cimg src=\"doc/fire_flow.gif\" width=\"375\"\u003e | \n|12scenes-office2-5a | Indoor; hand-held; larger movements | \u003cimg src=\"doc/office2_5a_flow.gif\" width=\"375\"\u003e |\n|Cambridge-KingsCollege | Outdoor; hand-held; large random motions | \u003cimg src=\"doc/KingsCollege_flow.gif\" width=\"375\"\u003e |\n|DeepLoc | Outdoor; vehicle-mounted; forward motions | \u003cimg src=\"doc/DeepLoc_flow.gif\" width=\"375\"\u003e |\n\n**Remark** For DeepLoc, since OFlowNet is trained only on one scene included in DeepLoc, the flow predictions appear somewhat messy due to the lack of training data. Training with a larger amount and variety of data would improve the results. \n\n\n## Usage\n\n### File format\n\n* **Input:** The input folder of a project should contain the files below.\n\t* `image_list.txt` comprising the sequential full image paths in lines. Please go to the [7scenes](https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/) dataset to download the source images.\n\t* `label_list.txt` comprising the full label paths in lines corresponding to the images. The label files are generated by the `tofile()` function of numpy matrices. They have a channel number of 4, with 3 for scene coordinates and 1 for binary masks of pixels. The mask for one pixel is 1 if its label scene coordinates are valid and 0 otherwise. Their resolutions are 8 times lower than the images. For example, for the [7scenes](https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/) dataset, the images have a resolution of 480x640, while the label maps have a resolution of 60x80.\n\t* `transform.txt` recording the 4x4 Euclidean transformation matrix which decorrelates the scene point cloud to give zero mean and correlations.\n\t* You can download the prepared input label map files of 7scenes from the Google drive links below.\n\n\t|[chess(13G)](https://drive.google.com/open?id=15LCNv8cZkg1tINggssB--MWDGxE3LoYq) |[fire(9G)](https://drive.google.com/open?id=1EaVPg_-6gp_7PWvsiHk05QHU425t5dql) |[heads(4G)](https://drive.google.com/open?id=1aYJPdekYuofNcqdsLNdphzCVVX93zT1w) |[office(22G)](https://drive.google.com/open?id=16hMHwI8dnWEmt0HoevfQxNsnyO7ND6Nb) |[pumpkin(13G)](https://drive.google.com/open?id=1elobB_maZ5tW1v_K3Anl9BGGlnkCKI8e) |[redkitchen(27G)](https://drive.google.com/open?id=1j5UG23me1Z8Sz9PBCeTNeZsw3mSeUTtS) |[stairs(7G)](https://drive.google.com/open?id=1Hv9bOsf68xNyaOJqpnOKHKcv9YYXroLj) |\n\t|:-:|:-:|:-:|:-:|:-:|:-:|:-:|\n\n* **Output:** The testing program (to be introduced below) outputs a 3-d scene coordinate map (in meters) and a 1-d confidence map into a 4-channel numpy matrix for each input image. And then you can run the provided PnP program (in ```PnP.zip```) or your own algorithms to compute the camera poses from them.\n\t* The confidences are the inverse of predicted Gaussian variances / uncertainties. Thus, the larger the confidences, the smaller the variances are. \n\t* You can visualize a scene coordinate map as a point cloud via [Open3d](http://www.open3d.org/docs/release/getting_started.html) by running ```python vis/vis_scene_coordinate_map.py \u003cpath_to_npy_file\u003e```.\n\t* Or you can visualize a streaming scene coordinate map list by running ```python vis/vis_scene_coordinate_map_list.py \u003cpath_to_npy_list\u003e```.\n\n\n### Environment\n\n* The codes are tested along with \n\t* python 2.7,\n\t* tensorflow-gpu 1.10~1.13 (inclusive),\n\t* corresponding versions of CUDA and CUDNN to enable tensorflow-gpu (see [link](https://stackoverflow.com/questions/50622525/which-tensorflow-and-cuda-version-combinations-are-compatible) for reference of the version combinations), \n\t* other python packages including numpy, matplotlib and open3d.\n\n* To directly install tensorflow and other python packages, run\n```\nsudo pip install -r requirements.txt\n``` \n\n* If you are familiar with Conda, you can create the environment for KFNet by running \n```\nconda create -f environment.yml\nconda activate KFNet\n```\n\n### Testing\n\n* Download\n\nYou can download the trained models of [7scenes](https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/) from the [Google drive link (3G)](https://drive.google.com/open?id=13KZGz_akJw8iTQW90pgbuw2JAQzV7cG8).\n\n* Test SCoordNet\n```\ngit checkout SCoordNet\npython SCoordnet/eval.py --input_folder \u003cinput_folder\u003e --output_folder \u003coutput_folder\u003e --model_folder \u003cmodel_folder\u003e --scene \u003cscene\u003e\n# \u003cscene\u003e = chess/fire/heads/office/pumpkin/redkitchen/stairs, i.e., one of the scene names of 7scenes dataset\n```\n\n* Test OFlowNet\n```\ngit checkout OFlowNet\npython OFlowNet/eval --input_folder \u003cinput_folder\u003e --output_folder \u003coutput_folder\u003e --model_folder \u003cmodel_folder\u003e\n```\nThe testing program of OFlowNet will save the 2-d optical flows and 1-d uncertainties of consecutive image pairs as npy files of the dimension 60x80x3. You can visualize the flow results by running scripts ```vis/vis_optical_flow.py``` and ```vis/vis_optical_flow_list.py```.\n\n* Test KFNet\n```\ngit checkout master\npython KFNet/eval.py --input_folder \u003cinput_folder\u003e --output_folder \u003coutput_folder\u003e --model_folder \u003cmodel_folder\u003e --scene \u003cscene\u003e\n```\n\n* Run PnP to compute camera poses\n\n```\nunzip PnP.zip \u0026\u0026 cd PnP\npython main.py \u003cpath_to_output_file_list\u003e \u003coutput_folder\u003e --gt \u003cpath_to_ground_truth_pose_list\u003e --thread_num \u003c32\u003e\n// Please note that you need to install git-lfs before cloning to get PnP.zip, since the zip file is stored via LFS.\n```\n\n### Training\n\nThe training procedure has 3 stages. \n\n1. **Train SCoordNet** for each scene independently.\n```\ngit checkout SCoordnet\npython SCoordNet/train.py --input_folder \u003cinput_folder\u003e --model_folder \u003cscoordnet_model_folder\u003e --scene \u003cscene\u003e\n```\n\n2. **Train OFlowNet** using all the image sequences that are not limited to any specific scenes, for example, concatenating all the ```image_list.txt``` and ```label_list.txt``` of 7scenes for training.\n```\ngit checkout OFlowNet\npython OFlowNet/train.py --input_folder \u003cinput_folder\u003e --model_folder \u003coflownet_model_folder\u003e\n```\n\n3. **Train KFNet** for each scene from the pre-trained SCoordNet and OFlowNet models to jointly finetune their parameters.\n```\ngit checkout master\npython KFNet/train.py --input_folder \u003cinput_folder\u003e --model_folder \u003cmodel_folder\u003e --scoordnet \u003cscoordnet_model_folder\u003e --oflownet \u003coflownet_model_folder\u003e --scene \u003cscene\u003e\n```\n\n\n\n## Credit\n\nThis implementation was developed by [Lei Zhou](https://zlthinker.github.io/). Feel free to contact Lei for any enquiry.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzlthinker%2FKFNet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzlthinker%2FKFNet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzlthinker%2FKFNet/lists"}