https://github.com/ffrivera0/reloc3r
[CVPR 2025] Relative camera pose estimation and visual localization with Reloc3r
https://github.com/ffrivera0/reloc3r
Last synced: about 2 months ago
JSON representation
[CVPR 2025] Relative camera pose estimation and visual localization with Reloc3r
- Host: GitHub
- URL: https://github.com/ffrivera0/reloc3r
- Owner: ffrivera0
- License: other
- Created: 2024-12-11T12:07:07.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-06-17T09:19:17.000Z (4 months ago)
- Last Synced: 2025-06-17T10:26:58.820Z (4 months ago)
- Language: Python
- Homepage:
- Size: 3.49 MB
- Stars: 199
- Watchers: 11
- Forks: 16
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-and-novel-works-in-slam - [Code
README
[CVPR 2025] Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization
Siyan Dong*
·
Shuzhe Wang*
·
Shaohui Liu
·
Lulu Cai
·
Qingnan Fan
·
Juho Kannala
·
Yanchao Yang
Paper | Video | Poster | Demo (Online)
Reloc3r is a simple yet effective camera pose estimation framework that combines a pre-trained two-view relative camera pose regression network with a multi-view motion averaging module.
Trained on approximately 8 million posed image pairs, Reloc3r achieves surprisingly good performance and generalization ability, producing high-quality camera pose estimates in real-time.## Table of Contents
- [TODO List](#todo-list)
- [Installation](#installation)
- [Usage](#usage)
- [Evaluation on Relative Camera Pose Estimation](#evaluation-on-relative-camera-pose-estimation)
- [Evaluation on Visual Localization](#evaluation-on-visual-localization)
- [Training](#training)
- [Citation](#citation)
- [Acknowledgments](#acknowledgments)## TODO List
- [x] Release pre-trained weights and inference code.
- [x] Release evaluation code for ScanNet1500 and MegaDepth1500 datasets.
- [x] Release evaluation code for 7Scenes and Cambridge datasets.
- [x] Release sample code for self-captured images and videos.
- [x] Release training code and data.
- [x] Gradio demo for relative camera pose regression, and its visualization.
- [ ] Evaluation code for other datasets.
- [ ] Accelerated version for visual localization.## Installation
1. Clone Reloc3r
```bash
git clone --recursive https://github.com/ffrivera0/reloc3r.git
cd reloc3r
# if you have already cloned reloc3r:
# git submodule update --init --recursive
```2. Create the environment using conda
```bash
conda create -n reloc3r python=3.11 cmake=3.14.0
conda activate reloc3r
conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia # use the correct version of cuda for your system
pip install -r requirements.txt
# optional: you can also install additional packages to:
# - add support for HEIC images
pip install -r requirements_optional.txt
```3. Optional: Compile the cuda kernels for RoPE
```bash
# Reloc3r relies on RoPE positional embeddings for which you can compile some cuda kernels for faster runtime.
cd croco/models/curope/
python setup.py build_ext --inplace
cd ../../../
```4. Optional: Download the checkpoints [Reloc3r-224](https://huggingface.co/siyan824/reloc3r-224)/[Reloc3r-512](https://huggingface.co/siyan824/reloc3r-512). The pre-trained model weights will automatically download when running the evaluation and demo code below.
## Usage
Using Reloc3r, you can estimate camera poses for images and videos you captured.
For relative pose estimation, try the demo code in `wild_relpose.py`. We provide some [image pairs](https://drive.google.com/drive/folders/1TmoSKrtxR50SlFoXOwC4a9aGr18h00yy?usp=sharing) used in our paper.
```bash
# replace the args with your paths
python wild_relpose.py --v1_path ./data/wild_images/zurich0.jpg --v2_path ./data/wild_images/zurich1.jpg --output_folder ./data/wild_images/
```Visualize the relative pose
```bash
# replace the args with your paths
python visualization.py --mode relpose --pose_path ./data/wild_images/pose2to1.txt
```For visual localization, the demo code in `wild_visloc.py` estimates absolute camera poses from sampled frames in self-captured videos.
> [!IMPORTANT]
> The demo simply uses the first and last frames as the database, which requires overlapping regions among all images. This demo does not support linear motion. We provide some [videos](https://drive.google.com/drive/folders/1sbXiXScts5OjESAfSZQwLrAQ5Dta1ibS?usp=sharing) as examples.```bash
# replace the args with your paths
python wild_visloc.py --video_path ./data/wild_video/ids.MOV --output_folder ./data/wild_video
```Visualize the absolute poses
```bash
# replace the args with your paths
python visualization.py --mode visloc --pose_folder ./data/wild_video/ids_poses/
```## Evaluation on Relative Camera Pose Estimation
To reproduce our evaluation on ScanNet1500, download the dataset [here](https://drive.google.com/drive/folders/16g--OfRHb26bT6DvOlj3xhwsb1kV58fT?usp=sharing) and unzip it to `./data/scannet1500`.
Then run the following script.
```bash
bash scripts/eval_scannet1500.sh
```To reproduce our evaluation on MegaDepth1500, download the dataset [here](https://drive.google.com/drive/folders/16g--OfRHb26bT6DvOlj3xhwsb1kV58fT?usp=sharing) and unzip it to `./data/megadepth1500`.
Then run the following script.
```bash
bash scripts/eval_megadepth1500.sh
```> [!NOTE]
> To achieve faster inference speed, set `--amp=1`. This enables evaluation with `fp16`, which increases speed from 24 FPS to 40 FPS on an RTX 4090 with Reloc3r-512, without any accuracy loss.## Evaluation on Visual Localization
To reproduce our evaluation on 7Scenes, download the dataset [here](https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/) and unzip it to `./data/7scenes`.
Then run the following script.
```bash
bash scripts/eval_7scenes.sh
```To reproduce our evaluation on Cambridge, download the dataset [here](https://drive.google.com/file/d/1XcJIVRMma4_IClJdRq6rwBKX3ZPet5az/view?usp=sharing) and unzip it to `./data/cambridge`.
Then run the following script.
```bash
bash scripts/eval_cambridge.sh
```## Training
We follow [DUSt3R](https://github.com/naver/dust3r) to process the training data. Download the datasets: [CO3Dv2](https://github.com/facebookresearch/co3d), [ScanNet++](https://kaldir.vc.in.tum.de/scannetpp/), [ARKitScenes](https://github.com/apple/ARKitScenes), [BlendedMVS](https://github.com/YoYo000/BlendedMVS), [MegaDepth](https://www.cs.cornell.edu/projects/megadepth/), [DL3DV](https://dl3dv-10k.github.io/DL3DV-10K/), [RealEstate10K](https://google.github.io/realestate10k/).
For each dataset, we provide a preprocessing script in the `datasets_preprocess` directory and an archive containing the list of [pairs](https://drive.google.com/drive/folders/193Lv5YB-2OVkqK3k6vnZJi36-FRPcAuu?usp=sharing) when needed. You have to download the datasets yourself from their official sources, agree to their license, and run the preprocessing script.
We provide a sample script to train Reloc3r with ScanNet++ on an RTX 3090 GPU
```bash
bash scripts/train_small.sh
```To reproduce our training for Reloc3r-512 with 8 H800 GPUs, run the following script
```bash
bash scripts/train.sh
```> [!NOTE]
> They are not strictly equivalent to what was used to train Reloc3r, but they should be close enough.## Citation
If you find our work helpful in your research, please consider citing:
```
@article{reloc3r,
title={Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization},
author={Dong, Siyan and Wang, Shuzhe and Liu, Shaohui and Cai, Lulu and Fan, Qingnan and Kannala, Juho and Yang, Yanchao},
journal={arXiv preprint arXiv:2412.08376},
year={2024}
}
```## Acknowledgments
Thanks to these great repositories: [Croco](https://github.com/naver/croco), [DUSt3R](https://github.com/naver/dust3r), [Marepo](https://github.com/nianticlabs/marepo), and many other inspiring works in the community.