https://github.com/markyu98/madpose

[CVPR 2025] Official implementation of the solvers and estimators proposed in the paper "Relative Pose Estimation through Affine Corrections of Monocular Depth Priors"
https://github.com/markyu98/madpose

monocular-depth-estimation multi-view-geometry relative-pose-estimation

Last synced: over 1 year ago
JSON representation

[CVPR 2025] Official implementation of the solvers and estimators proposed in the paper "Relative Pose Estimation through Affine Corrections of Monocular Depth Priors"

Host: GitHub
URL: https://github.com/markyu98/madpose
Owner: MarkYu98
License: bsd-3-clause
Created: 2024-11-02T15:00:39.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-02-27T08:59:34.000Z (over 1 year ago)
Last Synced: 2025-03-30T08:07:11.313Z (over 1 year ago)
Topics: monocular-depth-estimation, multi-view-geometry, relative-pose-estimation
Language: C++
Homepage: https://arxiv.org/abs/2501.05446
Size: 14.9 MB
Stars: 116
Watchers: 6
Forks: 6
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# MADPose

This repo contains the official implementation of the solvers and estimators proposed in the paper "Relative Pose Estimation through Affine Corrections of Monocular Depth Priors". The solvers and estimators are implemented using C++, and we provide easy-to-use Python bindings.

Note: "**MAD**" is an acronym for "**M**onocular **A**ffine **D**epth".

![](resources/pipeline.png)
**Overview**: Our method takes a pair of images as input, runs off-the-shelf feature matching and monocular depth
estimation, then jointly estimates the relative pose, scale and shift parameters of the two depth maps, and optionally the focal lengths.

## 📢 News
- 2025-02-27: Our paper is accepted to CVPR 2025! See you in Nashville, TN!

## Evaluation
We develop three solvers for relative pose estimation that explicitly account for independent affine (scale and shift) ambiguities, tailored for three setups of cameras: calibrated, shared-focal, and unknown focal lengths (two-focal). The solvers are further combined with classic point-based solvers and epipolar constraints in our hybrid RANSAC estimators. Our estimators show consistent improvements over classic point-based methods (e.g. [PoseLib](https://github.com/PoseLib/PoseLib)) and recent two-view reconstruction methods [DUSt3R](https://github.com/naver/dust3r) and [MASt3R](https://github.com/naver/mast3r), across different datasets with different feature matchers and monocular depth estimation models.

Some highlight results (probably best viewed in light theme on GitHub!):

#### Pose Error AUCs on ScanNet-1500 with shared-focal setting

Matches
Method
MD Model
AUC@5°
AUC@10°
AUC@20°

SP+SG
PoseLib-6pt
-
12.84
28.13
45.64

Ours-sf
DA-met.
18.35
37.54
57.58

RoMa
PoseLib-6pt
-
27.17
49.24
67.42

Ours-sf
DA-met.
29.81
53.11
71.15

MASt3R
PoseLib-6pt
-
30.28
54.16
72.87

Ours-sf
DA-met.
31.87
56.20
74.51

MASt3R
32.58
56.99
74.91

Reference entry - DUSt3R
25.90
48.45
68.03

Reference entry - MASt3R
23.94
46.44
66.18

#### Pose Error AUCs on 2D-3D-S sampled image pairs with unknown focal lengths

Matches
Method
MD Model
AUC@2°
AUC@5°
AUC@10°
AUC@20°

SP+LG
PoseLib-7pt
-
5.85
13.95
21.94
30.71

Ours-tf
DAv2-met.
9.15
22.22
32.80
43.26

RoMa
PoseLib-7pt
-
8.73
20.31
30.45
41.48

Ours-tf
DAv2-met.
13.50
29.19
42.18
54.42

MASt3R
PoseLib-7pt
-
12.58
30.27
45.57
59.85

Ours-tf
DAv2-met.
18.05
39.92
56.64
70.86

MASt3R
22.44
48.02
64.79
76.55

Reference entry - DUSt3R
6.43
24.47
42.39
58.36

Reference entry - MASt3R
13.39
38.41
57.92
71.91

Please refer to the paper for more results and discussions.

## Installation
### Install from PyPI
We are working on setting up wheel for easy installation using PyPI. Currently please use the following method to install from source.

### Install from source
#### Install dependencies
```bash
sudo apt-get install libeigen3-dev libceres-dev libopencv-dev
```
_Note: The two-focal estimator currently relies on `cv::recoverPose` from OpenCV, we plan to remove dependency on OpenCV in future updates._

#### Clone the repo
```bash
git clone --recursive https://github.com/MarkYu98/madpose
```

#### Build and install the Python bindings
```bash
pip install .
```
If you would like to see the building process (e.g. CMake logs) you can add `-v` option to the above command.

Note: MADPose also depends on [PoseLib](https://github.com/PoseLib/PoseLib). By default, CMake will automatically build PoseLib using `FetchContent`. Set `FETCH_POSELIB` CMake option to `OFF` if you prefer to use a self-installed version.

#### Check the installation
```bash
python -c "import madpose"
```
You should not see any errors if MADPose is successfully installed.

## Usage
### Estimators
We provide Python bindings of our 3 hybrid estimators for image pairs with calibrated cameras, shared-focal cameras, and cameras with unknown focal lengths (two-focal).

The estimators take `HybridLORansacOptions` and `EstimatorConfig` for related settings, some useful settings are:
```python
import madpose

options = madpose.HybridLORansacOptions()
options.min_num_iterations = 100
options.max_num_iterations = 1000
options.success_probability = 0.9999
options.random_seed = 0 # for reproducibility
options.final_least_squares = True
options.threshold_multiplier = 5.0
options.num_lo_steps = 4
# squared px thresholds for reprojection error and epipolar error
options.squared_inlier_thresholds = [reproj_pix_thres ** 2, epipolar_pix_thres ** 2]
# weight when scoring for the two types of errors
options.data_type_weights = [1.0, epipolar_weight]

est_config = madpose.EstimatorConfig()
# if enabled, the input min_depth values are guaranteed to be positive with the estimated depth offsets (shifts), default: True
est_config.min_depth_constraint = True
# if disabled, will model the depth with only scale (only applicable to the calibrated camera case)
est_config.use_shift = True
# best set to the number of PHYSICAL CPU cores
est_config.ceres_num_threads = 8
```

**Note on performance**: if speed is a primary concern, best set `est_config.ceres_num_threads` to the number of _physical_ CPU cores on your machine, also tune `options.min_num_iterations`, `options.max_num_iterations`, and `options.num_lo_steps` for trade-off between speed and accuracy. In addition, `export OPENBLAS_NUM_THREADS=1` to use single-thread dense solver in Ceres to avoid overheads due to the small-sized nature of the Jacobians.

We provide a example image pairs and code snippets in [examples/](examples/) to test the hybrid estimators. More demos and evaluations will be added in the future.

We use `numpy` and `cv2` to handle pre-computed data in Python in the examples, you can easily install them using pip:
```bash
pip install numpy opencv-python
```

#### Calibrated estimator
```python
pose, stats = madpose.HybridEstimatePoseScaleOffset(
mkpts0, mkpts1,
depth0, depth1,
[depth_map0.min(), depth_map1.min()],
K0, K1, options, est_config
)
# rotation and translation of the estimated pose
R_est, t_est = pose.R(), pose.t()
# scale and offsets of the affine corrected depth maps
s_est, o0_est, o1_est = pose.scale, pose.offset0, pose.offset1
```
The parameters are: keypoint matches(`mkpts0`, `mkpts1`), their corresponding depth prior values(`depth0`, `depth1`), min depth values for both views (used when `est_config.min_depth_constraint` is `True`), camera intrinsics(`K0`, `K1`),`options`, and `est_config`.

See [examples/calibrated.py](examples/calibrated.py) for a complete code example, evaluation, and comparison with point-based estimation using PoseLib.

#### Shared-focal estimator
```python
pose, stats = madpose.HybridEstimatePoseScaleOffsetSharedFocal(
mkpts0, mkpts1,
depth0, depth1,
[depth_map0.min(), depth_map1.min()],
pp0, pp1, options, est_config
)
# rotation and translation of the estimated pose
R_est, t_est = pose.R(), pose.t()
# scale and offsets of the affine corrected depth maps
s_est, o0_est, o1_est = pose.scale, pose.offset0, pose.offset1
# estimated shared focal length
f_est = pose.focal
```
Different from the calibrated estimator, now instead of intrinsics(`K0`, `K1`), the shared-focal estimator now takes as input the principal points(`pp0`, `pp1`).

See [examples/shared_focal.py](examples/shared_focal.py) for complete example.

#### Two-focal estimator
```python
pose, stats = madpose.HybridEstimatePoseScaleOffsetTwoFocal(
mkpts0, mkpts1,
depth0, depth1,
[depth_map0.min(), depth_map1.min()],
pp0, pp1, options, est_config
)
# rotation and translation of the estimated pose
R_est, t_est = pose.R(), pose.t()
# scale and offsets of the affine corrected depth maps
s_est, o0_est, o1_est = pose.scale, pose.offset0, pose.offset1
# the estimated two focal lengths
f0_est, f1_est = pose.focal0, pose.focal1
```
The parameters are same with the shared-focal estimator, but now the estimator will estimate two independent focal lengths.

See [examples/two_focal.py](examples/two_focal.py) for complete example.

#### Point-based baseline
You can compare with point-based estimators from [PoseLib](https://github.com/PoseLib/PoseLib). You need to install the [Poselib's Python bindings](https://github.com/PoseLib/PoseLib?tab=readme-ov-file#python-bindings).

The corresponding point-based estimation are included in each of the three example scripts above.

### Solvers
Our hybrid estimators combine our newly proposed depth-aware solvers with the point-based solvers. The Python implementation of the solvers can be found under [solver_py/](solver_py/), which also include calls to the C++ solvers through Python bindings for comparison.

As described in the paper, the proposed solvers all work in a 2-step manner, first solve for the depth scale and shifts (affine parameters) of the depth maps and then recover the pose by finding the rigid transformation between the back-projected points. We provide Python bindings for the first step (solving scale and shifts) as well as combined wrapper for both the pose and the affine parameters.

The bindings for solving the affine parameters:
```python
# Calibrated case
affine_params = madpose.solve_scale_and_shift(x1.T, x2.T, d1, d2)
# Shared-focal
affine_params = madpose.solve_scale_and_shift_shared_focal(x1.T, x2.T, d1, d2)
# Two-focal
affine_params = madpose.solve_scale_and_shift_two_focal(x1.T, x2.T, d1, d2)
```
The input `x1` and `x2` are `np.array` (or list) of homogeneous normalized image coordinates (`[x, y, 1]`) of the minimal set of $M$ matched keypoints; `d1` and `d2` are `np.array` (or list) of the corresponding depth values. The returned `affine_params` is a list of candidate solutions `(a1, a2, b1, b2)` with `a1` always equal to 1 and therefore `a2, b1, b2` correspond to the scale and shift values $\alpha, \beta_1, \beta_2$.

Or we can use the combined wrapper that solves affine parameters and also recover the candidate poses:
```python
# Calibrated case
poses = madpose.solve_scale_shift_pose(x1.T, x2.T, d1, d2)
# Shared-focal
poses = madpose.solve_scale_shift_pose_shared_focal(x1.T, x2.T, d1, d2)
# Two-focal
poses = madpose.solve_scale_shift_pose_two_focal(x1.T, x2.T, d1, d2)
```
Note that for simplicity the return solutions are all named `poses`, the list contains objects of different classes for the three cases (`madpose.PoseScaleOffset`, `madpose.PoseScaleOffsetSharedFocal`, and `madpose.PoseScaleOffsetTwoFocal`). Retrieve the pose, focal lengths, and affine parameters by:
```python
R_est, t_est = pose.R(), pose.t()
s_est, o0_est, o1_est = pose.scale, pose.offset0, pose.offset1

# for PoseScaleOffsetSharedFocal
f_est = pose.focal

# for PoseScaleOffsetTwoFocal
f0_est, f1_est = pose.focal0, pose.focal1
```

## TODO List

- [ ] Remove dependency on OpenCV.
- [ ] Setup wheel for PyPI
- [ ] Add experiment scirpts on datasets

## Acknowledgement
This codebase is inspired by and built upon many research work and opensource projects, we thank the authors and contributors for their work.
- [PoseLib](https://github.com/PoseLib/PoseLib)
- [COLMAP](https://github.com/colmap/colmap)
- [LIMAP](https://github.com/cvg/limap)
- [SuperGlue](https://github.com/magicleap/SuperGluePretrainedNetwork)
- [RansacLib](https://github.com/tsattler/RansacLib)
- [pybind11](https://github.com/pybind/pybind11)

## Citation
If you find our work useful in your research, please consider citing our paper:
```
@inproceedings{yu2025madpose,
title={Relative Pose Estimation through Affine Corrections of Monocular Depth Priors},
author={Yifan Yu and Shaohui Liu and Rémi Pautrat and Marc Pollefeys and Viktor Larsson},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2025}
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/markyu98/madpose

Awesome Lists containing this project

README