RePoseD: Efficient Relative Pose Estimation With Known Depth Information
ICCV 2025 Oral
Abstract
Recent advances in monocular depth estimation methods (MDE) and their improved accuracy open new possibilities for their applications. In this paper, we investigate how monocular depth estimates can be used for relative pose estimation. In particular, we are interested in answering the question whether using MDEs improves results over traditional point-based methods. We propose a novel framework for estimating the relative pose of two cameras from point correspondences with associated monocular depths. Since depth predictions are typically defined up to an unknown scale or even both unknown scale and shift parameters, our solvers jointly estimate the scale or both the scale and shift parameters along with the relative pose. We derive efficient solvers considering different types of depths for three camera configurations: (1) two calibrated cameras, (2) two cameras with an unknown shared focal length, and (3) two cameras with unknown different focal lengths. Our new solvers outperform state-of-the-art depth-aware solvers in terms of speed and accuracy. In extensive real experiments on multiple datasets and with various MDEs, we discuss which depth-aware solvers are preferable in which situation.
Results



Extended Results
Since the camera-ready version of the paper we have implemented an improved optimization strategy. Similarly to MADPose, we optimize the Sampson error jointly with both the forward and backward reprojection errors. The implementation of this strategy within PoseLib leads to significantly better resutls compared to those reported in our paper. Our method with the improved strategy surpasses the results of MADPose while being 10-20x faster. Below we provide a sample of these results on the Phototourism dataset for the calibrated case.
Preview - PhotoTourism (calibrated)
Depth | Method | Scale | Shift | SP+LG | RoMA | ||||
$\epsilon(^\circ)\downarrow$ | mAA $\uparrow$ | Runtime (ms) | $\epsilon(^\circ)\downarrow$ | mAA $\uparrow$ | Runtime (ms) | - | 5-Point | 1.42 | 76.56 | 63.79 | 0.78 | 86.18 | 264.61 | MoGe | 3P-RelDepth | 8.12 | 53.40 | 55.85 | 1.69 | 67.22 | 221.06 |
P3P | 1.40 | 77.37 | 32.95 | 0.78 | 86.42 | 148.76 | |||
MADPose | ✔ | ✔ | 1.27 | 80.28 | 788.18 | 0.87 | 86.85 | 1753.49 | |
RePoseD | ✔ | ✔ | 1.24 | 81.34 | 28.93 | 0.74 | 88.58 | 125.66 | |
RePoseD | ✔ | 1.75 | 80.29 | 30.11 | 1.03 | 88.02 | 135.95 | UniDepth | 3P-RelDepth | 4.07 | 51.60 | 52.49 | 1.33 | 67.56 | 214.73 |
P3P | 1.40 | 77.47 | 34.30 | 0.78 | 86.43 | 150.95 | |||
MADPose | ✔ | ✔ | 1.15 | 82.09 | 720.34 | 0.78 | 87.60 | 1695.57 | |
RePoseD | ✔ | ✔ | 1.04 | 83.71 | 30.88 | 0.69 | 89.27 | 131.52 | |
RePoseD | ✔ | 1.16 | 84.56 | 31.19 | 0.81 | 90.18 | 137.26 |
Depth | Method | Scale | Shift | MASt3R | ||
$\epsilon(^\circ)\downarrow$ | mAA $\uparrow$ | Runtime (ms) | - | 5-Point | 1.14 | 81.66 | 137.75 |
BibTeX
@inproceedings{ding2025reposed,
title={RePoseD: Efficient Relative Pose Estimation With Known Depth Information},
author={Ding, Yaqing and Kocur, Viktor and V{\'a}vra, V{\'a}clav and Haladov{\'a}, Zuzana Berger and Yang, Jian and Sattler, Torsten and Kukelova, Zuzana},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
year={2025}
}