RePoseD: Efficient Relative Pose Estimation With Known Depth Information

Yaqing, Ding; Viktor, Kocur; Václav, Vávra; Zuzana, Berger Haladová; Jian, Yang; Torsten, Sattler; Zuzana, Kukelova

RePoseD: Efficient Relative Pose Estimation With Known Depth Information

ICCV 2025 Oral

Yaqing Ding¹ Viktor Kocur² Václav Vávra¹ Zuzana Berger Haladová² Jian Yang³ Torsten Sattler⁴ Zuzana Kukelova¹

¹Visual Recognition Group, CTU Prague, ²Comenius University Bratislava,
³PCA Lab, VCIP, CS, Nankai University, ⁴CIIRC, CTU Prague

Paper Supplementary Code Demo arXiv

Abstract

Recent advances in monocular depth estimation methods (MDE) and their improved accuracy open new possibilities for their applications. In this paper, we investigate how monocular depth estimates can be used for relative pose estimation. In particular, we are interested in answering the question whether using MDEs improves results over traditional point-based methods. We propose a novel framework for estimating the relative pose of two cameras from point correspondences with associated monocular depths. Since depth predictions are typically defined up to an unknown scale or even both unknown scale and shift parameters, our solvers jointly estimate the scale or both the scale and shift parameters along with the relative pose. We derive efficient solvers considering different types of depths for three camera configurations: (1) two calibrated cameras, (2) two cameras with an unknown shared focal length, and (3) two cameras with unknown different focal lengths. Our new solvers outperform state-of-the-art depth-aware solvers in terms of speed and accuracy. In extensive real experiments on multiple datasets and with various MDEs, we discuss which depth-aware solvers are preferable in which situation.

Results

Extended Results

Since the camera-ready version of the paper we have implemented an improved optimization strategy. Similarly to MADPose, we optimize the Sampson error jointly with both the forward and backward reprojection errors. The implementation of this strategy within PoseLib leads to significantly better resutls compared to those reported in our paper. Our method with the improved strategy surpasses the results of MADPose while being 10-20x faster. Below we provide a sample of these results on the Phototourism dataset for the calibrated case.

Full Extended Results

Preview - PhotoTourism (calibrated)

Depth	Method	Scale	Shift	SP+LG			RoMA
Depth	Method	Scale	Shift	$\epsilon(^\circ)\downarrow$	mAA $\uparrow$	Runtime (ms)	$\epsilon(^\circ)\downarrow$	mAA $\uparrow$	Runtime (ms)
-	5-Point			1.42	76.56	63.79	0.78	86.18	264.61
MoGe	3P-RelDepth			8.12	53.40	55.85	1.69	67.22	221.06
	P3P			1.40	77.37	32.95	0.78	86.42	148.76
	MADPose	✔	✔	1.27	80.28	788.18	0.87	86.85	1753.49
	RePoseD	✔	✔	1.24	81.34	28.93	0.74	88.58	125.66
	RePoseD	✔		1.75	80.29	30.11	1.03	88.02	135.95
UniDepth	3P-RelDepth			4.07	51.60	52.49	1.33	67.56	214.73
	P3P			1.40	77.47	34.30	0.78	86.43	150.95
	MADPose	✔	✔	1.15	82.09	720.34	0.78	87.60	1695.57
	RePoseD	✔	✔	1.04	83.71	30.88	0.69	89.27	131.52
	RePoseD	✔		1.16	84.56	31.19	0.81	90.18	137.26

Depth	Method	Scale	Shift	MASt3R
				$\epsilon(^\circ)\downarrow$	mAA $\uparrow$	Runtime (ms)
-	5-Point			1.14	81.66	137.75

BibTeX

@inproceedings{ding2025reposed,
  title={RePoseD: Efficient Relative Pose Estimation With Known Depth Information},
  author={Ding, Yaqing and Kocur, Viktor and V{\'a}vra, V{\'a}clav and Haladov{\'a}, Zuzana Berger and Yang, Jian and Sattler, Torsten and Kukelova, Zuzana},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2025}
}