Deep Learning Methods for Point Matching, Visual Localization and 3D Reconstruction
Loading...
URL
Journal Title
Journal ISSN
Volume Title
School of Science |
Doctoral thesis (article-based)
| Defence date: 2024-12-13
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
Major/Subject
Mcode
Degree programme
Language
en
Pages
72 + 106
Series
Aalto University publication series DOCTORAL THESES, 255/2024
Abstract
This doctoral thesis explores advanced deep learning methods for three pivotal tasks in 3D computer vision: point matching, visual localization, and 3D reconstruction. These tasks are crucial for enabling machines to perceive and understand 3D environments, which is essential for applications ranging from virtual reality to robotics and autonomous driving. In point matching, this thesis investigates learning-based, visual descriptor-free matching pipelines that leverage geometric and color cues in conjunction with Graph Neural Networks. This approach significantly enhances the accuracy of 2D-3D keypoint matching while reducing memory usage and improving data privacy. For visual localization, the thesis introduces two hierarchical scene coordinate network architectures to establish dense 2D-3D matches for accurate 6-DoF camera pose estimation. These architectures incorporate conditioning mechanism and transformer to encode global context to local patches, overcoming scene ambiguities. Additionally, a novel few-shot learning setting is proposed, which reduces the training load for scene coordinate regression and improves training efficiency from days to minutes. A significant contribution of the thesis is the development of an end-to-end dense unconstrained 3D reconstruction pipeline based on Vision Transformers. This pipeline directly regresses point coordinates from image pairs without relying on camera parameters, simplifying traditional 3D reconstruction methods and introducing a unified framework for monocular and binocular reconstruction tasks. The thesis also explores methods to improve local feature matching by calculating the curvature of local 3D surface patches for detected points, enhancing matching accuracy with off-the-shelf learned matchers. Furthermore, it addresses the challenge of continual learning for visual localization by proposing an experience-replay-based baseline to prevent catastrophic forgetting and reduce computational and storage costs. Throughout the thesis, the importance of end-to-end learning is emphasized, where models are trained to directly produce desired outputs from raw input data. This paradigm shift can simplify the development pipelines and enhance the adaptability and scalability of 3D computer vision systems. By integrating deep learning with traditional geometric principles, this research provides a comprehensive framework for addressing key challenges in point matching, visual localization, and 3D reconstruction.Description
Supervising professor
Kannala, Juho, Prof., Aalto University, Department of Computer Science, FinlandOther note
Parts
-
[Publication 1]: Shuzhe Wang, Juho Kannala, and Daniel Barath. DGC-GNN: Leveraging Geometry and Color Cues for Visual Descriptor-Free 2D-3D Matching. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
DOI: 10.1109/CVPR52733.2024.01973 View at publisher
-
[Publication 2]: Shuzhe Wang, Juho Kannala, Marc Pollefeys, Daniel Barath. Guiding Local Feature Matching with Surface Curvature. IEEE/CVF International Conference on Computer Vision (ICCV), pp. 17981–17991, 2023.
Full text in Acris/Aaltodoc: https://urn.fi/URN:NBN:fi:aalto-202401312182DOI: 10.1109/ICCV51070.2023.01648 View at publisher
-
[Publication 3]: Xiaotian Li, Shuzhe Wang, Yi Zhao, Jakob Verbeek, and Juho Kannala. Hierarchical Scene Coordinate Classification and Regression for Visual Localization. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11983–11992, 2020.
DOI: 10.1109/CVPR42600.2020.01200 View at publisher
-
[Publication 4]: Shuzhe Wang, Zakaria Laskar, Iaroslav Melekhov, Xiaotian Li, Yi Zhao, Giorgos Tolias, and Juho Kannala. Hscnet++: Hierarchical Scene Coordinate Classification and Regression for Visual Localization with Transformer. International Journal of Computer Vision (IJCV), pp. 1–22, 2024.
Full text in Acris/Aaltodoc: https://urn.fi/URN:NBN:fi:aalto-202407045058DOI: 10.1007/s11263-023-01982-9 View at publisher
-
[Publication 5]: Siyan Dong*, Shuzhe Wang*, Yixin Zhuang, Juho Kannala, Marc Pollefeys, and Baoquan Chen. Visual Localization via Few-shot Scene Region Classification. International Conference on 3D Vision (3DV), pp. 393–402, 2022.
DOI: 10.1109/3DV57658.2022.00051 View at publisher
-
[Publication 6]: Shuzhe Wang*, Zakaria Laskar*, Iaroslav Melekhov, Xiaotian Li, and Juho Kannala. Continual Learning for Image-based Camera Localization. IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3252–3262, 2021.
Full text in Acris/Aaltodoc: https://urn.fi/URN:NBN:fi:aalto-202203032068DOI: 10.1109/ICCV48922.2021.00324 View at publisher
-
[Publication 7]: Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. DUSt3R: Geometric 3D Vision Made Easy. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
DOI: 10.1109/CVPR52733.2024.01956 View at publisher