Deep Learning Methods for Image Matching and Camera Relocalization

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
School of Science | Doctoral thesis (article-based) | Defence date: 2020-02-21
Degree programme
59 + app. 85
Aalto University publication series DOCTORAL DISSERTATIONS, 23/2020
Deep learning and convolutional neural networks have revolutionized computer vision and become a dominant tool in many applications, such as image classification, semantic segmentation, object recognition, and image retrieval. Their strength lies in the ability to learn an efficient representation of images that makes a subsequent learning task easier. This thesis presents deep learning approaches for a number of fundamental computer vision problems that are closely related to each other; image matching, image-based localization, ego-motion estimation, and scene understanding.  In image matching, the thesis studies two methods utilizing a Siamese network architecture for learning both patch-level and image-level descriptors for measuring similarity using Euclidean distance. Next, it introduces a coarse-to-fine CNN-based approach for dense pixel correspondence estimation that can leverage the advantages of optical flow methods and extend them to the case of wide baseline between two images. The method demonstrates good generalization performance and it is applicable for image matching as well as for image alignment and relative camera pose estimation.  One of the contributions of the thesis is a novel approach for recovering the absolute camera pose from ego-motion. In contrast to the existing CNN-based localization algorithms, the proposed method can be directly applied to scenes which are not available at training stage and it does not require scene-specific training of the network, thus, improving the scalability. The thesis also shows that Siamese architecture can be successfully utilized in the problem of relative camera pose estimation achieving better performance in challenging scenarios compared to traditional image descriptors.  Lastly, the thesis demonstrates how the advances of visual geometry can help to efficiently learn depth, camera ego-motion, and optical flow for the task of scene understanding. More specifically, it introduces a method that can leverage temporally consistent geometric priors between frames of monocular video sequences and jointly estimate ego-motion and depth maps in a self-supervised manner.
Supervising professor
Kannala, Juho, Prof., Aalto University, Department of Computer Science, Finland
Rahtu, Esa, Prof., Tampere University, Finland
computer vision, machine learning, deep learning, camera relocalization, image matching, scene understanding, ego-motion, image alignment
Other note
  • [Publication 1]: Iaroslav Melekhov, Juho Kannala, and Esa Rahtu. Image Patch Matching Using Convolutional Descriptors with Euclidean Distance. Asian Conference on Computer Vision. Workshop on Interpretation and Visualization of Deep Neural Nets (ACCVW), pp. 638–653, 2016.
    DOI: 10.1007/978-3-319-54526-4_46 View at publisher
  • [Publication 2]: Iaroslav Melekhov, Juho Kannala, and Esa Rahtu. Siamese Network Features for Image Matching. International Conference on Pattern Recognition (ICPR), pp. 378–383, December 2016.
    DOI: 10.1109/ICPR.2016.7899663 View at publisher
  • [Publication 3]: Iaroslav Melekhov, Aleksei Tiulpin, Torsten Sattler, Marc Pollefeys, Esa Rahtu, and Juho Kannala. DGC-Net: Dense Geometric Correspondence Network. IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1034–1042, January 2019.
    DOI: 10.1109/WACV.2019.00115 View at publisher
  • [Publication 4]: Iaroslav Melekhov, Juha Ylionas, Juho Kannala, and Esa Rahtu. Relative Camera Pose Estimation Using Convolutional Neural Networks. International Conference on Advanced Concepts for Intelligent Vision Systems (ACIVS), pp. 675–687, 2017.
    DOI: 10.1007/978-3-319-70353-4_57 View at publisher
  • [Publication 5]: Iaroslav Melekhov, Juha Ylionas, Juho Kannala, and Esa Rahtu. Image-based Localization Using Hourglass Networks. IEEE International Conference on Computer Vision. Geometry Meets Deep Learning Workshop (ICCVW), pp. 879–886, 2017.
    DOI: 10.1109/ICCVW.2017.107 View at publisher
  • [Publication 6]: Zakaria Laskar, Iaroslav Melekhov, Surya Kalia, and Juho Kannala. Camera Relocalization by Computing Pairwise Relative Poses Using Convolutional Neural Networks. IEEE International Conference on Computer Vision. Geometry Meets Deep Learning Workshop (ICCVW), pp. 929–938, 2017.
    DOI: 10.1109/ICCVW.2017.113 View at publisher
  • [Publication 7]: Iaroslav Melekhov, Esa Rahtu, Juho Kannala, Alex Kendall. TC-Net: Self-Supervised Monocular Video Scene Understanding Using Tempo-rally Consistent Geometric Prior. International Conference on Machine Learning. Self-Supervised Learning Workshop (ICMLW), 5 pages, April 2019.