Deep Learning Methods for Image Matching and Camera Relocalization

dc.contributorAalto Universityen
dc.contributor.authorMelekhov, Iaroslav
dc.contributor.departmentTietotekniikan laitosfi
dc.contributor.departmentDepartment of Computer Scienceen
dc.contributor.schoolPerustieteiden korkeakoulufi
dc.contributor.schoolSchool of Scienceen
dc.contributor.supervisorKannala, Juho, Prof., Aalto University, Department of Computer Science, Finland
dc.contributor.supervisorRahtu, Esa, Prof., Tampere University, Finland
dc.description.abstractDeep learning and convolutional neural networks have revolutionized computer vision and become a dominant tool in many applications, such as image classification, semantic segmentation, object recognition, and image retrieval. Their strength lies in the ability to learn an efficient representation of images that makes a subsequent learning task easier. This thesis presents deep learning approaches for a number of fundamental computer vision problems that are closely related to each other; image matching, image-based localization, ego-motion estimation, and scene understanding.  In image matching, the thesis studies two methods utilizing a Siamese network architecture for learning both patch-level and image-level descriptors for measuring similarity using Euclidean distance. Next, it introduces a coarse-to-fine CNN-based approach for dense pixel correspondence estimation that can leverage the advantages of optical flow methods and extend them to the case of wide baseline between two images. The method demonstrates good generalization performance and it is applicable for image matching as well as for image alignment and relative camera pose estimation.  One of the contributions of the thesis is a novel approach for recovering the absolute camera pose from ego-motion. In contrast to the existing CNN-based localization algorithms, the proposed method can be directly applied to scenes which are not available at training stage and it does not require scene-specific training of the network, thus, improving the scalability. The thesis also shows that Siamese architecture can be successfully utilized in the problem of relative camera pose estimation achieving better performance in challenging scenarios compared to traditional image descriptors.  Lastly, the thesis demonstrates how the advances of visual geometry can help to efficiently learn depth, camera ego-motion, and optical flow for the task of scene understanding. More specifically, it introduces a method that can leverage temporally consistent geometric priors between frames of monocular video sequences and jointly estimate ego-motion and depth maps in a self-supervised manner.en
dc.format.extent59 + app. 85
dc.identifier.isbn978-952-60-8945-4 (electronic)
dc.identifier.isbn978-952-60-8944-7 (printed)
dc.identifier.issn1799-4942 (electronic)
dc.identifier.issn1799-4934 (printed)
dc.identifier.issn1799-4934 (ISSN-L)
dc.opnLempitsky, Victor, Prof., Skolkovo Institute of Science and Technology, Russia
dc.publisherAalto Universityen
dc.relation.haspart[Publication 1]: Iaroslav Melekhov, Juho Kannala, and Esa Rahtu. Image Patch Matching Using Convolutional Descriptors with Euclidean Distance. Asian Conference on Computer Vision. Workshop on Interpretation and Visualization of Deep Neural Nets (ACCVW), pp. 638–653, 2016. DOI: 10.1007/978-3-319-54526-4_46
dc.relation.haspart[Publication 2]: Iaroslav Melekhov, Juho Kannala, and Esa Rahtu. Siamese Network Features for Image Matching. International Conference on Pattern Recognition (ICPR), pp. 378–383, December 2016. DOI: 10.1109/ICPR.2016.7899663
dc.relation.haspart[Publication 3]: Iaroslav Melekhov, Aleksei Tiulpin, Torsten Sattler, Marc Pollefeys, Esa Rahtu, and Juho Kannala. DGC-Net: Dense Geometric Correspondence Network. IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1034–1042, January 2019. DOI: 10.1109/WACV.2019.00115
dc.relation.haspart[Publication 4]: Iaroslav Melekhov, Juha Ylionas, Juho Kannala, and Esa Rahtu. Relative Camera Pose Estimation Using Convolutional Neural Networks. International Conference on Advanced Concepts for Intelligent Vision Systems (ACIVS), pp. 675–687, 2017. DOI: 10.1007/978-3-319-70353-4_57
dc.relation.haspart[Publication 5]: Iaroslav Melekhov, Juha Ylionas, Juho Kannala, and Esa Rahtu. Image-based Localization Using Hourglass Networks. IEEE International Conference on Computer Vision. Geometry Meets Deep Learning Workshop (ICCVW), pp. 879–886, 2017. DOI: 10.1109/ICCVW.2017.107
dc.relation.haspart[Publication 6]: Zakaria Laskar, Iaroslav Melekhov, Surya Kalia, and Juho Kannala. Camera Relocalization by Computing Pairwise Relative Poses Using Convolutional Neural Networks. IEEE International Conference on Computer Vision. Geometry Meets Deep Learning Workshop (ICCVW), pp. 929–938, 2017. DOI: 10.1109/ICCVW.2017.113
dc.relation.haspart[Publication 7]: Iaroslav Melekhov, Esa Rahtu, Juho Kannala, Alex Kendall. TC-Net: Self-Supervised Monocular Video Scene Understanding Using Tempo-rally Consistent Geometric Prior. International Conference on Machine Learning. Self-Supervised Learning Workshop (ICMLW), 5 pages, April 2019.
dc.relation.ispartofseriesAalto University publication series DOCTORAL DISSERTATIONSen
dc.revMaki, Atsuto, Prof., KTH Royal Institute of Technology, Sweden
dc.revBalntas, Vassileios, Dr., Scape Technologies, UK
dc.subject.keywordcomputer visionen
dc.subject.keywordmachine learningen
dc.subject.keyworddeep learningen
dc.subject.keywordcamera relocalizationen
dc.subject.keywordimage matchingen
dc.subject.keywordscene understandingen
dc.subject.keywordimage alignmenten
dc.subject.otherComputer scienceen
dc.titleDeep Learning Methods for Image Matching and Camera Relocalizationen
dc.typeG5 Artikkeliväitöskirjafi
dc.type.ontasotDoctoral dissertation (article-based)en
dc.type.ontasotVäitöskirja (artikkeli)fi
local.aalto.acrisexportstatuschecked 2020-03-21_0934
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
5.58 MB
Adobe Portable Document Format