Browsing by Author "Kannala, Juho, Prof., Aalto University, Department of Computer Science, Finland; Solin, Arno, Pof., Aalto University, Department of Computer Science, Finland"
Now showing 1 - 1 of 1
- Results Per Page
- Sort Options
- Learning Latent Image Representations with Prior Knowledge
School of Science | Doctoral dissertation (article-based)(2022) Hou, YuxinDeep learning has become a dominant tool in many computer vision applications due to the superior performance of extracting low-dimensional latent representations from images. However, though there is prior knowledge for many applications already, most existing methods learn image representations from large-scale training data in a black-box way, which is not good for interpretability and controllability. This thesis explores approaches that integrate different types of prior knowledge into deep neural networks. Instead of learning image representations from scratch, leveraging the prior knowledge in latent space can softly regularize the training and obtain more controllable representations.The models presented in the thesis mainly address three different problems: (i) How to encode epipolar geometry in deep learning architectures for multi-view stereo. The key of multi-view stereo is to find the matched correspondence across images. In this thesis, a learning-based method inspired by the classical plane sweep algorithm is studied. The method aims to improve the correspondence matching in two parts: obtaining better potential correspondence candidates with a novel plane sampling strategy and learning the multiplane representations instead of using hand-crafted cost metrics. (ii) How to capture the correlations of input data in the latent space. Multiple methods that introduce Gaussian process in the latent space to encode view priors are explored in the thesis. According to the availability of relative motion of frames, there is a hierarchy of three covariance functions which are presented as Gaussian process priors, and the correlated latent representations can be obtained via latent nonparametric fusion. Experimental results show that the correlated representations lead to more temporally consistent predictions for depth estimation, and they can also be applied to generative models to synthesize images in new views. (iii) How to use the known factors of variation to learn disentangled representations. Both equivariant representations and factorized representations are studied for novel view synthesis and interactive fashion retrieval respectively. In summary, this thesis presents three different types of solutions that use prior domain knowledge to learn more powerful image representations. For depth estimation, the presented methods integrate the multi-view geometry into the deep neural network. For image sequences, the correlated representations obtained from inter-frame reasoning make more consistent and stable predictions. The disentangled representations provide explicit flexible control over specific known factors of variation.