Browsing by Author "Heljakka, Ari"
Now showing 1 - 3 of 3
- Results Per Page
- Sort Options
- Deep Generative Neural Network Models for Capturing Complex Patterns in Visual Data
School of Science | Doctoral dissertation (article-based)(2020) Heljakka, AriDeep learning methods underlie much of the recent rapid progress in computer vision. These approaches, however, tend to require costly labeled data. Task-specific models such as classifiers are not intended for learning maximally general internal representations. Furthermore, these models cannot simulate the data-generating process to synthesize new samples nor modify input samples. Unsupervised deep generative models have the potential to avoid these problems. However, the two dominant families of generative models, Generative Adversarial Networks (GAN)and Variational Autoencoders (VAE), each come with their characteristic problems. GAN-based models are architecturally relatively complex, with a disposable discriminator network but, usually, no encoder to accept inputs. Also, GAN training is often unstable and prone to ignoring parts oft he training distribution ("mode collapse" or "mode dropping"). VAEs, on the other hand, tend to overestimate the variance in some regions of the distribution, resulting in blurry generated images. This work introduces and evaluates models and techniques that considerably reduce the problems above, and generate sharp image outputs with a simple autoencoder architecture. This is achieved by virtue of two overarching principles. First, a suitable combination of techniques from GAN models is integrated into the recently introduced VAE-like Adversarial Generator-Encoder. Second,the recursive nature of the networks is leveraged in several ways. The Automodulator represents a new category of autoencoders characterized by the use of the latent representation for modulating the statistics of the decoder layers. The network can take multiple images as inputs from which it generates a fused synthetic sample, with some scales of the output driven by one input and the other scales by another, allowing instantaneous 'style-mixing' and other new applications. Finally, with a Gaussian process framework, the image encoder-decoder setup is extended from single images to image sequences, including video and camera runs. To this end, auxiliary image metadata is leveraged in a form of a non-parametric prior in the latent space of a generative model.This allows to, for instance, smoothen and freely interpolate the image sequence. In doing so, an elegant connection is provided between Gaussian processes and computer vision methods,suggesting far-reaching implications in combining the two. This work provides several examples in which the adversarial training principle, without its typical manifestation in a GAN-like network architecture, is sufficient for high-fidelity image manipulation and synthesis. Hence, this often overlooked distinction appears increasingly significant. - Pioneer Networks: Progressively Growing Generative Autoencoder
A4 Artikkeli konferenssijulkaisussa(2019) Heljakka, Ari; Solin, Arno; Kannala, JuhoWe introduce a novel generative autoencoder network model that learns to encode and reconstruct images with high quality and resolution, and supports smooth random sampling from the latent space of the encoder. Generative adversarial networks (GANs) are known for their ability to simulate random high-quality images, but they cannot reconstruct existing images. Previous works have attempted to extend GANs to support such inference but, so far, have not delivered satisfactory high-quality results. Instead, we propose the Progressively Growing Generative Autoencoder (Pioneer) network which achieves high-quality reconstruction with images without requiring a GAN discriminator. We merge recent techniques for progressively building up the parts of the network with the recently introduced adversarial encoder–generator network. The ability to reconstruct input images is crucial in many real-world applications, and allows for precise intelligent manipulation of existing images. We show promising results in image synthesis and inference, with state-of-the-art results in CelebA inference tasks. - Recursive Chaining of Reversible Image-to-Image Translators for Face Aging
A4 Artikkeli konferenssijulkaisussa(2018-01-01) Heljakka, Ari; Solin, Arno; Kannala, JuhoThis paper addresses the modeling and simulation of progressive changes over time, such as human face aging. By treating the age phases as a sequence of image domains, we construct a chain of transformers that map images from one age domain to the next. Leveraging recent adversarial image translation methods, our approach requires no training samples of the same individual at different ages. Here, the model must be flexible enough to translate a child face to a young adult, and all the way through the adulthood to old age. We find that some transformers in the chain can be recursively applied on their own output to cover multiple phases, compressing the chain. The structure of the chain also unearths information about the underlying physical process. We demonstrate the performance of our method with precise and intuitive metrics, and visually match with the face aging state-of-the-art.