Unsupervised Networks, Stochasticity and Optimization in Deep Learning

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

School of Science | Doctoral thesis (article-based) | Defence date: 2017-04-11

Date

Major/Subject

Mcode

Degree programme

Language

en

Pages

102 + app. 112

Series

Aalto University publication series DOCTORAL DISSERTATIONS, 40/2017

Abstract

Deep learning has recently received a lot of attention for enabling breakthroughs especially in complex machine learning tasks in a wide array of problem domains. The rapid development of the field has been enabled by multiple factors, including increases in computational capacity and availability of large datasets, innovations in model structures, and developments in optimization algorithms. This dissertation presents some of the advances in optimization algorithms and model structures, with an emphasis on models that are unsupervised or have stochastic hidden states. In addition to presenting previously known model structures such as the restricted Boltzmann machine, multilayer perceptron and recurrent neural network, the Ladder Networks and Tagger are presented, which are unsupervised networks designed to be easily combined with supervised learning. The networks use denoising of representations corrupted with noise as the unsupervised task. In addition, a novel interpretation of using bidirectional recurrent neural networks as generative models is presented. The stochastic hidden states in restricted Boltzmann machines and binary stochastic feedforward networks complicate their training, which requires estimating the gradient. The properties of gradient estimates in both models are studied, and new estimators are proposed for binary stochastic feedforward networks. In addition, new methods are presented for optimizing neural networks including gradient-based hyperparameter tuning and transforming the nonlinearities of feedforward networks in a way that speeds up their optimization.

Description

Supervising professor

Karhunen, Juha, Prof., Aalto University, Department of Computer Science, Finland

Thesis advisor

Raiko, Tapani, Assistant Prof., Aalto University, Department of Computer Science, Finland

Other note

Parts

  • [Publication 1]: Mathias Berglund, Tapani Raiko, Mikko Honkala, Leo Kärkkäinen, Akos Vetek, and Juha Karhunen. Bidirectional Recurrent Neural Networks as Generative Models. In Advances in Neural Information Processing Systems 28, pp. 856-864, December 2015
  • [Publication 2]: Mathias Berglund, Tapani Raiko, and Kyunghyun Cho. Measuring the usefulness of hidden units in Boltzmann machines with mutual information. Neural Networks, Volume 64, pp. 12-18, September 2014.
    DOI: 10.1016/j.neunet.2014.09.004 View at publisher
  • [Publication 3]: Tapani Raiko, Mathias Berglund, Guillaume Alain, and Laurent Dinh. Techniques for Learning Binary Stochastic Feedforward Neural Networks. In Proceedings of the International Conference on Learning Representations, May 2015
  • [Publication 4]: Jelena Luketina, Mathias Berglund, Klaus Greff, Tapani Raiko. Scalable Gradient-Based Tuning of Continuous Regularization Hyperparameters. In Proceedings of The 33rd International Conference on Machine Learning, pp. 2952–2960, June 2016.
  • [Publication 5]: Mathias Berglund. Stochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence. In Proceedings of The 24th European Symposium on Artificial Neural Networks, pp. 521-526, April 2016
  • [Publication 6]: Antti Rasmus, Harri Valpola, Mikko Honkala, Mathias Berglund, and Tapani Raiko. Semi-supervised Learning with Ladder Networks. In Advances in Neural Information Processing Systems 28, pp. 3546-3554, December 2015
  • [Publication 7]: Klaus Greff, Antti Rasmus, Mathias Berglund, Tele Hotloo Hao, Jürgen Schmidhuber, and Harri Valpola. Tagger: Deep Unsupervised Perceptual Grouping. In Advances in Neural Information Processing Systems 29, pp. 4484-4492, December 2016
  • [Publication 8]: Tapani Raiko, Mathias Berglund, Tommi Vatanen, Juha Karhunen and Harri Valpola. Transformations in Activation Functions Push the Gradient Towards the Natural Gradient. Neural Processing Letters, submitted, 2016

Citation