Unsupervised Networks, Stochasticity and Optimization in Deep Learning

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
School of Science | Doctoral thesis (article-based) | Defence date: 2017-04-11
Degree programme
102 + app. 112
Aalto University publication series DOCTORAL DISSERTATIONS, 40/2017
Deep learning has recently received a lot of attention for enabling breakthroughs especially in complex machine learning tasks in a wide array of problem domains. The rapid development of the field has been enabled by multiple factors, including increases in computational capacity and availability of large datasets, innovations in model structures, and developments in optimization algorithms. This dissertation presents some of the advances in optimization algorithms and model structures, with an emphasis on models that are unsupervised or have stochastic hidden states. In addition to presenting previously known model structures such as the restricted Boltzmann machine, multilayer perceptron and recurrent neural network, the Ladder Networks and Tagger are presented, which are unsupervised networks designed to be easily combined with supervised learning. The networks use denoising of representations corrupted with noise as the unsupervised task. In addition, a novel interpretation of using bidirectional recurrent neural networks as generative models is presented. The stochastic hidden states in restricted Boltzmann machines and binary stochastic feedforward networks complicate their training, which requires estimating the gradient. The properties of gradient estimates in both models are studied, and new estimators are proposed for binary stochastic feedforward networks. In addition, new methods are presented for optimizing neural networks including gradient-based hyperparameter tuning and transforming the nonlinearities of feedforward networks in a way that speeds up their optimization.
Supervising professor
Karhunen, Juha, Prof., Aalto University, Department of Computer Science, Finland
Thesis advisor
Raiko, Tapani, Assistant Prof., Aalto University, Department of Computer Science, Finland
unsupervised networks, stochasticity, deep learning, neural networks, deep neural networks
Other note
  • [Publication 1]: Mathias Berglund, Tapani Raiko, Mikko Honkala, Leo Kärkkäinen, Akos Vetek, and Juha Karhunen. Bidirectional Recurrent Neural Networks as Generative Models. In Advances in Neural Information Processing Systems 28, pp. 856-864, December 2015
  • [Publication 2]: Mathias Berglund, Tapani Raiko, and Kyunghyun Cho. Measuring the usefulness of hidden units in Boltzmann machines with mutual information. Neural Networks, Volume 64, pp. 12-18, September 2014.
    DOI: 10.1016/j.neunet.2014.09.004 View at publisher
  • [Publication 3]: Tapani Raiko, Mathias Berglund, Guillaume Alain, and Laurent Dinh. Techniques for Learning Binary Stochastic Feedforward Neural Networks. In Proceedings of the International Conference on Learning Representations, May 2015
  • [Publication 4]: Jelena Luketina, Mathias Berglund, Klaus Greff, Tapani Raiko. Scalable Gradient-Based Tuning of Continuous Regularization Hyperparameters. In Proceedings of The 33rd International Conference on Machine Learning, pp. 2952–2960, June 2016.
  • [Publication 5]: Mathias Berglund. Stochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence. In Proceedings of The 24th European Symposium on Artificial Neural Networks, pp. 521-526, April 2016
  • [Publication 6]: Antti Rasmus, Harri Valpola, Mikko Honkala, Mathias Berglund, and Tapani Raiko. Semi-supervised Learning with Ladder Networks. In Advances in Neural Information Processing Systems 28, pp. 3546-3554, December 2015
  • [Publication 7]: Klaus Greff, Antti Rasmus, Mathias Berglund, Tele Hotloo Hao, Jürgen Schmidhuber, and Harri Valpola. Tagger: Deep Unsupervised Perceptual Grouping. In Advances in Neural Information Processing Systems 29, pp. 4484-4492, December 2016
  • [Publication 8]: Tapani Raiko, Mathias Berglund, Tommi Vatanen, Juha Karhunen and Harri Valpola. Transformations in Activation Functions Push the Gradient Towards the Natural Gradient. Neural Processing Letters, submitted, 2016