Unsupervised Networks, Stochasticity and Optimization in Deep Learning

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.advisorRaiko, Tapani, Assistant Prof., Aalto University, Department of Computer Science, Finland
dc.contributor.authorBerglund, Mathias
dc.contributor.departmentTietotekniikan laitosfi
dc.contributor.departmentDepartment of Computer Scienceen
dc.contributor.labDeep Learning and Bayesian Modelingen
dc.contributor.schoolPerustieteiden korkeakoulufi
dc.contributor.schoolSchool of Scienceen
dc.contributor.supervisorKarhunen, Juha, Prof., Aalto University, Department of Computer Science, Finland
dc.date.accessioned2017-03-08T10:00:36Z
dc.date.available2017-03-08T10:00:36Z
dc.date.defence2017-04-11
dc.date.issued2017
dc.description.abstractDeep learning has recently received a lot of attention for enabling breakthroughs especially in complex machine learning tasks in a wide array of problem domains. The rapid development of the field has been enabled by multiple factors, including increases in computational capacity and availability of large datasets, innovations in model structures, and developments in optimization algorithms. This dissertation presents some of the advances in optimization algorithms and model structures, with an emphasis on models that are unsupervised or have stochastic hidden states. In addition to presenting previously known model structures such as the restricted Boltzmann machine, multilayer perceptron and recurrent neural network, the Ladder Networks and Tagger are presented, which are unsupervised networks designed to be easily combined with supervised learning. The networks use denoising of representations corrupted with noise as the unsupervised task. In addition, a novel interpretation of using bidirectional recurrent neural networks as generative models is presented. The stochastic hidden states in restricted Boltzmann machines and binary stochastic feedforward networks complicate their training, which requires estimating the gradient. The properties of gradient estimates in both models are studied, and new estimators are proposed for binary stochastic feedforward networks. In addition, new methods are presented for optimizing neural networks including gradient-based hyperparameter tuning and transforming the nonlinearities of feedforward networks in a way that speeds up their optimization.en
dc.format.extent102 + app. 112
dc.format.mimetypeapplication/pdfen
dc.identifier.isbn978-952-60-7323-1 (electronic)
dc.identifier.isbn978-952-60-7324-8 (printed)
dc.identifier.issn1799-4942 (electronic)
dc.identifier.issn1799-4934 (printed)
dc.identifier.issn1799-4934 (ISSN-L)
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/24774
dc.identifier.urnURN:ISBN:978-952-60-7323-1
dc.language.isoenen
dc.opnWelling, Max, Prof., University of Amsterdam, Netherlands
dc.publisherAalto Universityen
dc.publisherAalto-yliopistofi
dc.relation.haspart[Publication 1]: Mathias Berglund, Tapani Raiko, Mikko Honkala, Leo Kärkkäinen, Akos Vetek, and Juha Karhunen. Bidirectional Recurrent Neural Networks as Generative Models. In Advances in Neural Information Processing Systems 28, pp. 856-864, December 2015
dc.relation.haspart[Publication 2]: Mathias Berglund, Tapani Raiko, and Kyunghyun Cho. Measuring the usefulness of hidden units in Boltzmann machines with mutual information. Neural Networks, Volume 64, pp. 12-18, September 2014. DOI: 10.1016/j.neunet.2014.09.004
dc.relation.haspart[Publication 3]: Tapani Raiko, Mathias Berglund, Guillaume Alain, and Laurent Dinh. Techniques for Learning Binary Stochastic Feedforward Neural Networks. In Proceedings of the International Conference on Learning Representations, May 2015
dc.relation.haspart[Publication 4]: Jelena Luketina, Mathias Berglund, Klaus Greff, Tapani Raiko. Scalable Gradient-Based Tuning of Continuous Regularization Hyperparameters. In Proceedings of The 33rd International Conference on Machine Learning, pp. 2952–2960, June 2016.
dc.relation.haspart[Publication 5]: Mathias Berglund. Stochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence. In Proceedings of The 24th European Symposium on Artificial Neural Networks, pp. 521-526, April 2016
dc.relation.haspart[Publication 6]: Antti Rasmus, Harri Valpola, Mikko Honkala, Mathias Berglund, and Tapani Raiko. Semi-supervised Learning with Ladder Networks. In Advances in Neural Information Processing Systems 28, pp. 3546-3554, December 2015
dc.relation.haspart[Publication 7]: Klaus Greff, Antti Rasmus, Mathias Berglund, Tele Hotloo Hao, Jürgen Schmidhuber, and Harri Valpola. Tagger: Deep Unsupervised Perceptual Grouping. In Advances in Neural Information Processing Systems 29, pp. 4484-4492, December 2016
dc.relation.haspart[Publication 8]: Tapani Raiko, Mathias Berglund, Tommi Vatanen, Juha Karhunen and Harri Valpola. Transformations in Activation Functions Push the Gradient Towards the Natural Gradient. Neural Processing Letters, submitted, 2016
dc.relation.ispartofseriesAalto University publication series DOCTORAL DISSERTATIONSen
dc.relation.ispartofseries40/2017
dc.revBornschein, Jörg, Research Scientist, DeepMind, United Kingdom
dc.revBinder, Alexander , Prof., Singapore University of Technology and Design, Singapore
dc.subject.keywordunsupervised networksen
dc.subject.keywordstochasticityen
dc.subject.keyworddeep learningen
dc.subject.keywordneural networksen
dc.subject.keyworddeep neural networksen
dc.subject.otherComputer scienceen
dc.titleUnsupervised Networks, Stochasticity and Optimization in Deep Learningen
dc.typeG5 Artikkeliväitöskirjafi
dc.type.dcmitypetexten
dc.type.ontasotDoctoral dissertation (article-based)en
dc.type.ontasotVäitöskirja (artikkeli)fi
local.aalto.archiveyes
local.aalto.formfolder2017_03_08_klo_07_53
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
isbn9789526073231.pdf
Size:
17.46 MB
Format:
Adobe Portable Document Format