Algorithms for Data-Efficient Training of Deep Neural Networks

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
School of Science | Doctoral thesis (article-based) | Defence date: 2020-12-16
Degree programme
97 + app. 131
Aalto University publication series DOCTORAL DISSERTATIONS, 198/2020
Deep Neural Networks ("deep learning") have become a ubiquitous choice of algorithms for Machine Learning applications. These systems often achieve human-level or even super-human level performances across a variety of tasks such as computer vision, natural language processing, speech recognition, reinforcement learning, generative modeling and healthcare. This success can be attributed to their ability to learn complex representations directly from the raw input data, completely eliminating the hand-crafted feature extraction from the pipeline. However, there still exists a caveat: due to the extremely large number of trainable parameters in Deep Neural Networks, their generalization ability depends heavily on the availability of a large amount of labeled data. In many machine learning applications, gathering a large amount of labeled data is not feasible due to privacy, cost, time or expertise constraints. Examples of such applications are abundant in healthcare; for example, predicting the effect of a medicine on a new patient in the scenario where the medicine has been administered to only a few patients earlier. This thesis addresses the problem of improving the generalization ability of Deep Neural Networks using a limited amount of labeled data. More specifically, this thesis explores a class of methods that directly incorporates the inductive bias about how the Deep Neural Networks should "behave" in-between the training samples (both in the input space as well as the hidden space) into the learning algorithms. Throughout several publications included in this thesis, the author has demonstrated that such kinds of methods can outperform conventional baseline methods and achieve state-of-the-art performance across supervised, unsupervised, semi-supervised, adversarial training and graph-based learning settings. In addition to these algorithms, the author proposes a mutual information based method for learning the representations for the "graph-level" tasks in an unsupervised and semi-supervised manner. Finally, the author proposes a method to improve the generalization of ResNets based on the iterative inference view.
Supervising professor
Kannala, Juho, Prof., Aalto University, Department of Computer Science, Finland
Thesis advisor
Bengio, Yoshua, Prof., Mila - Quebec AI Institute (Mila - Institut québécois d'intelligence artificielle), Canada
Raiko, Tapani, Prof., Aalto University, Department of Computer Science, Finland
Karhunen, Juha, Prof., Aalto University, Department of Computer Science, Finland
deep neural networks, machine learning
Other note
  • [Publication 1]: Vikas Verma, Alex Lamb, Christopher Beckham, Amir Najafi, Ioannis Mitliagkas, David Lopez-Paz, Yoshua Bengio. Manifold Mixup: Better Representations by Interpolating Hidden States. In Proceedings of the 36th International Conference on Machine Learning (ICML 2019), Long Beach, California, USA, volume 97, pages: 6438–6447, 2019.
    Full text in Acris/Aaltodoc:
  • [Publication 2]: Christopher Beckham, Sina Honari, Vikas Verma, Alex Lamb, Farnoosh Ghadiri, R Devon Hjelm, Yoshua Bengio. On Adversarial Mixup Resynthesis. In 2019 Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada, pages:4346–4357, 2019.
  • [Publication 3]: Vikas Verma, Alex Lamb, Juho Kannala, Yoshua Bengio. Interpolated Adversarial Training: Achieving Robust Neural Networks Without Sacrificing Too Much Accuracy. In Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security (AISec’19), London, United Kingdom, pages:95-103, 2019.
    Full text in Acris/Aaltodoc:
    DOI: 10.1145/3338501.3357369 View at publisher
  • [Publication 4]: Vikas Verma, Alex Lamb, Juho Kannala, Yoshua Bengio, David Lopez-Paz. Interpolation Consistency Training for Semi-Supervised Learning. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19), Macao, China, pages:3635–3641, 2019.
    DOI: 10.24963/ijcai.2019/504 View at publisher
  • [Publication 5]: Vikas Verma, Meng Qu, Alex Lamb, Yoshua Bengio, Juho Kannala, Jian Tang. GraphMix: Improved Training of Graph Neural Networks for Semi-Supervised Learning. Submitted for review,, January 2020, 8 pages
  • [Publication 6]: Fan-Yun Sun, Jordan Hoffmann, Vikas Verma, Jian Tang. InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization. In Eighth International Conference on Learning Representations (ICLR 2020, spotlight), Addis Ababa, Ethiopia, 2020
  • [Publication 7]: Stanislaw Jastrzebski, Devansh Arpit, Nicolas Ballas, Vikas Verma, Tong Che, Yoshua Bengio. Residual Connections Encourage Iterative Inference. In 6th International Conference on Learning Representations (ICLR 2018), Vancouver, Canada, 2018