Scalable Bayesian neural networks
Loading...
URL
Journal Title
Journal ISSN
Volume Title
Perustieteiden korkeakoulu |
Master's thesis
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
2021-06-14
Department
Major/Subject
Machine Learning, Data Science and Artificial Intelligence
Mcode
SCI3044
Degree programme
Master’s Programme in Computer, Communication and Information Sciences
Language
en
Pages
69
Series
Abstract
The ability to output accurate predictive uncertainty estimates is vital to a reliable classifier. Standard neural networks (NNs), while being powerful machine learning models that can learn complex patterns from large datasets, do not possess such ability. Therefore, one cannot reliably detect when an NN makes a wrong prediction. This shortcoming prevents applying NNs in safety-critical domains such as healthcare and autonomous vehicles. Bayesian neural networks (BNNs) have emerged as one of the promising solutions combining the learning capacity of NNs with probabilistic representations of uncertainty. By treating its weights as random variables, a BNN produces over its outputs a distribution from which uncertainty can be quantified. As a result, a BNN can provide better predictive performance while being more robust against out-of-distribution (OOD) samples than a respective deterministic NN. Unfortunately, training large BNNs is challenging due to the inherent complexity of these models. Therefore, BNNs trained by standard Bayesian inference methods typically produce lower classification accuracy than their deterministic counterparts, thus hindering their practical applications despite their potential. This thesis introduces implicit Bayesian neural networks (iBNNs), which are scalable BNN models that can be applied to large architectures. This model considers weights as deterministic parameters and augments the input nodes of each layer with latent variables as an alternative method to induce predictive uncertainty. To train an iBNN, we only need to infer the posterior distribution of these low-dimensional auxiliary variables while learning a point estimate of the weights. Through comprehensive experiments, we show that iBNNs provide competitive performance compared to other existing scalable BNN approaches and are more robust against OOD samples despite having smaller numbers of parameters. Furthermore, with minimal overhead, we can convert a pretrained deterministic NN to a respective iBNN with better generalisation performance and predictive uncertainty. Thus, we can use iBNNs with pretrained weights of state-of-the-art deep NNs as a computationally efficient post-processing step to further improve performance of those models.Description
Supervisor
Kaski, SamuelThesis advisor
Heinonen, MarkusKeywords
Bayesian neural network, deep learning, neural network, uncertainty quantification