Scalable Bayesian neural networks

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

Perustieteiden korkeakoulu | Master's thesis

Date

2021-06-14

Department

Major/Subject

Machine Learning, Data Science and Artificial Intelligence

Mcode

SCI3044

Degree programme

Master’s Programme in Computer, Communication and Information Sciences

Language

en

Pages

69

Series

Abstract

The ability to output accurate predictive uncertainty estimates is vital to a reliable classifier. Standard neural networks (NNs), while being powerful machine learning models that can learn complex patterns from large datasets, do not possess such ability. Therefore, one cannot reliably detect when an NN makes a wrong prediction. This shortcoming prevents applying NNs in safety-critical domains such as healthcare and autonomous vehicles. Bayesian neural networks (BNNs) have emerged as one of the promising solutions combining the learning capacity of NNs with probabilistic representations of uncertainty. By treating its weights as random variables, a BNN produces over its outputs a distribution from which uncertainty can be quantified. As a result, a BNN can provide better predictive performance while being more robust against out-of-distribution (OOD) samples than a respective deterministic NN. Unfortunately, training large BNNs is challenging due to the inherent complexity of these models. Therefore, BNNs trained by standard Bayesian inference methods typically produce lower classification accuracy than their deterministic counterparts, thus hindering their practical applications despite their potential. This thesis introduces implicit Bayesian neural networks (iBNNs), which are scalable BNN models that can be applied to large architectures. This model considers weights as deterministic parameters and augments the input nodes of each layer with latent variables as an alternative method to induce predictive uncertainty. To train an iBNN, we only need to infer the posterior distribution of these low-dimensional auxiliary variables while learning a point estimate of the weights. Through comprehensive experiments, we show that iBNNs provide competitive performance compared to other existing scalable BNN approaches and are more robust against OOD samples despite having smaller numbers of parameters. Furthermore, with minimal overhead, we can convert a pretrained deterministic NN to a respective iBNN with better generalisation performance and predictive uncertainty. Thus, we can use iBNNs with pretrained weights of state-of-the-art deep NNs as a computationally efficient post-processing step to further improve performance of those models.

Description

Supervisor

Kaski, Samuel

Thesis advisor

Heinonen, Markus

Keywords

Bayesian neural network, deep learning, neural network, uncertainty quantification

Other note

Citation