Browsing by Author "Kaski, Samuel, Prof., Aalto University, Department of Computer Science, Finland"
Now showing 1 - 18 of 18
- Results Per Page
- Sort Options
- Advances in distributed Bayesian inference and graph neural networks
School of Science | Doctoral dissertation (article-based)(2021) Mesquita, DiegoBayesian statistics and graph neural networks comprise a bag of tools widely employed in machine learning and applied sciences. The former rests on solid theoretical foundations, but its application depends on techniques that scale poorly as data increase. The latter is notorious for large-scale applications (e.g., in bioinformatics and natural language processing), but is largely only based on empirical intuitions. This thesis aims to i) broaden the scope of applications for Bayesian inference, and ii) deepen the understanding of core design principles of graph neural networks. First, we focus on distributed Bayesian inference under limited communication. We advance the state-of-the-art of embarrassingly parallel Markov chain Monte Carlo (MCMC) with a novel method that leverages normalizing flows as density estimators. On the same front, we also propose an extension of stochastic gradient Langevin dynamics for federated data, which are inherently distributed in a non-IID manner and cannot be centralized due to privacy constraints. Second, we develop a methodology for meta-analysis which allows the combination of Bayesian posteriors from different studies. Our approach is agnostic to study-specific complexities, which are all encapsulated in their respective posteriors. This extends the application of Bayesian meta-analysis to likelihood-free posteriors, which would otherwise be challenging. Our method also enables us to reuse posteriors from computationally costly analyses and update them post-hoc, without rerunning the analyses. Finally, we revisit two popular graph neural network components: spectral graph convolutions and pooling layers. Regarding convolutions, we propose a novel architecture and show that it is possible to achieve state-of-the-art performance by adding a minimal set of features to the most basic formulation of polynomial spectral convolutions. On the topic of pooling, we challenge the need for intricate pooling schemes and show that they do not play a role in the performance of graph networks in relevant benchmarks. - Bayesian Multi-View Factor Models for Drug Response and Brain Imaging Studies
School of Science | Doctoral dissertation (article-based)(2018) Leppäaho, EemeliThis thesis investigates knowledge inference from measurements of multiple data sources, motivated by technologies in a wide range of domains allowing effective measurement of several related, but heterogeneous data sources. In life sciences, examples of this kind of "multi-view" data are brain imaging data of multiple subjects along with description of the experimental stimuli, as well as drug response studies including measurements regarding the expression level, copy number variation and mutation of genes in cell lines. Data analyses have been typically related to analyzing the structure of a single data source, or the effect of one data source to another. The multi-view data inspected in this thesis results in a more complex problem: besides the structure of each of the data sources, the relations between the data sources are of high interest as well. This thesis addresses modern multi-view data analysis problems using Bayesian latent variable models. They are a natural choice for developing models in order to gain knowledge about multiple data sources and their relations; they allow for missing values in the data, incorporating prior information to the modelling problem and estimating the uncertainty present in the inference. The key contributions of this thesis include formulating a low-rank data source relation model and presenting biclustering using sparse priors, as well as a relaxed formulation of tensor factorization. All the developed models have been published as open-source software, enabling wide-spread use and further development. The presented machine learning tools are demonstrated using drug response and brain imaging studies, for both of which predictive performance above state-of-the-art level is achieved. In the drug response studies, the models were able to accurately relate similar drugs, as well as detect known cancer genes affecting the responsiveness of cells to certain drugs. In the brain response studies the benefits of the presented methods were shown via increased accuracy in predicting brain responses, whereas the relaxed tensor decomposition allowed for a novel way of utilizing measurements for multiple subjects. Finally, the advantage of using a low-dimensional latent space is illustrated in a genome-wide association study in an especially challenging domain: when there exist measurements for only two hundred subjects, yet there exist some thousands of features regarding the subjects, with the study discovering a relevant gene associated with components of brain activity. - Bayesian multi-view models for data-driven drug response analysis
School of Science | Doctoral dissertation (article-based)(2015) Khan, Suleiman AliA central challenge faced by biological and medical research is to understand the impact of chemical entities on living cells. Identifying the relationships between the chemical structures and their cellular responses is valuable for improving drug design and targeted therapies. The chemical structures and their detailed molecular responses need to be combined through a systematic analysis to learn the complex dependencies, which can then assist in improving understanding of the molecular mechanisms of drugs as well as predictions on the effects of unknown molecules. Moreover, with emerging drug-response data sets being profiled over several disease types and phenotypic details, it is pertinent to develop advanced computational methods that can be used to study multiple sets of data together. In this thesis, a novel multi-disciplinary challenge is undertaken for computationally analyzing interactions between multiple biological responses and chemical properties of drugs, while simultaneously advancing the computational methods to better learn these interactions. Specifically, multi-view dependency modeling of paired data sets is formulated as a means of systematically studying the drug-response relationships. First, the systematic analysis of drug structures and their genome-wide responses is presented as a multi-set dependency modeling problem and established methods are adopted to test the novel hypothesis. Several novel extensions of the drug-response analysis are then presented that explore responses measured over multiple disease types and multiple levels of phenotypic detail, uncovering novel biological insights of potential impact. These analyses are made possible by novel advancements in multi-view methods. Specifically, the first Bayesian tensor canonical correlation analysis and its extensions are introduced to capture the underlying multi-way structure and applied in analyzing novel toxicogenomic interactions. The results illustrate that modeling the precise multi-view and multi-way formulation of the data is valuable for discovering interpretable latent components as well as for the prediction of unseen responses of drugs. Therefore, the original contribution to knowledge in this dissertation is two-fold: first, the data-driven identification of relationships between structural properties of drugs and their genome-wide responses in cells and, second, novel advancements of multi-view methods that find dependencies between paired data sets. Open source implementations of the new methods have been released to facilitate further research. - Deep Visual Understanding and Beyond - Saliency, Uncertainty, and Bridges to Natural Language
School of Science | Doctoral dissertation (article-based)(2024) Wang, Tzu-Jui JuliusVisual understanding concerns to what extent a cognitive system can reason about the visual surroundings before it reacts accordingly. While visual understanding is considered crucial, what go beyond are the capabilities of multi-modal reasoning which involve also other modalities. That is, a cognitive system is often faced with a daunting process – how to capitalize on the inputs, usually from one or more modalities – to adapt itself to the world of multiple modalities. More importantly, different machine learning paradigms may be exploited to learn both uni-modal and multi-modal reasoning tasks. This defines the main research question initiating the research endeavour presented in this thesis. In response to the dissertation's core research question, the work provides a number of methods empowered by different machine learning paradigms for both uni-modal and multi-modal contexts. More concretely, it is shown that one can estimate visual saliency, which is one of the most crucial fundamentals of visual understanding, with visual cues learned in an unsupervised fashion. Semi-supervised learning principle is found to be effective in combating class-imbalance issues in scene graph generation, which aims at discovering relationships among visual objects in an image. Moreover, to overcome the primary drawback in vision-language (VL) pre-training and other VL applications, which conventionally necessitate annotated image-text pairs, a novel weakly supervised approach is introduced. Besides, several enhancements have been made to supervised learning applications: Firstly, an improved dense image captioning model is proposed to better exploit different types of relationships between visual objects in an image. Secondly, an enhanced video captioning model is proposed to alleviate the impact brought by the modality gap, which can be commonly found in the widely adopted Transformer models. Lastly, an uncertainty-aware classification model is proposed to learn more robustly under noisy supervision when accounting for data and model uncertainties. These results suggest the usefulness and wide applicability of different learning paradigms. In terms of models' robustness, several breakthroughs have been made and elaborated for both uni-modal and multi-modal applications. The research outcomes encompass numerous findings related to computer vision techniques and their bridges to natural language. The thesis concludes with a discussion on the limitations of each published work and potential future endeavours in both uni-modal and multi-modal research. - Differentially private approximate Bayesian inference of probabilistic models
School of Science | Doctoral dissertation (article-based)(2023) Jälkö, JoonasLearning population level characteristics from a set of individuals, belonging to the said population, is the typical aim of statistical inference. When the inference is based on confidential data, measures should be taken to make sure that sensitive data of none of the individuals can be deduced from the results of the statistical inference. In this thesis, I study approximate Bayesian inference under the strict privacy constraint of differential privacy (DP), which allows inferring many of the key features from the data while limiting the effect each individual has to the results, thus keeping the contribution to the analysis a secret. In this thesis, I focus on two families of approximate Bayesian inference methods: variational inference (VI) and Markov chain Monte Carlo (MCMC). Both of these families of methods are applicable for a wide variety of probabilistic models and are widely applied in practice. However, these methods rely on individuals' data through the log-likelihood computation, thus creating a possible channel of privacy leakage. I demonstrate that for a stochastic gradient based VI algorithm, the privacy leakage can be limited with minor modifications to the algorithm that guarantee DP. For a specific type of MCMC algorithm we can have an even more striking result: the algorithm itself guarantees DP as long as the log-likelihood satisfies certain smoothness conditions. This property follows from carefully analyzing the noise arising from the stochasticity of the MCMC method. I show that this noise is enough to limit the individual sample's effect on the results and to guarantee DP. Finally, I study an important application of the DP probabilistic inference: producing privacy-preserving synthetic data. Privacy-preserving synthetic data is typically a data set drawn from a generative model trained under DP. Probabilistic models can be seen as a set of instructions for generating data. Therefore, the probabilistic models trained under DP with the aforementioned DP approximate Bayesian inference techniques can be used to produce DP synthetic data sets. I demonstrate how equipping the probabilistic models with prior information about the data generating process can drastically improve the downstream utility of the synthetic data without compromising the privacy further. - Gaussian Process Modelling of Genome-wide High-throughput Sequencing Time Series
School of Science | Doctoral dissertation (article-based)(2018) Topa, HandeDuring the last decade, high-throughput sequencing (HTS) has become the mainstream technique for simultaneously studying enormous number of genetic features present in the genome, transcriptome, or epigenome of an organism. Besides the static experiments which compare genetic features between two or more distinct biological conditions, time series experiments which monitor genetic features over time provide valuable information about the dynamics of complex mechanisms in various biological processes. However, analysis of the currently available HTS time series data sets involves challenges as these data sets often consist of short and irregularly sampled time series which lack sufficient biological replication. In addition, quantification of the genetic features from HTS data is inherently subject to uncertainty due to the limitations of HTS platforms such as short read lengths and varying sequencing depths. This thesis presents a Gaussian process (GP)-based approach for modelling and ranking HTS time series by taking into account the characteristics of the data sets. GPs are one of the most suitable tools for modelling sparse and irregularly sampled time series and they can capture the temporal correlations between observations at different time points via suitable covariance functions. On the other hand, naive application of GP modelling may suffer from over-fitting, leading to increased number of false positives if the characteristics of the data are not taken into account. In this thesis, this problem has been mitigated by regularizing the models by introducing bounds to the hyperparameter values of the GP prior. Firstly, the range of the values of length-scale parameters has been restricted to values compatible with the spacing of the sampled time points. Secondly, application-dependent variance models have been developed to infer the uncertainty levels on the observations, which have then been incorporated into the GP models as lower bounds for the noise variance. Regularizing the GP models by setting realistic bounds to their hyperparameters makes the GP models more robust against the uncertainty in the data without increasing the complexity of the models, and thus makes the method applicable to large genome-wide studies. The publications included in this thesis suggest a number of techniques for modelling the variance in RNA-seq and Pool-seq applications, which are the HTS techniques specifically designed to sequence RNA transcripts and pooled DNA sequences, respectively. Variance models utilize the information obtained through pre-processing stages of the data depending on, for example, the number of replicates or varying sequencing depth levels. Performance evaluation of the GP models under different experiment settings indicates that the variance incorporation into the GP models can yield a higher average precision than the naive application of GP modelling. Motivated by results, an open-source software package, GPrank, has been implemented in R in order to enable researchers to easily apply the proposed GP-based method in their own HTS time series data sets for detecting temporally most active genetic features. - Humans as Information Sources in Bayesian Optimization
School of Science | Doctoral dissertation (article-based)(2024) Mikkola, PetrusHumans are at the heart of the current computational revolution, not only as end-users, but also as integral contributors to computational systems such as machine learning (ML) solutions. This is because these systems depend on data that mainly originate from human activities, such as textual content, artistic creations, or transcribed audio clips. This data is not the only human-derived information flowing into the process, as human expertise plays an important role at all stages of ML development. This thesis reviews methodologies for expert knowledge elicitation, and delves into a promising approach to harnessing humans as a source of information, which is based on the following two ideas. The first idea is to assume the existence of a latent "intuition function" that describes an expert's knowledge over the problem of interest. The intuition function can only be accessed through queries that allow for human feedback, such as preferential queries. Learning the intuition function presents a tractable machine learning problem that can be approached through Gaussian process learning with a probabilistic user model on how the expert data is generated. The second idea pertains to how queries should be selected for an expert and how the expert's knowledge should be applied to the problem of interest. Multi-fidelity Bayesian optimization (MFBO) is a global optimization approach that incorporates multiple information sources with differing levels of accuracy and cost, accelerating the search for optimal solutions. Treating humans as auxiliary information sources within the MFBO framework effectively tackles issues concerning knowledge integration and sample-efficiency. This thesis addresses three problems that arise when humans serve as information sources in Bayesian optimization: (i) the requirement for natural human interaction, (ii) the inherent unreliability of human input, and (iii) the high cost associated with human labor. The articles included in the thesis present novel algorithms as viable solutions to the problems (i), (ii), and (iii). Specifically, we identify problem (ii) as an issue of negative transfer, and we provide an algorithm that establishes theoretical bounds on the negative transfer gap. - Interaction Detection with Probabilistic Deep Learning for Genetics
School of Science | Doctoral dissertation (article-based)(2023) Cui, TianyuDeep learning is an important machine learning tool in genetics because of its ability to model nonlinear relations between genotypes and phenotypes, such as genetic interactions, without any assumptions about the forms of relations. However, current deep learning approaches are restricted in genetics applications by (i) the lack of well-calibrated uncertainty estimation about the model and (ii) limited accessible individual-level data for model training. This thesis aims to design principled approaches to tackle the shortcomings of deep learning with two relevant statistical genetics applications: gene-gene interaction detection and genotype-phenotype prediction. First, we focus on interaction detection with deep learning. We provide calibrated uncertainty estimations to interaction detection in deep learning with Bayesian principles, which are used to control statistical errors, e.g., false positive rate and false negative rate, of detected interactions. In genetic interaction detection applications, we design a novel neural network architecture to increase the power of detecting complex gene-gene interactions by learning gene representations that aggregate information from all SNPs (single-nucleotide polymorphisms) of the genes being analyzed and considering complex interaction forms between them beyond only the currently considered multiplicative interactions. Moreover, we propose a new permutation procedure that gives calibrated null distributions of genetic interactions from the neural network. Second, we study deep learning models in the low-data regime. We improve deep learning prediction by incorporating domain knowledge with informative priors. Specifically, we design informative Gaussian scale mixture priors that explicitly encode prior beliefs about feature sparsity and data signal-to-noise ratio into deep learning models, which improve their accuracy on regression tasks, such as genotype-phenotype prediction, especially when only a small training set is available. Moreover, we study how to understand better the working mechanism of low-data deep learning models that share knowledge from multiple similar domains, such as transfer learning, with representation similarity. We find that current representation similarities of deep learning models on multiple domains give counter-intuitive conclusions about their functional similarities due to the confounding effect of the input data structure. Therefore, we introduce a deconfounding step to adjust for the confounder, which improves the consistency of representation similarities w.r.t. functional similarities of models. - Interactive Knowledge Elicitation for Decision-Support Models in Precision Medicine
School of Science | Doctoral dissertation (article-based)(2023) Sundin, IirisThis thesis develops human-in-the-loop machine learning methods that aim at improving the performance of a machine learning model in precision medicine tasks. Many problems in precision medicine are still difficult for machine learning due to lack of data, and human experts' knowledge can provide a valuable source of information to reduce a model's prediction error and uncertainty. Such expert knowledge elicitation requires methods that address the following problems: How to leverage indirect expert knowledge instead of querying labels as in active learning, how to make the interaction less laborious to the expert than in traditional prior elicitation, and how to select the interaction so that it is the most beneficial to the prospective task of the model. The first contribution of the thesis is to develop an interactive knowledge elicitation method for "small n large p" problems where data is insufficient, that allows even a small amount of sequentially chosen noisy, indirect feedback from an expert to complement the data and improve the accuracy of the model's predictions. The effectiveness of the method is evaluated in a user-study. The method is further extended to a high-dimensional genomics prediction task where we demonstrate, for the first time, how sequentially selected domain expert's feedback improves personalized prediction of the cancer cell's sensitivity to drugs. The second main contribution of the thesis is to introduce two goal-oriented data acquisition strategies that aim at selecting queries that are maximally useful for a prospective task where the model is to be used: First, targeted Bayesian optimal experimental design to increase the accuracy of a single personalized prediction, and second, active learning that takes the down-the-line decision-making task into account by modeling the probability of a wrong decision. The last part of this thesis applies human-in-the-loop methods to a new, promising and yet unexplored application domain of de novo molecular design. The last contribution is how the goal of molecule generation can be inferred via human-in-the-loop interaction, to make an adaptive objective function to a reinforcement learning algorithm, so that the resulting system generates more molecules that match the user's goal. - Machine learning methods for improving drug response prediction in cancer
School of Science | Doctoral dissertation (article-based)(2017) Ammad-ud-din, MuhammadPersonalizing medicine, by choosing therapies that maximize effectiveness and minimize side effects for individual patients, is one of the prime challenges in cancer treatment. At the core of personalized medicine is a machine learning problem: Given a set of patients whose response to some drugs has been observed, predict the response of a new patient or to a new drug. Computationally predicted responses can then be used to generate hypotheses for selecting therapies tailored to individual patients. However, the prediction task is exceedingly challenging, raising the need for the development of new machine learning methods. This thesis undertakes a unique multi-disciplinary approach to predict drug responses by utilizing multiple data sources in cancer, while simultaneously advancing the computational methods to improve accuracy. Specifically, the thesis presents a new Bayesian multi-view multi-task method that outperformed existing computational models in an international crowdsourcing challenge to predict drug responses. The method is further extended to solve the more challenging task of predicting drug responses in multiple cancer types. Notably, the thesis extends the kernelized Bayesian matrix factorization method with component-wise multiple kernel learning for effectively inferring associations between a large number of biologically motivated data sources and the latent factors. The results demonstrate that the new formulation of the method, supplemented with prior biological knowledge, is helpful for discovering interpretable associations as well as for predicting the drug responses of new cancer cells. The original contribution of this thesis is two-fold: First, the thesis proposes novel multi-view and multi-task methods to predict drug responses in cancer cells with increased accuracy. Second, new ways of incorporating prior biological knowledge are explored to further improve drug response predictions. Open source implementations of the new methods have been released to facilitate further research. - Machine Learning Methods for Interactive Search Interfaces and Cognitive Models
School of Science | Doctoral dissertation (article-based)(2018) Kangasrääsiö, AnttiComputer systems that users interact with are becoming more and more driven by artificial intelligence and machine learning components. This means that the ability of the users to efficiently interact with these intelligent systems on one hand, and the ability of these intelligent systems to understand the users on the other hand, are becoming more and more important for productive human-computer interaction. This thesis proposes new methods to improve both of these aspects. The first contribution of this thesis is to improve the ability of the users to predict the consequences of their actions, and to observe possible inconsistencies in the feedback they give, when interacting with an information retrieval system that performs interactive user modelling. The proposed solutions for improving predictability are interactive visualization of the consequences of user actions and changing the behavior of the user model to better match user expectations. The proposed solutions for detecting inconsistencies in user feedback are visualization of past user feedback and interactive modelling of the accuracy of the feedback. Experiments demonstrate that the proposed methods improve user satisfaction and the usability of the search system. The second contribution is to develop generally applicable methods for inferring the parameter values for various types of models of the user's cognition. The inherent difficulty in estimating these parameter values is caused by the complicated relation between the parameters of these cognitive models and the observation data: the likelihood function. The proposed solution is to use likelihood-free Bayesian inference, which is applicable for various different cognitive models and also able to quantify the uncertainty of the parameter estimates. Experiments demonstrate that the proposed solution enables efficient inference of cognitive model parameter values in multiple settings, and also allows informative quantification of parameter uncertainty across the parameter space. - Methods for probabilistic modeling of knowledge elicitation for improving machine learning predictions
School of Science | Doctoral dissertation (article-based)(2020) Afrabandpey, HomayunMany applications of supervised machine learning consist of training data with a large number of features and small sample size. Constructing models with reliable predictive performance in such applications is challenging. To alleviate these challenges, either more samples are required, which could be very difficult or even impossible in some applications to obtain, or additional sources of information are required to regularize models. One of the additional sources of information is the domain expert, however, extracting knowledge from a human expert can itself be difficult; it will require some computer systems that experts could effectively and effortlessly interact with. This thesis proposes novel knowledge elicitation approaches, to improve the predictive performance of statistical models. The first contribution of this thesis is to develop methods that incorporate different types of knowledge on features extracted from domain expert, into the construction of the machine learning model. Several solutions are proposed for knowledge elicitation, including interactive visualization of the effect of feedback on features, and active learning. Experiments demonstrate that the proposed methods improve the predictive performance of an underlying model through adoption of limited interaction with the user. The second contribution of the thesis is to develop a new approach to the interpretability of Bayesian predictive models to facilitate the interaction of human users with Bayesian black-box predictive models. The proposed approach separates model specification from model interpretation, via a two-stage decision--theoretical approach: first construct a highly predictive model without compromising accuracy and then optimize the interpretability. Conducted experiments demonstrate that the proposed method constructs models which are more accurate, and yet more interpretable than the alternative practice of incorporation of interpretability constraints into the model specification via prior distribution. - Model-based Multi-agent Reinforcement Learning for AI Assistants
School of Science | Doctoral dissertation (article-based)(2023) Çelikok, Mustafa MertInteraction of humans and AI systems is becoming ubiquitous. Specifically, recent advances in machine learning have allowed AI agents to interactively learn from humans how to perform their tasks. The main focus of this line of research has been to develop AI systems that eventually learn to automate tasks for humans, where the end goal is to remove the human from the loop, even though humans are involved during training. However, this perspective limits the applications of AI systems to cases where full automation is the desired outcome. In this thesis, we focus on settings where an AI agent and a human must collaborate to perform a task, and the end goal of the AI is not to replace human intelligence, but to augment it. AI-assistance for humans involves at least two agents: an AI agent and a human. System designers have no control over the humans, and must develop learning agents that have the capabilities to assist and augment them. To do so, the AI agent must be able to infer the goals, bounds, constraints, and future behaviour of its human partner. In this thesis, we propose a model-based multi-agent reinforcement learning approach, where the AI agent infers a model of its human partner, and uses this model to behave in a way that is maximally helpful for the human.In order to learn a mathematical model of the human from interaction, the AI agent first must have a model space. Since data scarcity is a key problem in human--AI collaboration, defining a model space that is expressive enough to capture human behaviour, yet constrained enough to allow sample-efficient inference is important. Determining the minimal and realistic set of prior assumptions on human behaviour in order to define such model spaces is an open problem. To address this problem, we bring in prior knowledge from cognitive science and behavioural economics, where various mathematical models of human decision-making have been developed. However, incorporating this prior knowledge in multi-agent reinforcement learning is not trivial. We demonstrate that, using the methods developed in this thesis, sufficient statistics of human behaviour can be drawn from these models, and incorporated into multi-agent reinforcement learning. We demonstrate the effectiveness of our approach of incorporating models of human behaviour into multi-agent reinforcement learning in three types of tasks where: (I) The AI must learn the preferences of the human from their feedback to assist them, (II) The AI must teach the human conceptual knowledge to assist them, (III) The AI must infer the cognitive bounds and biases of the human to improve their decisions. In all tasks, our simulated empirical results show that the AI agent can learn to assist the human and improve the human--AI team's performance. Our user study for the case (I) supports the simulated results. We present a theoretical result for case (III) which determines the limits of AI-assistance when the human user disagrees with the AI. - Modelling non-stationary functions with Gaussian processes
School of Science | Doctoral dissertation (article-based)(2019) Remes, SamiGaussian processes (GP's) are a central piece of non-parametric Bayesian methods, which allow placing priors over functions in settings such as classification and regression. The prior is described using a kernel function that encodes a similarity between any two points in the input space, and thus defines the properties of functions that are modelled by the GP. In applying Gaussian processes the choice of the kernel is crucial, and the commonly used standard kernels often offer unsatisfactory performance due to making the assumption of stationarity. This thesis presents approaches in modelling non-stationarity from two different perspectives in Gaussian processes. First, this thesis presents a formulation of a non-stationary spectral mixture kernel for univariate outputs, focusing on modelling the non-stationarity in the input space. The construction is based on the spectral mixture (SM) kernel, which has been derived for stationary functions using the Fourier duality implied by Bochner's theorem. The work done in this thesis extends the SM kernel into the non-stationary case. This is achieved by two complementary approaches, based on replacing the constant frequency parameters by input-dependent functions. The first approach is based on modelling the latent functions describing the frequency surface as Gaussian processes. In the second approach the functions are directly modelled as a neural network, parameters of which are optimized with respect to the variational evidence lower bound (ELBO). Second, this thesis presents a kernel suitable for modelling non-stationary couplings between multiple output variables of interest in the context of multi-task or multi-output GP regression. The construction of the kernel is based on a Hadamard product of two kernels, which model the different aspects of dependencies between the outputs. The part of the kernel modelling the input-dependent couplings is based on a generalized Wishart process, which is a stochastic process on time-varying positive-definite matrices, in this case describing the changing dependencies between the outputs. The proposed Hadamard product kernel is applied in a latent factor model to enrich the latent variable prior distribution, that is, to model correlations within the latent variables explicitly. This results in the latent correlation Gaussian process model (LCGP). This thesis additionally considers novel, flexible models for classification of multi-view data, specifically one based on a mixture of group factor analyzers (GFA). The model has a close relationship to the LCGP that builds a classifier in the latent variable space, while the classifier in the GFA mixture is based on the mixture assignments. GFA also allows modelling dependencies between groups of variables, which is not done by the LCGP. Applying Gaussian processes and adapting the proposed multi-output kernel would make the multi-view model even more general. The methods introduced in this thesis now allow modelling non-stationary functions in Gaussian processes in a flexible way. The proposed kernels can be applied very generally, and the approaches introduced to derive them can also be applied to derive other types of non-stationary kernels. - Probabilistic user modelling methods for improving human-in-the-loop machine learning for prediction
School of Science | Doctoral dissertation (article-based)(2021) Daee, PedramIn many machine learning applications and in particular those with only few training data, human involvement in the form of data provider or expert of the task is crucial. However, human interaction with a machine learning model is constrained by (i) the interaction channels, i.e., how human knowledge can be applied in the model, and (ii) the interaction budget, i.e., how much the user is willing to interact with the model. This thesis presents new methods to improve these constraints in human-in-the-loop machine learning. The core idea of the thesis is to jointly model the available data with a model of the human user, i.e., the user model, in a unified probabilistic model and then perform sequential probabilistic inference on the joint model to design improved interaction. The thesis contributes on two types of prediction tasks. The first task is expert knowledge elicitation for high-dimensional prediction. Experts in a field usually have information beyond training data which can help to improve the prediction performance. User models, as priors and likelihood functions, are proposed to directly connect expert knowledge about the relevance of parameters to a model responsible for prediction. The user model can account for complex user behaviour such as users updating their knowledge during the interaction. Furthermore, sequential experimental design on the joint model is employed to query the most informative expert knowledge earlier to minimize the amount of interaction. The second task is personalized recommendation where the goal is to predict the most relevant item for a user with as few interactions as possible. The interactions are based on user relevance feedback on the recommendations. The thesis proposes user models that are able to receive and integrate feedback on multiple domains and sources by providing a joint probabilistic model connecting all feedback types. Sequential inference on the joint model, using Thompson sampling, was employed to find the targeted recommendation with minimum interaction. Simulated experiments and user studies in both tasks demonstrate improved prediction performance only after few interactions with the users. The research highlights the benefits of joint probabilistic modelling of the user and prediction model in interactive tasks. - Real-time and sample-efficient learning of computationally rational user models
School of Science | Doctoral dissertation (article-based)(2024) Keurulainen, AnttiTo effectively collaborate with humans, Artificial Intelligence (AI) systems must understand human behavior and the factors influencing it, including their goals, preferences, and abilities. Interactions with humans are typically costly, and in many real-life situations, AI must adapt to human behavior after only a few interactions. Additionally, when AI interacts with humans to learn about their behavior, the interactions need to be conducted without any noticeable delay for the human, which in turn necessitates adaptation in real-time. This thesis investigates how an AI system can learn about other agents in a sample-efficient and real-time manner, using methods based on reinforcement learning. The first contribution of this thesis is a method for learning representations of goal-driven agents' behaviors with neural networks from incomplete observations, showing that they can be used for improving performance in cooperative decision-making tasks. The second contribution concerns the creation of an automated method for producing task distributions and related ground truth data for training a meta-learner to assess the skill level and adapt quickly to the behavior of a cooperating partner. The third contribution presents a novel method for designing informative experiments for estimating the parameters of simulation-based user models without closed-form likelihood functions, and which models are grounded in cognitive science. This method simultaneously amortizes the estimation of these parameters and the designing of experiments. These contributions cover a wide range of settings where useful representations of behavior for improving cooperation are learned, along with the efficient learning of complex user models. The implications of the methods developed, as well as their strengths and limitations, are discussed. - Steps Forward in Approximate Computational Inference
School of Science | Doctoral dissertation (article-based)(2019) Lintusaari, JarnoThis thesis deals with approximate computational inference, particularly with a relatively recent approach in it known as approximate Bayesian computation (ABC). ABC deals with simulator-based models whose likelihood function is intractable. To overcome the intractability of the likelihood, ABC uses simulations from the model and a principled approximation of the posterior that is traditionally defined via a distance function and a threshold. I represent the ABC approximation as an approximation of the underlying likelihood function of the simulator-based model. This interpretation provides an intuitive way of understanding the approximation in ABC. I also consider the bias and Monte Carlo error in ABC, and demonstrate that better results can be acquired with a proper approximation than with a corresponding exact method in a given computational time. I further propose using an approximation of the likelihood function in investigating the reliability of ABC inferences. This approach reveals identifiability issues with a well-known disease transmission model for tuberculosis. A new transmission model is proposed that resolves these issues by more closely modelling the epidemiological process of tuberculosis. Updated estimates of the epidemiological parameters are then provided together with an estimate of the underlying infectious population that is better aligned with the epidemiological knowledge of the disease. Apart from ABC, I consider modelling computational inference problems with graphs, and how the graph representations can be used in the algorithmic level. The graph representations are used in learning Bayesian networks with more granular dependency structures. Finally, graphs are used for effective modelling of the ABC procedure and streamlining many aspects of the inference in a new open-source software called ELFI. In addition to graph-based modelling, ELFI provides distributed parallelization, data re-use and many other practical features for performing ABC inferences. - Strengthening nonparametric Bayesian methods with structured kernels
School of Science | Doctoral dissertation (article-based)(2022) Shen, ZheyangThis thesis covers an assortment of topics at the intersection of Bayesian nonparametrics and kernel machines: that is, to propose more efficient, kernel-based solutions to nonparametric Bayesian machine learning tasks. In chronological order, we provide summaries for the 4 publications on 3 interconnected topics: (i) expressive and nonstationary covariance kernels for Gaussian processes (GPs); (ii) scalable approximate inference of GP models via pseudo-inputs; (iii) Bayesian sampling of un-normalized target distributions via the simulation of interacting particle systems (IPSs). GPs are flexible priors on functions, which inform the hypothesis spaces of infinitely wide neural networks. However, to fully exploit their tractable uncertainty measures, careful selection of flexible covariance kernels is required for pattern discovery. Highly parametrized, stationary kernels have been proposed for handling extrapolations in GPs, but the translation invariance implied by stationarity caps their expressiveness. We propose nonstationary generalizations of such expressiveness kernels, both in parametric and nonparametric forms, and explore the implications of those kernels with respect to their spectral properties. Another restrictive aspect of GP models lies upon the cumbersome cubic scaling in their inference. We can draw upon a smaller set of pseudo-inputs, or inducing points, to obtain a sparse and more scalable approximate posteriors. Myriad studies of sparse GPs have established a separation of model parameters, which can either be optimized or inferred, and variational parameters which only requires optimization and no priors. The inducing point locations, however, exist somewhat outside this dichotomy, but the common practice is to simply find point estimates via optimization. We demonstrate that a fully Bayesian treatment of inducing inputs is equally valid in sparse GPs, and leads to a more flexible inferential framework with measurable practical benefits. Lastly, we turn to the sampling of un-normalized densities, a ubiquitous task in Bayesian inference. Apart from Markov Chain Monte Carlo (MCMC) sampling, we can also draw samples by deterministically transporting a set of interacting particles, i.e., the simulation of IPSs. Despite their ostensible differences in mechanism, a duality exists between the subtypes of the two sampling regimes, namely Langevin diffusion (LD) and Stein variational gradient descent (SVGD), where SVGD can be seen as a "kernelized" counterpart of LD. We demonstrate that kernelized, deterministic approximations exist for all diffusion-based MCMCs, which we denote as MCMC dynamics. Drawing upon this extended duality, we obtain deterministic samplers that emulate the behavior of other MCMC diffusion processes.