Browsing by Author "Peltola, Tomi"
Now showing 1 - 18 of 18
Results Per Page
Sort Options
Item Artificial Intelligence for chemical risk assessment(Elsevier, 2020-02) Wittwehr, Clemens; Blomstedt, Paul; Gosling, John Paul; Peltola, Tomi; Raffael, Barbara; Richarz, Andrea Nicole; Sienkiewicz, Marta; Whaley, Paul; Worth, Andrew; Whelan, Maurice; Department of Computer Science; Centre of Excellence in Computational Inference, COIN; European Commission Joint Research Centre; University of Leeds; Lancaster UniversityAs the basis for managing the risks of chemical exposure, the Chemical Risk Assessment (CRA) process can impact a substantial part of the economy, the health of hundreds of millions of people, and the condition of the environment. However, the number of properly assessed chemicals falls short of societal needs due to a lack of experts for evaluation, interference of third party interests, and the sheer volume of potentially relevant information on the chemicals from disparate sources. In order to explore ways in which computational methods may help overcome this discrepancy between the number of chemical risk assessments required on the one hand and the number and adequateness of assessments actually being conducted on the other, the European Commission's Joint Research Centre organised a workshop on Artificial Intelligence for Chemical Risk Assessment (AI4CRA). The workshop identified a number of areas where Artificial Intelligence could potentially increase the number and quality of regulatory risk management decisions based on CRA, involving process simulation, supporting evaluation, identifying problems, facilitating collaboration, finding experts, evidence gathering, systematic review, knowledge discovery, and building cognitive models. Although these are interconnected, they are organised and discussed under two main themes: scientific-technical process and social aspects and the decision making process.Item Bayesian Theory of Mind Models(2019-05-14) Hämäläinen, Alex; Peltola, Tomi; Perustieteiden korkeakoulu; Hyvönen, EeroItem Bayesian Variable Selection in Searching for Additive and Dominant Effects in Genome-Wide Data(2012) Peltola, Tomi; Marttinen, Pekka; Jula, Antti; Salomaa, Veikko; Perola, Markus; Vehtari, Aki; Department of Computer ScienceAlthough complex diseases and traits are thought to have multifactorial genetic basis, the common methods in genome-wide association analyses test each variant for association independent of the others. This computational simplification may lead to reduced power to identify variants with small effect sizes and requires correcting for multiple hypothesis tests with complex relationships. However, advances in computational methods and increase in computational resources are enabling the computation of models that adhere more closely to the theory of multifactorial inheritance. Here, a Bayesian variable selection and model averaging approach is formulated for searching for additive and dominant genetic effects. The approach considers simultaneously all available variants for inclusion as predictors in a linear genotype-phenotype mapping and averages over the uncertainty in the variable selection. This leads to naturally interpretable summary quantities on the significances of the variants and their contribution to the genetic basis of the studied trait. We first characterize the behavior of the approach in simulations. The results indicate a gain in the causal variant identification performance when additive and dominant variation are simulated, with a negligible loss of power in purely additive case. An application to the analysis of high- and low-density lipoprotein cholesterol levels in a dataset of 3895 Finns is then presented, demonstrating the feasibility of the approach at the current scale of single-nucleotide polymorphism data. We describe a Markov chain Monte Carlo algorithm for the computation and give suggestions on the specification of prior parameters using commonly available prior information. An open-source software implementing the method is available at http://www.lce.hut.fi/research/mm/bmagwa/ and https://github.com/to-mi/.Item Brain-to-brain hyperclassification reveals action-specific motor mapping of observed actions in humans(2017-12-11) Smirnov, Dmitry; Lachat, Fanny; Peltola, Tomi; Lahnakoski, Juha; Koistinen, Olli-Pekka; Glerean, Enrico; Vehtari, Aki; Hari, Riitta; Sams, Mikko; Nummenmaa, Lauri; Department of Neuroscience and Biomedical Engineering; Department of Computer Science; Department of Art; Centre of Excellence in Computational Inference, COIN; Professorship Kaski Samuel; Helsinki Institute for Information Technology (HIIT); Probabilistic Machine Learning; Professorship Vehtari AkiSeeing an action may activate the corresponding action motor code in the observer. It remains unresolved whether seeing and performing an action activates similar action-specific motor codes in the observer and the actor. We used novel hyperclassification approach to reveal shared brain activation signatures of action execution and observation in interacting human subjects. In the first experiment, two "actors" performed four types of hand actions while their haemodynamic brain activations were measured with 3-T functional magnetic resonance imaging (fMRI). The actions were videotaped and shown to 15 "observers" during a second fMRI experiment. Eleven observers saw the videos of one actor, and the remaining four observers saw the videos of the other actor. In a control fMRI experiment, one of the actors performed actions with closed eyes, and five new observers viewed these actions. Bayesian canonical correlation analysis was applied to functionally realign observers' and actors' fMRI data. Hyperclassification of the seen actions was performed with Bayesian logistic regression trained on actors' data and tested with observers' data. Without the functional realignment, between-subjects accuracy was at chance level. With the realignment, the accuracy increased on average by 15 percentage points, exceeding both the chance level and the accuracy without functional realignment. The highest accuracies were observed in occipital, parietal and premotor cortices. Hyperclassification exceeded chance level also when the actor did not see her own actions. We conclude that the functional brain activation signatures underlying action execution and observation are partly shared, yet these activation signatures may be anatomically misaligned across individuals.Item A decision-theoretic approach for model interpretability in Bayesian framework(Springer Netherlands, 2020-09-01) Afrabandpey, Homayun; Peltola, Tomi; Piironen, Juho; Vehtari, Aki; Kaski, Samuel; Department of Computer Science; Centre of Excellence in Computational Inference, COIN; Probabilistic Machine Learning; Helsinki Institute for Information Technology (HIIT); Professorship Vehtari Aki; Finnish Center for Artificial Intelligence, FCAI; Professorship Kaski SamuelA salient approach to interpretable machine learning is to restrict modeling to simple models. In the Bayesian framework, this can be pursued by restricting the model structure and prior to favor interpretable models. Fundamentally, however, interpretability is about users’ preferences, not the data generation mechanism; it is more natural to formulate interpretability as a utility function. In this work, we propose an interpretability utility, which explicates the trade-off between explanation fidelity and interpretability in the Bayesian framework. The method consists of two steps. First, a reference model, possibly a black-box Bayesian predictive model which does not compromise accuracy, is fitted to the training data. Second, a proxy model from an interpretable model family that best mimics the predictive behaviour of the reference model is found by optimizing the interpretability utility function. The approach is model agnostic—neither the interpretable model nor the reference model are restricted to a certain class of models—and the optimization problem can be solved using standard tools. Through experiments on real-word data sets, using decision trees as interpretable models and Bayesian additive regression models as reference models, we show that for the same level of interpretability, our approach generates more accurate models than the alternative of restricting the prior. We also propose a systematic way to measure stability of interpretabile models constructed by different interpretability approaches and show that our proposed approach generates more stable models.Item Depression, depressive symptoms and treatments in women who have recently given birth: UK cohort study(2018-10-24) Petersen, Irene; Peltola, Tomi; Kaski, Samuel; Walters, Kate R; Hardoon, Sarah; Department of Computer Science; Centre of Excellence in Computational Inference, COIN; Professorship Kaski Samuel; Helsinki Institute for Information Technology (HIIT); Probabilistic Machine Learning; University College LondonObjectives: To investigate how depression is recognised in the year after child birth and treatment given in clinical practice. Design: Cohort study based on UK primary care electronic health records. Setting: Primary care. Participants: Women who have given live birth between 2000 and 2013. Outcomes: Prevalence of postnatal depression, depression diagnoses, depressive symptoms, antidepressant and non-pharmacological treatment within a year after birth. Results: Of 206 517 women, 23 623 (11%) had a record of depressive diagnosis or symptoms in the year after delivery and more than one in eight women received antidepressant treatment. Recording and treatment peaked 6-8 weeks after delivery. Initiation of selective serotonin reuptake inhibitors (SSRI) treatment has become earlier in the more recent years. Thus, the initiation rate of SSRI treatment per 100 pregnancies (95% CI) at 8 weeks were 2.6 (2.5 to 2.8) in 2000-2004, increasing to 3.0 (2.9 to 3.1) in 2005-2009 and 3.8 (3.6 to 3.9) in 2010-2013. The overall rate of initiation of SSRI within the year after delivery, however, has not changed noticeably. A third of the women had at least one record suggestive of depression at any time prior to delivery and of these one in four received SSRI treatment in the year after delivery. Younger women were most likely to have records of depression and depressive symptoms. (Relative risk for postnatal depression: age 15-19: 1.92 (1.76 to 2.10), age 20-24: 1.49 (1.39 to 1.59) versus age 30-34). The risk of depression, postnatal depression and depressive symptoms increased with increasing social deprivation. Conclusions: More than 1 in 10 women had electronic health records indicating depression diagnoses or depressive symptoms within a year after delivery and more than one in eight women received antidepressant treatment in this period. Women aged below 30 and from the most deprived areas were at highest risk of depression and most likely to receive antidepressant treatment.Item Elicitation of Non-Linearity from Expert Drawing(2019-06-17) Chauhan, Rohan; Peltola, Tomi; Perustieteiden korkeakoulu; Kaski, SamuelMachine learning methods do not perform very well with little data because there is not enough information to learn. The choice is to either obtain more data or elicit knowledge from an expert. Obtaining more data might be infeasible because of the associated cost or required time. In such cases, we opt for expert knowledge elicitation. Current expert knowledge elicitation methods either query the user for data points or regarding the relevance of parameters. However, there is no method which allows expressing the non-linearity intuitively without requiring knowledge of Bayesian statistics. We propose expert knowledge elicitation through drawing where the expert draws the fit through data points. We then combine the observed data and drawing data to select the right kernel for a Gaussian process. We also conduct a user study for testing the usability of the proposed method. We obtain better performance with the proposed model for kernel selection and extrapolation in comparison to the baseline model using only observed data.Item Finite Adaptation and Multistep Moves in the Metropolis-Hastings Algorithm for Variable Selection in Genome-Wide Association Analysis(2012) Peltola, Tomi; Marttinen, Pekka; Vehtari, Aki; Helsinki Insititute for Information Technology HIIT; Tietojenkäsittelytieteen laito; BECS; Department of Computer ScienceHigh-dimensional datasets with large amounts of redundant information are nowadays available for hypothesis-free exploration of scientific questions. A particular case is genome-wide association analysis, where variations in the genome are searched for effects on disease or other traits. Bayesian variable selection has been demonstrated as a possible analysis approach, which can account for the multifactorial nature of the genetic effects in a linear regression model. Yet, the computation presents a challenge and application to large-scale data is not routine. Here, we study aspects of the computation using the Metropolis-Hastings algorithm for the variable selection: finite adaptation of the proposal distributions, multistep moves for changing the inclusion state of multiple variables in a single proposal and multistep move size adaptation. We also experiment with a delayed rejection step for the multistep moves. Results on simulated and real data show increase in the sampling efficiency. We also demonstrate that with application specific proposals, the approach can overcome a specific mixing problem in real data with 3822 individuals and 1,051,811 single nucleotide polymorphisms and uncover a variant pair with synergistic effect on the studied trait. Moreover, we illustrate multimodality in the real dataset related to a restrictive prior distribution on the genetic effect sizes and advocate a more flexible alternative.Item Improving genomics-based predictions for precision medicine through active elicitation of expert knowledge(2018-06-27) Sundin, Iiris; Peltola, Tomi; Micallef, Luana; Afrabandpey, Homayun; Soare, Marta; Majumder, Muntasir Mamun; Daee, Pedram; He, Chen; Serim, Baris; Havulinna, Aki; Heckman, Caroline; Jacucci, Giulio; Marttinen, Pekka; Kaski, Samuel; Department of Computer Science; Probabilistic Machine Learning; Helsinki Institute for Information Technology (HIIT); Professorship Kaski Samuel; Centre of Excellence in Computational Inference, COIN; Institute for Molecular Medicine Finland; University of HelsinkiMotivation Precision medicine requires the ability to predict the efficacies of different treatments for a given individual using high-dimensional genomic measurements. However, identifying predictive features remains a challenge when the sample size is small. Incorporating expert knowledge offers a promising approach to improve predictions, but collecting such knowledge is laborious if the number of candidate features is very large. Results: We introduce a probabilistic framework to incorporate expert feedback about the impact of genomic measurements on the outcome of interest and present a novel approach to collect the feedback efficiently, based on Bayesian experimental design. The new approach outperformed other recent alternatives in two medical applications: prediction of metabolic traits and prediction of sensitivity of cancer cells to different drugs, both using genomic features as predictors. Furthermore, the intelligent approach to collect feedback reduced the workload of the expert to approximately 11%, compared to a baseline approach. Availability and implementation: Source code implementing the introduced computational methods is freely available at https://github.com/AaltoPML/knowledge-elicitation-for-precision-medicine. Supplementary information: Supplementary data are available at Bioinformatics online.Item Knowledge elicitation via sequential probabilistic inference for high-dimensional prediction(2017-07-12) Daee, Pedram; Peltola, Tomi; Soare, Marta; Kaski, Samuel; Department of Computer Science; Probabilistic Machine Learning; Professorship Kaski Samuel; Helsinki Institute for Information Technology (HIIT); Centre of Excellence in Computational Inference, COINPrediction in a small-sized sample with a large number of covariates, the “small n, large p” problem, is challenging. This setting is encountered in multiple applications, such as in precision medicine, where obtaining additional data can be extremely costly or even impossible, and extensive research effort has recently been dedicated to finding principled solutions for accurate prediction. However, a valuable source of additional information, domain experts, has not yet been efficiently exploited. We formulate knowledge elicitation generally as a probabilistic inference process, where expert knowledge is sequentially queried to improve predictions. In the specific case of sparse linear regression, where we assume the expert has knowledge about the relevance of the covariates, or of values of the regression coefficients, we propose an algorithm and computational approximation for fast and efficient interaction, which sequentially identifies the most informative features on which to query expert knowledge. Evaluations of the proposed method in experiments with simulated and real users show improved prediction accuracy already with a small effort from the expert.Item Multivariate exploratory analysis of the habitual diets of patients with type 1 diabetes(2009) Peltola, Tomi; Vehtari, Aki; Mäkinen, Ville-Petteri; Elektroniikan, tietoliikenteen ja automaation tiedekunta; Teknillinen korkeakoulu; Helsinki University of Technology; Kaski, KimmoType 1 diabetes requires life-long treatment with insulin to balance the disrupted physiological control of blood glucose level. Diet plays an essential role in the treatment and in the general health of the patients. Notably, diabetes predisposes to long-term vascular complications. The adverse effects of the western lifestyle, such as overweight and metabolic disorders, occur also frequently in patients with type 1 diabetes. Yet the knowledge of the habitual diets of the patients is scarce. The aim of this study is to explore and describe the habitual diets of patients with type 1 diabetes in a multivariate context, and to examine the applicability of recent exploratory data analysis methods for the data. The data consists of diet questionnaires completed by 1175 patients with type 1 diabetes along with clinical and biochemical data gathered in the nation-wide FinnDiane study. Correlation networks and factor analysis were applied to characterize the linear associations in the data. Several weak associations were identified between the dietary variables. A reliable analysis of associations to the clinical and biochemical variables was hindered by differences in the time periods of the data collection. Regression modelling was applied to uncover variation related to self-reported compliance with dietary guidance. The tendency to choose low salt products was found to be the most prominent feature in dimensionality reduction performed in the feature space of the regression model. A large effect of the scale of the food frequency questions for dimensionality reduction is also highlighted, and a comparison of a selection of dimensionality reduction methods is presented regarding their neighbourhood preservation capabilities. In conclusion, the dietary variables were dominated by weak associations, and no strong patterns in the dietary habits of the patients were found (except the requirements of celiac disease treatment). Some limitations in the application of the statistical methods were identified. Notably, the discrete and skewed distributions of the variables provide challenges. The current visual displays of the dimensionality reduction methods could be improved to detect weak trends.Item Phenotype-driven identification of epithelial signalling clusters(2018-03-05) Marques, Elsa; Peltola, Tomi; Kaski, Samuel; Klefström, Juha; Department of Computer Science; Centre of Excellence in Computational Inference, COIN; Professorship Kaski Samuel; Helsinki Institute for Information Technology (HIIT); Probabilistic Machine Learning; University of HelsinkiIn metazoans, epithelial architecture provides a context that dynamically modulates most if not all epithelial cell responses to intrinsic and extrinsic signals, including growth or survival signalling and transforming oncogene action. Three-dimensional (3D) epithelial culture systems provide tractable models to interrogate the function of human genetic determinants in establishment of context-dependency. We performed an arrayed genetic shRNA screen in mammary epithelial 3D cultures to identify new determinants of epithelial architecture, finding that the key phenotype impacting shRNAs altered not only the data population average but even more noticeably the population distribution. The broad distributions were attributable to sporadic gene silencing actions by shRNA in unselected populations. We employed Maximum Mean Discrepancy concept to capture similar population distribution patterns and demonstrate here the feasibility of the test in identifying an impact of shRNA in populations of 3D structures. Integration of the clustered morphometric data with protein-protein interactions data enabled hypothesis generation of novel biological pathways underlying similar 3D phenotype alterations. The results present a new strategy for 3D phenotype-driven pathway analysis, which is expected to accelerate discovery of context-dependent gene functions in epithelial biology and tumorigenesis.Item Probabilistic Expert Knowledge Elicitation of Feature Relevances in Sparse Linear Regression(Rheinisch-Westfaelische Technische Hochschule Aachen, 2016) Daee, Pedram; Peltola, Tomi; Soare, Marta; Kaski, Samuel; Department of Computer Science; Professorship Kaski Samuel; Helsinki Institute for Information Technology (HIIT); Centre of Excellence in Computational Inference, COINItem Risk-return patterns in merger arbitrage(2006) Peltola, Tomi; Laskentatoimen ja rahoituksen laitos; Department of Accounting and Finance; Kauppakorkeakoulu; School of BusinessItem Solukalvojen rakenne ja toiminta(Teknillinen korkeakoulu, 2006) Peltola, Tomi; Sähkö- ja tietoliikennetekniikan osasto; Turunen, MarkusItem Sparse Bayesian Linear Models: Computational Advances and Applications in Epidemiology(Aalto University, 2014) Peltola, Tomi; Vehtari, Aki, Prof., Aalto University, Department of Biomedical Engineering and Computational Science, Finland; Marttinen, Pekka, Dr., Aalto University, Department of Information and Computer Science, Finland; Lääketieteellisen tekniikan ja laskennallisen tieteen laitos; Department of Biomedical Engineering and Computational Science; Perustieteiden korkeakoulu; School of Science; Lampinen, Jouko, Prof., Aalto University, Department of Biomedical Engineering and Computational Science, FinlandRecent advances in measurement technologies have transformed the landscape of studies in the genetic and metabolic determinants of diseases and other complex traits. DNA and blood samples can be cost- and time-efficiently interrogated for millions of genetic markers and hundreds of circulating metabolites. While the scale and unbiased nature of the characterization of the individual samples creates opportunities for new discoveries, they also pose a challenge for the statistical analysis of the data. One approach for tackling the issues, and a focus of much recent research in statistical methodology, is searching for linear relationships with a sparsity assumption, that is, the presence of only a limited number of practically relevant relationships among the vast number of possibilities. This thesis studies aspects of the statistical modelling and computation with the linearity and sparsity assumptions in the framework of Bayesian data analysis. First, a hierarchical extension of the spike and slab prior distribution for sparse linear regression modelling, to allow additive and dominant effects in genome-wide association analysis, is presented. The model is applied to search for genetic markers related to blood cholesterol levels. A tailored, finitely adaptive Markov chain Monte Carlo algorithm is studied for the computation. Second, an approach for constructing deterministic Gaussian approximations for Bayesian linear latent variable models using the expectation propagation method is described. The main advance is an efficient numerical solution to the moment integrals for bilinear probability factors. Third, a model for the prediction of the risk of adverse cardiovascular events in diabetic individuals using candidate biomarkers is presented. The model is extended hierarchically to include data from non-diabetic individuals. Shrinkage priors and projective covariate selection are applied to identify biomarkers with predictive value. The results of the studies demonstrate benefits from the hierarchical Bayesian modelling. Despite the advances here and generally in the literature, the computation in sparse models and large datasets remains challenging. On the other hand, given the fast pace in the development of deterministic approximation methods, assessing their role in predictive covariate selection would seem timely.Item Theory of Mind Based Models in Human-AI Interaction(2018-12-10) Çelikok, Mustafa; Peltola, Tomi; Perustieteiden korkeakoulu; Kaski, SamuelHumans are social animals. They have goals, they make plans, they collaborate and compete. The richness of human-human interaction is immense. Yet, the way modern AI systems model their interaction with human users does not take these aspects into account. Often times human feedback is modelled as samples from an unknown but fixed probability distribution. These models are not able to capture the active planning aspect of real humans. The underlying motivation of this thesis is that the performance of human-AI collaboration is limited by the parties' ability of modelling each others' minds. In human-human interaction, this ability is called the theory of mind, and it is shown to be a limiting factor in human teams' task performance by cognitive science studies. In order to examine the effects of having theory of mind based user models, we define a multi-armed bandit setting where the system takes into account that the user is able to anticipate the system's behaviour multiple steps ahead, and strategically plan her feedback. We compare the performance of our proposed setting to the standard multi-armed bandit setting where the feedback is assumed to be samples from an unknown probability distribution. Empirical results demonstrate that better reward performance and ranking of arms are achieved when users can behave strategically and the system takes this into account. The results indicate that the performance of human-AI teams increase based on how well the parties can model each other and use their models to plan their interaction.Item User Modelling for Avoiding Overfitting in Interactive Knowledge Elicitation for Prediction(2018-03-08) Daee, Pedram; Peltola, Tomi; Vehtari, Aki; Kaski, Samuel; School services,SCI; Department of Computer Science; Probabilistic Machine Learning; Professorship Kaski Samuel; Helsinki Institute for Information Technology (HIIT); Centre of Excellence in Computational Inference, COIN; Professorship Vehtari AkiIn human-in-the-loop machine learning, the user provides information beyond that in the training data. Many algorithms and user interfaces have been designed to optimize and facilitate this human--machine interaction; however, fewer studies have addressed the potential defects the designs can cause. Effective interaction often requires exposing the user to the training data or its statistics. The design of the system is then critical, as this can lead to double use of data and overfitting, if the user reinforces noisy patterns in the data. We propose a user modelling methodology, by assuming simple rational behaviour, to correct the problem. We show, in a user study with 48 participants, that the method improves predictive performance in a sparse linear regression sentiment analysis task, where graded user knowledge on feature relevance is elicited. We believe that the key idea of inferring user knowledge with probabilistic user models has general applicability in guarding against overfitting and improving interactive machine learning.