### Browsing by Department "Columbia University"

Now showing 1 - 12 of 12

###### Results Per Page

###### Sort Options

Item Beta bubbles(2018-06) Jylhä, Petri; Suominen, Matti; Tomunen, Tuomas; Department of Finance; Columbia UniversityWe show that an increase in a stock’s breadth of institutional ownership or turnover is followed by a significant, but temporary, increase in its CAPM beta estimate and a decrease in its CAPM alpha. The increasing effect of breadth of ownership on beta estimates is mainly driven by short-term investors. These transitory trading-activity-driven components of beta estimates contribute to the empirical failure of the CAPM and the large returns to long-short portfolios that bet against beta. Relations between ownership breadth, turnover, and betas, which we document, help explain the puzzling fact that, on average, betas increase after seasoned equity offerings and stock splits and decrease after stock repurchases.Item Dynamic Sampling and Selective Masking for Communication-Efficient Federated Learning(IEEE, 2022) Ji, Shaoxiong; Jiang, Wenqi; Walid, Anwar; Li, Xue; Department of Computer Science; Columbia University; Nokia Bell Labs USA; University of QueenslandFederated learning (FL) is a novel machine learning setting that enables on-device intelligence via decentralized training and federated optimization. Deep neural networks' rapid development facilitates the learning techniques for modeling complex problems and emerges into federated deep learning under the federated setting. However, the tremendous amount of model parameters burdens the communication network with a high load of transportation. This article introduces two approaches for improving communication efficiency by dynamic sampling and top-k selective masking. The former controls the fraction of selected client models dynamically, while the latter selects parameters with top-k largest values of difference for federated updating. Experiments on convolutional image classification and recurrent language modeling are conducted on three public datasets to show our proposed methods' effectiveness.Item Expectation Propagation as a Way of Life(MICROTOME PUBL, 2020) Vehtari, Aki; Gelman, Andrew; Sivula, Tuomas; Jylanki, Pasi; Tran, Dustin; Sahai, Swupnil; Blomstedt, Paul; Cunningham, John P.; Schiminovich, David; Robert, Christian P.; Probabilistic Machine Learning; Columbia University; Centre of Excellence in Computational Inference, COIN; Department of Computer ScienceA common divide-and-conquer approach for Bayesian computation with big data is to partition the data, perform local inference for each piece separately, and combine the results to obtain a global posterior approximation. While being conceptually and computationally appealing, this method involves the problematic need to also split the prior for the local inferences; these weakened priors may not provide enough regularization for each separate computation, thus eliminating one of the key advantages of Bayesian methods. To resolve this dilemma while still retaining the generalizability of the underlying local inference method, we apply the idea of expectation propagation (EP) as a framework for distributed Bayesian inference. The central idea is to iteratively update approximations to the local likelihoods given the state of the other approximations and the prior. The present paper has two roles: we review the steps that are needed to keep EP algorithms numerically stable, and we suggest a general approach, inspired by EP, for approaching data partitioning problems in a way that achieves the computational benefits of parallelism while allowing each local update to make use of relevant information from the other sites. In addition, we demonstrate how the method can be applied in a hierarchical context to make use of partitioning of both data and parameters. The paper describes a general algorithmic framework, rather than a specific algorithm, and presents an example implementation for it.Item Fast Methods for Posterior Inference of Two-Group Normal-Normal Models(International Society for Bayesian Analysis, 2023-09) Greengard, Philip; Hoskins, Jeremy; Margossian, Charles C.; Gabry, Jonah; Gelman, Andrew; Vehtari, Aki; Columbia University; University of Chicago; Computer Science Professors; Department of Computer ScienceWe describe a class of algorithms for evaluating posterior moments of certain Bayesian linear regression models with a normal likelihood and a normal prior on the regression coefficients. The proposed methods can be used for hierarchical mixed effects models with partial pooling over one group of predictors, as well as random effects models with partial pooling over two groups of predictors. We demonstrate the performance of the methods on two applications, one involving U.S. opinion polls and one involving the modeling of COVID-19 outbreaks in Israel using survey data. The algorithms involve analytical marginalization of regression coefficients followed by numerical integration of the remaining low-dimensional density. The dominant cost of the algorithms is an eigendecomposition computed once for each value of the outside parameter of integration. Our approach drastically reduces run times compared to state-of-the-art Markov chain Monte Carlo (MCMC) algorithms. The latter, in addition to being computationally expensive, can also be difficult to tune when applied to hierarchical models.Item An importance sampling approach for reliable and efficient inference in Bayesian ordinary differential equation models(John Wiley & Sons, 2023-09-18) Timonen, Juho; Siccha, Nikolas; Bales, Ben; Lähdesmäki, Harri; Vehtari, Aki; Department of Computer Science; Columbia University; Computer Science Professors; Department of Computer ScienceStatistical models can involve implicitly defined quantities, such as solutions to nonlinear ordinary differential equations (ODEs), that unavoidably need to be numerically approximated in order to evaluate the model. The approximation error inherently biases statistical inference results, but the amount of this bias is generally unknown and often ignored in Bayesian parameter inference. We propose a computationally efficient method for verifying the reliability of posterior inference for such models, when the inference is performed using Markov chain Monte Carlo methods. We validate the efficiency and reliability of our workflow in experiments using simulated and real data and different ODE solvers. We highlight problems that arise with commonly used adaptive ODE solvers and propose robust and effective alternatives, which, accompanied by our workflow, can now be taken into use without losing reliability of the inferences.Item The Open Brain Consent(Wiley, 2021-05) Bannier, Elise; Barker, Gareth; Borghesani, Valentina; Broeckx, Nils; Clement, Patricia; Emblem, Kyrre E.; Ghosh, Satrajit; Glerean, Enrico; Gorgolewski, Krzysztof J.; Havu, Marko; Halchenko, Yaroslav O.; Herholz, Peer; Hespel, Anne; Heunis, Stephan; Hu, Yue; Hu, Chuan Peng; Huijser, Dorien; de la Iglesia Vayá, María; Jancalek, Radim; Katsaros, Vasileios K.; Kieseler, Marie Luise; Maumet, Camille; Moreau, Clara A.; Mutsaerts, Henk Jan; Oostenveld, Robert; Ozturk-Isik, Esin; Pascual Leone Espinosa, Nicolas; Pellman, John; Pernet, Cyril R.; Pizzini, Francesca Benedetta; Trbalić, Amira Šerifović; Toussaint, Paule Joanne; Visconti di Oleggio Castello, Matteo; Wang, Fengjuan; Wang, Cheng; Zhu, Hua; CHU de Rennes; King's College London; University of California San Francisco; University of Antwerp; Ghent University; University of Oslo; Harvard Medical School; School services,SCI; Stanford University; Department of Neuroscience and Biomedical Engineering; Dartmouth College; McGill University; Eindhoven University of Technology; Heinrich Heine University Düsseldorf; Nanjing Normal University; Erasmus University Rotterdam; Centro de Investigacion Principe Felipe; Masaryk University; National and Kapodistrian University of Athens; Centre National de la Recherche Scientifique (CNRS); Institut Pasteur; University of Amsterdam; Radboud University Nijmegen; Bogazici University; Columbia University; University of Edinburgh; University of Verona; University of Tuzla; University of California Berkeley; Nanyang Technological University; Fujian Medical University; Beihang UniversityHaving the means to share research data openly is essential to modern science. For human research, a key aspect in this endeavor is obtaining consent from participants, not just to take part in a study, which is a basic ethical principle, but also to share their data with the scientific community. To ensure that the participants' privacy is respected, national and/or supranational regulations and laws are in place. It is, however, not always clear to researchers what the implications of those are, nor how to comply with them. The Open Brain Consent (https://open-brain-consent.readthedocs.io) is an international initiative that aims to provide researchers in the brain imaging community with information about data sharing options and tools. We present here a short history of this project and its latest developments, and share pointers to consent forms, including a template consent form that is compliant with the EU general data protection regulation. We also share pointers to an associated data user agreement that is not only useful in the EU context, but also for any researchers dealing with personal (clinical) data elsewhere.Item Pathfinder: Parallel quasi-Newton variational inference(MICROTOME PUBL, 2022) Zhang, Lu; Carpenter, Bob; Gelman, Andrew; Vehtari, Aki; University of Southern California; Flatiron Institute; Columbia University; Computer Science Professors; Department of Computer ScienceWe propose Pathfinder, a variational method for approximately sampling from differentiable probability densities. Starting from a random initialization, Pathfinder locates normal approximations to the target density along a quasi-Newton optimization path, with local covariance estimated using the inverse Hessian estimates produced by the optimizer. Pathfinder returns draws from the approximation with the lowest estimated Kullback-Leibler (KL) divergence to the target distribution. We evaluate Pathfinder on a wide range of posterior distributions, demonstrating that its approximate draws are better than those from automatic differentiation variational inference (ADVI) and comparable to those produced by short chains of dynamic Hamiltonian Monte Carlo (HMC), as measured by 1-Wasserstein distance. Compared to ADVI and short dynamic HMC runs, Pathfinder requires one to two orders of magnitude fewer log density and gradient evaluations, with greater reductions for more challenging posteriors. Importance resamplingover multiple runs of Pathfinder improves the diversity of approximate draws, reducing 1-Wasserstein distance further and providing a measure of robustness to optimization failures on plateaus, saddle points, or in minor modes. The Monte Carlo KL divergence estimates are embarrassingly parallelizable in the core Pathfinder algorithm, as are multiple runs in the resampling version, further increasing Pathfinder's speed advantage with multiple cores.Item Stacking for Non-mixing Bayesian Computations: The Curse and Blessing of Multimodal Posteriors(MICROTOME PUBL, 2022) Yao, Yuling; Vehtari, Aki; Gelman, Andrew; Flatiron Institute; Computer Science Professors; Columbia University; Department of Computer ScienceWhen working with multimodal Bayesian posterior distributions, Markov chain Monte Carlo (MCMC) algorithms have difficulty moving between modes, and default variational or mode-based approximate inferences will understate posterior uncertainty. And, even if the most important modes can be found, it is difficult to evaluate their relative weights in the posterior. Here we propose an approach using parallel runs of MCMC, variational, or mode-based inference to hit as many modes or separated regions as possible and then combine these using Bayesian stacking, a scalable method for constructing a weighted average of distributions. The result from stacking efficiently samples from multimodal posterior distribution, minimizes cross validation prediction error, and represents the posterior uncertainty better than variational inference, but it is not necessarily equivalent, even asymptotically, to fully Bayesian inference. We present theoretical consistency with an example where the stacked inference approximates the true data generating process from the misspecified model and a non-mixing sampler, from which the predictive performance is better than full Bayesian inference, hence the multimodality can be considered a blessing rather than a curse under model misspecification. We demonstrate practical implementation in several model families: latent Dirichlet allocation, Gaussian process regression, hierarchical regression, horseshoe variable selection, and neural networks.Item Two-thirds of global cropland area impacted by climate oscillations(2018-12-01) Heino, Matias; Puma, Michael J.; Ward, Philip J.; Gerten, Dieter; Heck, Vera; Siebert, Stefan; Kummu, Matti; Water and Environmental Eng.; Columbia University; Vrije Universiteit Amsterdam; Potsdam Institute for Climate Impact Research; University of Bonn; Department of Built EnvironmentThe El Niño Southern Oscillation (ENSO) peaked strongly during the boreal winter 2015-2016, leading to food insecurity in many parts of Africa, Asia and Latin America. Besides ENSO, the Indian Ocean Dipole (IOD) and the North Atlantic Oscillation (NAO) are known to impact crop yields worldwide. Here we assess for the first time in a unified framework the relationships between ENSO, IOD and NAO and simulated crop productivity at the sub-country scale. Our findings reveal that during 1961-2010, crop productivity is significantly influenced by at least one large-scale climate oscillation in two-thirds of global cropland area. Besides observing new possible links, especially for NAO in Africa and the Middle East, our analyses confirm several known relationships between crop productivity and these oscillations. Our results improve the understanding of climatological crop productivity drivers, which is essential for enhancing food security in many of the most vulnerable places on the planet.Item Uncertainty Quantification for the Horseshoe (with Discussion) comment(INT SOC BAYESIAN ANALYSIS, 2017-12) Piironen, Juho; Betancourt, Michael; Simpson, Daniel; Vehtari, Aki; Department of Computer Science; Columbia University; University of TorontoItem Using Stacking to Average Bayesian Predictive Distributions (with Discussion)(2018-09) Yao, Yuling; Vehtari, Aki; Simpson, Daniel; Gelman, Andrew; Columbia University; Professorship Vehtari Aki; University of Toronto; Department of Computer ScienceBayesian model averaging is flawed in the M-open setting in which the true data-generating process is not one of the candidate models being fit. We take the idea of stacking from the point estimation literature and generalize to the combination of predictive distributions. We extend the utility function to any proper scoring rule and use Pareto smoothed importance sampling to efficiently compute the required leave-one-out posterior distributions. We compare stacking of predictive distributions to several alternatives: stacking of means, Bayesian model averaging (BMA), Pseudo-BMA, and a variant of Pseudo-BMA that is stabilized using the Bayesian bootstrap. Based on simulations and real-data applications, we recommend stacking of predictive distributions, with bootstrapped-Pseudo-BMA as an approximate alternative when computation cost is an issue.Item Why state-of-the-art deep learning barely works as good as a linear classifier in extreme multi-label text classification(2020) Mohammadnia Qaraei, Mohammadreza; Khandagale, Sujay; Babbar, Rohit; Professorship Babbar Rohit; Columbia University; Department of Computer ScienceExtreme Multi-label Text Classification (XMTC) refers to supervised learning of a classifier which can predict a small subset of relevant labels for a document from an extremely large set. Even though deep learning algorithms have surpassed linear and kernel methods for most natural language processing tasks over the last decade; recent works show that state-of-the-art deep learning methods can only barely manage to work as well as a linear classifier for the XMTC task. The goal of this work is twofold : (i) to investigate the reasons for the comparable performance of these two strands of methods for XMTC, and (ii) to document this observation explicitly, as the efficacy of linear classifiers in this regime, has been ignored in many relevant recent works.