DOLDA: a regularized supervised topic model for high-dimensional multi-class regression

Loading...
Thumbnail Image

Access rights

openAccess
publishedVersion

URL

Journal Title

Journal ISSN

Volume Title

A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä

Major/Subject

Mcode

Degree programme

Language

en

Pages

27

Series

Computational Statistics, Volume 35, issue 1, pp. 175-201

Abstract

Generating user interpretable multi-class predictions in data-rich environments with many classes and explanatory covariates is a daunting task. We introduce Diagonal Orthant Latent Dirichlet Allocation (DOLDA), a supervised topic model for multi-class classification that can handle many classes as well as many covariates. To handle many classes we use the recently proposed Diagonal Orthant probit model (Johndrow et al., in: Proceedings of the sixteenth international conference on artificial intelligence and statistics, 2013) together with an efficient Horseshoe prior for variable selection/shrinkage (Carvalho et al. in Biometrika 97:465–480, 2010). We propose a computationally efficient parallel Gibbs sampler for the new model. An important advantage of DOLDA is that learned topics are directly connected to individual classes without the need for a reference class. We evaluate the model’s predictive accuracy and scalability, and demonstrate DOLDA’s advantage in interpreting the generated predictions.

Description

Other note

Citation

Magnusson, M, Jonsson, L & Villani, M 2020, 'DOLDA : a regularized supervised topic model for high-dimensional multi-class regression', Computational Statistics, vol. 35, no. 1, pp. 175-201. https://doi.org/10.1007/s00180-019-00891-1