DOLDA: a regularized supervised topic model for high-dimensional multi-class regression
Loading...
Access rights
openAccess
Journal Title
Journal ISSN
Volume Title
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä
This publication is imported from Aalto University research portal.
View publication in the Research portal
View/Open full text file from the Research portal
Other link related to publication
View publication in the Research portal
View/Open full text file from the Research portal
Other link related to publication
Date
2020-03-01
Department
Major/Subject
Mcode
Degree programme
Language
en
Pages
27
Series
Computational Statistics
Abstract
Generating user interpretable multi-class predictions in data-rich environments with many classes and explanatory covariates is a daunting task. We introduce Diagonal Orthant Latent Dirichlet Allocation (DOLDA), a supervised topic model for multi-class classification that can handle many classes as well as many covariates. To handle many classes we use the recently proposed Diagonal Orthant probit model (Johndrow et al., in: Proceedings of the sixteenth international conference on artificial intelligence and statistics, 2013) together with an efficient Horseshoe prior for variable selection/shrinkage (Carvalho et al. in Biometrika 97:465–480, 2010). We propose a computationally efficient parallel Gibbs sampler for the new model. An important advantage of DOLDA is that learned topics are directly connected to individual classes without the need for a reference class. We evaluate the model’s predictive accuracy and scalability, and demonstrate DOLDA’s advantage in interpreting the generated predictions.Description
Keywords
Diagonal Orthant probit model, Horseshoe prior, Interpretable models, Latent Dirichlet Allocation, Text classification
Other note
Citation
Magnusson, M, Jonsson, L & Villani, M 2020, ' DOLDA : a regularized supervised topic model for high-dimensional multi-class regression ', Computational Statistics, vol. 35, no. 1, pp. 175-201 . https://doi.org/10.1007/s00180-019-00891-1