Pathfinder: Parallel quasi-Newton variational inference
dc.contributor | Aalto-yliopisto | fi |
dc.contributor | Aalto University | en |
dc.contributor.author | Zhang, Lu | en_US |
dc.contributor.author | Carpenter, Bob | en_US |
dc.contributor.author | Gelman, Andrew | en_US |
dc.contributor.author | Vehtari, Aki | en_US |
dc.contributor.department | Department of Computer Science | en |
dc.contributor.groupauthor | Computer Science Professors | en |
dc.contributor.groupauthor | Computer Science - Artificial Intelligence and Machine Learning (AIML) | en |
dc.contributor.groupauthor | Probabilistic Machine Learning | en |
dc.contributor.groupauthor | Helsinki Institute for Information Technology (HIIT) | en |
dc.contributor.groupauthor | Professorship Vehtari Aki | en |
dc.contributor.organization | University of Southern California | en_US |
dc.contributor.organization | Flatiron Institute | en_US |
dc.contributor.organization | Columbia University | en_US |
dc.date.accessioned | 2022-12-14T10:16:20Z | |
dc.date.available | 2022-12-14T10:16:20Z | |
dc.date.issued | 2022 | en_US |
dc.description.abstract | We propose Pathfinder, a variational method for approximately sampling from differentiable probability densities. Starting from a random initialization, Pathfinder locates normal approximations to the target density along a quasi-Newton optimization path, with local covariance estimated using the inverse Hessian estimates produced by the optimizer. Pathfinder returns draws from the approximation with the lowest estimated Kullback-Leibler (KL) divergence to the target distribution. We evaluate Pathfinder on a wide range of posterior distributions, demonstrating that its approximate draws are better than those from automatic differentiation variational inference (ADVI) and comparable to those produced by short chains of dynamic Hamiltonian Monte Carlo (HMC), as measured by 1-Wasserstein distance. Compared to ADVI and short dynamic HMC runs, Pathfinder requires one to two orders of magnitude fewer log density and gradient evaluations, with greater reductions for more challenging posteriors. Importance resampling over multiple runs of Pathfinder improves the diversity of approximate draws, reducing 1-Wasserstein distance further and providing a measure of robustness to optimization failures on plateaus, saddle points, or in minor modes. The Monte Carlo KL divergence estimates are embarrassingly parallelizable in the core Pathfinder algorithm, as are multiple runs in the resampling version, further increasing Pathfinder's speed advantage with multiple cores. | en |
dc.description.version | Peer reviewed | en |
dc.format.extent | 49 | |
dc.format.extent | 1-49 | |
dc.format.mimetype | application/pdf | en_US |
dc.identifier.citation | Zhang, L, Carpenter, B, Gelman, A & Vehtari, A 2022, ' Pathfinder: Parallel quasi-Newton variational inference ', Journal of Machine Learning Research, vol. 23, pp. 1-49 . < https://www.jmlr.org/papers/volume23/21-0889/21-0889.pdf > | en |
dc.identifier.issn | 1532-4435 | |
dc.identifier.issn | 1533-7928 | |
dc.identifier.other | PURE UUID: 52999c12-0d19-4efe-a4d3-bc15561263f2 | en_US |
dc.identifier.other | PURE ITEMURL: https://research.aalto.fi/en/publications/52999c12-0d19-4efe-a4d3-bc15561263f2 | en_US |
dc.identifier.other | PURE LINK: https://www.jmlr.org/papers/volume23/21-0889/21-0889.pdf | en_US |
dc.identifier.other | PURE FILEURL: https://research.aalto.fi/files/94601507/SCI_Zhang_etal_JMLR_2022.pdf | en_US |
dc.identifier.uri | https://aaltodoc.aalto.fi/handle/123456789/118143 | |
dc.identifier.urn | URN:NBN:fi:aalto-202212146883 | |
dc.language.iso | en | en |
dc.publisher | MICROTOME PUBL | |
dc.relation.ispartofseries | Journal of Machine Learning Research | en |
dc.relation.ispartofseries | Volume 23 | en |
dc.rights | openAccess | en |
dc.title | Pathfinder: Parallel quasi-Newton variational inference | en |
dc.type | A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä | fi |
dc.type.version | publishedVersion |