Pathfinder: Parallel quasi-Newton variational inference

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorZhang, Luen_US
dc.contributor.authorCarpenter, Boben_US
dc.contributor.authorGelman, Andrewen_US
dc.contributor.authorVehtari, Akien_US
dc.contributor.departmentDepartment of Computer Scienceen
dc.contributor.groupauthorComputer Science Professorsen
dc.contributor.groupauthorComputer Science - Artificial Intelligence and Machine Learning (AIML)en
dc.contributor.groupauthorProbabilistic Machine Learningen
dc.contributor.groupauthorHelsinki Institute for Information Technology (HIIT)en
dc.contributor.groupauthorProfessorship Vehtari Akien
dc.contributor.organizationUniversity of Southern Californiaen_US
dc.contributor.organizationFlatiron Instituteen_US
dc.contributor.organizationColumbia Universityen_US
dc.date.accessioned2022-12-14T10:16:20Z
dc.date.available2022-12-14T10:16:20Z
dc.date.issued2022en_US
dc.description.abstractWe propose Pathfinder, a variational method for approximately sampling from differentiable probability densities. Starting from a random initialization, Pathfinder locates normal approximations to the target density along a quasi-Newton optimization path, with local covariance estimated using the inverse Hessian estimates produced by the optimizer. Pathfinder returns draws from the approximation with the lowest estimated Kullback-Leibler (KL) divergence to the target distribution. We evaluate Pathfinder on a wide range of posterior distributions, demonstrating that its approximate draws are better than those from automatic differentiation variational inference (ADVI) and comparable to those produced by short chains of dynamic Hamiltonian Monte Carlo (HMC), as measured by 1-Wasserstein distance. Compared to ADVI and short dynamic HMC runs, Pathfinder requires one to two orders of magnitude fewer log density and gradient evaluations, with greater reductions for more challenging posteriors. Importance resampling over multiple runs of Pathfinder improves the diversity of approximate draws, reducing 1-Wasserstein distance further and providing a measure of robustness to optimization failures on plateaus, saddle points, or in minor modes. The Monte Carlo KL divergence estimates are embarrassingly parallelizable in the core Pathfinder algorithm, as are multiple runs in the resampling version, further increasing Pathfinder's speed advantage with multiple cores.en
dc.description.versionPeer revieweden
dc.format.extent49
dc.format.extent1-49
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationZhang, L, Carpenter, B, Gelman, A & Vehtari, A 2022, ' Pathfinder: Parallel quasi-Newton variational inference ', Journal of Machine Learning Research, vol. 23, pp. 1-49 . < https://www.jmlr.org/papers/volume23/21-0889/21-0889.pdf >en
dc.identifier.issn1532-4435
dc.identifier.issn1533-7928
dc.identifier.otherPURE UUID: 52999c12-0d19-4efe-a4d3-bc15561263f2en_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/52999c12-0d19-4efe-a4d3-bc15561263f2en_US
dc.identifier.otherPURE LINK: https://www.jmlr.org/papers/volume23/21-0889/21-0889.pdfen_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/94601507/SCI_Zhang_etal_JMLR_2022.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/118143
dc.identifier.urnURN:NBN:fi:aalto-202212146883
dc.language.isoenen
dc.publisherMICROTOME PUBL
dc.relation.ispartofseriesJournal of Machine Learning Researchen
dc.relation.ispartofseriesVolume 23en
dc.rightsopenAccessen
dc.titlePathfinder: Parallel quasi-Newton variational inferenceen
dc.typeA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessäfi
dc.type.versionpublishedVersion

Files