Evaluation of polygenic scoring methods in five biobanks shows larger variation between biobanks than methods and finds benefits of ensemble learning
No Thumbnail Available
Access rights
embargoedAccess
URL
Journal Title
Journal ISSN
Volume Title
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
Other link related to publication (opens in new window)
View publication in the Research portal (opens in new window)
Other link related to publication (opens in new window)
Date
2024-07-11
Department
Major/Subject
Mcode
Degree programme
Language
en
Pages
17
Series
American Journal of Human Genetics, Volume 111, issue 7, pp. 1431-1447
Abstract
Methods of estimating polygenic scores (PGSs) from genome-wide association studies are increasingly utilized. However, independent method evaluation is lacking, and method comparisons are often limited. Here, we evaluate polygenic scores derived via seven methods in five biobank studies (totaling about 1.2 million participants) across 16 diseases and quantitative traits, building on a reference-standardized framework. We conducted meta-analyses to quantify the effects of method choice, hyperparameter tuning, method ensembling, and the target biobank on PGS performance. We found that no single method consistently outperformed all others. PGS effect sizes were more variable between biobanks than between methods within biobanks when methods were well tuned. Differences between methods were largest for the two investigated autoimmune diseases, seropositive rheumatoid arthritis and type 1 diabetes. For most methods, cross-validation was more reliable for tuning hyperparameters than automatic tuning (without the use of target data). For a given target phenotype, elastic net models combining PGS across methods (ensemble PGS) tuned in the UK Biobank provided consistent, high, and cross-biobank transferable performance, increasing PGS effect sizes (β coefficients) by a median of 5.0% relative to LDpred2 and MegaPRS (the two best-performing single methods when tuned with cross-validation). Our interactively browsable online-results and open-source workflow prspipe provide a rich resource and reference for the analysis of polygenic scoring methods across biobanks.Description
Publisher Copyright: © 2024 American Society of Human Genetics
Keywords
autoimmune diseases, biobank studies, cross-biobank analysis, ensemble learning, genetic risk, genetic variability, genome-wide association studies, GWAS, method evaluation, PGS, phenotype prediction, polygenic scores
Other note
Citation
Monti, R, Eick, L, Hudjashov, G, Läll, K, Kanoni, S, Wolford, B N, Wingfield, B, Pain, O, Wharrie, S, Jermy, B, McMahon, A, Hartonen, T, Heyne, H, Mars, N, Lambert, S, Hveem, K, Inouye, M, van Heel, D A, Mägi, R, Marttinen, P, Ripatti, S, Ganna, A, Lippert, C & Genes and Health Research Team 2024, ' Evaluation of polygenic scoring methods in five biobanks shows larger variation between biobanks than methods and finds benefits of ensemble learning ', American Journal of Human Genetics, vol. 111, no. 7, pp. 1431-1447 . https://doi.org/10.1016/j.ajhg.2024.06.003