Calibrated prediction intervals for polygenic scores across diverse contexts

Kangcheng Hou; Ziqi Xu; Yi Ding; Arbel Harpak; Bogdan Pasaniuc

doi:10.1101/2023.07.24.23293056

Calibrated prediction intervals for polygenic scores across diverse contexts

medRxiv [Preprint]. 2023 Jul 27:2023.07.24.23293056. doi: 10.1101/2023.07.24.23293056.

Authors

Kangcheng Hou¹, Ziqi Xu², Yi Ding¹, Arbel Harpak^{3

4}, Bogdan Pasaniuc^{1

5

6

7}

Affiliations

¹ Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA.
² Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA.
³ Department of Population Health, The University of Texas at Austin, Austin, TX, USA.
⁴ Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA.
⁵ Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA.
⁶ Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA.
⁷ Institute for Precision Health, University of California, Los Angeles, Los Angeles.

Abstract

Polygenic scores (PGS) have emerged as the tool of choice for genomic prediction in a wide range of fields from agriculture to personalized medicine. We analyze data from two large biobanks in the US (All of Us) and the UK (UK Biobank) to find widespread variability in PGS performance across contexts. Many contexts, including age, sex, and income, impact PGS accuracies with similar magnitudes as genetic ancestry. PGSs trained in single versus multi-ancestry cohorts show similar context-specificity in their accuracies. We introduce trait prediction intervals that are allowed to vary across contexts as a principled approach to account for context-specific PGS accuracy in genomic prediction. We model the impact of all contexts in a joint framework to enable PGS-based trait predictions that are well-calibrated (contain the trait value with 90% probability in all contexts), whereas methods that ignore context are mis-calibrated. We show that prediction intervals need to be adjusted for all considered traits ranging from 10% for diastolic blood pressure to 80% for waist circumference. Adjustment of prediction intervals depends on the dataset; for example, prediction intervals for education years need to be adjusted by 90% in All of Us versus 8% in UK Biobank. Our results provide a path forward towards utilization of PGS as a prediction tool across all individuals regardless of their contexts while highlighting the importance of comprehensive profile of context information in study design and data collection.

Publication types

Preprint

Abstract

Publication types

Grants and funding