Biobank-scale inference of multi-individual identity by descent and gene conversion

Sharon R Browning; Brian L Browning

doi:10.1016/j.ajhg.2024.02.015

Biobank-scale inference of multi-individual identity by descent and gene conversion

Am J Hum Genet. 2024 Apr 4;111(4):691-700. doi: 10.1016/j.ajhg.2024.02.015. Epub 2024 Mar 20.

Authors

Sharon R Browning¹, Brian L Browning²

Affiliations

¹ Department of Biostatistics, University of Washington, Seattle, WA, USA. Electronic address: sguy@uw.edu.
² Department of Biostatistics, University of Washington, Seattle, WA, USA; Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA. Electronic address: browning@uw.edu.

PMID: 38513668
PMCID: PMC11023918 (available on 2024-10-04)
DOI: 10.1016/j.ajhg.2024.02.015

Abstract

We present a method for efficiently identifying clusters of identical-by-descent haplotypes in biobank-scale sequence data. Our multi-individual approach enables much more computationally efficient inference of identity by descent (IBD) than approaches that infer pairwise IBD segments and provides locus-specific IBD clusters rather than IBD segments. Our method's computation time, memory requirements, and output size scale linearly with the number of individuals in the dataset. We also present a method for using multi-individual IBD to detect alleles changed by gene conversion. Application of our methods to the autosomal sequence data for 125,361 White British individuals in the UK Biobank detects more than 9 million converted alleles. This is 2,900 times more alleles changed by gene conversion than were detected in a previous analysis of familial data. We estimate that more than 250,000 sequenced probands and a much larger number of additional genomes from multi-generational family members would be required to find a similar number of alleles changed by gene conversion using a family-based approach. Our IBD clustering method is implemented in the open-source ibd-cluster software package.

MeSH terms

Biological Specimen Banks*
Chromosomes
Gene Conversion*
Haplotypes / genetics
Humans
Polymorphism, Single Nucleotide
Software

Abstract

MeSH terms

Grants and funding