Biobank-scale inference of multi-individual identity by descent and gene conversion

Am J Hum Genet. 2024 Apr 4;111(4):691-700. doi: 10.1016/j.ajhg.2024.02.015. Epub 2024 Mar 20.

Abstract

We present a method for efficiently identifying clusters of identical-by-descent haplotypes in biobank-scale sequence data. Our multi-individual approach enables much more computationally efficient inference of identity by descent (IBD) than approaches that infer pairwise IBD segments and provides locus-specific IBD clusters rather than IBD segments. Our method's computation time, memory requirements, and output size scale linearly with the number of individuals in the dataset. We also present a method for using multi-individual IBD to detect alleles changed by gene conversion. Application of our methods to the autosomal sequence data for 125,361 White British individuals in the UK Biobank detects more than 9 million converted alleles. This is 2,900 times more alleles changed by gene conversion than were detected in a previous analysis of familial data. We estimate that more than 250,000 sequenced probands and a much larger number of additional genomes from multi-generational family members would be required to find a similar number of alleles changed by gene conversion using a family-based approach. Our IBD clustering method is implemented in the open-source ibd-cluster software package.

MeSH terms

  • Biological Specimen Banks*
  • Chromosomes
  • Gene Conversion*
  • Haplotypes / genetics
  • Humans
  • Polymorphism, Single Nucleotide
  • Software