Fast and accurate relatedness estimation from high-throughput sequencing data in the presence of inbreeding

Gigascience. 2019 May 1;8(5):giz034. doi: 10.1093/gigascience/giz034.

Abstract

Background: The estimation of relatedness between pairs of possibly inbred individuals from high-throughput sequencing (HTS) data has previously not been possible for samples where we cannot obtain reliable genotype calls, as in the case of low-coverage data.

Results: We introduce ngsRelateV2, a major revision of ngsRelateV1, a program that originally allowed for estimation of relatedness from HTS data among non-inbred individuals only. The new revised version takes into account the possibility of individuals being inbred by estimating the 9 condensed Jacquard coefficients along with various other relatedness statistics. The program is threaded and scales linearly with the number of cores allocated to the process.

Conclusion: The program is available as an open source C/C++ program under the GPL license and hosted at https://github.com/ANGSD/ngsRelate. To facilitate easy analysis, the program is able to work directly on the most commonly used container formats for raw sequence (BAM/CRAM) and summary data (VCF/BCF).

Keywords: Jacquard coefficients; genotype likelihood; high-throughput sequencing data; inbreeding; next-generation sequencing; population genetics; relatedness estimation; threading.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Genetics, Population*
  • Genotype
  • Genotyping Techniques*
  • High-Throughput Nucleotide Sequencing*
  • Humans
  • Inbreeding*
  • Polymorphism, Single Nucleotide / genetics
  • Sequence Analysis, DNA
  • Software