G-STRATEGY: Optimal Selection of Individuals for Sequencing in Genetic Association Studies

Miaoyan Wang; Johanna Jakobsdottir; Albert V Smith; Mary Sara McPeek

doi:10.1002/gepi.21982

G-STRATEGY: Optimal Selection of Individuals for Sequencing in Genetic Association Studies

Genet Epidemiol. 2016 Sep;40(6):446-60. doi: 10.1002/gepi.21982. Epub 2016 Jun 3.

Authors

Miaoyan Wang¹, Johanna Jakobsdottir², Albert V Smith^{2

3}, Mary Sara McPeek^{1

4}

Affiliations

¹ Department of Statistics, University of Chicago, Chicago, Illinois, United States of America.
² Icelandic Heart Association, Kopavogur, Iceland.
³ University of Iceland, Reykjavik, Iceland.
⁴ Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America.

Abstract

In a large-scale genetic association study, the number of phenotyped individuals available for sequencing may, in some cases, be greater than the study's sequencing budget will allow. In that case, it can be important to prioritize individuals for sequencing in a way that optimizes power for association with the trait. Suppose a cohort of phenotyped individuals is available, with some subset of them possibly already sequenced, and one wants to choose an additional fixed-size subset of individuals to sequence in such a way that the power to detect association is maximized. When the phenotyped sample includes related individuals, power for association can be gained by including partial information, such as phenotype data of ungenotyped relatives, in the analysis, and this should be taken into account when assessing whom to sequence. We propose G-STRATEGY, which uses simulated annealing to choose a subset of individuals for sequencing that maximizes the expected power for association. In simulations, G-STRATEGY performs extremely well for a range of complex disease models and outperforms other strategies with, in many cases, relative power increases of 20-40% over the next best strategy, while maintaining correct type 1 error. G-STRATEGY is computationally feasible even for large datasets and complex pedigrees. We apply G-STRATEGY to data on high-density lipoprotein and low-density lipoprotein from the AGES-Reykjavik and REFINE-Reykjavik studies, in which G-STRATEGY is able to closely approximate the power of sequencing the full sample by selecting for sequencing a only small subset of the individuals.

Keywords: association mapping; family data; selective genotyping; sequence; simulated annealing.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Genetic Association Studies*
Genotype
Humans
Polymorphism, Single Nucleotide
Quantitative Trait Loci
Software*

Grants and funding

R01 HG001645/HG/NHGRI NIH HHS/United States