Examining population stratification via individual ancestry estimates versus self-reported race

Cancer Epidemiol Biomarkers Prev. 2005 Jun;14(6):1545-51. doi: 10.1158/1055-9965.EPI-04-0832.

Abstract

Population stratification has the potential to affect the results of genetic marker studies. Estimating individual ancestry provides a continuous measure to assess population structure in case-control studies of complex disease, instead of using self-reported racial groups. We estimate individual ancestry using the Federal Bureau of Investigation CODIS Core short tandem repeat set of 13 loci using two different analysis methods in a case-control study of early-onset lung cancer. Individual ancestry proportions were estimated for "European" and "West African" groups using published allele frequencies. The majority of Caucasian, non-Hispanics had >50% European ancestry, whereas the majority of African Americans had <20% European ancestry, regardless of ancestry estimation method, although significant overlap by self-reported race and ancestry also existed. When we further investigated the effect of ancestry and self-reported race on the frequency of a lung cancer risk genotype, we found that the frequency of the GSTM1 null genotype varies by individual European ancestry and case-control status within self-reported race (particularly for African Americans). Genetic risk models showed that adjusting for individual European ancestry provided a better fit to the data compared with the model with no group adjustment or adjustment for self-reported race. This study suggests that significant population substructure differences exist that self-reported race alone does not capture and that individual ancestry may be confounded with disease status and/or a candidate gene risk genotype.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Black People / genetics*
  • Case-Control Studies
  • Genotype
  • Glutathione Transferase / genetics
  • Humans
  • Lung Neoplasms / genetics*
  • Models, Theoretical*
  • Pedigree
  • Reproducibility of Results
  • Risk Assessment
  • Tandem Repeat Sequences*
  • White People / genetics*

Substances

  • Glutathione Transferase
  • glutathione S-transferase M1