The difficulty of avoiding false positives in genome scans for natural selection

Genome Res. 2009 May;19(5):922-33. doi: 10.1101/gr.086512.108.

Abstract

Several studies have found evidence for more positive selection on the chimpanzee lineage compared with the human lineage since the two species split. A potential concern, however, is that these findings may simply reflect artifacts of the data: inaccuracies in the underlying chimpanzee genome sequence, which is of lower quality than human. To test this hypothesis, we generated de novo genome assemblies of chimpanzee and macaque and aligned them with human. We also implemented a novel bioinformatic procedure for producing alignments of closely related species that uses synteny information to remove misassembled and misaligned regions, and sequence quality scores to remove nucleotides that are less reliable. We applied this procedure to re-examine 59 genes recently identified as candidates for positive selection in chimpanzees. The great majority of these signals disappear after application of our new bioinformatic procedure. We also carried out laboratory-based resequencing of 10 of the regions in multiple chimpanzees and humans, and found that our alignments were correct wherever there was a conflict with the published results. These findings throw into question previous findings that there has been more positive selection in chimpanzees than in humans since the two species diverged. Our study also highlights the challenges of searching the extreme tails of distributions for signals of natural selection. Inaccuracies in the genome sequence at even a tiny fraction of genes can produce false-positive signals, which make it difficult to identify loci that have genuinely been targets of selection.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Base Sequence
  • Evolution, Molecular
  • Genome*
  • Genomics
  • Humans
  • Molecular Sequence Data
  • Pan troglodytes / genetics
  • Selection, Genetic*
  • Sequence Alignment
  • Sequence Analysis, DNA*
  • Synteny

Associated data

  • GENBANK/FJ821202
  • GENBANK/FJ821203
  • GENBANK/FJ821204
  • GENBANK/FJ821205
  • GENBANK/FJ821206
  • GENBANK/FJ821207
  • GENBANK/FJ821208
  • GENBANK/FJ821209
  • GENBANK/FJ821210
  • GENBANK/FJ821211
  • GENBANK/FJ821212
  • GENBANK/FJ821213
  • GENBANK/FJ821214
  • GENBANK/FJ821215
  • GENBANK/FJ821216
  • GENBANK/FJ821217
  • GENBANK/FJ821218
  • GENBANK/FJ821219
  • GENBANK/FJ821220
  • GENBANK/FJ821221
  • GENBANK/FJ821222
  • GENBANK/FJ821223
  • GENBANK/FJ821224
  • GENBANK/FJ821225
  • GENBANK/FJ821226
  • GENBANK/FJ821227
  • GENBANK/FJ821228
  • GENBANK/FJ821229
  • GENBANK/FJ821230
  • GENBANK/FJ821231
  • GENBANK/FJ821232
  • GENBANK/FJ821233
  • GENBANK/FJ821234
  • GENBANK/FJ821235
  • GENBANK/FJ821236
  • GENBANK/FJ821237
  • GENBANK/FJ821238
  • GENBANK/FJ821239
  • GENBANK/FJ821240
  • GENBANK/FJ821241
  • GENBANK/FJ821242
  • GENBANK/FJ821243
  • GENBANK/FJ821244
  • GENBANK/FJ821245
  • GENBANK/FJ821246
  • GENBANK/FJ821247
  • GENBANK/FJ821248
  • GENBANK/FJ821249
  • GENBANK/FJ821250
  • GENBANK/FJ821251
  • GENBANK/FJ821252
  • GENBANK/FJ821253
  • GENBANK/FJ821254
  • GENBANK/FJ821255
  • GENBANK/FJ821256
  • GENBANK/FJ821257
  • GENBANK/FJ821258
  • GENBANK/FJ821259
  • GENBANK/FJ821260
  • GENBANK/FJ821261
  • GENBANK/FJ821262
  • GENBANK/FJ821263
  • GENBANK/FJ821264
  • GENBANK/FJ821265
  • GENBANK/FJ821266
  • GENBANK/FJ821267
  • GENBANK/FJ821268
  • GENBANK/FJ821269
  • GENBANK/FJ821270
  • GENBANK/FJ821271
  • GENBANK/FJ821272
  • GENBANK/FJ821273
  • GENBANK/FJ821274
  • GENBANK/FJ821275
  • GENBANK/FJ821276
  • GENBANK/FJ821277
  • GENBANK/FJ821278
  • GENBANK/FJ821279
  • GENBANK/FJ821280
  • GENBANK/FJ821281
  • GENBANK/FJ821282
  • GENBANK/FJ821283
  • GENBANK/FJ821284
  • GENBANK/FJ821285
  • GENBANK/FJ821286
  • GENBANK/FJ821287
  • GENBANK/FJ821288