Prioritizing natural-selection signals from the deep-sequencing genomic data suggests multi-variant adaptation in Tibetan highlanders

Natl Sci Rev. 2019 Nov;6(6):1201-1222. doi: 10.1093/nsr/nwz108. Epub 2019 Aug 7.

Abstract

Human genetic adaptation to high altitudes (>2500 m) has been extensively studied over the last few years, but few functional adaptive genetic variants have been identified, largely owing to the lack of deep-genome sequencing data available to previous studies. Here, we build a list of putative adaptive variants, including 63 missense, 7 loss-of-function, 1,298 evolutionarily conserved variants and 509 expression quantitative traits loci. Notably, the top signal of selection is located in TMEM247, a transmembrane protein-coding gene. The Tibetan version of TMEM247 harbors one high-frequency (76.3%) missense variant, rs116983452 (c.248C > T; p.Ala83Val), with the T allele derived from archaic ancestry and carried by >94% of Tibetans but absent or in low frequencies (<3%) in non-Tibetan populations. The rs116983452-T is strongly and positively correlated with altitude and significantly associated with reduced hemoglobin concentration (p = 5.78 × 10-5), red blood cell count (p = 5.72 × 10-7) and hematocrit (p = 2.57 × 10-6). In particular, TMEM247-rs116983452 shows greater effect size and better predicts the phenotypic outcome than any EPAS1 variants in association with adaptive traits in Tibetans. Modeling the interaction between TMEM247-rs116983452 and EPAS1 variants indicates weak but statistically significant epistatic effects. Our results support that multiple variants may jointly deliver the fitness of the Tibetans on the plateau, where a complex model is needed to elucidate the adaptive evolution mechanism.

Keywords: Tibetan; adaptive genetic variant; archaic ancestry; expression quantitative traits loci (eQTL); hemoglobin concentration; high-altitude adaptation; hypoxia; next-generation sequencing (NGS); tissue-specific expression.