Efficient Gene Tree Correction Guided by Genome Evolution

PLoS One. 2016 Aug 11;11(8):e0159559. doi: 10.1371/journal.pone.0159559. eCollection 2016.

Abstract

Motivations: Gene trees inferred solely from multiple alignments of homologous sequences often contain weakly supported and uncertain branches. Information for their full resolution may lie in the dependency between gene families and their genomic context. Integrative methods, using species tree information in addition to sequence information, often rely on a computationally intensive tree space search which forecloses an application to large genomic databases.

Results: We propose a new method, called ProfileNJ, that takes a gene tree with statistical supports on its branches, and corrects its weakly supported parts by using a combination of information from a species tree and a distance matrix. Its low running time enabled us to use it on the whole Ensembl Compara database, for which we propose an alternative, arguably more plausible set of gene trees. This allowed us to perform a genome-wide analysis of duplication and loss patterns on the history of 63 eukaryote species, and predict ancestral gene content and order for all ancestors along the phylogeny.

Availability: A web interface called RefineTree, including ProfileNJ as well as a other gene tree correction methods, which we also test on the Ensembl gene families, is available at: http://www-ens.iro.umontreal.ca/~adbit/polytomysolver.html. The code of ProfileNJ as well as the set of gene trees corrected by ProfileNJ from Ensembl Compara version 73 families are also made available.

MeSH terms

  • Algorithms*
  • Animals
  • Computational Biology / methods*
  • Evolution, Molecular*
  • Genes / genetics*
  • Genome / genetics*
  • Humans
  • Phylogeny*
  • Sequence Analysis, DNA

Grants and funding

MS, LG, BB and ET were supported by the French Agence Nationale de la Recherche (ANR) Grant ANR-10-BINF-01-01 “Ancestrome”. EN, ML, JS and NEM were supported by the Natural Sciences and Engineering Research Council of Canada (NSERC), and the “Fonds de recherche du Québec Nature et technologies” (FRQNT) of Quebec. Computations were made on the supercomputer “Briarée” from Université de Montréal, managed by Calcul Québec and Compute Canada.