Mirage: estimation of ancestral gene-copy numbers by considering different evolutionary patterns among gene families

Bioinform Adv. 2021 Jul 30;1(1):vbab014. doi: 10.1093/bioadv/vbab014. eCollection 2021.

Abstract

Motivation: Reconstruction of gene copy number evolution is an essential approach for understanding how complex biological systems have been organized. Although various models have been proposed for gene copy number evolution, existing evolutionary models have not appropriately addressed the fact that different gene families can have very different gene gain/loss rates.

Results: In this study, we developed Mirage (MIxtuRe model for Ancestral Genome Estimation), which allows different gene families to have flexible gene gain/loss rates. Mirage can use three models for formulating heterogeneous evolution among gene families: the discretized Γ model, probability distribution-free model and pattern mixture (PM) model. Simulation analysis showed that Mirage can accurately estimate heterogeneous gene gain/loss rates and reconstruct gene-content evolutionary history. Application to empirical datasets demonstrated that the PM model fits genome data from various taxonomic groups better than the other heterogeneous models. Using Mirage, we revealed that metabolic function-related gene families displayed frequent gene gains and losses in all taxa investigated.

Availability and implementation: The source code of Mirage is freely available at https://github.com/fukunagatsu/Mirage.

Supplementary information: Supplementary data are available at Bioinformatics Advances online.