Genome-wide patterns of non-coding and protein-coding sequence variation in the major fungal pathogen Aspergillus fumigatus

G3 (Bethesda). 2024 May 2:jkae091. doi: 10.1093/g3journal/jkae091. Online ahead of print.

Abstract

A. fumigatus is a deadly fungal pathogen, responsible for >400,000 infections/year and high mortality rates. A. fumigatus strains exhibit variation in infection-relevant traits, including in their virulence. However, most A. fumigatus protein-coding genes, including those that modulate its virulence, are shared between A. fumigatus strains and closely related non-pathogenic relatives. We hypothesized that A. fumigatus genes exhibit substantial genetic variation in the non-coding regions immediately upstream to the start codons of genes, which could reflect differences in gene regulation between strains. To begin testing this hypothesis, we identified 5,812 single-copy orthologs across the genomes of 263 A. fumigatus strains. In general, A. fumigatus non-coding regions showed higher levels of sequence variation compared to their corresponding protein-coding regions. Focusing on 2,482 genes whose protein-coding sequence identity scores ranged between 75% and 99%, we identified 478 total genes with signatures of positive selection only in their non-coding regions and 65 total genes with signatures only in their protein-coding regions. 28 of the 478 non-coding regions and 5 of the 65 protein-coding regions under selection are associated with genes known to modulate A. fumigatus virulence. Non-coding region variation between A. fumigatus strains included single nucleotide polymorphisms and insertions or deletions of at least a few nucleotides. These results show that non-coding regions of A. fumigatus genes harbor greater sequence variation than protein-coding regions, raising the hypothesis that this variation may contribute to A. fumigatus phenotypic heterogeneity.

Keywords: Aspergillus fumigatus; divergence; evolution; fungal genomics; non-coding region; pathobiology; polymorphism; selection; strain; strain heterogeneity; virulence factor.