Integrating whole genome sequencing, methylation, gene expression, topological associated domain information in regulatory mutation prediction: A study of follicular lymphoma

Comput Struct Biotechnol J. 2022 Mar 23:20:1726-1742. doi: 10.1016/j.csbj.2022.03.023. eCollection 2022.

Abstract

A major challenge in human genetics is of the analysis of the interplay between genetic and epigenetic factors in a multifactorial disease like cancer. Here, a novel methodology is proposed to investigate genome-wide regulatory mechanisms in cancer, as studied with the example of follicular Lymphoma (FL). In a first phase, a new machine-learning method is designed to identify Differentially Methylated Regions (DMRs) by computing six attributes. In a second phase, an integrative data analysis method is developed to study regulatory mutations in FL, by considering differential methylation information together with DNA sequence variation, differential gene expression, 3D organization of genome (e.g., topologically associated domains), and enriched biological pathways. Resulting mutation block-gene pairs are further ranked to find out the significant ones. By this approach, BCL2 and BCL6 were identified as top-ranking FL-related genes with several mutation blocks and DMRs acting on their regulatory regions. Two additional genes, CDCA4 and CTSO, were also found in top rank with significant DNA sequence variation and differential methylation in neighboring areas, pointing towards their potential use as biomarkers for FL. This work combines both genomic and epigenomic information to investigate genome-wide gene regulatory mechanisms in cancer and contribute to devising novel treatment strategies.

Keywords: 3D chromatin domain; Cancer; Epigenome; Genome; Integrative data analysis; Machine learning; Regulatory mutation; T-distributed stochastic neighbor embedding, t-SNE; differentially expressed gene, DEG; differentially methylated region, DMR; follicular lymphoma, FL; group mean difference, GMD; principal component analysis, PCA; single nucleotide variation, SNV; topologically associated domain, TAD.