FastMix: a versatile data integration pipeline for cell type-specific biomarker inference

Bioinformatics. 2022 Oct 14;38(20):4735-4744. doi: 10.1093/bioinformatics/btac585.

Abstract

Motivation: Flow cytometry (FCM) and transcription profiling are the two widely used assays in translational immunology research. However, there is no data integration pipeline for analyzing these two types of assays together with experiment variables for biomarker inference. Current FCM data analysis mainly relies on subjective manual gating analysis, which is difficult to be directly integrated with other automated computational methods. Existing deconvolutional analysis of bulk transcriptomics relies on predefined marker genes in the transcriptomics data, which are unavailable for novel cell types and does not utilize the FCM data that provide canonical phenotypic definitions of the cell types.

Results: We developed a novel analytics pipeline-FastMix-for computational immunology, which integrates flow cytometry, bulk transcriptomics and clinical covariates for identifying cell type-specific gene expression signatures and biomarker genes. FastMix addresses the 'large p, small n' problem in the gene expression and flow cytometry integration analysis via a linear mixed effects model (LMER) for both cross-sectional and longitudinal studies. Its novel moment-based estimator not only reduces bias in parameter estimation but also is more efficient than iterative optimization. The FastMix pipeline also includes a cutting-edge flow cytometry data analysis method-DAFi-for identifying cell populations of interest and their characteristics. Simulation studies showed that FastMix produced smaller type I/II errors than competing methods. Validation using real data of two vaccine studies showed that FastMix identified a consistent set of signature genes as in independent single-cell RNA-seq analysis, producing additional interesting findings.

Availability and implementation: Source code of FastMix is publicly available at https://github.com/terrysun0302/FastMix.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biomarkers
  • Cross-Sectional Studies
  • Data Analysis
  • Software*
  • Transcriptome*

Substances

  • Biomarkers