Modeling of inter-sample variation in flow cytometric data with the joint clustering and matching procedure

Cytometry A. 2016 Jan;89(1):30-43. doi: 10.1002/cyto.a.22789. Epub 2015 Oct 22.

Abstract

We present an algorithm for modeling flow cytometry data in the presence of large inter-sample variation. Large-scale cytometry datasets often exhibit some within-class variation due to technical effects such as instrumental differences and variations in data acquisition, as well as subtle biological heterogeneity within the class of samples. Failure to account for such variations in the model may lead to inaccurate matching of populations across a batch of samples and poor performance in classification of unlabeled samples. In this paper, we describe the Joint Clustering and Matching (JCM) procedure for simultaneous segmentation and alignment of cell populations across multiple samples. Under the JCM framework, a multivariate mixture distribution is used to model the distribution of the expressions of a fixed set of markers for each cell in a sample such that the components in the mixture model may correspond to the various populations of cells, which have similar expressions of markers (that is, clusters), in the composition of the sample. For each class of samples, an overall class template is formed by the adoption of random-effects terms to model the inter-sample variation within a class. The construction of a parametric template for each class allows for direct quantification of the differences between the template and each sample, and also between each pair of samples, both within or between classes. The classification of a new unclassified sample is then undertaken by assigning the unclassified sample to the class that minimizes the distance between its fitted mixture density and each class density as provided by the class templates. For illustration, we use a symmetric form of the Kullback-Leibler divergence as a distance measure between two densities, but other distance measures can also be applied. We show and demonstrate on four real datasets how the JCM procedure can be used to carry out the tasks of automated clustering and alignment of cell populations, and supervised classification of samples.

Keywords: EM algorithm; JCM; class template; classification; clustering; flow cytometry; inter-sample variation; matching; skew mixture models.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Biomarkers / blood*
  • Cluster Analysis
  • Computational Biology / methods*
  • Data Interpretation, Statistical
  • Electronic Data Processing / methods*
  • Flow Cytometry / methods*
  • Humans
  • Leukemia, Myeloid, Acute / diagnosis
  • Lymphoma, Follicular / diagnosis
  • Membrane Proteins / analysis*
  • Models, Theoretical
  • Pattern Recognition, Automated / methods*
  • West Nile Fever / diagnosis

Substances

  • Biomarkers
  • Membrane Proteins