Modeling of inter-sample variation in flow cytometric data with the joint clustering and matching procedure

Sharon X Lee; Geoffrey J McLachlan; Saumyadipta Pyne

doi:10.1002/cyto.a.22789

Modeling of inter-sample variation in flow cytometric data with the joint clustering and matching procedure

Cytometry A. 2016 Jan;89(1):30-43. doi: 10.1002/cyto.a.22789. Epub 2015 Oct 22.

Authors

Sharon X Lee¹, Geoffrey J McLachlan¹, Saumyadipta Pyne^{2

3}

Affiliations

¹ Department of Mathematics, University of Queensland, St. Lucia, Queensland, 4072, Australia.
² Indian Institute of Public Health Hyderabad (IIPHH), Plot No. 1, A.N.V. Arcade, Amar Co-op Society, Kavuri Hills, Madhapur, Hyderabad, AP, 500033, India.
³ CR Rao Advanced Institute of Mathematics, Statistics and Computer Science, University of Hyderabad Campus, Hyderabad, AP, 500046, India.

PMID: 26492316
DOI: 10.1002/cyto.a.22789

Abstract

We present an algorithm for modeling flow cytometry data in the presence of large inter-sample variation. Large-scale cytometry datasets often exhibit some within-class variation due to technical effects such as instrumental differences and variations in data acquisition, as well as subtle biological heterogeneity within the class of samples. Failure to account for such variations in the model may lead to inaccurate matching of populations across a batch of samples and poor performance in classification of unlabeled samples. In this paper, we describe the Joint Clustering and Matching (JCM) procedure for simultaneous segmentation and alignment of cell populations across multiple samples. Under the JCM framework, a multivariate mixture distribution is used to model the distribution of the expressions of a fixed set of markers for each cell in a sample such that the components in the mixture model may correspond to the various populations of cells, which have similar expressions of markers (that is, clusters), in the composition of the sample. For each class of samples, an overall class template is formed by the adoption of random-effects terms to model the inter-sample variation within a class. The construction of a parametric template for each class allows for direct quantification of the differences between the template and each sample, and also between each pair of samples, both within or between classes. The classification of a new unclassified sample is then undertaken by assigning the unclassified sample to the class that minimizes the distance between its fitted mixture density and each class density as provided by the class templates. For illustration, we use a symmetric form of the Kullback-Leibler divergence as a distance measure between two densities, but other distance measures can also be applied. We show and demonstrate on four real datasets how the JCM procedure can be used to carry out the tasks of automated clustering and alignment of cell populations, and supervised classification of samples.

Keywords: EM algorithm; JCM; class template; classification; clustering; flow cytometry; inter-sample variation; matching; skew mixture models.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Biomarkers / blood*
Cluster Analysis
Computational Biology / methods*
Data Interpretation, Statistical
Electronic Data Processing / methods*
Flow Cytometry / methods*
Humans
Leukemia, Myeloid, Acute / diagnosis
Lymphoma, Follicular / diagnosis
Membrane Proteins / analysis*
Models, Theoretical
Pattern Recognition, Automated / methods*
West Nile Fever / diagnosis

Substances

Biomarkers
Membrane Proteins