An overview of clustering applied to molecular biology

Methods Mol Biol. 2010:620:369-404. doi: 10.1007/978-1-60761-580-4_12.

Abstract

In molecular biology, we are often interested in determining the group structure in, e.g., a population of cells or microarray gene expression data. Clustering methods identify groups of similar observations, but the results can depend on the chosen method's assumptions and starting parameter values. In this chapter, we give a broad overview of both attribute- and similarity-based clustering, describing both the methods and their performance. The parametric and nonparametric approaches presented vary in whether or not they require knowing the number of clusters in advance as well as the shapes of the estimated clusters. Additionally, we include a biclustering algorithm that incorporates variable selection into the clustering procedure. We finish with a discussion of some common methods for comparing two clustering solutions (possibly from different methods). The user is advised to devote time and attention to determining the appropriate clustering approach (and any corresponding parameter values) for the specific application prior to analysis.

Publication types

  • Review

MeSH terms

  • Algorithms
  • Cluster Analysis
  • Models, Statistical
  • Molecular Biology / methods*