ConsensusCluster: a software tool for unsupervised cluster discovery in numerical data

OMICS. 2010 Feb;14(1):109-13. doi: 10.1089/omi.2009.0083.

Abstract

We have created a stand-alone software tool, ConsensusCluster, for the analysis of high-dimensional single nucleotide polymorphism (SNP) and gene expression microarray data. Our software implements the consensus clustering algorithm and principal component analysis to stratify the data into a given number of robust clusters. The robustness is achieved by combining clustering results from data and sample resampling as well as by averaging over various algorithms and parameter settings to achieve accurate, stable clustering results. We have implemented several different clustering algorithms in the software, including K-Means, Partition Around Medoids, Self-Organizing Map, and Hierarchical clustering methods. After clustering the data, ConsensusCluster generates a consensus matrix heatmap to give a useful visual representation of cluster membership, and automatically generates a log of selected features that distinguish each pair of clusters. ConsensusCluster gives more robust and more reliable clusters than common software packages and, therefore, is a powerful unsupervised learning tool that finds hidden patterns in data that might shed light on its biological interpretation. This software is free and available from http://code.google.com/p/consensus-cluster .

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Cluster Analysis*
  • Principal Component Analysis
  • Software