ClusterSignificance: a bioconductor package facilitating statistical analysis of class cluster separations in dimensionality reduced data

Bioinformatics. 2017 Oct 1;33(19):3126-3128. doi: 10.1093/bioinformatics/btx393.

Abstract

Summary: Multi-dimensional data generated via high-throughput experiments is increasingly used in conjunction with dimensionality reduction methods to ascertain if resulting separations of the data correspond with known classes. This is particularly useful to determine if a subset of the variables, e.g. genes in a specific pathway, alone can separate samples into these established classes. Despite this, the evaluation of class separations is often subjective and performed via visualization. Here we present the ClusterSignificance package; a set of tools designed to assess the statistical significance of class separations downstream of dimensionality reduction algorithms. In addition, we demonstrate the design and utility of the ClusterSignificance package and utilize it to determine the importance of long non-coding RNA expression in the identity of multiple hematological malignancies.

Availability and implementation: ClusterSignificance is an R package available via Bioconductor (https://bioconductor.org/packages/ClusterSignificance) under GPL-3.

Contact: dan.grander@ki.se.

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Algorithms
  • Cluster Analysis
  • Data Interpretation, Statistical
  • Gene Expression Profiling / methods*
  • Hematologic Neoplasms / genetics
  • Hematologic Neoplasms / metabolism
  • Humans
  • RNA, Long Noncoding / metabolism
  • Software*

Substances

  • RNA, Long Noncoding