Visualization and unsupervised predictive clustering of high-dimensional multimodal neuroimaging data

J Neurosci Methods. 2014 Oct 30:236:19-25. doi: 10.1016/j.jneumeth.2014.08.001. Epub 2014 Aug 10.

Abstract

Background: Neuroimaging machine learning studies have largely utilized supervised algorithms - meaning they require both neuroimaging scan data and corresponding target variables (e.g. healthy vs. diseased) to be successfully 'trained' for a prediction task. Noticeably, this approach may not be optimal or possible when the global structure of the data is not well known and the researcher does not have an a priori model to fit the data.

New method: We set out to investigate the utility of an unsupervised machine learning technique; t-distributed stochastic neighbour embedding (t-SNE) in identifying 'unseen' sample population patterns that may exist in high-dimensional neuroimaging data. Multimodal neuroimaging scans from 92 healthy subjects were pre-processed using atlas-based methods, integrated and input into the t-SNE algorithm. Patterns and clusters discovered by the algorithm were visualized using a 2D scatter plot and further analyzed using the K-means clustering algorithm.

Comparison with existing methods: t-SNE was evaluated against classical principal component analysis.

Conclusion: Remarkably, based on unlabelled multimodal scan data, t-SNE separated study subjects into two very distinct clusters which corresponded to subjects' gender labels (cluster silhouette index value=0.79). The resulting clusters were used to develop an unsupervised minimum distance clustering model which identified 93.5% of subjects' gender. Notably, from a neuropsychiatric perspective this method may allow discovery of data-driven disease phenotypes or sub-types of treatment responders.

Keywords: Big data; Multimodal neuroimaging; Research domain criteria (RDoC); Unsupervised machine learning; t-Distributed stochastic neighbour embedding (t-SNE).

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adolescent
  • Adult
  • Algorithms
  • Artificial Intelligence*
  • Brain / anatomy & histology
  • Brain / physiology
  • Cluster Analysis
  • Female
  • Humans
  • Image Processing, Computer-Assisted / methods*
  • Magnetic Resonance Imaging / methods*
  • Male
  • Middle Aged
  • Multimodal Imaging / methods*
  • Neuroimaging / methods*
  • Principal Component Analysis
  • Sex Characteristics
  • Stochastic Processes
  • Young Adult