Visualization and unsupervised predictive clustering of high-dimensional multimodal neuroimaging data

Benson Mwangi; Jair C Soares; Khader M Hasan

doi:10.1016/j.jneumeth.2014.08.001

Visualization and unsupervised predictive clustering of high-dimensional multimodal neuroimaging data

J Neurosci Methods. 2014 Oct 30:236:19-25. doi: 10.1016/j.jneumeth.2014.08.001. Epub 2014 Aug 10.

Authors

Benson Mwangi¹, Jair C Soares², Khader M Hasan³

Affiliations

¹ UT Center of Excellence on Mood Disorders, Department of Psychiatry and Behavioral Sciences, UT Houston Medical School, Houston, TX, USA. Electronic address: benson.irungu@uth.tmc.edu.
² UT Center of Excellence on Mood Disorders, Department of Psychiatry and Behavioral Sciences, UT Houston Medical School, Houston, TX, USA.
³ The University of Texas Health Science Center at Houston, Department of Diagnostic & Interventional Imaging, Houston, TX, USA.

PMID: 25117552
DOI: 10.1016/j.jneumeth.2014.08.001

Abstract

Background: Neuroimaging machine learning studies have largely utilized supervised algorithms - meaning they require both neuroimaging scan data and corresponding target variables (e.g. healthy vs. diseased) to be successfully 'trained' for a prediction task. Noticeably, this approach may not be optimal or possible when the global structure of the data is not well known and the researcher does not have an a priori model to fit the data.

New method: We set out to investigate the utility of an unsupervised machine learning technique; t-distributed stochastic neighbour embedding (t-SNE) in identifying 'unseen' sample population patterns that may exist in high-dimensional neuroimaging data. Multimodal neuroimaging scans from 92 healthy subjects were pre-processed using atlas-based methods, integrated and input into the t-SNE algorithm. Patterns and clusters discovered by the algorithm were visualized using a 2D scatter plot and further analyzed using the K-means clustering algorithm.

Comparison with existing methods: t-SNE was evaluated against classical principal component analysis.

Conclusion: Remarkably, based on unlabelled multimodal scan data, t-SNE separated study subjects into two very distinct clusters which corresponded to subjects' gender labels (cluster silhouette index value=0.79). The resulting clusters were used to develop an unsupervised minimum distance clustering model which identified 93.5% of subjects' gender. Notably, from a neuropsychiatric perspective this method may allow discovery of data-driven disease phenotypes or sub-types of treatment responders.

Keywords: Big data; Multimodal neuroimaging; Research domain criteria (RDoC); Unsupervised machine learning; t-Distributed stochastic neighbour embedding (t-SNE).

Publication types

Comparative Study
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Adolescent
Adult
Algorithms
Artificial Intelligence*
Brain / anatomy & histology
Brain / physiology
Cluster Analysis
Female
Humans
Image Processing, Computer-Assisted / methods*
Magnetic Resonance Imaging / methods*
Male
Middle Aged
Multimodal Imaging / methods*
Neuroimaging / methods*
Principal Component Analysis
Sex Characteristics
Stochastic Processes
Young Adult

Abstract

Publication types

MeSH terms

Grants and funding