A Novel Method for Clustering Cellular Data to Improve Classification

ArXiv [Preprint]. 2024 Mar 5:arXiv:2403.03318v1.

Abstract

Many fields, such as neuroscience, are experiencing the vast proliferation of cellular data, underscoring the need for organizing and interpreting large datasets. A popular approach partitions data into manageable subsets via hierarchical clustering, but objective methods to determine the appropriate classification granularity are missing. We recently introduced a technique to systematically identify when to stop subdividing clusters based on the fundamental principle that cells must differ more between than within clusters. Here we present the corresponding protocol to classify cellular datasets by combining data-driven unsupervised hierarchical clustering with statistical testing. These general-purpose functions are applicable to any cellular dataset that can be organized as two-dimensional matrices of numerical values, including molecular, physiological, and anatomical datasets. We demonstrate the protocol using cellular data from the Janelia MouseLight project to characterize morphological aspects of neurons.

Publication types

  • Preprint