clusterMLD: An Efficient Hierarchical Clustering Method for Multivariate Longitudinal Data

Junyi Zhou; Ying Zhang; Wanzhu Tu

doi:10.1080/10618600.2022.2149540

clusterMLD: An Efficient Hierarchical Clustering Method for Multivariate Longitudinal Data

J Comput Graph Stat. 2023;32(3):1131-1144. doi: 10.1080/10618600.2022.2149540. Epub 2023 Jan 12.

Authors

Junyi Zhou¹, Ying Zhang², Wanzhu Tu¹

Affiliations

¹ Department of Biostatistics and Health Data Science, Indiana University.
² Department of Biostatistics, University of Nebraska Medical Center.

Abstract

Longitudinal data clustering is challenging because the grouping has to account for the similarity of individual trajectories in the presence of sparse and irregular times of observation. This paper puts forward a hierarchical agglomerative clustering method based on a dissimilarity metric that quantifies the cost of merging two distinct groups of curves, which are depicted by B-splines for the repeatedly measured data. Extensive simulations show that the proposed method has superior performance in determining the number of clusters, classifying individuals into the correct clusters, and in computational efficiency. Importantly, the method is not only suitable for clustering multivariate longitudinal data with sparse and irregular measurements but also for intensely measured functional data. Towards this end, we provide an R package for the implementation of such analyses. To illustrate the use of the proposed clustering method, two large clinical data sets from real-world clinical studies are analyzed.

Keywords: B-splines; Dissimilarity metric; Functional Data; Longitudinal data; Multiple outcomes.

Abstract

Grants and funding