Deconstructing heterogeneity in schizophrenia through language: a semi-automated linguistic analysis and data-driven clustering approach

Schizophrenia (Heidelb). 2022 Nov 29;8(1):102. doi: 10.1038/s41537-022-00306-z.

Abstract

Previous works highlighted the relevance of automated language analysis for predicting diagnosis in schizophrenia, but a deeper language-based data-driven investigation of the clinical heterogeneity through the illness course has been generally neglected. Here we used a semiautomated multidimensional linguistic analysis innovatively combined with a machine-driven clustering technique to characterize the speech of 67 individuals with schizophrenia. Clusters were then compared for psychopathological, cognitive, and functional characteristics. We identified two subgroups with distinctive linguistic profiles: one with higher fluency, lower lexical variety but greater use of psychological lexicon; the other with reduced fluency, greater lexical variety but reduced psychological lexicon. The former cluster was associated with lower symptoms and better quality of life, pointing to the existence of specific language profiles, which also show clinically meaningful differences. These findings highlight the importance of considering language disturbances in schizophrenia as multifaceted and approaching them in automated and data-driven ways.