Benchmarking differential abundance methods for finding condition-specific prototypical cells in multi-sample single-cell datasets

Haidong Yi; Alec Plotkin; Natalie Stanley

doi:10.1186/s13059-023-03143-0

Benchmarking differential abundance methods for finding condition-specific prototypical cells in multi-sample single-cell datasets

Genome Biol. 2024 Jan 3;25(1):9. doi: 10.1186/s13059-023-03143-0.

Authors

Haidong Yi¹, Alec Plotkin², Natalie Stanley^{3

4}

Affiliations

¹ Department of Computer Science, University of North Carolina at Chapel Hill, 27599, Chapel Hill, NC, USA.
² Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, 27599, Chapel Hill, NC, USA.
³ Department of Computer Science, University of North Carolina at Chapel Hill, 27599, Chapel Hill, NC, USA. natalies@cs.unc.edu.
⁴ Computational Medicine Program, University of North Carolina at Chapel Hill, 27599, Chapel Hill, NC, USA. natalies@cs.unc.edu.

Abstract

Background: To analyze the large volume of data generated by single-cell technologies and to identify cellular correlates of particular clinical or experimental outcomes, differential abundance analyses are often applied. These algorithms identify subgroups of cells whose abundances change significantly in response to disease progression, or to an experimental perturbation. Despite the effectiveness of differential abundance analyses in identifying critical cell-states, there is currently no systematic benchmarking study to compare their applicability, usefulness, and accuracy in practice across single-cell modalities.

Results: Here, we perform a comprehensive benchmarking study to objectively evaluate and compare the benefits and potential downsides of current state-of-the-art differential abundance testing methods. We benchmarked six single-cell testing methods on several practical tasks, using both synthetic and real single-cell datasets. The tasks evaluated include effectiveness in identifying true differentially abundant subpopulations, accuracy in the adequate handling of batch effects, runtime efficiency, and hyperparameter usability and robustness. Based on various evaluation results, this paper gives dataset-specific suggestions for the practical use of differential abundance testing approaches.

Conclusions: Based on our benchmarking study, we provide a set of recommendations for the optimal usage of single-cell DA testing methods in practice, particularly with respect to factors such as the presence of technical noise (for example batch effects), dataset size, and hyperparameter sensitivity.

Keywords: Benchmarking; Clinical phenotyping; Differential abundance (DA); Single-cell bioinformatics.

MeSH terms

Algorithms*
Benchmarking*
Research Design
Single-Cell Analysis / methods

Abstract

MeSH terms

Grants and funding