TMBstable: a variant caller controls performance variation across heterogeneous sequencing samples

Brief Bioinform. 2024 Mar 27;25(3):bbae159. doi: 10.1093/bib/bbae159.

Abstract

In cancer genomics, variant calling has advanced, but traditional mean accuracy evaluations are inadequate for biomarkers like tumor mutation burden, which vary significantly across samples, affecting immunotherapy patient selection and threshold settings. In this study, we introduce TMBstable, an innovative method that dynamically selects optimal variant calling strategies for specific genomic regions using a meta-learning framework, distinguishing it from traditional callers with uniform sample-wide strategies. The process begins with segmenting the sample into windows and extracting meta-features for clustering, followed by using a pre-trained meta-model to select suitable algorithms for each cluster, thereby addressing strategy-sample mismatches, reducing performance fluctuations and ensuring consistent performance across various samples. We evaluated TMBstable using both simulated and real non-small cell lung cancer and nasopharyngeal carcinoma samples, comparing it with advanced callers. The assessment, focusing on stability measures, such as the variance and coefficient of variation in false positive rate, false negative rate, precision and recall, involved 300 simulated and 106 real tumor samples. Benchmark results showed TMBstable's superior stability with the lowest variance and coefficient of variation across performance metrics, highlighting its effectiveness in analyzing the counting-based biomarker. The TMBstable algorithm can be accessed at https://github.com/hello-json/TMBstable for academic usage only.

Keywords: counting-based biomarker; immunotherapy; meta-learning approach; sequencing data analysis; tumor mutation burden; variant calling.

MeSH terms

  • Algorithms
  • Carcinoma, Non-Small-Cell Lung*
  • Genome
  • Genomics / methods
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Lung Neoplasms*