Assessing Interobserver Variability in the Delineation of Structures in Radiation Oncology: A Systematic Review

Int J Radiat Oncol Biol Phys. 2023 Apr 1;115(5):1047-1060. doi: 10.1016/j.ijrobp.2022.11.021. Epub 2022 Nov 22.

Abstract

Purpose: The delineation of target volumes and organs at risk is the main source of uncertainty in radiation therapy. Numerous interobserver variability (IOV) studies have been conducted, often with unclear methodology and nonstandardized reporting. We aimed to identify the parameters chosen in conducting delineation IOV studies and assess their performances and limits.

Methods and materials: We conducted a systematic literature review to highlight major points of heterogeneity and missing data in IOV studies published between 2018 and 2021. For the main used metrics, we did in silico analyses to assess their limits in specific clinical situations.

Results: All disease sites were represented in the 66 studies examined. Organs at risk were studied independently of tumor site in 29% of reviewed IOV studies. In 65% of studies, statistical analyses were performed. No gold standard (GS; ie, reference) was defined in 36% of studies. A single expert was considered as the GS in 21% of studies, without testing intraobserver variability. All studies reported both absolute and relative indices, including the Dice similarity coefficient (DSC) in 68% and the Hausdorff distance (HD) in 42%. Limitations were shown in silico for small structures when using the DSC and dependence on irregular shapes when using the HD. Variations in DSC values were large between studies, and their thresholds were inconsistent. Most studies (51%) included 1 to 10 cases. The median number of observers or experts was 7 (range, 2-35). The intraclass correlation coefficient was reported in only 9% of cases. Investigating the feasibility of studying IOV in delineation, a minimum of 8 observers with 3 cases, or 11 observers with 2 cases, was required to demonstrate moderate reproducibility.

Conclusions: Implementation of future IOV studies would benefit from a more standardized methodology: clear definitions of the gold standard and metrics and a justification of the tradeoffs made in the choice of the number of observers and number of delineated cases should be provided.

Publication types

  • Systematic Review
  • Review

MeSH terms

  • Humans
  • Observer Variation
  • Radiation Oncology*
  • Radiotherapy Planning, Computer-Assisted / methods
  • Reproducibility of Results