Comparison of interreader reproducibility of the prostate imaging reporting and data system and likert scales for evaluation of multiparametric prostate MRI

AJR Am J Roentgenol. 2013 Oct;201(4):W612-8. doi: 10.2214/AJR.12.10173.

Abstract

Objective: The objective of our study was to compare interreader reproducibility of the recently proposed "Prostate Imaging Reporting and Data System," or "PI-RADS," scale incorporating fixed criteria and a standard Likert scale based on overall impression for prostate cancer localization using multiparametric MRI.

Materials and methods: Fifty-five patients who underwent a 3-T prostate MRI examination using a pelvic phased-array coil and incorporating T2-weighted imaging, diffusion-weighted imaging, and dynamic contrast-enhanced imaging were included in the study. Three radiologists (6, 4, and 1 year of experience) independently scored 18 regions (12 in the peripheral zone [PZ] and six in the transition zone [TZ]) using PI-RADS (range, 3-15) and Likert (range, 1-5) scales, which were based on fixed criteria and overall impression, respectively. Interreader reproducibility was evaluated using the concordance correlation coefficient (CCC), which assesses exact agreement between scores (minimal, < 0.2; poor, 0.2-<0.4; moderate, 0.4-<0.6; strong, 0.6-<0.8; almost perfect, ≥ 0.8).

Results: Agreement between experienced readers was strong in the PZ and TZ combined and in the PZ for both the PI-RADS and Likert scales (CCC = 0.608-0.677), moderate in the TZ for the Likert scale (CCC = 0.519), and poor in the TZ for PI-RADS (CCC = 0.376). Agreement between experienced and inexperienced readers was moderate to poor in the PZ and TZ combined for PI-RADS (CCC = 0.340-0.477), moderate in the PZ and TZ combined for the Likert scale (CCC = 0.471-0.497), moderate in the PZ for PI-RADS and Likert scales (CCC = 0.472-0.542), minimal to poor in the TZ for PI-RADS (CCC = 0.094-0.283), and poor in the TZ for the Likert scale (CCC = 0.287-0.400).

Conclusion: Interreader reproducibility tended to be higher for relatively experienced readers than for less experienced readers and to be higher in the PZ than in the TZ. For the relatively experienced readers, reproducibility was similar for PI-RADS and Likert scales in the PZ but was somewhat higher for the Likert scale than for PI-RADS in the TZ.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Aged
  • Algorithms*
  • Humans
  • Image Enhancement / methods
  • Image Interpretation, Computer-Assisted / methods*
  • Magnetic Resonance Imaging
  • Male
  • Middle Aged
  • Observer Variation
  • Prostatic Neoplasms / pathology*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Severity of Illness Index