Comparison of interreader reproducibility of the prostate imaging reporting and data system and likert scales for evaluation of multiparametric prostate MRI

Andrew B Rosenkrantz; Ruth P Lim; Mershad Haghighi; Molly B Somberg; James S Babb; Samir S Taneja

doi:10.2214/AJR.12.10173

Comparison of interreader reproducibility of the prostate imaging reporting and data system and likert scales for evaluation of multiparametric prostate MRI

AJR Am J Roentgenol. 2013 Oct;201(4):W612-8. doi: 10.2214/AJR.12.10173.

Authors

Andrew B Rosenkrantz¹, Ruth P Lim, Mershad Haghighi, Molly B Somberg, James S Babb, Samir S Taneja

Affiliation

¹ 1 Department of Radiology, NYU Langone Medical Center, 660 First Ave, New York, NY 10016.

PMID: 24059400
DOI: 10.2214/AJR.12.10173

Abstract

Objective: The objective of our study was to compare interreader reproducibility of the recently proposed "Prostate Imaging Reporting and Data System," or "PI-RADS," scale incorporating fixed criteria and a standard Likert scale based on overall impression for prostate cancer localization using multiparametric MRI.

Materials and methods: Fifty-five patients who underwent a 3-T prostate MRI examination using a pelvic phased-array coil and incorporating T2-weighted imaging, diffusion-weighted imaging, and dynamic contrast-enhanced imaging were included in the study. Three radiologists (6, 4, and 1 year of experience) independently scored 18 regions (12 in the peripheral zone [PZ] and six in the transition zone [TZ]) using PI-RADS (range, 3-15) and Likert (range, 1-5) scales, which were based on fixed criteria and overall impression, respectively. Interreader reproducibility was evaluated using the concordance correlation coefficient (CCC), which assesses exact agreement between scores (minimal, < 0.2; poor, 0.2-<0.4; moderate, 0.4-<0.6; strong, 0.6-<0.8; almost perfect, ≥ 0.8).

Results: Agreement between experienced readers was strong in the PZ and TZ combined and in the PZ for both the PI-RADS and Likert scales (CCC = 0.608-0.677), moderate in the TZ for the Likert scale (CCC = 0.519), and poor in the TZ for PI-RADS (CCC = 0.376). Agreement between experienced and inexperienced readers was moderate to poor in the PZ and TZ combined for PI-RADS (CCC = 0.340-0.477), moderate in the PZ and TZ combined for the Likert scale (CCC = 0.471-0.497), moderate in the PZ for PI-RADS and Likert scales (CCC = 0.472-0.542), minimal to poor in the TZ for PI-RADS (CCC = 0.094-0.283), and poor in the TZ for the Likert scale (CCC = 0.287-0.400).

Conclusion: Interreader reproducibility tended to be higher for relatively experienced readers than for less experienced readers and to be higher in the PZ than in the TZ. For the relatively experienced readers, reproducibility was similar for PI-RADS and Likert scales in the PZ but was somewhat higher for the Likert scale than for PI-RADS in the TZ.

Publication types

Comparative Study
Evaluation Study
Research Support, Non-U.S. Gov't

MeSH terms

Adult
Aged
Algorithms*
Humans
Image Enhancement / methods
Image Interpretation, Computer-Assisted / methods*
Magnetic Resonance Imaging
Male
Middle Aged
Observer Variation
Prostatic Neoplasms / pathology*
Reproducibility of Results
Sensitivity and Specificity
Severity of Illness Index