Peer overmarking and insufficient diagnosticity: the impact of the rating method for peer assessment

Adv Health Sci Educ Theory Pract. 2022 Oct;27(4):1049-1066. doi: 10.1007/s10459-022-10130-w. Epub 2022 Jul 24.

Abstract

The present study explores two rating methods for peer assessment (analytical rating using criteria and comparative judgement) in light of concurrent validity, reliability and insufficient diagnosticity (i.e. the degree to which substandard work is recognised by the peer raters). During a second-year undergraduate course, students wrote a one-page essay on an air pollutant. A first cohort (N = 260) relied on analytical rating using criteria to assess their peers' essays. A total of 1297 evaluations were made, and each essay received at least four peer ratings. Results indicate a small correlation between peer and teacher marks, and three essays of substandard quality were not recognised by the group of peer raters. A second cohort (N = 230) used comparative judgement. They completed 1289 comparisons, from which a rank order was calculated. Results suggest a large correlation between the university teacher marks and the peer scores and acceptable reliability of the rank order. In addition, the three essays of substandard quality were discerned as such by the group of peer raters. Although replication research is warranted, the results provide the first evidence that, when peer raters overmark and fail to identify substandard work using analytical rating with criteria, university teachers may consider changing the rating method of the peer assessment to comparative judgement.

Keywords: Analytical rating; Comparative judgement; Concurrent validity; Peer assessment; Reliability.

MeSH terms

  • Air Pollutants*
  • Humans
  • Judgment
  • Peer Group*
  • Peer Review
  • Reproducibility of Results

Substances

  • Air Pollutants