Differential Weighting for Subcomponent Measures of Integrated Clinical Encounter Scores Based on the USMLE Step 2 CS Examination: Effects on Composite Score Reliability and Pass-Fail Decisions

Yoon Soo Park; Matthew Lineberry; Abbas Hyderi; Georges Bordage; Kuan Xing; Rachel Yudkowsky

doi:10.1097/ACM.0000000000001359

Differential Weighting for Subcomponent Measures of Integrated Clinical Encounter Scores Based on the USMLE Step 2 CS Examination: Effects on Composite Score Reliability and Pass-Fail Decisions

Acad Med. 2016 Nov;91(11 Association of American Medical Colleges Learn Serve Lead: Proceedings of the 55th Annual Research in Medical Education Sessions):S24-S30. doi: 10.1097/ACM.0000000000001359.

Authors

Yoon Soo Park¹, Matthew Lineberry, Abbas Hyderi, Georges Bordage, Kuan Xing, Rachel Yudkowsky

Affiliation

¹ Y.S. Park is assistant professor, Department of Medical Education, University of Illinois at Chicago College of Medicine, Chicago, Illinois. M. Lineberry is director of Simulation Research, Assessment, and Outcomes, Zamierowski Institute for Experiential Learning, and assistant professor, Department of Health Policy and Management, University of Kansas Medical Center, Kansas City, Kansas. A. Hyderi is associate dean for curriculum and associate professor, Department of Family Medicine, University of Illinois at Chicago College of Medicine, Chicago, Illinois. G. Bordage is professor, Department of Medical Education, University of Illinois at Chicago College of Medicine, Chicago, Illinois. K. Xing is a doctoral student, Department of Educational Psychology, University of Illinois at Chicago College of Education, Chicago, Illinois. R. Yudkowsky is director, Graham Clinical Performance Center, and associate professor, Department of Medical Education, University of Illinois at Chicago College of Medicine, Chicago, Illinois.

PMID: 27779506
DOI: 10.1097/ACM.0000000000001359

Abstract

Purpose: Medical schools administer locally developed graduation competency examinations (GCEs) following the structure of the United States Medical Licensing Examination Step 2 Clinical Skills that combine standardized patient (SP)-based physical examination and the patient note (PN) to create integrated clinical encounter (ICE) scores. This study examines how different subcomponent scoring weights in a locally developed GCE affect composite score reliability and pass-fail decisions for ICE scores, contributing to internal structure and consequential validity evidence.

Method: Data from two M4 cohorts (2014: n = 177; 2015: n = 182) were used. The reliability of SP encounter (history taking and physical examination), PN, and communication and interpersonal skills scores were estimated with generalizability studies. Composite score reliability was estimated for varying weight combinations. Faculty were surveyed for preferred weights on the SP encounter and PN scores. Composite scores based on Kane's method were compared with weighted mean scores.

Results: Faculty suggested weighting PNs higher (60%-70%) than the SP encounter scores (30%-40%). Statistically, composite score reliability was maximized when PN scores were weighted at 40% to 50%. Composite score reliability of ICE scores increased by up to 0.20 points when SP-history taking (SP-Hx) scores were included; excluding SP-Hx only increased composite score reliability by 0.09 points. Classification accuracy for pass-fail decisions between composite and weighted mean scores was 0.77; misclassification was < 5%.

Conclusions: Medical schools and certification agencies should consider implications of assigning weights with respect to composite score reliability and consequences on pass-fail decisions.

MeSH terms

Clinical Competence / standards*
Cohort Studies
Education, Medical, Undergraduate / standards*
Educational Measurement / methods*
Humans
Medical History Taking / standards*
Physical Examination / standards*
Reproducibility of Results
Surveys and Questionnaires
United States