Comparative Studies in the Shoulder Literature Lack Statistical Robustness: A Fragility Analysis

Robert L Parisien; David P Trofa; Patrick K Cronin; Jesse Dashe; Emily J Curry; Josef K Eichinger; William N Levine; Paul Tornetta 3rd; Xinning Li

doi:10.1016/j.asmr.2021.08.017

Comparative Studies in the Shoulder Literature Lack Statistical Robustness: A Fragility Analysis

Arthrosc Sports Med Rehabil. 2021 Oct 12;3(6):e1899-e1904. doi: 10.1016/j.asmr.2021.08.017. eCollection 2021 Dec.

Authors

Robert L Parisien¹, David P Trofa², Patrick K Cronin³, Jesse Dashe¹, Emily J Curry⁴, Josef K Eichinger⁵, William N Levine², Paul Tornetta 3rd¹, Xinning Li¹

Affiliations

¹ Boston University Medical Center, Boston, Massachusetts.
² Columbia University Medical Center, New York, New York.
³ Harvard-Combined Orthopaedic Residency Program, Boston, Massachusetts.
⁴ Boston University School of Public Health, Boston, Massachusetts.
⁵ Medical University of South Carolina, Charleston, South Carolina, U.S.A.

Abstract

Purpose: Evidenced-based decision-making is rooted in comparative clinical studies; however, a small number of outcome event reversals have the potential to change study significance. The purpose of this study was to determine the utility of applying fragility analysis to comparative studies in the published orthopaedic shoulder literature.

Methods: Comparative clinical shoulder research studies reporting 1:1 dichotomous categorical data were analyzed in 6 leading orthopaedic journals between 2006 and 2016. Statistical significance was defined as a P value of less than .05. The fragility index (FI) for each study outcome was determined by the number of event reversals required to change the P value to either greater or less than 0.05, thus changing the study conclusions. The associated fragility quotient (FQ) was determined by dividing the FI by the total population comprising a particular outcome.

Results: Of the 23,897 studies screened, 3,591 met search criteria, with 198 comparative studies ultimately included for analysis, 67 of which were randomized controlled trials. There were 357 total outcome events with 74 reported as significant and 283 as not significant. The FI was 4 (IQR 2-6) with an associated FQ of 0.066 (interquartile range [IQR] 0.038-0.102). There was no difference in statistical fragility between randomized and nonrandomized trials with both revealing a FI of 4 and FQ of 0.068 (IQR 0.044-0.107) and 0.065 (IQR 0.031-0.101), respectively.

Conclusions: This current analysis reveals that comparative shoulder studies published in six leading orthopaedic journals are at risk of statistical fragility. As such, contemporary clinical shoulder literature may not be as robust as traditionally perceived with the reversal of only a few outcome events required to change study significance. Therefore, we advocate the reporting of both FI and FQ in addition to the P value as statistical complements to all comparative investigations to provide a more comprehensive understanding of trial stability and significance in the published shoulder literature.

Clinical relevance: Comparative study designs are commonly employed in shoulder research. Several studies in both the general medical and orthopaedic literature have identified a lack of statistical robustness through comprehensive fragility analysis. Our findings demonstrate the P value may be an inadequate independent statistical metric requiring the complement of a FI and FQ to aid in the interpretation and understanding of study significance for clinical decision-making.