New improved Aggregator: predicting which clinical trial articles derive from the same registered clinical trial

JAMIA Open. 2020 Oct 28;3(3):338-341. doi: 10.1093/jamiaopen/ooaa042. eCollection 2020 Oct.

Abstract

Objectives: To identify separate publications that report outcomes from the same underlying clinical trial, in order to avoid over-counting these as independent pieces of evidence.

Materials and methods: We updated our previous model by creating larger, more recent, and more diverse positive and negative training sets consisting of article pairs that were (or not) linked to the same ClinicalTrials.gov trial registry number. Features were extracted from PubMed metadata; pairwise similarity scores were modeled using logistic regression and used to form clusters of articles that are likely to arise from the same registered clinical trial.

Results: Articles from the same trial were identified with high accuracy (F1 = 0.859), nominally better than the previous model (F1 = 0.843). Predicted clusters showed a low error rate of splitting of 8-11% (ie, when 2 articles belonged to the same trial but were assigned to different clusters). Performance was similar whether only randomized controlled trial articles or a more diverse set of clinical trial articles were processed.

Discussion: Metadata are surprisingly accurate in predicting when 2 articles derive from the same underlying clinical trial.

Conclusion: We have continued confidence in the Aggregator tool which can be accessed publicly at http://arrowsmith.psych.uic.edu/cgi-bin/arrowsmith_uic/RCT_Tagger.cgi.

Keywords: clinical trials; evidence-based medicine; informatics; information retrieval; systematic reviews.