Evaluation of Clinical Text Segmentation to Facilitate Cohort Retrieval

Tracy Edinger; Dina Demner-Fushman; Aaron M Cohen; Steven Bedrick; William Hersh

Evaluation of Clinical Text Segmentation to Facilitate Cohort Retrieval

AMIA Annu Symp Proc. 2018 Apr 16:2017:660-669. eCollection 2017.

Authors

Tracy Edinger¹, Dina Demner-Fushman², Aaron M Cohen¹, Steven Bedrick¹, William Hersh¹

Affiliations

¹ Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR, USA.
² National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.

PMID: 29854131
PMCID: PMC5977655

Abstract

Objective: Secondary use of electronic health record (EHR) data is enabled by accurate and complete retrieval of the relevant patient cohort, which requires searching both structured and unstructured data. Clinical text poses difficulties to searching, although chart notes incorporate structure that may facilitate accurate retrieval. Methods: We developed rules identifying clinical document sections, which can be indexed in search engines that allow faceted searches, such as Lucene or Essie, an NLM search engine. We developed 22 clinical cohorts and two queries for each cohort, one utilizing section headings and the other searching the whole document. We manually evaluated a subset of retrieved documents to compare query performance. Results: Querying by section had lower recall than whole-document queries (0.83 vs 0.95), higher precision (0.73 vs 0.54), and higher F₁ (0.78 vs 0.69). Conclusion: This evaluation suggests that searching specific sections may improve precision under certain conditions and often with loss of recall.

MeSH terms

Abstracting and Indexing
Electronic Health Records*
Humans
Information Storage and Retrieval / methods*
Search Engine*