Using Enriched Samples for Semi-Automated Vocabulary Expansion to Identify Rare Events in Clinical Text: Sexual Orientation as a Use Case

Stud Health Technol Inform. 2019 Aug 21:264:1532-1533. doi: 10.3233/SHTI190520.

Abstract

We demonstrate the utility of concept lexicon expansion and evaluation using enriched samples of patients and documents with sexual orientation as a use case for rare event detection in electronic medical records. Using this approach, we found 7 additional words and 21 misspellings beyond our initial set of five seed words. We can use the expanded vocabulary to further develop a full natural language processing system to identify instances where sexual orientation is documented.

Keywords: Electronic Health Records; Natural Language Processing.

MeSH terms

  • Electronic Health Records
  • Female
  • Gender Identity
  • Humans
  • Male
  • Natural Language Processing
  • Vocabulary*
  • Vocabulary, Controlled