Developing and Validating a Computable Phenotype for the Identification of Transgender and Gender Nonconforming Individuals and Subgroups

AMIA Annu Symp Proc. 2021 Jan 25:2020:514-523. eCollection 2020.

Abstract

Transgender and gender nonconforming (TGNC) individuals face significant marginalization, stigma, and discrimination. Under-reporting of TGNC individuals is common since they are often unwilling to self-identify. Meanwhile, the rapid adoption of electronic health record (EHR) systems has made large-scale, longitudinal real-world clinical data available to research and provided a unique opportunity to identify TGNC individuals using their EHRs, contributing to a promising routine health surveillance approach. Built upon existing work, we developed and validated a computable phenotype (CP) algorithm for identifying TGNC individuals and their natal sex (i.e., male-to-female or female-to-male) using both structured EHR data and unstructured clinical notes. Our CP algorithm achieved a 0.955 F1-score on the training data and a perfect F1-score on the independent testing data. Consistent with the literature, we observed an increasing percentage of TGNC individuals and a disproportionate burden of adverse health outcomes, especially sexually transmitted infections and mental health distress, in this population.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adolescent
  • Adult
  • Aged
  • Aged, 80 and over
  • Algorithms*
  • Child
  • Child, Preschool
  • Decision Support Techniques*
  • Electronic Health Records*
  • Female
  • Gender Identity*
  • Hormone Replacement Therapy / methods
  • Humans
  • Infant
  • Male
  • Middle Aged
  • Phenotype
  • Reproducibility of Results
  • Sex Reassignment Procedures
  • Sexual and Gender Minorities / psychology*
  • Transgender Persons / psychology*
  • Young Adult