Searching in the Dark: Phenotyping Diabetic Retinopathy in a De-Identified Electronic Medical Record Sample of African Americans

AMIA Jt Summits Transl Sci Proc. 2016 Jul 20:2016:221-30. eCollection 2016.

Abstract

A hurdle to EMR-based studies is the characterization and extraction of complex phenotypes not readily defined by single diagnostic/procedural codes. Here we developed an algorithm utilizing data mining techniques to identify a diabetic retinopathy (DR) cohort of type-2 diabetic African Americans from the Vanderbilt University de-identified EMR system. The algorithm incorporates a combination of diagnostic codes, current procedural terminology billing codes, medications, and text matching to identify DR when gold-standard digital photography results were unavailable. DR cases were identified with a positive predictive value of 75.3% and an accuracy of 84.8%. Controls were classified with a negative predictive value of 1.0% as could be assessed. Limited studies of DR have been performed in African Americans who are at an elevated risk of DR. Identification of EMR-based African American cohorts may help stimulate new biomedical studies that could elucidate differences in risk for the development of DR and other complex diseases.