High-Throughput Algorithm for Discovering New Drug Indications by Utilizing Large-Scale Electronic Medical Record Data

Do-Hoon Kim; Jung-Eun Lee; Yong-Gil Kim; Yura Lee; Dong-Woo Seo; Kye Hwa Lee; Jae-Ho Lee; Woo Sung Kim; Young-Hak Kim; Ji Seon Oh

doi:10.1002/cpt.1980

High-Throughput Algorithm for Discovering New Drug Indications by Utilizing Large-Scale Electronic Medical Record Data

Clin Pharmacol Ther. 2020 Dec;108(6):1299-1307. doi: 10.1002/cpt.1980. Epub 2020 Aug 13.

Authors

Do-Hoon Kim¹, Jung-Eun Lee², Yong-Gil Kim³, Yura Lee¹, Dong-Woo Seo^{1

4}, Kye Hwa Lee¹, Jae-Ho Lee^{1

4}, Woo Sung Kim^{1

5}, Young-Hak Kim^{1

6

7}, Ji Seon Oh^{1

6}

Affiliations

¹ Department of Information Medicine, Asan Medical Center, Seoul, Republic of Korea.
² Asan Institute for Life Sciences, Asan Medical Center, Seoul, Republic of Korea.
³ Division of Rheumatology, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea.
⁴ Department of Emergency Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea.
⁵ Department of Pulmonary and Critical Care Medicine, University of Ulsan College of Medicine, Seoul, Republic of Korea.
⁶ Health Innovation Big Data Center, Asan Institute for Life Science, Asan Medical Center, Seoul, Republic of Korea.
⁷ Department of Cardiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea.

PMID: 32621536
DOI: 10.1002/cpt.1980

Abstract

Drug repositioning is an effective way to mitigate the production problem in the pharmaceutical industry. Electronic medical record (EMR) databases harbor a large amount of data on drug prescriptions and laboratory test results and may thus be useful for finding new indications for existing drugs. Here, we present a novel high-throughput data-driven algorithm that identifies and prioritizes drug candidates that show significant effects on specific clinical indicators by utilizing large-scale EMR data. We chose four laboratory tests as clinical indicators: hemoglobin A1c (HbA1c), low-density lipoprotein (LDL) cholesterol, triglycerides (TGs), and high-density lipoprotein (HDL) cholesterol. From a 5-year EMR database, we generated datasets consisting of paired data with averaged measurement values during on and off each drug in each patient, adjusted for co-administered drug effects at each timepoint, and applied one sample t-test with the Bonferroni correction for statistical analysis. Among 1,774 drugs, 45 were associated with increases in HDL cholesterol, and 41, 146, and 65 were associated with reductions in HbA1c, LDL cholesterol, and TGs, respectively. We compared the list of candidate drugs with that of drugs indicated for relevant clinical conditions and found that the algorithm had high values for both sensitivity (range 0.95-1.00) and negative predictive value (range 0.95-1.00). Our algorithm was able to rediscover well-known drugs that are used for diabetes and dyslipidemia while revealing potential candidates without current indications but have shown promising results in the literature. Our algorithm may facilitate the repositioning of drugs with proven safety profiles.

Publication types

Research Support, Non-U.S. Gov't
Validation Study

MeSH terms

Algorithms*
Biomarkers / blood
Cholesterol, HDL / blood
Cholesterol, LDL / blood
Data Mining*
Databases, Factual
Drug Prescriptions
Drug Repositioning*
Electronic Health Records*
Glycated Hemoglobin / analysis
Humans
Hypoglycemic Agents / adverse effects
Hypoglycemic Agents / therapeutic use*
Hypolipidemic Agents / adverse effects
Hypolipidemic Agents / therapeutic use*
Reproducibility of Results
Time Factors
Triglycerides / blood

Substances

Biomarkers
Cholesterol, HDL
Cholesterol, LDL
Glycated Hemoglobin A
Hypoglycemic Agents
Hypolipidemic Agents
Triglycerides
hemoglobin A1c protein, human