A Method to Learn Embedding of a Probabilistic Medical Knowledge Graph: Algorithm Development

Linfeng Li; Peng Wang; Yao Wang; Shenghui Wang; Jun Yan; Jinpeng Jiang; Buzhou Tang; Chengliang Wang; Yuting Liu

doi:10.2196/17645

A Method to Learn Embedding of a Probabilistic Medical Knowledge Graph: Algorithm Development

JMIR Med Inform. 2020 May 21;8(5):e17645. doi: 10.2196/17645.

Authors

Linfeng Li^#^{1

2}, Peng Wang^#^{3

4}, Yao Wang², Shenghui Wang¹, Jun Yan², Jinpeng Jiang², Buzhou Tang⁵, Chengliang Wang³, Yuting Liu⁶

Affiliations

¹ Institute of Information Science, Beijing Jiaotong University, Beijing, China.
² Yidu Cloud Technology Inc, Beijing, China.
³ College of Computer Science, Chongqing University, Chongqing, China.
⁴ Southwest Hospital, Chongqing, China.
⁵ Department of Computer Science, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China.
⁶ School of Science, Beijing Jiaotong University, Beijing, China.

^# Contributed equally.

PMID: 32436854
PMCID: PMC7273238
DOI: 10.2196/17645

Abstract

Background: Knowledge graph embedding is an effective semantic representation method for entities and relations in knowledge graphs. Several translation-based algorithms, including TransE, TransH, TransR, TransD, and TranSparse, have been proposed to learn effective embedding vectors from typical knowledge graphs in which the relations between head and tail entities are deterministic. However, in medical knowledge graphs, the relations between head and tail entities are inherently probabilistic. This difference introduces a challenge in embedding medical knowledge graphs.

Objective: We aimed to address the challenge of how to learn the probability values of triplets into representation vectors by making enhancements to existing TransX (where X is E, H, R, D, or Sparse) algorithms, including the following: (1) constructing a mapping function between the score value and the probability, and (2) introducing probability-based loss of triplets into the original margin-based loss function.

Methods: We performed the proposed PrTransX algorithm on a medical knowledge graph that we built from large-scale real-world electronic medical records data. We evaluated the embeddings using link prediction task.

Results: Compared with the corresponding TransX algorithms, the proposed PrTransX performed better than the TransX model in all evaluation indicators, achieving a higher proportion of corrected entities ranked in the top 10 and normalized discounted cumulative gain of the top 10 predicted tail entities, and lower mean rank.

Conclusions: The proposed PrTransX successfully incorporated the uncertainty of the knowledge triplets into the embedding vectors.

Keywords: PrTransX; decision support systems, clinical; electronic health records; graph embedding; knowledge graph; medical informatics; natural language processing; probabilistic medical knowledge graph; representation learning.

©Linfeng Li, Peng Wang, Yao Wang, Shenghui Wang, Jun Yan, Jinpeng Jiang, Buzhou Tang, Chengliang Wang, Yuting Liu. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 21.05.2020.