Leveraging graph topology and semantic context for pharmacovigilance through twitter-streams

Ryan Eshleman; Rahul Singh

doi:10.1186/s12859-016-1220-5

Leveraging graph topology and semantic context for pharmacovigilance through twitter-streams

BMC Bioinformatics. 2016 Oct 6;17(Suppl 13):335. doi: 10.1186/s12859-016-1220-5.

Authors

Ryan Eshleman¹, Rahul Singh^{2

3}

Affiliations

¹ Department of Computer Science, San Francisco State University, San Francisco, CA, 94132, USA.
² Department of Computer Science, San Francisco State University, San Francisco, CA, 94132, USA. rahul@sfsu.edu.
³ Center for Discovery and Innovation in Parasitic Diseases, University of California, San Diego, USA. rahul@sfsu.edu.

Abstract

Background: Adverse drug events (ADEs) constitute one of the leading causes of post-therapeutic death and their identification constitutes an important challenge of modern precision medicine. Unfortunately, the onset and effects of ADEs are often underreported complicating timely intervention. At over 500 million posts per day, Twitter is a commonly used social media platform. The ubiquity of day-to-day personal information exchange on Twitter makes it a promising target for data mining for ADE identification and intervention. Three technical challenges are central to this problem: (1) identification of salient medical keywords in (noisy) tweets, (2) mapping drug-effect relationships, and (3) classification of such relationships as adverse or non-adverse.

Methods: We use a bipartite graph-theoretic representation called a drug-effect graph (DEG) for modeling drug and side effect relationships by representing the drugs and side effects as vertices. We construct individual DEGs on two data sources. The first DEG is constructed from the drug-effect relationships found in FDA package inserts as recorded in the SIDER database. The second DEG is constructed by mining the history of Twitter users. We use dictionary-based information extraction to identify medically-relevant concepts in tweets. Drugs, along with co-occurring symptoms are connected with edges weighted by temporal distance and frequency. Finally, information from the SIDER DEG is integrate with the Twitter DEG and edges are classified as either adverse or non-adverse using supervised machine learning.

Results: We examine both graph-theoretic and semantic features for the classification task. The proposed approach can identify adverse drug effects with high accuracy with precision exceeding 85 % and F1 exceeding 81 %. When compared with leading methods at the state-of-the-art, which employ un-enriched graph-theoretic analysis alone, our method leads to improvements ranging between 5 and 8 % in terms of the aforementioned measures. Additionally, we employ our method to discover several ADEs which, though present in medical literature and Twitter-streams, are not represented in the SIDER databases.

Conclusions: We present a DEG integration model as a powerful formalism for the analysis of drug-effect relationships that is general enough to accommodate diverse data sources, yet rigorous enough to provide a strong mechanism for ADE identification.

Keywords: Biological Modeling; Pharmacology; Pharmacovigilance; Social Media; Text Mining.

MeSH terms

Data Accuracy
Data Mining / methods*
Databases, Factual
Drug-Related Side Effects and Adverse Reactions / classification*
Drug-Related Side Effects and Adverse Reactions / diagnosis
Humans
Models, Theoretical*
Pharmacovigilance*
Semantics*
Social Media*