Phenome-wide identification of therapeutic genetic targets, leveraging knowledge graphs, graph neural networks, and UK Biobank data

Lawrence Middleton; Ioannis Melas; Chirag Vasavda; Arwa Raies; Benedek Rozemberczki; Ryan S Dhindsa; Justin S Dhindsa; Blake Weido; Quanli Wang; Andrew R Harper; Gavin Edwards; Slavé Petrovski; Dimitrios Vitsios

doi:10.1126/sciadv.adj1424

Phenome-wide identification of therapeutic genetic targets, leveraging knowledge graphs, graph neural networks, and UK Biobank data

Sci Adv. 2024 May 10;10(19):eadj1424. doi: 10.1126/sciadv.adj1424. Epub 2024 May 8.

Authors

Affiliations

¹ Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK.
² Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Waltham, MA 02451, USA.
³ Biological Insights Knowledge Graph (BIKG), Research D&A, R&D IT, AstraZeneca, Cambridge, UK.
⁴ Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.
⁵ Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston, TX 77030, USA.
⁶ Medical Scientist Training Program, Baylor College of Medicine, Houston, TX 77030, USA.
⁷ Department of Medicine, University of Melbourne, Austin Health, Melbourne, Victoria, Australia.

Abstract

The ongoing expansion of human genomic datasets propels therapeutic target identification; however, extracting gene-disease associations from gene annotations remains challenging. Here, we introduce Mantis-ML 2.0, a framework integrating AstraZeneca's Biological Insights Knowledge Graph and numerous tabular datasets, to assess gene-disease probabilities throughout the phenome. We use graph neural networks, capturing the graph's holistic structure, and train them on hundreds of balanced datasets via a robust semi-supervised learning framework to provide gene-disease probabilities across the human exome. Mantis-ML 2.0 incorporates natural language processing to automate disease-relevant feature selection for thousands of diseases. The enhanced models demonstrate a 6.9% average classification power boost, achieving a median receiver operating characteristic (ROC) area under curve (AUC) score of 0.90 across 5220 diseases from Human Phenotype Ontology, OpenTargets, and Genomics England. Notably, Mantis-ML 2.0 prioritizes associations from an independent UK Biobank phenome-wide association study (PheWAS), providing a stronger form of triaging and mitigating against underpowered PheWAS associations. Results are exposed through an interactive web resource.

Publication types

Research Support, Non-U.S. Gov't
Research Support, N.I.H., Extramural

MeSH terms

Algorithms
Biological Specimen Banks*
Computational Biology / methods
Databases, Genetic
Genetic Predisposition to Disease
Genome-Wide Association Study / methods
Genomics / methods
Humans
Neural Networks, Computer*
Phenomics / methods
Phenotype
UK Biobank
United Kingdom