ensemblQueryR: fast, flexible and high-throughput querying of Ensembl LD API endpoints in R

GigaByte. 2023 Sep 14:2023:1-10. doi: 10.46471/gigabyte.91. eCollection 2023.

Abstract

We present ensemblQueryR, an R package for querying Ensembl linkage disequilibrium (LD) endpoints. This package is flexible, fast and user-friendly, and optimised for high-throughput querying. ensemblQueryR uses functions that are intuitive and amenable to custom code integration, familiar R object types as inputs and outputs as well as providing parallelisation functionality. For each Ensembl LD endpoint, ensemblQueryR provides two functions, permitting both single- and multi-query modes of operation. The multi-query functions are optimised for large query sizes and provide optional parallelisation to leverage available computational resources and minimise processing time. We demonstrate improved computational performance of ensemblQueryR over an exisiting tool in terms of random access memory (RAM) usage and speed, delivering a 10-fold speed increase whilst using a third of the RAM. Finally, ensemblQueryR is near-agnostic to operating system and computational architecture through Docker and singularity images, making this tool widely accessible to the scientific community.

Grants and funding

AFB was supported through the award of a Biotechnology and Biological Sciences Research Council (BBSRC UK) London Interdisciplinary Doctoral Fellowship. SGR and MR were supported through the award of a Tenure Track Clinician Scientist Fellowship to MR (MR/N008324/1). AH was funded by a BBSRC award (BB/R006075/1).