Human variation database: an open-source database template for genomic discovery

Bioinformatics. 2011 Apr 15;27(8):1155-6. doi: 10.1093/bioinformatics/btr100. Epub 2011 Mar 2.

Abstract

Motivation: Current public variation databases are based upon collaboratively pooling data into a single database with a single interface available to the public. This gives little control to the collaborator to mine the database and requires that they freely share their data with the owners of the repository. We aim to provide an alternative mechanism: providing the source code and application programming interface (API) of a database, enabling researchers to set up local versions without investing heavily in the development of the resource and allowing for confidential information to remain secure.

Results: We describe an open-source database that can be installed easily at any research facility for the storage and analysis of thousands of next-generation sequencing variations. This database is built using PostgreSQL 8.4 (The PostgreSQL Global Development Group. postgres 8.4: http://www.postgresql.org) and provides a novel method for collating and searching across the reported results from thousands of next-generation sequence samples, as well as rapidly accessing vital information on the origin of the samples. The schema of the database makes rapid and insightful queries simple and enables easy annotation of novel or known genetic variations. A modular and cross-platform Java API is provided to perform common functions, such as generation of standard experimental reports and graphical summaries of modifications to genes. Included libraries allow adopters of the database to quickly develop their own queries.

Availability: The software is available for download through the Vancouver Short Read Analysis Package on Sourceforge, http://vancouvershortr.sourceforge.net. Instructions for use and deployment are provided on the accompanying wiki pages.

Contact: afejes@bcgsc.ca.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Databases, Nucleic Acid*
  • Genetic Variation*
  • Genome, Human
  • Genomics
  • Humans
  • Software