Computational approaches to interpreting genomic sequence variation

Graham Rs Ritchie; Paul Flicek

doi:10.1186/s13073-014-0087-1

Computational approaches to interpreting genomic sequence variation

Genome Med. 2014 Oct 22;6(10):87. doi: 10.1186/s13073-014-0087-1. eCollection 2014.

Authors

Graham Rs Ritchie¹, Paul Flicek¹

Affiliation

¹ European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD UK ; Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA UK.

Abstract

Identifying sequence variants that play a mechanistic role in human disease and other phenotypes is a fundamental goal in human genetics and will be important in translating the results of variation studies. Experimental validation to confirm that a variant causes the biochemical changes responsible for a given disease or phenotype is considered the gold standard, but this cannot currently be applied to the 3 million or so variants expected in an individual genome. This has prompted the development of a wide variety of computational approaches that use several different sources of information to identify functional variation. Here, we review and assess the limitations of computational techniques for categorizing variants according to functional classes, prioritizing variants for experimental follow-up and generating hypotheses about the possible molecular mechanisms to inform downstream experiments. We discuss the main current bioinformatics approaches to identifying functional variation, including widely used algorithms for coding variation such as SIFT and PolyPhen and also novel techniques for interpreting variation across the genome.

Abstract

Grants and funding