Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem

Brief Bioinform. 2002 Mar;3(1):23-31. doi: 10.1093/bib/3.1.23.

Abstract

With the consensus human genome sequenced and many other sequencing projects at varying stages of completion, greater attention is being paid to the genetic differences among individuals and the abilities of those differences to predict phenotypes. A significant obstacle to such work is the difficulty and expense of determining haplotypes--sets of variants genetically linked because of their proximity on the genome--for large numbers of individuals for use in association studies. This paper presents some algorithmic considerations in a new approach for haplotype determination: inferring haplotypes from localised polymorphism data gathered from short genome 'fragments.' Formalised models of the biological system under consideration are examined, given a variety of assumptions about the goal of the problem and the character of optimal solutions. Some theoretical results and algorithms for handling haplotype assembly given the different models are then sketched. The primary conclusion is that some important simplified variants of the problem yield tractable problems while more general variants tend to be intractable in the worst case.

MeSH terms

  • Algorithms*
  • Base Sequence
  • DNA
  • Haplotypes*
  • Models, Theoretical
  • Polymorphism, Single-Stranded Conformational*

Substances

  • DNA