Querying highly similar sequences

Carl Barton; Mathieu Giraud; Costas S Iliopoulos; Thierry Lecroq; Laurent Mouchard; Solon P Pissis

doi:10.1504/IJCBDD.2013.052206

Querying highly similar sequences

Int J Comput Biol Drug Des. 2013;6(1-2):119-30. doi: 10.1504/IJCBDD.2013.052206. Epub 2013 Feb 21.

Authors

Carl Barton¹, Mathieu Giraud, Costas S Iliopoulos, Thierry Lecroq, Laurent Mouchard, Solon P Pissis

Affiliation

¹ Department of Informatics, King's College London, London, UK. carl.barton@kcl.ac.uk

PMID: 23428478
DOI: 10.1504/IJCBDD.2013.052206

Abstract

In this paper, we present a solution to the extreme similarity sequencing problem. The extreme similarity sequencing problem consists of finding occurrences of a pattern p in a set S(0), S(1), , S(k), of sequences of equal length, where S(i), for all 1≤i≤k, differs from S(0) by a constant number of errors - around 10 in practice. We present an asymptotically fast O(n + occ logocc) time algorithm, as well as a practical O(nk/w) time algorithm for solving this problem, where n is the length of a sequence, occ is the number of candidate occurrences reported by our technique, w is the size of the machine word, and the total number of errors is bounded by k - the number of sequences.

MeSH terms

Algorithms*
Computational Biology
High-Throughput Nucleotide Sequencing
Sequence Analysis, DNA / methods*
Sequence Homology, Nucleic Acid*