Local homology recognition and distance measures in linear time using compressed amino acid alphabets

We describe how local similarities between two sequences can be identi®ed with high reliability in O(L) time, and show how this can improve alignment speed with negligible reduction in accuracy

R. C. Edgar

2004

Scholarcy highlights

  • Biological databases are growing exponentially, and algorithms that minimize processor time and memory requirements are becoming increasingly important
  • We used a set of 1484 pairs of protein structures selected by Sadreyev and Grishin. These pairs were chosen to be representative of alignable structures in the FSSP database
  • We show that use of a compressed amino acid alphabet can increase the coverage of the method with a negligible increase in errors compared with full dynamic programming
  • On a test set of 1848 sequence pairs selected by Sadreyev and Grishin from the FSSP database, we ®nd that this method achieves comparable coverage to the fast Fourier transform method used by MAFFT and is more than an order of magnitude faster
  • We investigate the use of k-mer counting as a fast estimate of evolutionary distance
  • We show that k-mer distances correlate well with the fractional identity computed from a global alignment

Need more features? Save interactive summary cards to your Scholarcy Library.