Table of contents |
2 Multiple alignment 3 Algorithms 4 Needleman-Wunsch 5 Smith-Waterman 6 Software 7 SSearch 8 BLAST 9 Fasta 10 Clustal 11 See also |
Pairwise sequence alignment methods are concerned with finding the best-matching piecewise (local) or global alignments of protein / amino acid or dna / nucleic acid sequences.
Typically, the purpose of this is to find homologues (relatives) of a gene or gene-product in a database of known examples. This information is useful for answering a variety of biological questions. The most important application of pairwise alignment methods is for identifying sequences of unknown structure/function. Another important use of these techniques is in studies of molecular evolution.
A global alignment between two sequences is an alignment in which all of the characters in both sequences participate in the alignment.
Global alignments are useful mostly for finding closely-related sequences. As these sequences are also easily identified by local alignment methods global alignment is now somewhat deprecated as a technique. Further, there are several complications to molecular evolution (such as domain shuffling - see below) which prevent these methods from being useful.
Local alignment methods find related regions within sequences - in other words they can consist of a subset of the characters within each sequence (e.g. positions 20-40 of sequence A might align with positions 50-70 of sequence B).
This is obviously a more flexible technique than global alignment and has the advantage that related regions which appear in a different order in the two proteins (which is known as domain shuffling) can be identified as being related. This is not possible with global alignment methods.
Two important issues for sequence alignment are:
The two questions are related, obviously. The first can be addressed by developing a model of how likely certain changes between characters in the sequences are. There are lots of ways to do this, none of which is obviously superior overall. These models are derived empirically using related sequences, and are expressed as substitution matrices. These matrices are used by the algorithms named below to give each possible alignment between two sequences a score. The highest-scoring alignments possible are generated by the algorithm. The actual biological quality of the alignments then depends upon the evolutionary model used to generate the score.
The second question is purely statistical. A lot of work on the part of a lot of people has determined a few hard theoretical rules and many approximations. It is now generally accepted that the scores of alignments between random sequences follow the extreme value distribution. Pairwise alignment programs such as BLAST use simulation methods to estimate the parameters of this distribution for a particular parameter set (consisting of the query, database, substitution matrix and certain other parameters). Alignments can then be given a statistical significance value, allowing judgements on possible relationships between sequences to be inferred.
Multiple alignment is an extension of pairwise alignment to incorporate several sequences. Several methods for this exist, one of the most popular being the progressive alignment strategy as used by the CLUSTAL family of programs. Instead of searching a database, multiple alignment methods take a few sequences and find common regions between them all. This is typically used in cladistics as a method for building phylogenetic trees, as well as for creating sequence profiles which can be used to search sequence database for more distant relatives (the two most popular methods for remote-homologue detection, PSI-BLAST and Hidden Markov model (HMM) based methods both work on this principle).
Many strategies exist, however as yet no algorithm has been described which is guaranteed to find the best possible alignment.
Pairwise.
Global alignment only.
Pairwise.
Local or global alignment.
Implements the standard Smith-Waterman algorithm. Considerably slower than the more modern BLAST and FASTA methods.
(Stands for Basic Local Alignment Search Tool)
Pairwise local search. Uses a number of methods to increase the speed of the original Smith-Waterman algorithm.
Blast Server at the NCBI
Pairwise local search. Superseded by BLAST.
Progressive multiple alignment method.
Comes in several varieties (ClustalW, ClustalX etc.)
Pairwise alignment
Global Alignment
Local Alignment
Significance of Alignments
It is important to realise that the actual biological meaning of any alignment can never be absolutely guaranteed. However, statistical methods can be used to assess the liklihood of finding an alignment between two regions (or sequences) by chance, given the size of the database and its composition. Multiple alignment
Algorithms
Needleman-Wunsch
Smith-Waterman
Software
SSearch
BLAST
Fasta
Clustal
See also