7.5: Local Alignments
We have so far discussed how to align two sequences over their entire length, called a global alignment. Often, however, it is more useful to align two sequences over only part of their lengths, called a local alignment. In bioinformatics, the algorithm for global alignment is called "Needleman-Wunsch," and that for local alignment "Smith-Waterman." Local alignments are useful, for instance, when searching a long genome sequence for alignments to a short DNA segment. They are also useful when aligning two protein sequences since proteins can consist of multiple domains, and only a single domain may align.
If for simplicity we consider a constant gap penalty \(g\) , then a local alignment can be obtained using the rule
\[T(i, j)=\max \left\{\begin{array}{l} 0, \\[4pt] T(i-1, j-1)+S\left(a_{i}, b_{j}\right), \\[4pt] T(i-1, j)+g, \\[4pt] T(i, j-1)+g . \end{array}\right. \nonumber \]
After the dynamic matrix is computed using (7.8), the traceback algorithm starts at the matrix element with the highest score, and stops at the first encountered zero score.
If we apply the Smith-Waterman algorithm to locally align the two sequences GGAT and GAATT considered previously, with a match scored as \(+2\) , a mismatch as \(-1\) and an indel as \(-2\) , the dynamic matrix is
\(\begin{array}{ccccccc} & - & \mathrm{G} & \mathrm{A} & \mathrm{A} & \mathrm{T} & \mathrm{T} \\[4pt] - & 0 & 0 & 0 & 0 & 0 & 0 \\[4pt] \mathrm{G} & 0 & 2 & 0 & 0 & 0 & 0 \\[4pt] \mathrm{G} & 0 & 2 & 1 & 0 & 0 & 0 \\[4pt] \mathrm{~A} & 0 & 0 & 4 & 3 & 1 & 0 \\[4pt] \mathrm{~T} & 0 & 0 & 2 & 3 & 5 & 3\end{array}\)
The traceback algorithm starts at the highest score, here the 5 in matrix element \((4,4)\) , and ends at the 0 in matrix element \((0,0)\) . The resulting local alignment is
\[\begin{gathered} :: \\[4pt] \text { GAAT } \end{gathered} \nonumber \]
which has a score of five, larger than the previous global alignment score of three.