7.1: DNA
The minimum you need to know about DNA chem- istry and the genetic code
In one of the most important scientific papers ever published, James Watson and Francis Crick, pictured in Fig. \(7.2\) , determined the structure of DNA using a three- CLUSTAL W (1.83) multiple sequence alignment
dimensional molecular model that makes plain the chemical basis of heredity. The DNA molecule consists of two strands wound around each other to form the now famous double helix. Arbitrarily, one strand is labeled by the sequencing group to be the positive strand, and the other the negative strand. The two strands of the DNA molecule bind to each other by base pairing: the bases of one strand pair with the bases of the other strand. Adenine (A) always pairs with thymine (T), and guanine \((\mathrm{G})\) always pairs with cytosine \((\mathrm{C}): \mathrm{A}\) with \(\mathrm{T}, \mathrm{G}\) with \(\mathrm{C}\) . For RNA, T is replaced by uracil (U). When reading the sequence of nucleotides from a single strand, the direction of reading must be specified, and this is possible by referring to the chemical bonds of the DNA backbone. There are of course only two possible directions to read a linear sequence of bases, and these are denoted as \(5^{\prime}-\) to- \(3^{\prime}\) and \(3^{\prime}-t o-5^{\prime} .\) Importantly, the two separate strands of the DNA molecule are oriented in opposite directions. Below is the beginning of the DNA coding sequence for the human hemoglobin beta chain protein discussed earlier:
It is important to realize that there are two unique DNA sequences here, and either one, or even both, can be coding. Reading from \(5^{\prime}-\) to- \(3^{\prime}\) , the upper sequence begins ’GTGCACCTG...’, while the lower sequence ends’...CAGGTGCAC’. Here, only the upper sequence codes for the human hemoglobin beta chain, and the lower sequence is non-coding.
Sequence alignment by brute force
Sequence alignment by dynamic programming
Gap opening and gap extension penalties
\(7.5\) Local alignments
\(7.6\) Software
If you have in hand two or more sequences that you would like to align, there is a choice of software tools available. For relatively short sequences, you can use the LALIGN program for global or local alignments: http://embnet.vital-it.ch/software/LALIGN_form.html
For longer sequences, the BLAST software has a flavor that permits local alignment of two sequences:
http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi
Another useful software for global alignment of two or more long DNA sequences is PipMaker
http://pipmaker.bx.psu.edu/pipmaker/
Multiple global alignments of protein sequences use ClustalW or T-Coffee:
Most users of sequence alignment software want to compare a given sequence against a database of sequences. The BLAST software is most widely used, and comes in several versions depending on the type of sequence and database search one is performing:
http://www.ncbi.nlm.nih.gov/BLAST/