7.1: DNA

Last updated
Save as PDF

Page ID: 93521

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\dsum}{\displaystyle\sum\limits} \)

\( \newcommand{\dint}{\displaystyle\int\limits} \)

\( \newcommand{\dlim}{\displaystyle\lim\limits} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\(\newcommand{\longvect}{\overrightarrow}\)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

The minimum you need to know about DNA chem- istry and the genetic code

In one of the most important scientific papers ever published, James Watson and Francis Crick, pictured in Fig. \(7.2\), determined the structure of DNA using a three- CLUSTAL W (1.83) multiple sequence alignment

Figure 7.1: Multiple alignment of the hemoglobin beta-chain for Human, Chimpanzee, Rat and Zebra fish, obtained using ClustalW.

dimensional molecular model that makes plain the chemical basis of heredity. The DNA molecule consists of two strands wound around each other to form the now famous double helix. Arbitrarily, one strand is labeled by the sequencing group to be the positive strand, and the other the negative strand. The two strands of the DNA molecule bind to each other by base pairing: the bases of one strand pair with the bases of the other strand. Adenine (A) always pairs with thymine (T), and guanine \((\mathrm{G})\) always pairs with cytosine \((\mathrm{C}): \mathrm{A}\) with \(\mathrm{T}, \mathrm{G}\) with \(\mathrm{C}\). For RNA, T is replaced by uracil (U). When reading the sequence of nucleotides from a single strand, the direction of reading must be specified, and this is possible by referring to the chemical bonds of the DNA backbone. There are of course only two possible directions to read a linear sequence of bases, and these are denoted as \(5^{\prime}-\) to- \(3^{\prime}\) and \(3^{\prime}-t o-5^{\prime} .\) Importantly, the two separate strands of the DNA molecule are oriented in opposite directions. Below is the beginning of the DNA coding sequence for the human hemoglobin beta chain protein discussed earlier:

It is important to realize that there are two unique DNA sequences here, and either one, or even both, can be coding. Reading from \(5^{\prime}-\) to- \(3^{\prime}\), the upper sequence begins ’GTGCACCTG...’, while the lower sequence ends’...CAGGTGCAC’. Here, only the upper sequence codes for the human hemoglobin beta chain, and the lower sequence is non-coding.

Figure 7.2: James Watson and Francis Crick posing in front of their DNA model. The original photograph was taken in 1953, the year of discovery, and was recreated in 2003, fifty years later. Francis Crick, the man on the right, died in \(2004 .\)

Figure 7.3: The genetic code. codes for a single amino acid. The triplet coding of nucleotides for amino acids is the famous genetic code, shown in Fig. 7.3. Here, the translation to amino acid sequence is ’VHL...’, where we have used the genetic code ’GUG’ = V, ’CAC’=H, ’CUG’ \(=\) L. The three out of the twenty amino acids used here are \(\mathrm{V}=\) Valine, \(\mathrm{H}=\) Histidine, and \(L=\) Leucine

Sequence alignment by brute force

Sequence alignment by dynamic programming

Gap opening and gap extension penalties

\(7.5\) Local alignments

\(7.6\) Software

If you have in hand two or more sequences that you would like to align, there is a choice of software tools available. For relatively short sequences, you can use the LALIGN program for global or local alignments: http://embnet.vital-it.ch/software/LALIGN_form.html

For longer sequences, the BLAST software has a flavor that permits local alignment of two sequences:

http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi

Another useful software for global alignment of two or more long DNA sequences is PipMaker

http://pipmaker.bx.psu.edu/pipmaker/

Multiple global alignments of protein sequences use ClustalW or T-Coffee:

http://www.clustal.org/

Most users of sequence alignment software want to compare a given sequence against a database of sequences. The BLAST software is most widely used, and comes in several versions depending on the type of sequence and database search one is performing:

http://www.ncbi.nlm.nih.gov/BLAST/

Search

Text Color

Text Size

Margin Size

Font Type