8.8: An Application to Linear Codes over Finite Fields
- Page ID
- 59051
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)An Application to Statistical Principal Component
Analysis
Linear algebra is important in multivariate analysis in statistics, and we conclude with a very short look at one application of diagonalization in this area. A main feature of probability and statistics is the idea of a random variable \(X\), that is a real-valued function which takes its values according to a probability law (called its distribution). Random variables occur in a wide variety of contexts; examples include the number of meteors falling per square kilometre in a given region, the price of a share of a stock, or the duration of a long distance telephone call from a certain city.
The values of a random variable \(X\) are distributed about a central number \(\mu\), called the mean of \(X\). The mean can be calculated from the distribution as the expectation \(E(X) = \mu\) of the random variable \(X\). Functions of a random variable are again random variables. In particular, \((X - \mu)^{2}\) is a random variable, and the variance of the random variable \(X\), denoted \(\func{var}(X)\), is defined to be the number
\[\func{var}(X) = E\{(X - \mu)^2\} \quad \mbox{ where } \mu = E(X) \nonumber \]
It is not difficult to see that \(\func{var}(X) \geq 0\) for every random variable \(X\). The number \(\sigma = \sqrt{\func{var}(X)}\) is called the standard deviation of \(X\), and is a measure of how much the values of \(X\) are spread about the mean \(\mu\) of \(X\). A main goal of statistical inference is finding reliable methods for estimating the mean and the standard deviation of a random variable \(X\) by sampling the values of \(X\).
If two random variables \(X\) and \(Y\) are given, and their joint distribution is known, then functions of \(X\) and \(Y\) are also random variables. In particular, \(X + Y\) and \(aX\) are random variables for any real number \(a\), and we have
\[E(X + Y) = E(X) + E(Y) \quad \mbox{ and } \quad E(aX) = aE(X). \footnote{Hence $E(\ )$ is a linear transformation from the vector space of all random variables to the space of real numbers.} \nonumber \]
An important question is how much the random variables \(X\) and \(Y\) depend on each other. One measure of this is the covariance of \(X\) and \(Y\), denoted \(\func{cov}(X, Y)\), defined by
\[\func{cov}(X, Y) = E\{(X - \mu)(Y - \upsilon)\} \quad \mbox{ where } \mu = E(X) \mbox{ and } \upsilon = E(Y) \nonumber \]
Clearly, \(\func{cov}(X, X) = \func{var}(X)\). If \(\func{cov}(X, Y) = 0\) then \(X\) and \(Y\) have little relationship to each other and are said to be uncorrelated.1
Multivariate statistical analysis deals with a family \(X_{1}, X_{2}, \dots, X_{n}\) of random variables with means \(\mu_{i} = E(X_{i})\) and variances \(\sigma_{i}^2 = \func{var}(X_{i})\) for each \(i\). Let \(\sigma_{ij} = \func{cov}(X_{i}, X_{j})\) denote the covariance of \(X_{i}\) and \(X_{j}\). Then the covariance matrix of the random variables \(X_{1}, X_{2}, \dots, X_{n}\) is defined to be the \(n \times n\) matrix
\[\Sigma = [\sigma_{ij}] \nonumber \]
whose \((i, j)\)-entry is \(\sigma_{ij}\). The matrix \(\Sigma\) is clearly symmetric; in fact it can be shown that \(\Sigma\) is positive semidefinite in the sense that \(\lambda \geq 0\) for every eigenvalue \(\lambda\) of \(\Sigma\). (In reality, \(\Sigma\) is positive definite in most cases of interest.) So suppose that the eigenvalues of \(\Sigma\) are \(\lambda_{1} \geq \lambda_{2} \geq \cdots \geq \lambda_{n} \geq 0\). The principal axes theorem (Theorem [thm:024303]) shows that an orthogonal matrix \(P\) exists such that
\[P^T\Sigma P = \func{diag}(\lambda_{1}, \lambda_{2}, \dots, \lambda_{n}) \nonumber \]
If we write \(\overline{X} = (X_{1}, X_{2}, \dots, X_{n})\), the procedure for diagonalizing a quadratic form gives new variables \(\overline{Y} = (Y_{1}, Y_{2}, \dots, Y_{n})\) defined by
\[\overline{Y} = P^T\overline{X} \nonumber \]
These new random variables \(Y_{1}, Y_{2}, \dots, Y_{n}\) are called the principal components of the original random variables \(X_{i}\), and are linear combinations of the \(X_{i}\). Furthermore, it can be shown that
\[\func{cov}(Y_{i}, Y_{j}) = 0 \mbox{ if } i \neq j \quad \mbox{ and } \quad \func{var}(Y_{i}) = \lambda_{i} \quad \mbox{ for each } i \nonumber \]
Of course the principal components \(Y_{i}\) point along the principal axes of the quadratic form \(q = \overline{X}^T\Sigma\overline{X}\).
The sum of the variances of a set of random variables is called the total variance of the variables, and determining the source of this total variance is one of the benefits of principal component analysis. The fact that the matrices \(\Sigma\) and \(\diag(\lambda_{1}, \lambda_{2}, \dots, \lambda_{n})\) are similar means that they have the same trace, that is,
\[\sigma_{11} + \sigma_{22} + \cdots + \sigma_{nn} = \lambda_{1} + \lambda_{2} + \cdots + \lambda_{n} \nonumber \]
This means that the principal components \(Y_{i}\) have the same total variance as the original random variables \(X_{i}\). Moreover, the fact that \(\lambda_{1} \geq \lambda_{2} \geq \cdots \geq \lambda_{n} \geq 0\) means that most of this variance resides in the first few \(Y_{i}\). In practice, statisticians find that studying these first few \(Y_{i}\) (and ignoring the rest) gives an accurate analysis of the total system variability. This results in substantial data reduction since often only a few \(Y_{i}\) suffice for all practical purposes. Furthermore, these \(Y_{i}\) are easily obtained as linear combinations of the \(X_{i}\). Finally, the analysis of the principal components often reveals relationships among the \(X_{i}\) that were not previously suspected, and so results in interpretations that would not otherwise have been made.
- If \(X\) and \(Y\) are independent in the sense of probability theory, then they are uncorrelated; however, the converse is not true in general.↩