3.3: Measures of Variation
- Page ID
- 105822
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Next, we will discuss the measures of variation or spread that describe how much data varies from the center. Consider the two sets of players:
|
Team 1 |
Team 2 |
|---|---|
|
71, 73, 76, 76, 78 |
67, 72, 76, 76, 84 |
The mean of each group is \(75\), but clearly the two teams are not the same. One of the differences between the two data sets that any measure of center doesn't capture is the variety of data within the set. To describe that difference quantitatively, we use a descriptive measure that indicates the amount of variation or spread, in a data set. Such descriptive measures are referred to as measures of variation or measures of spread.
Measures of variation and spread are numerical summaries used to describe the distribution of data points in a dataset. They provide insights into how much the data varies and how spread out the values are.
Just as there are several different measures of center, there are also several different measures of variation. In this section, we will examine the most frequently used measures of variation.
One example of a measure of variation is the range.
The range of a data set is the difference between the maximum (largest) and minimum (smallest) observations.
For example, the range of Team 1 is \(78-72=6\) and the range of Team 2 is \(84-67=17\).
The range only measures the total variation and doesn't capture any variation between the minimum and maximum observed values. It is not the only nor the best way to measure variation!
In contrast to the range, the standard deviation takes into account all the observations and is the preferred measure of variation when the mean is used as the measure of center. The formulas for finding the standard deviations of a sample and a population differ slightly.
First, assume that we are dealing with a population. We are going to list all the values in the first column and compute the population mean:
\(\mu=75\)
We will subtract the mean from each observation to make sure that all observations are accounted for! If we were to add these differences now then we would have gotten a zero total which is not what we want, so instead we square each difference to count each variation as positive. Now, it makes sense to add them all together to obtain the sum of squared differences:
\(\Sigma(x_i-\mu)^2=24\)
However, the total sum of squared differences is not a good way to measure the variation just like the grand total of a students’ grades is not a good way to measure their academic achievements. We naturally divide the sum by \(N\) to find the “average”:
\(\frac{\Sigma(x_i-\mu)^2}{N}=\frac{24}{5}=4.8=\sigma^2\)
This measure is labeled as \(\sigma^2\) and is called the population variance and is very significant in theory, however, from a practical point of view we can make it better by taking the square root:
\(\sqrt\frac{\Sigma(x_i-\mu)^2}{N}=\sqrt4.8=2.19=\sigma\)
This measure is labeled with a Greek letter \(\sigma\) and is called the population standard deviation and its advantage is that it has the same units as the original data (in this case inches) and therefore can be used as a measuring stick!
|
\(x_i\) |
\(x_i-\mu\) |
\((x_i-\mu)^2\) |
|---|---|---|
|
72 |
72-75=-3 |
9 |
|
73 |
73-75=-2 |
4 |
|
76 |
76-75=1 |
1 |
|
76 |
76-75=1 |
1 |
|
78 |
78-75=3 |
9 |
|
\(\mu=75\) |
\(\Sigma(x_i-\mu)^2=24\) |
|
|
Population Variance |
\(\frac{\Sigma(x_i-\mu)^2}{N}=\frac{24}{5}=4.8=\sigma^2\) |
|
|
Population Standard Deviation |
\(\sqrt\frac{\Sigma(x_i-\mu)^2}{N}=\sqrt4.8=2.19=\sigma\) |
Roughly speaking, the standard deviation measures variation by indicating how far, on average, the observations are from the mean. For a data set with a large amount of variation, the observations will, on average, be far from the mean, so the standard deviation will be large. For a data set with a small amount of variation, the observations will, on average, be close to the mean, so the standard deviation will be small.
Remember! Do not perform any rounding until the computation is complete; otherwise, substantial roundoff error can occur.
Now, let's assume that we actually are interested in the standard deviation of the heights of all players in the league, but we have only accessed the given (\5\) players, which are now became a sample. We will label the mean as \(\bar{x}\) rather than \(\mu\) and will perform the first few steps in the same way as before up until it is time to divide. This time, we are going to divide by \(4\) instead of \(5\). We will learn why later but for right now let’s just register the fact that to compute the sample variance we divide by the \(n-1\) rather than the sample size \(n\):
\(\frac{\Sigma(x_i-\bar{x})^2}{n}=\frac{24}{4}=6=s^2\)
We finish by taking the square root:
\(\sqrt\frac{\Sigma(x_i-\bar{x})^2}{n}=\sqrt6=s\)
This measure is called the sample standard deviation and it has the same units as the original data (in this case inches).
|
\(x_i\) |
\(x_i-\bar{x}\) |
\((x_i-\bar{x})^2\) |
|---|---|---|
|
72 |
72-75=-3 |
9 |
|
73 |
73-75=-2 |
4 |
|
76 |
76-75=1 |
1 |
|
76 |
76-75=1 |
1 |
|
78 |
78-75=3 |
9 |
|
\(\bar{x}=75\) |
\(\Sigma(x_i-\bar{x})^2=24\) |
|
|
Sample Variance |
\(\frac{\Sigma(x_i-\bar{x})^2}{n-1}=\frac{24}{4}=6=s^2\) |
|
|
Sample Standard Deviation |
\(\sqrt\frac{\Sigma(x_i-\bar{x})^2}{n-1}=\sqrt6=2.45=s\) |
Usually, we use the same formula for the population’s parameter and a sample’s statistic, however as it became obvious it is not the case with the standard deviation and variance:
|
Population |
Sample |
|---|---|
|
|
So why do we divide by \(n-1\) instead of \(n\)?
Remember that the main purpose of finding a statistic is to estimate the unknown parameter (in this case the population variance, \(\sigma^2\)). To do that we use the statistic (in this case, the sample variance \(s^2\)). If we divide by \(n\) instead of \(n-1\) we will be obtaining a value that is always underestimating the true population variance by a factor of \(\frac{n}{n-1}\).

After the correction, we get a formula for the desired unbiased estimator for the population variance which we call a sample variance:
\(\frac{\Sigma(x_i-\bar{x})^2}{n}\cdot\frac{n}{n-1}=\frac{\Sigma(x_i-\bar{x})^2}{n-1}=s^2\)
We discussed the most frequently used measures of variation or spread that describe how much data varies from the center.


