9.3: Measures of Variation .

Last updated
Save as PDF

Page ID: 139300

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $

$ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $

$ \newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$

( \newcommand{\kernel}{\mathrm{null}\,}\) $ \newcommand{\range}{\mathrm{range}\,}$

$ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$

$ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$

$ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$

$ \newcommand{\Span}{\mathrm{span}}$

$ \newcommand{\id}{\mathrm{id}}$

$ \newcommand{\Span}{\mathrm{span}}$

$ \newcommand{\kernel}{\mathrm{null}\,}$

$ \newcommand{\range}{\mathrm{range}\,}$

$ \newcommand{\RealPart}{\mathrm{Re}}$

$ \newcommand{\ImaginaryPart}{\mathrm{Im}}$

$ \newcommand{\Argument}{\mathrm{Arg}}$

$ \newcommand{\norm}[1]{\| #1 \|}$

$ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$

$ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\AA}{\unicode[.8,0]{x212B}}$

$ \newcommand{\vectorA}[1]{\vec{#1}} % arrow$

$ \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow$

$ \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $

$ \newcommand{\vectorC}[1]{\textbf{#1}} $

$ \newcommand{\vectorD}[1]{\overrightarrow{#1}} $

$ \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} $

$ \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} $

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $

$ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $

Another component of describing a data set is how much “Spread” there is in the data set. In other words, how much the data in the distribution vary from one another. It may seem like once we know the center of a data set, we know everything there is to know. The first example will demonstrate why we need measures of variation (or spread).

There are several ways to measure this "Spread" of the data. The three most common measures are the range, standard deviation, and quartiles. In this section we will learn about the range and standard deviation. We will discuss quartiles in the following section.

We will focus first on the simplest measure of spread, called the range.

Range

The range is the difference between the maximum value and the minimum value of the data set.

Example $\PageIndex{1}$

Consider these three sets of quiz scores:

Section A	5	5	5	5	5	5	5	5	5	5
Section B	0	0	0	0	0	10	10	10	10	10	10
Section C	4	4	4	5	5	5	5	6	6	6

Solution

All three of these sets of data have a mean of 5 and median of 5 . If we only calculated a measure of center for each set of scores, we would say the three sets are all identical, yet the sets of scores are clearly quite different. Calculating a measure of variability (or spread) will help identify how they are different.

For section $\mathrm{A}$, the range is 0 since both maximum and minimum are 5 and $5-5=0$

For section $\mathrm{B}$, the range is 10 since $10-0=10$

For section $C$, the range is 2 since $6-4=2$

You Try It $\PageIndex{1}$

The price of a jar of peanut butter at 5 stores was: $3.29, $3.59, $3.79, $3.75, and $3.99. Find the range of the prices.

Answer: The range of the data is $0.70.

In example 1, the range seems to be revealing how spread out the data is. However, suppose we add a fourth section, Section D,.

This section also has a mean and median of 5. The range is 10, yet this data set is quite different than Section B. To better illuminate the differences, we’ll have to turn to more sophisticated measures of variation.

Standard Deviaion

The standard deviation is a measure of variation based on measuring how far, on average, each data value deviates, or is different, from the mean. A few important characteristics:

Standard deviation is always positive. Standard deviation will be zero if all the data values are equal, and will get larger as the data spreads out.
Standard deviation has the same units as the original data.
Standard deviation, like the mean, can be highly influenced by outliers.

Using the data from Section D: 0 5 5 5 5 5 5 5 5 10,

Section D	0	5	5	5	5	5	5	5	5	10

we could compute for each data value the difference between the data value and the mean. This will give us an idea of “how far” each value in the data set lies away from the mean.

data value	deviation: data value - mean
0	0-5 = -5
5	5-5 = 0
5	5-5 = 0
5	5-5 = 0
5	5-5 = 0
5	5-5 = 0
5	5-5 = 0
5	5-5 = 0
5	5-5 = 0
10	10-5 = 5

We would like to get an idea of the "average" deviation from the mean, but if we find the average of the values in the second column the negative and positive values cancel each other out (this always happens), so instead we square every value in the second column:

data value	deviation: data value - mean	deviation squared
0	0-5 = -5	(-5)² = 25
5	5-5 = 0	0² = 0
5	5-5 = 0	0² = 0
5	5-5 = 0	0² = 0
5	5-5 = 0	0² = 0
5	5-5 = 0	0² = 0
5	5-5 = 0	0² = 0
5	5-5 = 0	0² = 0
5	5-5 = 0	0² = 0
10	10-5 = 5	(5)² = 25

We then add the squared deviations up to get $25+0+0+0+0+0+0+0+0+25=$ 50. Ordinarily we would then divide by the number of scores, $n$, (in this case, 10 ) to find the mean of the deviations. But we only do this if the data set represents a population; if the data set represents a sample (as it almost always does), we instead divide by $n-1$ (in this case, $\quad 10-1=9) $

Note

The reason we do this is highly technical, but we can see how it might be useful by considering the case of a small sample from a population that contains an outlier, which would increase the average deviation: the outlier very likely won't be included in the sample, so the mean deviation of the sample would underestimate the mean deviation of the population; thus we divide by a slightly smaller number to get a slightly bigger average deviation.

So in our example, we would have 50/10 = 5 if section D represents a population and 50/9 = about 5.56 if section D represents a sample. These values (5 and 5.56) are called, respectively, the population variance and the sample variance for section D.

Variance can be a useful statistical concept, but note that the units of variance in this instance would be points-squared since we squared all of the deviations. What are points-squared? Good question. We would rather deal with the units we started with (points in this case), so to convert back we take the square root and get:
population standard deviation $=\sqrt{\frac{50}{10}}=\sqrt{5} \approx 2.2$

sample standard deviation $=\sqrt{\frac{50}{9}} \approx 2.4$

If we are unsure whether the data set is a sample or a population, we will usually assume it is a sample, and we will round answers to one more decimal place than the original data, as we have done above.

To Compute Standard Deviation

To Compute Standard Deviation:

Find the deviation of each data from the mean. In other words, subtract the mean from the data value.
Square each deviation.
Add the squared deviations.
Divide by n, the number of data values, if the data represents a whole population; divide by n – 1 if the data is from a sample.
Compute the square root of the result.

Example $\PageIndex{2}$

Computing the standard deviation for Section B above, we first calculate that the mean is
5. Using a table can help keep track of your computations for the standard deviation:

Solution

data value	deviation: data value - mean	deviation squared
0	0-5 = -5	(-5)² = 25
0	0-5 = -5	(-5)² = 25
0	0-5 = -5	(-5)² = 25
0	0-5 = -5	(-5)² = 25
0	0-5 = -5	(-5)² = 25
10	10-5=5	(5)² = 25
10	10-5=5	(5)² = 25
10	10-5=5	(5)² = 25
10	10-5=5	(5)² = 25
10	10-5=5	(5)² = 25

Computing the standard deviation for Section B above, we first calculate that the mean is 5. Using a table can help keep track of your computations for the standard deviation:

\[
\sqrt{\frac{25+25+25+25+25+25+25+25+25+25}{10}}=\sqrt{\frac{250}{10}}=5
\]

Notice that the standard deviation of this data set is much larger than that of section $\mathrm{D}$ since the data in this set is more spread out.

For comparison, the standard deviations of all four sections are:

Section A	5	5	5	5	5	5	5	5	5	5		Standard deviation: 0
Section B	0	0	0	0	0	10	10	10	10	10	10	Standard deviation: 5
Section C	4	4	4	5	5	5	5	6	6	6		Standard deviation: 0.8
Section D	0	5	5	5	5	5	5	5	5	10		Standard deviation: 2.2

You Try It $\PageIndex{2}$

The price of a jar of peanut butter at 5 stores were: $3.29, $3.59, $3.79, $3.75, and $3.99. Find the standard deviation of the prices.

Answer: The standard deviation of the data is $0.26.

Calculator Instructions for Finding Summary Statistics Using TI-83/84:

Turn on the calculator
Press the “STAT” key
Hit “Enter” on option 1: “Edit” This will bring you to a screen that contains lists: L1, L2, L3, etc.
Enter the data values (one value per row) into L1. For any negative values you need to use the (-) key, not the subtraction key. Continue until all data is entered into L1.
Press the “STAT” key again
Use the arrow key to scroll over to “CALC”.
Select option 1: “1-Var Stats”
Indicate that the data is in L1
Scroll down to “Calculate” and hit “Enter”

The summary statistics should now be displayed. You may scroll down with your arrow key to get remaining statistics. Using “1-Var Stats” you can get the sample mean, sample standard deviation, population standard deviation, and 5 number summary.

Search

Text Color

Text Size

Margin Size

Font Type

Solution

Solution

Range

Example \(\PageIndex{1}\)

Solution

You Try It \(\PageIndex{1}\)

Standard Deviaion

Note

To Compute Standard Deviation

Example \(\PageIndex{2}\)

Solution

You Try It \(\PageIndex{2}\)

Calculator Instructions for Finding Summary Statistics Using TI-83/84: