6.2: Descriptive Statitics:Measures of Center, Measures of Variation and the Five -Number Summary

Last updated
Save as PDF

Page ID: 4886

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

Measures of Central Tendency

Definition: Mean, Median, Mode

Mean: Add each data value and divide by the number of data values.
Median: Arrange the data values in numerical order. The median is the middle data value. If there is an even number of data, then find the mean of the two closest to the middle.
Mode: The data value that occurs most often.

Example \(\PageIndex{1}\):

Given the following set A, determine the mean, median, and mode of the set.

\(A = {1, 1, 1, 2, 3, 5, 5, 7, 9, 12, 23}\)

\(Mean = \displaystyle \frac{\Sigma A}{n}\), or the sum of all terms of A divided by the number of terms in A

\( = \displaystyle \frac{1 + 1 + 1 + 2 + 3 + 5 + 5 + 7 + 9 + 12 + 23}{11}\)

\(= \displaystyle \frac{69}{11}\)

\(= 6 \displaystyle \frac{3}{11}\)

\(Median = 5\)

\(Mode = 1\)

Measurements of Variation

Measurements of variation are well named. These quantities describe how far apart the data points can be from each other. If a data set is imagined as a bull's eye, measurements of variation will describe the size of the target, as well as where there are groups of points or gaps in points. We use the following measures to describe the dispersion of data:

Definition: Measures of Dispersion

Range describes the span of the data, or how far apart the biggest and smallest values are. It is calculated by subtracting the minimum value from the maximum value
Clusters occur when groups of data occur together, and apart from the rest of the data points. There may be one or more clusters in any given data set.
Gaps are places where data is expected to occur but does not.
Outliers are data points which occur individually and do not behave according to the trend described by the rest of the data.
Standard Deviation, given by \(s\), describes how far, on average, observed data is from the expected mean.

Where n is the number of observations, we can determine a number of quantities:

Definition: Describing the Data

Variance describes how far from the average a set of values in a data set is expected to fall. Variance = s²

The first quartile (Q₁) is the median of the part of the entire data set that lies at or below the median of the data set.

The second quartile (Q₂) is the median of the data set.

The third quartile (Q₃) is the median of the part of the entire data set that lies at or above the median of the data set.

Interquartile Range describes the difference between the first and third quartiles. IQR = Q₃ – Q₁

Five-Number Summary consists of five numbers that describe a data set:

The data's minimum value
The first quartile
The median
The third quartile
The data's maximum value

There are many ways in which a set of data can be distributed. In this course, we will focus on five distributions: uniform, skewed to the right, skewed to the left, bimodal, and normal.

Example \(\PageIndex{2}\):

Given data set B, give the five-number summary of the set.

\(B = 1, 1, 2, 2, 4, 5, 5, 6, 6, 6, 6, 7, 8, 8, 9, 9, 9, 10, 12, 14, 16, 22, 29\)

Remember, the five-number summary of a set is

The data's minimum value
The first quartile
The median
The third quartile
The data's maximum value

Let's start with the minimum, maximum, and median values, as those are the simplest.

1. The minimum value is 1.

3. The median value is the twelfth value (there are 23 values in all): 7

5. The maximum value is 29.

For the first and third quartiles, things get a little more complicated. When determining the first quartile, we include the median. When we switch to the third quartile, however, we cannot use the median again. Whether we use the median for the first or third quartiles is an arbitrary choice, but since Microsoft Excel uses the median to determine the first quartile, that's what we will do. Not all software does this, so you should be aware that things might not always be the same.

2. The first quartile is the average of the sixth and seventh terms (there are 12 terms in the first half, median included): \(\displaystyle \frac{5 + 5}{2} = 5\)

4. The third quartile is the 18th term (there are 11 terms in the second half, median excluded, and 12 plus six is 18): \(10\)

So, the five-number summary is:

\({1, \, 5, \, 7, \, 10, \, 29}\)

Exercise \(\PageIndex{1}\):Five number summary

Given \(3,2,0,5,5,3,1,0,3,2\)

Obtain the five-number summary for these data.

Identify potential outliers, if any.
Construct a boxplot.

Note:

What is the meaning of the interpolated median? How does it differ from the median?