Skip to main content
Mathematics LibreTexts

3.6: Five-Number Summaries and Boxplots

  • Page ID
    105825
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Previously, we learned how to use the mean and the standard deviation of a data set to figure out the shape of the distribution and the outliers. Next, we will learn how to do the same using the other numerical summaries!

    Definition: Five-number suymmary

    The values Q0, Q1, Q2, Q3, and Q4 together are called the five-number summary.

    Alternatively, we can think of the list as the 0th, 25th, 50th, 75th, 100th percentiles:

    Q0 (Minimum)

    Q1

    Q2 (Median)

    Q3

    Q4 (Maximum)

    0th percentile

    25th percentile

    50th percentile

    75th percentile

    100th percentile

    Before finding the five-value-summary by hand, we have to make sure that the data is organized in ascending order. Then it will be easier to find the minimum and maximum. We already know how to find the median that divides the data set into two halves – top and bottom. For simplicity, we are going to define the Q1 as the median of the bottom half and Q3 as the median of the top half.

    • Q0 (Minimum) – the smallest observation
    • Q1 – the median of the bottom half
    • Q2 (Median) – the middle observation
    • Q3 – the median of the top half
    • Q4 (Maximum) – the largest observation
    Example \(\PageIndex{1}\)

    Consider the ages of the US presidents at their inauguration. For convenience it is already organized in ascending order.

    42

    43

    46

    47

    47

    47

    48

    49

    49

    50

    51

    51

    51

    51

    51

    52

    52

    54

    54

    54

    54

    54

    55

    55

    55

    55

    56

    56

    56

    57

    57

    57

    57

    58

    60

    61

    61

    61

    62

    64

    64

    65

    68

    69

    70

    So let’s determine the five-number summary:

    • The minimum is 42 and the maximum is 70.
    • The number of observations is 45 which is an odd number, so the median is 55 - the 23rd observation that divides the data into upper 22 observations and lower 22 observations.
    • In the bottom half, the number of observations is 22 which is an even number, so the median of the bottom half is the average between the 11th and 12th observations, which is 51.
    • Similarly In the upper half, the number of observations is 22 which is an even number, so the median of the upper half is the average between the 11th and 12th observations, which is 59.

    The five-number summary is provided in the table:

    Min

    Q1

    Median

    Q3

    Max

    42

    51

    55

    59

    70

    Try It Yourself! \(\PageIndex{1.1}\)

    One of the two goals that we are trying to accomplish is to learn how to identify the outliers. For that we are going to need the following vocabulary.

    Definition: Interquartile Range (IQR)

    The interquartile range (IQR) is the difference between the Q3 and Q1, or in other words, it is the width of the middle 50%.

    Try It Yourself! \(\PageIndex{1.2}\)
    Definition: Lower and upper limits and potential outliers

    The values Q1 − 1.5IQR and Q3 + 1.5IQR are called the lower limit and upper limit of a data set. Values that are greater than upper limit or less than the lower limit are potential outliers.

    Try It Yourself! \(\PageIndex{1.3}\)

    The other goal that we are trying to accomplish is to learn how to visualize a data set. For that we are going to need the following vocabulary.

    Definition: Boxplot (box-and-whiskers diagram)

    A boxplot, also called a box-and-whisker diagram, is based on the five-number summary, and can be used to provide a graphical display of the center and variation of a data set. To construct a boxplot, we also need the concept of adjacent values. The adjacent values of a data set are the most extreme observations that still lie within the lower and upper limits; they are the most extreme observations that are not potential outliers. Note that, if a data set has no potential outliers, the adjacent values are just the minimum and maximum observations.

    To Construct a Boxplot:

    1. Determine the quartiles.
    2. Determine potential outliers and adjacent values.
    3. Draw a horizontal axis on which the numbers obtained in Steps 1 and 2 can be located. Above this axis, mark the quartiles and the adjacent values with vertical lines.
    4. Connect the quartiles to make a box, and then connect the box to the adjacent values with lines.
    5. Plot each potential outlier with an asterisk.

    Note: One can skip steps 2 and 5 if not concerned about the outliers. In such a case the adjacent values are the minimum and maximum.

    Example \(\PageIndex{2}\)

    For example, let’s construct the boxplot for the president’s ages at inauguration for which we already found the five-number summary.

    IQR=Q3-Q1=59-51=8

    1.5xIQR=1.5x8=12

    LL=Q1-1.5xIQR=51-12=39

    UL=Q3+1.5xIQR=59+12=71

    Since all the values are within the lower and upper limits there are no outliers. We draw the boxplot by creating the horizontal axis and drawing the vertical lines for each value in the five-number summary:

    clipboard_e403658b1445b3183b9f60316f5cb6ebe.png

    Try It Yourself! \(\PageIndex{2.1}\)
    Try It Yourself! \(\PageIndex{2.2}\)
    Try It Yourself! \(\PageIndex{2.3}\)

    After the boxplot is constructed, we can check the following chart to identify the shape of the distribution:

    clipboard_e1a115e06e9ff5282977c0fb577e418ed.png

    Example \(\PageIndex{3}\)

    clipboard_e403658b1445b3183b9f60316f5cb6ebe.png

    It appears that the shape of the distribution of the president’s ages is normal!

    We discussed how to use the five-number summary to identify the outliers and to visualize the data by constructing a boxplot.


    This page titled 3.6: Five-Number Summaries and Boxplots is shared under a not declared license and was authored, remixed, and/or curated by Anton Butenko.

    • Was this article helpful?