Skip to main content
Mathematics LibreTexts

2.2: Visual Summaries of Quantitative Data

  • Page ID
    105814
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Section 1: Small Discrete Data

    Next, we are going to learn how to organize and summarize quantitative data. What will change when we consider the small discrete type of data? Consider the following data set obtained from asking twenty students how many hours they worked per day:

    5; 6; 3; 3; 2; 4; 7; 5; 2; 3; 5; 6; 5; 4; 4; 3; 5; 2; 5; 3

    Note that this discrete data is classified as "small" discrete as rarely a part-time employee works more than 8 hours per day. Can we treat this data set the same way we treated qualitative data sets previously? The answer is yes! We can list all possible values, do the tally, find the frequencies and the total, and then compute the relative frequencies.

    DATA VALUE

    TALLY

    FREQUENCY

    RELATIVE FREQUENCY

    2

    |||

    3

    3/20=0.15=15%

    3

    |||||

    5

    5/20=0.25=25%

    4

    |||

    3

    3/20=0.15=15%

    5

    ||||| |

    6

    6/20=0.30=30%

    6

    ||

    2

    2/20=0.10=10%

    7

    |

    1

    1/20=0.05=5%

    Total:

     

    20

    20/20=1=100%

    Try It Yourself! \(\PageIndex{1.1}\)
    Try It Yourself! \(\PageIndex{1.2}\)

    We can construct a relative frequency distribution table for small discrete data the same way we did earlier for qualitative data! Can we construct the frequency bar chart for "small' discrete data the same way we did it earlier for ordinal data? The answer is yes! And the relative frequency bar chart!

    A "frequency bar plot" showing hours worked per day of 20 students of which there are 3 students work 2 hours, 5 work 2, 3 work 4, 6 work 5, 2 work 6, 1 work 7.
    Figure \(\PageIndex{1.1}\): A "frequency bar plot" showing hours worked per day of 20 students of which there are 3 students work 2 hours, 5 work 2, 3 work 4, 6 work 5, 2 work 6, 1 work 7.
    A "relative frequency bar plot" showing hours worked per day of 20 students of which there are 15% students work 2 hours, 25% work 2, 15% work 4, 30% work 5, 10% work 6, 5% work 7.
    Figure \(\PageIndex{1.2}\): A "relative frequency bar plot" showing hours worked per day of 20 students of which there are 15% students work 2 hours, 25% work 2, 15% work 4, 30% work 5, 10% work 6, 5% work 7.

    However, there are a few things that we are going to do differently this time. Since quantitative (unlike qualitative) data is associated with the real number line we will draw the bars touching each other so that no number on the number line is left uncovered unless the corresponding frequency is zero. To highlight the differences between the quantitative and qualitative data, we are going to call this chart a histogram instead of a bar chart. As expected, a histogram that uses frequencies on the vertical axis is called a frequency histogram. Similarly, a histogram that uses relative frequencies on the vertical axis is called a relative frequency histogram.

    A frequency histogram showing hours worked per day of 20 students of which there are 3 students work 2 hours, 5 work 2, 3 work 4, 6 work 5, 2 work 6, 1 work 7.
    Figure \(\PageIndex{1.3}\): A frequency histogram showing hours worked per day of 20 students of which there are 3 students work 2 hours, 5 work 2, 3 work 4, 6 work 5, 2 work 6, 1 work 7.
    A relative frequency histogram showing hours worked per day of 20 students of which there are 15% students work 2 hours, 25% work 2, 15% work 4, 30% work 5, 10% work 6, 5% work 7.
    Figure \(\PageIndex{1.4}\): A relative frequency histogram showing hours worked per day of 20 students of which there are 15% students work 2 hours, 25% work 2, 15% work 4, 30% work 5, 10% work 6, 5% work 7.

    We discussed how to visually summarize small discrete data. In short, we can treat "small" discrete data the same way as qualitative data except the fact that the bar charts are now called histograms and the bars must have no gaps in between unless the corresponding frequency is zero.

    Try It Yourself! \(\PageIndex{1.3}\)
    Try It Yourself! \(\PageIndex{1.4}\)

    Section 2: Large Discrete Data

    Next, we are going to learn how to visualize large discrete data. Consider the following data set obtained by recording the presidential ages at inauguration:

    President

    Age

    President

    Age

    President

    Age

    Washington

    57

    Lincoln

    52

    Hoover

    54

    J. Adams

    61

    A. Johnson

    56

    F. Roosevelt

    51

    Jefferson

    57

    Grant

    46

    Truman

    60

    Madison

    57

    Hayes

    54

    Eisenhower

    62

    Monroe

    58

    Garfield

    49

    Kennedy

    43

    J. Q. Adams

    57

    Arthur

    51

    L. Johnson

    55

    Jackson

    61

    Cleveland

    47

    Nixon

    56

    Van Buren

    54

    B. Harrison

    55

    Ford

    61

    W. H. Harrison

    68

    Cleveland

    55

    Carter

    52

    Tyler

    51

    McKinley

    54

    Reagan

    69

    Polk

    49

    T. Roosevelt

    42

    G.H.W. Bush

    64

    Taylor

    64

    Taft

    51

    Clinton

    47

    Fillmore

    50

    Wilson

    56

    G. W. Bush

    54

    Pierce

    48

    Harding

    55

    Obama

    47

    Buchanan

    65

    Coolidge

    51

    Trump

    70

    Will we run into any issues if you try treat "large" discrete data same way as if it was small? If we do that, then first we would end up with the following frequency table:

    Frequency Table for the ages of US presidents.

    President's age at inauguration

    Frequency

    42

    1

    43

    1

    46

    1

    47

    3

    48

    1

    49

    2

    50

    1

    51

    5

    52

    2

    54

    5

    55

    4

    56

    3

    57

    4

    58

    1

    60

    1

    61

    3

    62

    1

    64

    2

    65

    1

    68

    1

    69

    1

    70

    1

    Total:

    45

    Then with the following frequency histogram:

    A "frequency bar plot" showing ages at inauguration of 45 US presidents.
    Figure \(\PageIndex{2.1}\): A "frequency bar plot" showing ages at inauguration of 45 US presidents. (Copyright; author via source)

    The problem with such a summary is that when more than half of the frequencies are 0s and 1s the summary isn't very informative! To deal with this issue, we first group the observations into classes (also known as categories or bins) and then treat each class as a distinct value. Each class is defined by a range of values from the lower-class limit up to but not including the upper-class limit. We define the lower-class limit (LCL) as the smallest value that could go in a class and the upper-class limit (UCL) as the lower-class limit of the next higher class. We define the class midpoint (CM) as the average of the lower-limit of a class and the upper-limit of the class. We define the class width (CW) is the difference between the lower limit of a class and the upper limit of the class.

    Note that because of the relations between the quantities, we only need two of the four of them to define the entire class structure! For example, if LCL = 15 and CW = 10 then the classes are 15-25, 25-35, 35-45, 45-55 etc.

    Also note that each value can only belong to one class! For example, 35 will belong to 35-45 and not to 25-35.

    The sample size determines the number of classes in the following way:

    Number of observations

    Number of classes

    25 or fewer

    5–6

    25–50

    7–14

    Over 50

    15–20

    We are going to use the following guideline for choosing the classes:

    1. Decide on the (approximate) number of classes.
    2. Calculate an approximate class width as \(\frac{\text{Maximum observation - Minimum Observation}}{\text{Number of classes}}\) and use the result to decide on a convenient class width.
    3. Choose a number for the lower limit of the first class, noting that it must be less than or equal to the minimum observation.
    4. Obtain the other lower-class limits by successively adding the class width chosen in Step 2.
    5. Use the results of Step 4 to specify all the classes.

    Let’s take another look at our data before we decide to split it into classes. Consider the following data set obtained by recording the presidential ages at inauguration:

    57 61 57 57 58 57 61 54 68 51 49 64 50 48 65

    52 56 46 54 49 51 47 55 55 54 42 51 56 55 51

    54 51 60 62 43 55 56 61 52 69 64 47 54 47 70

    The step-by-step procedure for choosing the classes:

    1. Decide on the (approximate) number of classes:

    \(\sim(7-14)\)

    1. Calculate an approximate class width:

    \(\frac{70-42}{\sim(7-14)}=\frac{28}{7}=4\approx5\) (b/c 5 is a "nicer" number than 4)

    1. Choose a number for the lower limit of the first class as 40.
    2. Obtain the other lower-class limits:

    40, 45, 50, 55, 60, 65, 70

    1. Use the results of Step 4 to specify all the classes:

    40-45, 45-50, 50-55, 55-60, 60-65, 65-70, 70-75

    Once we identified the classes, we can start the tally by assigning each value to one of the classes, count the frequency with the totals, and relative frequencies.

    Classes

    Tally

    Frequency

    Relative Frequency

    40 to 45

    ||

    2

    2/45=0.044=4.4%

    45 to 50

    ||||| ||

    7

    7/45=0.156=15.6%

    50 to 55

    ||||| ||||| |||

    13

    13/45=0.289=28.9%

    55 to 60

    ||||| ||||| ||

    12

    12/45=0.267=26.7%

    60 to 65

    ||||| ||

    7

    7/45=0.156=15.6%

    65 to 70

    |||

    3

    3/45=0.067=6.7%

    70 to 75

    |

    1

    1/45=0.022=2.2%

    Total:

    45

    45

    45/45=1=100%

    Basically, once the classes are created, we can construct a relative frequency distribution table for large discrete data the same way we did earlier for small discrete data!

    Once we have a complete frequency table, we can now construct a frequency histogram:

    A frequency histogram showing ages at inauguration of 45 US presidents of which there are 2 presidents between 40 and 45, 7 between 45-50, 13 between 50-55, 12 between 55-60, 7 between 60-65, 3 between 65-70, 1 between 70-75.
    Figure \(\PageIndex{2.2}\): A frequency histogram showing ages at inauguration of 45 US presidents of which there are 2 presidents between 40 and 45, 7 between 45-50, 13 between 50-55, 12 between 55-60, 7 between 60-65, 3 between 65-70, 1 between 70-75. (Copyright; author via source)

    Similarly, we can obtain the relative frequency histogram!

    A relative frequency histogram showing ages at inauguration of 45 US presidents of which there are 4.4% presidents between 40 and 45, 15.6% between 45-50, 28.9% between 50-55, 26.7% between 55-60, 15.6% between 60-65, 6.7% between 65-70, 2.2% between 70-75.
    Figure \(\PageIndex{2.3}\): A relative frequency histogram showing ages at inauguration of 45 US presidents of which there are 4.4% presidents between 40 and 45, 15.6% between 45-50, 28.9% between 50-55, 26.7% between 55-60, 15.6% between 60-65, 6.7% between 65-70, 2.2% between 70-75. (Copyright; author via source)

    Once the idea of grouping is understood the process of constructing the (relative) frequency table and the histogram is very intuitive!

    In summary, to construct a frequency table for discrete data:

    1. List the classes in the first column of a table.
    2. For each observation, place a tally mark in the second column of the table in the row of the appropriate class.
    3. Count the tallies for each class and record the totals in the third column of the table.

    In summary, to construct a (relative) histogram for discrete data:

    1. Obtain a frequency (relative frequency) distribution of the data.
    2. Draw a horizontal axis on which to place the bars and a vertical axis on which to display the frequencies (relative frequencies).
    3. For each class, construct a vertical bar whose height equals the frequency (relative frequency) of that class.
    4. Label the bars with the classes, the horizontal axis with the name of the variable, and the vertical axis with “Frequency” (“Relative frequency” or “Percent”).
      • For single-value grouping, we use the distinct values of the observations to label the bars, with each such value centered under its bar.
      • For class grouping, we use the midpoints to label the bars. Note: Some statisticians and technology use class limits to label the bars.

    While it may appear that we used different approaches to organize "small" and "large" discrete data, they both can be described by the term grouping! Single value grouping for "small" discrete data and interval grouping for "large" discrete data. The two methods have more in common than it appears at first sight because the single-value grouping can be viewed as an interval grouping with each single-value \(X\) defining a class of width 1 and \(X\) in the middle!

    We discussed how to organize large discrete data and concluded that essentially it is the same way as we organize qualitative and small discrete data with addition of the step called binning or creating classes.

    Try It Yourself! \(\PageIndex{2.1}\)

    Section 3: Continuous Data

    Next, we will discuss how to produce a visual summary of continuous data. Consider the following data set obtained from recording the heights (in inches) of 100 semiprofessional soccer players:

    66.18 64.49 72.24 64.70 65.80 67.55 64.15 65.42 70.62 65.85

    67.74 69.11 68.26 65.98 69.45 65.93 67.55 73.63 70.55 64.53

    67.03 67.48 66.19 61.69 69.30 61.57 66.01 68.62 70.72 65.81

    64.87 68.30 63.71 63.66 66.36 64.68 62.49 67.19 72.99 64.96

    68.12 59.83 67.97 67.33 64.98 66.09 65.56 67.62 67.66 63.07

    72.22 68.00 69.43 65.15 63.27 63.23 64.13 69.45 63.10 65.40

    68.92 67.57 64.21 68.36 68.88 66.39 68.28 67.27 68.75 67.59

    63.67 70.50 67.52 64.06 73.95 65.36 67.62 65.06 67.30 68.42

    66.08 65.91 64.82 69.23 68.40 68.29 73.27 69.35 69.80 68.42

    67.32 73.00 69.58 64.66 68.59 62.77 67.29 66.56 68.35 65.08

    Does continuous dataset look more like a small discrete data or large? It certainly looks like we are going to need some interval grouping here!

    The step-by-step procedure for choosing the classes:

    1. Decide on the (approximate) number of classes:

    \(\sim(15-20)\)

    1. Calculate an approximate class width:

    \(\frac{73.95-59.83}{\sim(15-20)}=\frac{\sim15}{15}\approx1\)

    1. Choose a number for the lower limit of the first class as 59.
    2. Obtain the other lower-class limits:

    59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74

    1. Use the results of Step 4 to specify all the classes:

    59-60,60-61,61-62,62-63,63-64,64-65,65-66,66-67

    67-68,68-69,69-70,70-71,71-72,72-73,73-74

    Now every entry can be assigned to a single class and treat it as qualitative data!

    Classes

    Tally

    Frequency

    Relative Frequency

    59 to 60

    |

    1

    0.01

    60 to 61

     

    0

    0.00

    61 to 62

    ||

    2

    0.02

    62 to 63

    ||

    2

    0.02

    63 to 64

    ||||| ||

    7

    0.07

    64 to 65

    ||||| ||||| |||

    13

    0.13

    65 to 66

    ||||| ||||| |||

    13

    0.13

    66 to 67

    ||||| |||

    8

    0.08

    67 to 68

    ||||| ||||| ||||| |||

    18

    0.18

    68 to 69

    ||||| ||||| ||||| |

    16

    0.16

    69 to 70

    ||||| ||||

    9

    0.09

    70 to 71

    ||||

    4

    0.04

    72 to 73

    |||

    3

    0.03

    73 to 74

    ||||

    4

    0.04

    Total:

     

    100

    1

    Now we can turn the frequency distribution table into the frequency histogram:

    A frequency histogram showing heights of 100 semiprofessional soccer players (in feet) of which there is 1 player between 59 and 60, 2 between 61-62, 2 between 62-63, 7 between 63-64, 13 between 64-65, 13 between 65-66, 8 between 66-67, 18 between 67-68, 16 between 68-69, 9 between 69-70, 4 between 70-71, 3 between 72-73, 4 between 73-74.
    Figure \(\PageIndex{3.1}\): A frequency histogram showing heights of 100 semiprofessional soccer players (in inches) of which there is 1 player between 59 and 60, 2 between 61-62, 2 between 62-63, 7 between 63-64, 13 between 64-65, 13 between 65-66, 8 between 66-67, 18 between 67-68, 16 between 68-69, 9 between 69-70, 4 between 70-71, 3 between 72-73, 4 between 73-74. (Copyright; author via source)

    And the relative frequency histogram:

    A relative frequency histogram showing heights of 100 semiprofessional soccer players (in feet) of which there is 1% between 59 and 60, 2% between 61-62, 2% between 62-63, 7% between 63-64, 13% between 64-65, 13% between 65-66, 8% between 66-67, 18% between 67-68, 16% between 68-69, 9% between 69-70, 4% between 70-71, 3% between 72-73, 4% between 73-74.
    Figure \(\PageIndex{3.2}\): A relative frequency histogram showing heights of 100 semiprofessional soccer players (in inches) of which there is 1% between 59 and 60, 2% between 61-62, 2% between 62-63, 7% between 63-64, 13% between 64-65, 13% between 65-66, 8% between 66-67, 18% between 67-68, 16% between 68-69, 9% between 69-70, 4% between 70-71, 3% between 72-73, 4% between 73-74. (Copyright; author via source)

    We discussed how to summarize continuous data! in short, we use interval grouping and treat it similarly to large discrete data!

    Try It Yourself! \(\PageIndex{3.1}\)

    This page titled 2.2: Visual Summaries of Quantitative Data is shared under a not declared license and was authored, remixed, and/or curated by Anton Butenko.

    • Was this article helpful?