2.2: Visual Summaries of Quantitative Data
- Page ID
- 105814
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Section 1: Small Discrete Data
Next, we are going to learn how to organize and summarize quantitative data. What will change when we consider the small discrete type of data? Consider the following data set obtained from asking twenty students how many hours they worked per day:
5; 6; 3; 3; 2; 4; 7; 5; 2; 3; 5; 6; 5; 4; 4; 3; 5; 2; 5; 3
Note that this discrete data is classified as "small" discrete as rarely a part-time employee works more than 8 hours per day. Can we treat this data set the same way we treated qualitative data sets previously? The answer is yes! We can list all possible values, do the tally, find the frequencies and the total, and then compute the relative frequencies.
|
DATA VALUE |
TALLY |
FREQUENCY |
RELATIVE FREQUENCY |
|---|---|---|---|
|
2 |
||| |
3 |
3/20=0.15=15% |
|
3 |
||||| |
5 |
5/20=0.25=25% |
|
4 |
||| |
3 |
3/20=0.15=15% |
|
5 |
||||| | |
6 |
6/20=0.30=30% |
|
6 |
|| |
2 |
2/20=0.10=10% |
|
7 |
| |
1 |
1/20=0.05=5% |
|
Total: |
20 |
20/20=1=100% |
We can construct a relative frequency distribution table for small discrete data the same way we did earlier for qualitative data! Can we construct the frequency bar chart for "small' discrete data the same way we did it earlier for ordinal data? The answer is yes! And the relative frequency bar chart!
However, there are a few things that we are going to do differently this time. Since quantitative (unlike qualitative) data is associated with the real number line we will draw the bars touching each other so that no number on the number line is left uncovered unless the corresponding frequency is zero. To highlight the differences between the quantitative and qualitative data, we are going to call this chart a histogram instead of a bar chart. As expected, a histogram that uses frequencies on the vertical axis is called a frequency histogram. Similarly, a histogram that uses relative frequencies on the vertical axis is called a relative frequency histogram.
We discussed how to visually summarize small discrete data. In short, we can treat "small" discrete data the same way as qualitative data except the fact that the bar charts are now called histograms and the bars must have no gaps in between unless the corresponding frequency is zero.
Section 2: Large Discrete Data
Next, we are going to learn how to visualize large discrete data. Consider the following data set obtained by recording the presidential ages at inauguration:
|
President |
Age |
President |
Age |
President |
Age |
|---|---|---|---|---|---|
|
Washington |
57 |
Lincoln |
52 |
Hoover |
54 |
|
J. Adams |
61 |
A. Johnson |
56 |
F. Roosevelt |
51 |
|
Jefferson |
57 |
Grant |
46 |
Truman |
60 |
|
Madison |
57 |
Hayes |
54 |
Eisenhower |
62 |
|
Monroe |
58 |
Garfield |
49 |
Kennedy |
43 |
|
J. Q. Adams |
57 |
Arthur |
51 |
L. Johnson |
55 |
|
Jackson |
61 |
Cleveland |
47 |
Nixon |
56 |
|
Van Buren |
54 |
B. Harrison |
55 |
Ford |
61 |
|
W. H. Harrison |
68 |
Cleveland |
55 |
Carter |
52 |
|
Tyler |
51 |
McKinley |
54 |
Reagan |
69 |
|
Polk |
49 |
T. Roosevelt |
42 |
G.H.W. Bush |
64 |
|
Taylor |
64 |
Taft |
51 |
Clinton |
47 |
|
Fillmore |
50 |
Wilson |
56 |
G. W. Bush |
54 |
|
Pierce |
48 |
Harding |
55 |
Obama |
47 |
|
Buchanan |
65 |
Coolidge |
51 |
Trump |
70 |
Will we run into any issues if you try treat "large" discrete data same way as if it was small? If we do that, then first we would end up with the following frequency table:
|
President's age at inauguration |
Frequency |
|---|---|
|
42 |
1 |
|
43 |
1 |
|
46 |
1 |
|
47 |
3 |
|
48 |
1 |
|
49 |
2 |
|
50 |
1 |
|
51 |
5 |
|
52 |
2 |
|
54 |
5 |
|
55 |
4 |
|
56 |
3 |
|
57 |
4 |
|
58 |
1 |
|
60 |
1 |
|
61 |
3 |
|
62 |
1 |
|
64 |
2 |
|
65 |
1 |
|
68 |
1 |
|
69 |
1 |
|
70 |
1 |
|
Total: |
45 |
Then with the following frequency histogram:
The problem with such a summary is that when more than half of the frequencies are 0s and 1s the summary isn't very informative! To deal with this issue, we first group the observations into classes (also known as categories or bins) and then treat each class as a distinct value. Each class is defined by a range of values from the lower-class limit up to but not including the upper-class limit. We define the lower-class limit (LCL) as the smallest value that could go in a class and the upper-class limit (UCL) as the lower-class limit of the next higher class. We define the class midpoint (CM) as the average of the lower-limit of a class and the upper-limit of the class. We define the class width (CW) is the difference between the lower limit of a class and the upper limit of the class.
Note that because of the relations between the quantities, we only need two of the four of them to define the entire class structure! For example, if LCL = 15 and CW = 10 then the classes are 15-25, 25-35, 35-45, 45-55 etc.
Also note that each value can only belong to one class! For example, 35 will belong to 35-45 and not to 25-35.
The sample size determines the number of classes in the following way:
|
Number of observations |
Number of classes |
|---|---|
|
25 or fewer |
5–6 |
|
25–50 |
7–14 |
|
Over 50 |
15–20 |
We are going to use the following guideline for choosing the classes:
- Decide on the (approximate) number of classes.
- Calculate an approximate class width as \(\frac{\text{Maximum observation - Minimum Observation}}{\text{Number of classes}}\) and use the result to decide on a convenient class width.
- Choose a number for the lower limit of the first class, noting that it must be less than or equal to the minimum observation.
- Obtain the other lower-class limits by successively adding the class width chosen in Step 2.
- Use the results of Step 4 to specify all the classes.
Let’s take another look at our data before we decide to split it into classes. Consider the following data set obtained by recording the presidential ages at inauguration:
57 61 57 57 58 57 61 54 68 51 49 64 50 48 65
52 56 46 54 49 51 47 55 55 54 42 51 56 55 51
54 51 60 62 43 55 56 61 52 69 64 47 54 47 70
The step-by-step procedure for choosing the classes:
- Decide on the (approximate) number of classes:
\(\sim(7-14)\)
- Calculate an approximate class width:
\(\frac{70-42}{\sim(7-14)}=\frac{28}{7}=4\approx5\) (b/c 5 is a "nicer" number than 4)
- Choose a number for the lower limit of the first class as 40.
- Obtain the other lower-class limits:
40, 45, 50, 55, 60, 65, 70
- Use the results of Step 4 to specify all the classes:
40-45, 45-50, 50-55, 55-60, 60-65, 65-70, 70-75
Once we identified the classes, we can start the tally by assigning each value to one of the classes, count the frequency with the totals, and relative frequencies.
|
Classes |
Tally |
Frequency |
Relative Frequency |
|---|---|---|---|
|
40 to 45 |
|| |
2 |
2/45=0.044=4.4% |
|
45 to 50 |
||||| || |
7 |
7/45=0.156=15.6% |
|
50 to 55 |
||||| ||||| ||| |
13 |
13/45=0.289=28.9% |
|
55 to 60 |
||||| ||||| || |
12 |
12/45=0.267=26.7% |
|
60 to 65 |
||||| || |
7 |
7/45=0.156=15.6% |
|
65 to 70 |
||| |
3 |
3/45=0.067=6.7% |
|
70 to 75 |
| |
1 |
1/45=0.022=2.2% |
|
Total: |
45 |
45 |
45/45=1=100% |
Basically, once the classes are created, we can construct a relative frequency distribution table for large discrete data the same way we did earlier for small discrete data!
Once we have a complete frequency table, we can now construct a frequency histogram:
Similarly, we can obtain the relative frequency histogram!
Once the idea of grouping is understood the process of constructing the (relative) frequency table and the histogram is very intuitive!
In summary, to construct a frequency table for discrete data:
- List the classes in the first column of a table.
- For each observation, place a tally mark in the second column of the table in the row of the appropriate class.
- Count the tallies for each class and record the totals in the third column of the table.
In summary, to construct a (relative) histogram for discrete data:
- Obtain a frequency (relative frequency) distribution of the data.
- Draw a horizontal axis on which to place the bars and a vertical axis on which to display the frequencies (relative frequencies).
- For each class, construct a vertical bar whose height equals the frequency (relative frequency) of that class.
- Label the bars with the classes, the horizontal axis with the name of the variable, and the vertical axis with “Frequency” (“Relative frequency” or “Percent”).
- For single-value grouping, we use the distinct values of the observations to label the bars, with each such value centered under its bar.
- For class grouping, we use the midpoints to label the bars. Note: Some statisticians and technology use class limits to label the bars.
While it may appear that we used different approaches to organize "small" and "large" discrete data, they both can be described by the term grouping! Single value grouping for "small" discrete data and interval grouping for "large" discrete data. The two methods have more in common than it appears at first sight because the single-value grouping can be viewed as an interval grouping with each single-value \(X\) defining a class of width 1 and \(X\) in the middle!
We discussed how to organize large discrete data and concluded that essentially it is the same way as we organize qualitative and small discrete data with addition of the step called binning or creating classes.
Section 3: Continuous Data
Next, we will discuss how to produce a visual summary of continuous data. Consider the following data set obtained from recording the heights (in inches) of 100 semiprofessional soccer players:
66.18 64.49 72.24 64.70 65.80 67.55 64.15 65.42 70.62 65.85
67.74 69.11 68.26 65.98 69.45 65.93 67.55 73.63 70.55 64.53
67.03 67.48 66.19 61.69 69.30 61.57 66.01 68.62 70.72 65.81
64.87 68.30 63.71 63.66 66.36 64.68 62.49 67.19 72.99 64.96
68.12 59.83 67.97 67.33 64.98 66.09 65.56 67.62 67.66 63.07
72.22 68.00 69.43 65.15 63.27 63.23 64.13 69.45 63.10 65.40
68.92 67.57 64.21 68.36 68.88 66.39 68.28 67.27 68.75 67.59
63.67 70.50 67.52 64.06 73.95 65.36 67.62 65.06 67.30 68.42
66.08 65.91 64.82 69.23 68.40 68.29 73.27 69.35 69.80 68.42
67.32 73.00 69.58 64.66 68.59 62.77 67.29 66.56 68.35 65.08
Does continuous dataset look more like a small discrete data or large? It certainly looks like we are going to need some interval grouping here!
The step-by-step procedure for choosing the classes:
- Decide on the (approximate) number of classes:
\(\sim(15-20)\)
- Calculate an approximate class width:
\(\frac{73.95-59.83}{\sim(15-20)}=\frac{\sim15}{15}\approx1\)
- Choose a number for the lower limit of the first class as 59.
- Obtain the other lower-class limits:
59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74
- Use the results of Step 4 to specify all the classes:
59-60,60-61,61-62,62-63,63-64,64-65,65-66,66-67
67-68,68-69,69-70,70-71,71-72,72-73,73-74
Now every entry can be assigned to a single class and treat it as qualitative data!
|
Classes |
Tally |
Frequency |
Relative Frequency |
|---|---|---|---|
|
59 to 60 |
| |
1 |
0.01 |
|
60 to 61 |
0 |
0.00 |
|
|
61 to 62 |
|| |
2 |
0.02 |
|
62 to 63 |
|| |
2 |
0.02 |
|
63 to 64 |
||||| || |
7 |
0.07 |
|
64 to 65 |
||||| ||||| ||| |
13 |
0.13 |
|
65 to 66 |
||||| ||||| ||| |
13 |
0.13 |
|
66 to 67 |
||||| ||| |
8 |
0.08 |
|
67 to 68 |
||||| ||||| ||||| ||| |
18 |
0.18 |
|
68 to 69 |
||||| ||||| ||||| | |
16 |
0.16 |
|
69 to 70 |
||||| |||| |
9 |
0.09 |
|
70 to 71 |
|||| |
4 |
0.04 |
|
72 to 73 |
||| |
3 |
0.03 |
|
73 to 74 |
|||| |
4 |
0.04 |
|
Total: |
100 |
1 |
Now we can turn the frequency distribution table into the frequency histogram:
And the relative frequency histogram:
We discussed how to summarize continuous data! in short, we use interval grouping and treat it similarly to large discrete data!


