4.4: Other Summaries
- Create and interpret stemplots, line graphs, time series for univariate data.
- Create and interpret contingency tables and scatter plots for bivariate data.
Stem-and-Leaf Diagrams or Stemplots
One simple graph, the stem-and-leaf graph or stemplot , comes from the field of exploratory data analysis. It is a good choice when the data sets are small. To create the plot, divide each observation of data into a stem and a leaf. The leaf consists of a final significant digit. For example, 23 has stem two and leaf three. The number 432 has stem 43 and leaf two. Likewise, the number 5,432 has stem 543 and leaf two. The decimal 9.3 has stem nine and leaf three. Write the stems in a vertical line from smallest to largest. Draw a vertical line to the right of the stems. Then write the leaves in increasing order next to their corresponding stem.
For Susan Dean's spring pre-calculus class, scores for the first exam were as follows (smallest to largest):
33; 42; 49; 49; 53; 55; 55; 61; 63; 67; 68; 68; 69; 69; 72; 73; 74; 78; 80; 83; 88; 88; 88; 90; 92; 94; 94; 94; 94; 96; 100
| Stem | Leaf |
|---|---|
| 3 | 3 |
| 4 | 2 9 9 |
| 5 | 3 5 5 |
| 6 | 1 3 7 8 8 9 9 |
| 7 | 2 3 4 8 |
| 8 | 0 3 8 8 8 |
| 9 | 0 2 4 4 4 4 6 |
| 10 | 0 |
The stemplot shows that most scores fell in the 60s, 70s, 80s, and 90s. Eight out of the 31 scores or approximately 26% \(\left(\frac{8}{31}\right)\) were in the 90s or 100, a fairly high number of As.
For the Park City basketball team, scores for the last 30 games were as follows (smallest to largest):
32; 32; 33; 34; 38; 40; 42; 42; 43; 44; 46; 47; 47; 48; 48; 48; 49; 50; 50; 51; 52; 52; 52; 53; 54; 56; 57; 57; 60; 61
Construct a stem plot for the data.
- Answer
-
Stem Leaf 3 2 2 3 4 8 4 0 2 2 3 4 6 7 7 8 8 8 9 5 0 0 1 2 2 2 3 4 6 7 7 6 0 1
The stemplot is a quick way to graph data and gives an exact picture of the data. You want to look for an overall pattern and any outliers. An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500) while others may indicate that something unusual is happening. It takes some background information to explain outliers, so we will cover them in more detail later.
The data are the distances (in kilometers) from a home to local supermarkets. Create a stemplot using the data:
1.1; 1.5; 2.3; 2.5; 2.7; 3.2; 3.3; 3.3; 3.5; 3.8; 4.0; 4.2; 4.5; 4.5; 4.7; 4.8; 5.5; 5.6; 6.5; 6.7; 12.3
Do the data seem to have any concentration of values?
HINT: The leaves are to the right of the decimal.
Answer
The value 12.3 may be an outlier. Values appear to concentrate at three and four kilometers.
| Stem | Leaf |
|---|---|
| 1 | 1 5 |
| 2 | 3 5 7 |
| 3 | 2 3 3 5 8 |
| 4 | 0 2 5 5 7 8 |
| 5 | 5 6 |
| 6 | 5 7 |
| 7 | |
| 8 | |
| 9 | |
| 10 | |
| 11 | |
| 12 | 3 |
The following data show the distances (in miles) from the homes of off-campus statistics students to the college. Create a stem plot using the data and identify any outliers:
0.5; 0.7; 1.1; 1.2; 1.2; 1.3; 1.3; 1.5; 1.5; 1.7; 1.7; 1.8; 1.9; 2.0; 2.2; 2.5; 2.6; 2.8; 2.8; 2.8; 3.5; 3.8; 4.4; 4.8; 4.9; 5.2; 5.5; 5.7; 5.8; 8.0
- Answer
-
Stem Leaf 0 5 7 1 1 2 2 3 3 5 5 7 7 8 9 2 0 2 5 6 8 8 8 3 5 8 4 4 8 9 5 2 5 7 8 6 7 8 0 The value 8.0 may be an outlier. Values appear to concentrate at one and two miles.
A side-by-side stem-and-leaf plot allows a comparison of the two data sets in two columns. In a side-by-side stem-and-leaf plot, two sets of leaves share the same stem. The leaves are to the left and the right of the stems. Tables \(\PageIndex{1}\) and \(\PageIndex{2}\) show the ages of presidents at their inauguration and at their death. Construct a side-by-side stem-and-leaf plot using this data.
| President | Ageat Inauguration | President | Age | President | Age |
|---|---|---|---|---|---|
| Pierce | 48 | Harding | 55 | Obama | 47 |
| Polk | 49 | T. Roosevelt | 42 | G.H.W. Bush | 64 |
| Fillmore | 50 | Wilson | 56 | G. W. Bush | 54 |
| Tyler | 51 | McKinley | 54 | Reagan | 69 |
| Van Buren | 54 | B. Harrison | 55 | Ford | 61 |
| Washington | 57 | Lincoln | 52 | Hoover | 54 |
| Jefferson | 57 | Grant | 46 | Truman | 60 |
| Madison | 57 | Hayes | 54 | Eisenhower | 62 |
| J. Q. Adams | 57 | Arthur | 51 | L. Johnson | 55 |
| Monroe | 58 | Garfield | 49 | Kennedy | 43 |
| J. Adams | 61 | A. Johnson | 56 | F. Roosevelt | 51 |
| Jackson | 61 | Cleveland | 47 | Nixon | 56 |
| Taylor | 64 | Taft | 51 | Clinton | 47 |
| Buchanan | 65 | Coolidge | 51 | Trump | 70 |
| W. H. Harrison | 68 | Cleveland | 55 | Carter | 52 |
| President | Age | President | Age | President | Age |
|---|---|---|---|---|---|
| Washington | 67 | Lincoln | 56 | Hoover | 90 |
| J. Adams | 90 | A. Johnson | 66 | F. Roosevelt | 63 |
| Jefferson | 83 | Grant | 63 | Truman | 88 |
| Madison | 85 | Hayes | 70 | Eisenhower | 78 |
| Monroe | 73 | Garfield | 49 | Kennedy | 46 |
| J. Q. Adams | 80 | Arthur | 56 | L. Johnson | 64 |
| Jackson | 78 | Cleveland | 71 | Nixon | 81 |
| Van Buren | 79 | B. Harrison | 67 | Ford | 93 |
| W. H. Harrison | 68 | Cleveland | 71 | Reagan | 93 |
| Tyler | 71 | McKinley | 58 | ||
| Polk | 53 | T. Roosevelt | 60 | ||
| Taylor | 65 | Taft | 72 | ||
| Fillmore | 74 | Wilson | 67 | ||
| Pierce | 64 | Harding | 57 | ||
| Buchanan | 77 | Coolidge | 60 |
Answer
| Ages at Inauguration | Ages at Death | |
|---|---|---|
| 9 9 8 7 7 7 6 3 2 | 4 | 6 9 |
| 8 7 7 7 7 6 6 6 5 5 5 5 4 4 4 4 4 2 1 1 1 1 1 0 | 5 | 3 6 6 7 7 8 |
| 9 5 4 4 2 1 1 1 0 | 6 | 0 0 3 3 4 4 5 6 7 7 7 8 |
| 7 | 0 0 1 1 1 4 7 8 8 9 | |
| 8 | 0 1 3 5 8 | |
| 9 | 0 0 3 3 |
The table shows the number of wins and losses the Atlanta Hawks have had in 42 seasons. Create a side-by-side stem-and-leaf plot of these wins and losses.
| Losses | Wins | Year | Losses | Wins | Year |
|---|---|---|---|---|---|
| 34 | 48 | 1968–1969 | 41 | 41 | 1989–1990 |
| 34 | 48 | 1969–1970 | 39 | 43 | 1990–1991 |
| 46 | 36 | 1970–1971 | 44 | 38 | 1991–1992 |
| 46 | 36 | 1971–1972 | 39 | 43 | 1992–1993 |
| 36 | 46 | 1972–1973 | 25 | 57 | 1993–1994 |
| 47 | 35 | 1973–1974 | 40 | 42 | 1994–1995 |
| 51 | 31 | 1974–1975 | 36 | 46 | 1995–1996 |
| 53 | 29 | 1975–1976 | 26 | 56 | 1996–1997 |
| 51 | 31 | 1976–1977 | 32 | 50 | 1997–1998 |
| 41 | 41 | 1977–1978 | 19 | 31 | 1998–1999 |
| 36 | 46 | 1978–1979 | 54 | 28 | 1999–2000 |
| 32 | 50 | 1979–1980 | 57 | 25 | 2000–2001 |
| 51 | 31 | 1980–1981 | 49 | 33 | 2001–2002 |
| 40 | 42 | 1981–1982 | 47 | 35 | 2002–2003 |
| 39 | 43 | 1982–1983 | 54 | 28 | 2003–2004 |
| 42 | 40 | 1983–1984 | 69 | 13 | 2004–2005 |
| 48 | 34 | 1984–1985 | 56 | 26 | 2005–2006 |
| 32 | 50 | 1985–1986 | 52 | 30 | 2006–2007 |
| 25 | 57 | 1986–1987 | 45 | 37 | 2007–2008 |
| 32 | 50 | 1987–1988 | 35 | 47 | 2008–2009 |
| 30 | 52 | 1988–1989 | 29 | 53 | 2009–2010 |
- Answer
-
Table \(\PageIndex{A}\): Atlanta Hawks Wins and Losses Number of Wins Number of Losses 3 1 9 9 8 8 6 5 2 5 5 9 8 7 6 6 5 5 4 3 1 1 1 1 0 3 0 2 2 2 2 4 4 5 6 6 6 9 9 9 8 8 7 6 6 6 3 3 3 2 2 1 1 0 4 0 0 1 1 2 4 5 6 6 7 7 8 9 7 7 6 3 2 0 0 0 0 5 1 1 1 2 3 4 4 6 7 6 9
Line Graphs or Frequency Polygons
Another type of graph that is useful for specific data values is a line graph or a frequency polygon . In the particular line graph shown in the example below, the x -axis (horizontal axis) consists of data values and the y -axis (vertical axis) consists of frequency points . The frequency points are connected using line segments.
In a survey, 40 mothers were asked how many times per week a teenager must be reminded to do his or her chores. The results are shown in Table and in Figure.
| Number of times teenager is reminded | Frequency |
|---|---|
| 0 | 2 |
| 1 | 5 |
| 2 | 8 |
| 3 | 14 |
| 4 | 7 |
| 5 | 4 |
In a survey, 40 people were asked how many times per year they had their car in the shop for repairs. The results are shown in Table. Construct a line graph.
| Number of times in shop | Frequency |
|---|---|
| 0 | 7 |
| 1 | 10 |
| 2 | 14 |
| 3 | 9 |
Answer
Frequency polygons can also be used for comparing distributions. This is achieved by overlaying the frequency polygons drawn for different data sets.
We will construct an overlay frequency polygon comparing the final exam scores with the students’ final numeric grade.
| Range | Midpoint | Frequency | Cumulative Frequency |
|---|---|---|---|
| 49.5-59.5 | 54.5 | 5 | 5 |
| 59.5-69.5 | 64.5 | 10 | 15 |
| 69.5-79.5 | 74.5 | 30 | 45 |
| 79.5-89.5 | 84.5 | 40 | 85 |
| 89.5-99.5 | 94.5 | 15 | 100 |
| Range | Midpoint | Frequency | Cumulative Frequency |
|---|---|---|---|
| 49.5-59.5 | 54.5 | 10 | 10 |
| 59.5-69.5 | 64.5 | 10 | 20 |
| 69.5-79.5 | 74.5 | 30 | 50 |
| 79.5-89.5 | 84.5 | 45 | 95 |
| 89.5-99.5 | 94.5 | 5 | 100 |
Time Series Graphs
Suppose that we want to study the temperature range of a region for an entire month. Every day at noon we note the temperature and write this down in a log. A variety of statistical studies could be done with this data. We could find the mean or the median temperature for the month. We could construct a histogram displaying the number of days that temperatures reach a certain range of values. However, all of these methods ignore a portion of the data that we have collected.
One feature of the data that we may want to consider is that of time. Since each date is paired with the temperature reading for the day, we don't have to think of the data as being random. We can instead use the times given to impose a chronological order on the data. A graph that recognizes this ordering and displays the changing temperature as the month progresses is called a time series graph.
To construct a time series graph, we must look at both pieces of our paired data set . We start with a standard Cartesian coordinate system. The horizontal axis is used to plot the date or time increments, and the vertical axis is used to plot the values of the variable that we are measuring. By doing this, we make each point on the graph correspond to a date and a measured quantity. The points on the graph are typically connected by straight lines in the order in which they occur.
The following data shows the Annual Consumer Price Index, each month, for ten years. Construct a time series graph for the Annual Consumer Price Index data only.
| Year | Jan | Feb | Mar | Apr | May | Jun | Jul |
|---|---|---|---|---|---|---|---|
| 2003 | 181.7 | 183.1 | 184.2 | 183.8 | 183.5 | 183.7 | 183.9 |
| 2004 | 185.2 | 186.2 | 187.4 | 188.0 | 189.1 | 189.7 | 189.4 |
| 2005 | 190.7 | 191.8 | 193.3 | 194.6 | 194.4 | 194.5 | 195.4 |
| 2006 | 198.3 | 198.7 | 199.8 | 201.5 | 202.5 | 202.9 | 203.5 |
| 2007 | 202.416 | 203.499 | 205.352 | 206.686 | 207.949 | 208.352 | 208.299 |
| 2008 | 211.080 | 211.693 | 213.528 | 214.823 | 216.632 | 218.815 | 219.964 |
| 2009 | 211.143 | 212.193 | 212.709 | 213.240 | 213.856 | 215.693 | 215.351 |
| 2010 | 216.687 | 216.741 | 217.631 | 218.009 | 218.178 | 217.965 | 218.011 |
| 2011 | 220.223 | 221.309 | 223.467 | 224.906 | 225.964 | 225.722 | 225.922 |
| 2012 | 226.665 | 227.663 | 229.392 | 230.085 | 229.815 | 229.478 | 229.104 |
| Year | Aug | Sep | Oct | Nov | Dec | Annual |
|---|---|---|---|---|---|---|
| 2003 | 184.6 | 185.2 | 185.0 | 184.5 | 184.3 | 184.0 |
| 2004 | 189.5 | 189.9 | 190.9 | 191.0 | 190.3 | 188.9 |
| 2005 | 196.4 | 198.8 | 199.2 | 197.6 | 196.8 | 195.3 |
| 2006 | 203.9 | 202.9 | 201.8 | 201.5 | 201.8 | 201.6 |
| 2007 | 207.917 | 208.490 | 208.936 | 210.177 | 210.036 | 207.342 |
| 2008 | 219.086 | 218.783 | 216.573 | 212.425 | 210.228 | 215.303 |
| 2009 | 215.834 | 215.969 | 216.177 | 216.330 | 215.949 | 214.537 |
| 2010 | 218.312 | 218.439 | 218.711 | 218.803 | 219.179 | 218.056 |
| 2011 | 226.545 | 226.889 | 226.421 | 226.230 | 225.672 | 224.939 |
| 2012 | 230.379 | 231.407 | 231.317 | 230.221 | 229.601 | 229.594 |
Answer
The following table is a portion of a data set from www.worldbank.org. Use the table to construct a time series graph for CO 2 emissions for the United States.
| Ukraine | United Kingdom | United States | |
|---|---|---|---|
| 2003 | 352,259 | 540,640 | 5,681,664 |
| 2004 | 343,121 | 540,409 | 5,790,761 |
| 2005 | 339,029 | 541,990 | 5,826,394 |
| 2006 | 327,797 | 542,045 | 5,737,615 |
| 2007 | 328,357 | 528,631 | 5,828,697 |
| 2008 | 323,657 | 522,247 | 5,656,839 |
| 2009 | 272,176 | 474,579 | 5,299,563 |
Time series graphs are important tools in various applications of statistics. When recording values of the same variable over an extended period of time, sometimes it is difficult to discern any trend or pattern. However, once the same data points are displayed graphically, some features jump out. Time series graphs make trends easy to spot.
Bivariate Data
Earlier we discussed the methods of summarizing and organizing the data obtained from observing one variable. Such data are called univariate data . When we are interested in studying the relationship between a pair of variables we must collect and organize the data on two variables at the same time. Such data itself is called bivariate data and there are several ways to organize such data depending on the types of the variables.
|
Univariate Data |
Bivariate Data |
||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
The mileage of 5 cars
|
The ages and mileage of 5 cars
|
Scatter Plots
When both variables are quantitative, we can construct a scatter plot to organize the data. To construct a scatterplot , we use a horizontal axis for the observations of one variable and a vertical axis for the observations of the other. When picking which axis to use for each variable consider whether you suspect that one variable depends on the other. The independent variable will be on the x-axis and dependent will be on the y axis. Each pair of observations is then plotted as a point.
The summary of the ages and the mileages of a sample of 5 cars is shown below:
| Car | \(x\) (age in years) | \(y\) (mileage in miles) |
|---|---|---|
|
1 |
7 |
78524 |
|
2 |
2 |
12574 |
|
3 |
1 |
24914 |
|
4 |
5 |
65813 |
|
5 |
3 |
39824 |
Amelia plays basketball for her high school. She wants to improve to play at the college level. She notices that the number of points she scores in a game goes up in response to the number of hours she practices her jump shot each week. She records the following data:
| \(X\) (hours practicing jump shot) | \(Y\) (points scored in a game) |
|---|---|
| 5 | 15 |
| 7 | 22 |
| 9 | 28 |
| 10 | 31 |
| 11 | 33 |
| 12 | 36 |
Construct a scatter plot and state if what Amelia thinks appears to be true.
- Answer
-
Figure \(\PageIndex{2}\)
Yes, Amelia’s assumption appears to be correct. The number of points Amelia scores per game goes up when she practices her jump shot more.
Contingency Tables
When we are interested in studying the relationship between a pair of variables we must collect and organize the data on two variables at the same time. Recall that such data itself is called bivariate data and there are several ways to organize such data depending on the types of the variables. Below are examples of univariate and bivariate qualitative data.
|
Univariate Data |
Bivariate Data |
||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
The colors of 5 cars
|
The color and condition of 5 cars
|
A table called contingency table can be used to organize bivariate data in which one or both variables are qualitative.
To construct the table, we arrange the observed frequencies into rows and columns. The intersection of a row and a column of a contingency table is called a cell. Each cell shows the frequency of observations that fit the description in the corresponding row and column.
|
Color\Condition |
Like new |
Used |
|---|---|---|
|
Light |
1 |
1 |
|
Dark |
1 |
2 |
Many times, it makes sense to add the column and row with the totals for each column and row and for the entire table.
|
Color\Condition |
Like new |
Used |
Total: |
|---|---|---|---|
|
Light |
1 |
1 |
2 |
|
Dark |
1 |
2 |
3 |
|
Total: |
2 |
3 |
5 |
Alternatively, the contingency table can be referred to as a two-way frequency table . Why? In the example above, cover the second and third columns and what's left can be easily seen as the frequency table for the colors of cars! Cover the second and third rows instead and what’s left can be easily seen as the frequency table for whether car conditions! Also note that every contingency table can be easily broken down into two frequency tables, but the two frequency tables cannot be combined into a contingency table if the original data is lost!
Destiny surveyed the students in her neighborhood and obtained the following data. Construct the contingency table that summarizes the school enrollment by level and type.
| Student | School level | School type |
|---|---|---|
| Student 1 | Middle School | Public |
| Student 2 | Middle School | Private |
| Student 3 | Elementary School | Public |
| Student 4 | Middle School | Public |
| Student 5 | Elementary School | Public |
| Student 6 | High School | Public |
| Student 7 | High School | Private |
| Student 8 | Middle School | Public |
| Student 9 | High School | Public |
| Student 10 | Elementary School | Public |
| Student 11 | Middle School | Private |
| Student 12 | Elementary School | Private |
| Student 13 | Elementary School | Private |
| Student 14 | Elementary School | Private |
| Student 15 | Elementary School | Private |
| Student 16 | Middle School | Private |
| Student 17 | Middle School | Private |
| Student 18 | High School | Private |
| Student 19 | Elementary School | Private |
- Answer
-
The following contingency table summarizes the school enrollment by level and type:
Public Private Total Elementary School 3 5 8 Middle School 3 4 7 High School 2 2 4 Total 8
11 19