2.5: Organizing Bivariate Data
- Page ID
- 105817
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Section 1: Scatterplots
Earlier we discussed the methods of summarizing and organizing the data obtained from observing one variable. Such data are called univariate data.
|
Car |
Mileage (mi) |
|---|---|
|
1 |
78524 |
|
2 |
12574 |
|
3 |
24914 |
|
4 |
65813 |
|
5 |
39824 |
When we are interested in studying the relationship between a pair of variables we must collect and organize the data on two variables at the same time. Such data itself is called bivariate data and there are several ways to organize such data depending on the types of the variables.
The ages and mileages of a sample of 5 cars.
Car
Age (years)
Mileage (mi)
1
7
78524
2
2
12574
3
1
24914
4
5
65813
5
3
39824
When both variables are quantitative, we can construct a scatter plot to organize the data. To construct a scatterplot, we use a horizontal axis for the observations of one variable and a vertical axis for the observations of the other. When picking which axis to use for each variable consider whether you suspect that one variable depends on the other. The independent variable will be on the x-axis and dependent will be on the y axis. Each pair of observations is then plotted as a point. The example here demonstrates the summary of the observed relation between the age and the mileage of cars.
In the scatter plot above, each point is associated with a car, for example, car 1 is the point whose x-coordinate is 7 and the y-coordinate is 78.5 thousand, car 2 is the point whose x-coordinate is 2 and the y-coordinate is 12.5 thousand, etc.
Here is another summary that demonstrates the observed relation between the third exam score and the final exam score. Notice that the third exam is on the x-axis as we may suspect that it may have an effect on the final exam scores.
When one of the variables is time and there is only one observation for each moment of time, we can construct the scatter plot:
|
Year |
CO2 emissions in the United States |
|---|---|
|
2003 |
5,681,664 |
|
2004 |
5,790,761 |
|
2005 |
5,826,394 |
|
2006 |
5,737,615 |
|
2007 |
5,828,697 |
|
2008 |
5,656,839 |
|
2009 |
5,299,563 |
And then we can connect the points on the scatter plot with lines - such graph is called time series. The example here demonstrates the summary of the observed relation between the year and the CO2 emissions for the United States:
Section 2: Contingency Tables
When we are interested in studying the relationship between a pair of variables we must collect and organize the data on two variables at the same time. Such data itself is called bivariate data and there are several ways to organize such data depending on the types of the variables.
|
Car |
Color |
|---|---|
|
1 |
Light |
|
2 |
Dark |
|
3 |
Dark |
|
4 |
Light |
|
5 |
Dark |
|
Car |
Color |
Condition |
|---|---|---|
|
1 |
Light |
Like new |
|
2 |
Dark |
Used |
|
3 |
Dark |
Like New |
|
4 |
Light |
Used |
|
5 |
Dark |
Used |
A table called contingency table can be used to organize bivariate data in which one or both variables are qualitative. To construct the table, we arrange the observed frequencies into rows and columns. The intersection of a row and a column of a contingency table is called a cell. Each cell shows the frequency of observations that fit the description in the corresponding row and column.
|
Color\Condition |
Like new |
Used |
|---|---|---|
|
Light |
1 |
1 |
|
Dark |
1 |
2 |
Many times, it makes sense to add the column and row with the totals for each column and row and for the entire table.
|
Color\Condition |
Like new |
Used |
Total |
|---|---|---|---|
|
Light |
1 |
1 |
2 |
|
Dark |
1 |
2 |
3 |
|
Total |
2 |
3 |
5 |
Alternatively, this table can be referred to as a two-way frequency table. Why? Cover the second and third columns and what's left can be easily seen as the frequency table for the colors of cars! Cover the second and third rows instead and what’s left can be easily seen as the frequency table for whether car conditions! Also note that every contingency table can be easily broken down into two frequency tables, but the two frequency tables cannot be combined into a contingency table if the original data is lost!
Here is another example of a contingency table. For example, we can see that there are 280 people out of 755 that are cell phone users and had no speeding violation in the last year:
|
Speeding violations in the last year |
No speeding violations in the last year |
Total: |
|
|---|---|---|---|
|
Cell phone user |
25 |
280 |
305 |
|
Not a cell phone user |
45 |
405 |
450 |
|
Total: |
70 |
665 |
755 |


