Skip to main content
Mathematics LibreTexts

2.5: Organizing Bivariate Data

  • Page ID
    105817
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Section 1: Scatterplots

    Earlier we discussed the methods of summarizing and organizing the data obtained from observing one variable. Such data are called univariate data.

    The mileages of a sample of 5 cars.

    Car

    Mileage (mi)

    1

    78524

    2

    12574

    3

    24914

    4

    65813

    5

    39824

    When we are interested in studying the relationship between a pair of variables we must collect and organize the data on two variables at the same time. Such data itself is called bivariate data and there are several ways to organize such data depending on the types of the variables.

    The ages and mileages of a sample of 5 cars.

    Car

    Age (years)

    Mileage (mi)

    1

    7

    78524

    2

    2

    12574

    3

    1

    24914

    4

    5

    65813

    5

    3

    39824

    When both variables are quantitative, we can construct a scatter plot to organize the data. To construct a scatterplot, we use a horizontal axis for the observations of one variable and a vertical axis for the observations of the other. When picking which axis to use for each variable consider whether you suspect that one variable depends on the other. The independent variable will be on the x-axis and dependent will be on the y axis. Each pair of observations is then plotted as a point. The example here demonstrates the summary of the observed relation between the age and the mileage of cars.

    The scatter plot that has the age (in years) of a vehicle on the x-axis and the mileage (in miles) on the y-axis with 5 points with the following coordinates (7,78524), (2,12574), (1,24914), (5,65813), (3,39824)
    Figure \(\PageIndex{1}\): The scatterplot summarizing the mileage and conditions of a sample of 5 cars. (Copyright; author via source)

    In the scatter plot above, each point is associated with a car, for example, car 1 is the point whose x-coordinate is 7 and the y-coordinate is 78.5 thousand, car 2 is the point whose x-coordinate is 2 and the y-coordinate is 12.5 thousand, etc.

    Here is another summary that demonstrates the observed relation between the third exam score and the final exam score. Notice that the third exam is on the x-axis as we may suspect that it may have an effect on the final exam scores.

    The scatter plot that has the third exam score (in %) of a student on the x-axis and the final exam score (in %) on the y-axis with 10 points with the following coordinates (65, 87.5), (67, 66.5), (71, 92.5), (71, 81.5), (66, 63), (75, 99), (67, 76.5), (70, 81.5), (71, 79.5), (69, 75.5)
    Figure \(\PageIndex{2}\): The scatterplot summarizing the third exam scores and the final exam scores of a sample of 10 students. (Copyright; author via source)

    When one of the variables is time and there is only one observation for each moment of time, we can construct the scatter plot:

    CO2 emissions in the United States between 2003-2009

    Year

    CO2 emissions in the United States

    2003

    5,681,664

    2004

    5,790,761

    2005

    5,826,394

    2006

    5,737,615

    2007

    5,828,697

    2008

    5,656,839

    2009

    5,299,563

    The scatter plot that has the years from 2003-2009 on the x-axis and the amount of CO2 emissions on the y-axis with 7 points with the following coordinates (2003, 5681664), (2004, 5790761), (2005, 5826394), (2006, 5737615), (2007, 5828697), (2008, 5656839), (2009, 5299563)
    Figure \(\PageIndex{3}\): The scatterplot summarizing the CO2 emissions in the US between 2003 and 2009. (Copyright; author via source)
    Try It Yourself! \(\PageIndex{1.1}\)

    And then we can connect the points on the scatter plot with lines - such graph is called time series. The example here demonstrates the summary of the observed relation between the year and the CO2 emissions for the United States:

    The time series that has the years from 2003-2009 on the x-axis and the amount of CO2 emissions on the y-axis with 7 points with the following coordinates (2003, 5681664), (2004, 5790761), (2005, 5826394), (2006, 5737615), (2007, 5828697), (2008, 5656839), (2009, 5299563) and the line segments connecting the pairs of consecutive points.
    Figure \(\PageIndex{4}\): The time series summarizing the CO2 emissions in the US between 2003 and 2009. (Copyright; author via source)

    Section 2: Contingency Tables

    When we are interested in studying the relationship between a pair of variables we must collect and organize the data on two variables at the same time. Such data itself is called bivariate data and there are several ways to organize such data depending on the types of the variables.

    Univariate Data: The colors of a sample of 5 cars.

    Car

    Color

    1

    Light

    2

    Dark

    3

    Dark

    4

    Light

    5

    Dark

    Bivariate Data: The colors and conditions of a sample of 5 cars.

    Car

    Color

    Condition

    1

    Light

    Like new

    2

    Dark

    Used

    3

    Dark

    Like New

    4

    Light

    Used

    5

    Dark

    Used

    A table called contingency table can be used to organize bivariate data in which one or both variables are qualitative. To construct the table, we arrange the observed frequencies into rows and columns. The intersection of a row and a column of a contingency table is called a cell. Each cell shows the frequency of observations that fit the description in the corresponding row and column.

    Contingency table summarizing the colors and conditions of a sample of 5 cars.

    Color\Condition

    Like new

    Used

    Light

    1

    1

    Dark

    1

    2

    Try It Yourself! \(\PageIndex{2.1}\)

    Many times, it makes sense to add the column and row with the totals for each column and row and for the entire table.

    Contingency table summarizing the colors and conditions of a sample of 5 cars with totals.

    Color\Condition

    Like new

    Used

    Total

    Light

    1

    1

    2

    Dark

    1

    2

    3

    Total

    2

    3

    5

    Try It Yourself! \(\PageIndex{2.2}\)

    Alternatively, this table can be referred to as a two-way frequency table. Why? Cover the second and third columns and what's left can be easily seen as the frequency table for the colors of cars! Cover the second and third rows instead and what’s left can be easily seen as the frequency table for whether car conditions! Also note that every contingency table can be easily broken down into two frequency tables, but the two frequency tables cannot be combined into a contingency table if the original data is lost!

    Here is another example of a contingency table. For example, we can see that there are 280 people out of 755 that are cell phone users and had no speeding violation in the last year:

    Contingency table with totals summarizing whether a driver is a cell phone user and whether they received a speeding violation in the last year.

    Speeding violations in the last year

    No speeding violations in the last year

    Total:

    Cell phone user

    25

    280

    305

    Not a cell phone user

    45

    405

    450

    Total:

    70

    665

    755

    Try It Yourself! \(\PageIndex{2.3}\)

    This page titled 2.5: Organizing Bivariate Data is shared under a not declared license and was authored, remixed, and/or curated by Anton Butenko.

    • Was this article helpful?