Skip to main content
Mathematics LibreTexts

4.4: Fitting Linear Models to Data

  • Page ID
    61988
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    Learning Objectives

    • Draw and interpret scatter diagrams.
    • Distinguish between linear and nonlinear relations.

    A professor is attempting to identify trends among final exam scores. His class has a mixture of students, so he wonders if there is any relationship between age and final exam scores. One way for him to analyze the scores is by creating a diagram that relates the age of each student to the exam score received. In this section, we will examine one such diagram known as a scatter plot.

    Drawing and Interpreting Scatter Plots

    A scatter plot is a graph of plotted points that may show a relationship between two sets of data. If the relationship is from a linear model, or a model that is nearly linear, the professor can draw conclusions using his knowledge of linear functions. Figure \(\PageIndex{1}\) shows a sample scatter plot.

    Scatter plot, titled 'Final Exam Score VS Age'. The x-axis is the age, and the y-axis is the final exam score. The range of ages are between 20s - 50s, and the range for scores are between upper 50s and 90s
    Figure \(\PageIndex{1}\): A scatter plot of age and final exam score variables

    Notice this scatter plot does not indicate a linear relationship. The points do not appear to follow a trend. In other words, there does not appear to be a relationship between the age of the student and the score on the final exam.

    Example \(\PageIndex{1}\): Using a Scatter Plot to Investigate Cricket Chirps

    Table shows the number of cricket chirps in 15 seconds, for several different air temperatures, in degrees Fahrenheit[1]. Plot this data, and determine whether the data appears to be linearly related.

    Table \(\PageIndex{1}\)
    Chirps 44 35 20.4 33 31 35 18.5 37 26
    Temperature 80.5 70.5 57 66 68 72 52 73.5 53

    Solution

    Plotting this data, as depicted in Figure \(\PageIndex{2}\) suggests that there may be a trend. We can see from the trend in the data that the number of chirps increases as the temperature increases. The trend appears to be roughly linear, though certainly not perfectly so.

    Scatter plot, titled 'Cricket Chirps Vs Air Temperature'. The x-axis is the Cricket Chirps in 15 Seconds, and the y-axis is the Temperature (F). The line regression is generally positive.
    Figure \(\PageIndex{2}\): The line regression is generally positive.

    Finding the Line of Best Fit

    Once we recognize a need for a linear function to model that data, the natural follow-up question is “what is that linear function?” One way to approximate our linear function is to sketch the line that seems to best fit the data. Then we can extend the line until we can verify the y-intercept. We can approximate the slope of the line by extending it until we can estimate the \(\frac{\text{rise}}{\text{run}}\).

    Example \(\PageIndex{2}\): Finding a Line of Best Fit

    Find a linear function that fits the data in Table \(\PageIndex{1}\) by “eyeballing” a line that seems to fit.

    Solution

    On a graph, we could try sketching a line.

    Using the starting and ending points of our hand drawn line, points \((0, 30)\) and \((50, 90)\), this graph has a slope of

    \[m=\dfrac{60}{50}=1.2\]

    and a y-intercept at 30. This gives an equation of

    \[T(c)=1.2c+30\]

    where \(c\) is the number of chirps in 15 seconds, and \(T(c)\) is the temperature in degrees Fahrenheit. The resulting equation is represented in Figure \(\PageIndex{3}\).

    Scatter plot, showing the line of best fit. It is titled 'Cricket Chirps Vs Air Temperature'. The x-axis is 'c, Number of Chirps', and the y-axis is 'T(c), Temperature (F)'.
    Figure \(\PageIndex{3}\): Scatter plot, showing the line of best fit.

    Analysis

    This linear equation can then be used to approximate answers to various questions we might ask about the trend.

    Recognizing Interpolation or Extrapolation

    While the data for most examples does not fall perfectly on the line, the equation is our best guess as to how the relationship will behave outside of the values for which we have data. We use a process known as interpolation when we predict a value inside the domain and range of the data. The process of extrapolation is used when we predict a value outside the domain and range of the data.

    Figure \(\PageIndex{4}\) compares the two processes for the cricket-chirp data addressed in Example \(\PageIndex{2}\). We can see that interpolation would occur if we used our model to predict temperature when the values for chirps are between 18.5 and 44. Extrapolation would occur if we used our model to predict temperature when the values for chirps are less than 18.5 or greater than 44.

    There is a difference between making predictions inside the domain and range of values for which we have data and outside that domain and range. Predicting a value outside of the domain and range has its limitations. When our model no longer applies after a certain point, it is sometimes called model breakdown. For example, predicting a cost function for a period of two years may involve examining the data where the input is the time in years and the output is the cost. But if we try to extrapolate a cost when \(x=50\), that is in 50 years, the model would not apply because we could not account for factors fifty years in the future.

    Scatter plot, showing the line of best fit and where interpolation and extrapolation occurs. It is titled 'Cricket Chirps Vs Air Temperature'. The x-axis is 'c, Number of Chirps', and the y-axis is 'T(c), Temperature (F)'.
    Figure \(\PageIndex{4}\): Interpolation occurs within the domain and range of the provided data whereas extrapolation occurs outside.

    Interpolation and Extrapolation

    Different methods of making predictions are used to analyze data.

    • The method of extrapolation involves predicting a value outside the domain and/or range of the data.
    • Model breakdown occurs at the point when the model no longer applies.

    Example \(\PageIndex{3}\): Understanding Interpolation and Extrapolation

    Use the cricket data from Table \(\PageIndex{1}\) to answer the following questions:

    1. Would predicting the temperature when crickets are chirping 30 times in 15 seconds be interpolation or extrapolation? Make the prediction, and discuss whether it is reasonable.
    2. Would predicting the number of chirps crickets will make at 40 degrees be interpolation or extrapolation? Make the prediction, and discuss whether it is reasonable.

    Solution

    a. The number of chirps in the data provided varied from 18.5 to 44. A prediction at 30 chirps per 15 seconds is inside the domain of our data, so would be interpolation. Using our model:

    \[\begin{align} T(30)&=30+1.2(30) \\ &=66 \text{ degrees} \end{align}\]

    Based on the data we have, this value seems reasonable.
    b. The temperature values varied from 52 to 80.5. Predicting the number of chirps at 40 degrees is extrapolation because 40 is outside the range of our data. Using our model:

    \[\begin{align} 40&=30+1.2c \\ 10&=1.2c \\ c&\approx8.33 \end{align}\]

    We can compare the regions of interpolation and extrapolation using Figure \(\PageIndex{5}\).

    Scatter plot, showing the line of best fit and where interpolation and extrapolation occurs. It is titled 'Cricket Chirps Vs Air Temperature'. The x-axis is 'c, Number of Chirps', and the y-axis is 'T(c), Temperature (F)'.
    Figure \(\PageIndex{5}\): Scatter plot, showing the line of best fit and where interpolation and extrapolation occurs.

    Analysis

    Our model predicts the crickets would chirp 8.33 times in 15 seconds. While this might be possible, we have no reason to believe our model is valid outside the domain and range. In fact, generally crickets stop chirping altogether below around 50 degrees.

    Exercise \(\PageIndex{1}\)

    According to the data from Table \(\PageIndex{1}\), what temperature can we predict it is if we counted 20 chirps in 15 seconds?

    Solution

    54°F

     

    Key Concepts

    • Scatter plots show the relationship between two sets of data.
    • Scatter plots may represent linear or non-linear models.
    • Interpolation can be used to predict values inside the domain and range of the data, whereas extrapolation can be used to predict values outside the domain and range of the data.

    This page titled 4.4: Fitting Linear Models to Data is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by OpenStax via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.