8.2: Fitting Exponential Models to Data
In this section, you will:
- Build an exponential model from data.
- Build a logarithmic model from data.
- Build a logistic model from data.
In previous sections of this chapter, we were either given a function explicitly to graph or evaluate, or we were given a set of points that were guaranteed to lie on the curve. Then we used algebra to find the equation that fit the points exactly. In this section, we use a modeling technique called regression analysis to find a curve that models data collected from real-world observations. With regression analysis , we don’t expect all the points to lie perfectly on the curve. The idea is to find a model that best fits the data. Then we use the model to make predictions about future events.
Do not be confused by the word model . In mathematics, we often use the terms function , equation , and model interchangeably, even though they each have their own formal definition. The term model is typically used to indicate that the equation or function approximates a real-world situation.
We will concentrate on three types of regression models in this section: exponential, logarithmic, and logistic. Having already worked with each of these functions gives us an advantage. Knowing their formal definitions, the behavior of their graphs, and some of their real-world applications gives us the opportunity to deepen our understanding. As each regression model is presented, key features and definitions of its associated function are included for review. Take a moment to rethink each of these functions, reflect on the work we’ve done so far, and then explore the ways regression is used to model real-world phenomena.
Building an Exponential Model from Data
As we’ve learned, there are a multitude of situations that can be modeled by exponential functions, such as investment growth, radioactive decay, atmospheric pressure changes, and temperatures of a cooling object. What do these phenomena have in common? For one thing, all the models either increase or decrease as time moves forward. But that’s not the whole story. It’s the way data increase or decrease that helps us determine whether it is best modeled by an exponential equation. Knowing the behavior of exponential functions in general allows us to recognize when to use exponential regression, so let’s review exponential growth and decay.
Recall that exponential functions have the form or When performing regression analysis, we use the form most commonly used on graphing utilities, Take a moment to reflect on the characteristics we’ve already learned about the exponential function (assume
- must be greater than zero and not equal to one.
-
The initial value of the model is
- If the function models exponential growth. As increases, the outputs of the model increase slowly at first, but then increase more and more rapidly, without bound.
- If the function models exponential decay . As increases, the outputs for the model decrease rapidly at first and then level off to become asymptotic to the x -axis. In other words, the outputs never become equal to or less than zero.
As part of the results, your calculator will display a number known as the correlation coefficient , labeled by the variable or (You may have to change the calculator’s settings for these to be shown.) The values are an indication of the “goodness of fit” of the regression equation to the data. We more commonly use the value of instead of but the closer either value is to 1, the better the regression equation approximates the data.
Exponential Regression
Exponential regression is used to model situations in which growth begins slowly and then accelerates rapidly without bound, or where decay begins rapidly and then slows down to get closer and closer to zero. We use the command “ ExpReg ” on a graphing utility to fit an exponential function to a set of data points. This returns an equation of the form,
Note that:
- must be non-negative.
- when we have an exponential growth model.
- when we have an exponential decay model.
How To
Given a set of data, perform exponential regression using a graphing utility.
-
Use the
STAT
then
EDIT
menu to enter given data.
- Clear any existing data from the lists.
- List the input values in the L1 column.
- List the output values in the L2 column.
-
Graph and observe a scatter plot of the data using the
STATPLOT
feature.
- Use ZOOM [ 9 ] to adjust axes to fit the data.
- Verify the data follow an exponential pattern.
-
Find the equation that models the data.
- Select “ ExpReg ” from the STAT then CALC menu.
- Use the values returned for a and b to record the model,
- Graph the model in the same window as the scatterplot to verify it is a good fit for the data.
Example 1
Using Exponential Regression to Fit a Model to Data
In 2007, a university study was published investigating the crash risk of alcohol impaired driving. Data from 2,871 crashes were used to measure the association of a person’s blood alcohol level (BAC) with the risk of being in an accident. Table 1 shows results from the study 9 . The relative risk is a measure of how many times more likely a person is to crash. So, for example, a person with a BAC of 0.09 is 3.54 times as likely to crash as a person who has not been drinking alcohol.
| BAC | 0 | 0.01 | 0.03 | 0.05 | 0.07 | 0.09 |
| Relative Risk of Crashing | 1 | 1.03 | 1.06 | 1.38 | 2.09 | 3.54 |
| BAC | 0.11 | 0.13 | 0.15 | 0.17 | 0.19 | 0.21 |
| Relative Risk of Crashing | 6.41 | 12.6 | 22.1 | 39.05 | 65.32 | 99.78 |
- Answer
-
-
Using the
STAT
then
EDIT
menu on a graphing utility, list the
BAC
values in L1 and the relative risk values in L2. Then use the
STATPLOT
feature to verify that the scatterplot follows the exponential pattern shown in
Figure 1
:
Use the “ ExpReg ” command from the STAT then CALC menu to obtain the exponential model,
Converting from scientific notation, we have:
Notice that which indicates the model is a good fit to the data. To see this, graph the model in the same window as the scatterplot to verify it is a good fit as shown in Figure 2 :
-
Use the model to estimate the risk associated with a BAC of
Substitute
for
in the model and solve for
If a 160-pound person drives after having 6 drinks, he or she is about 26.35 times more likely to crash than if driving while sober.
-
Using the
STAT
then
EDIT
menu on a graphing utility, list the
BAC
values in L1 and the relative risk values in L2. Then use the
STATPLOT
feature to verify that the scatterplot follows the exponential pattern shown in
Figure 1
:
Try It #1
Table 2 shows a recent graduate’s credit card balance each month after graduation.
| Month | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| Debt ($) | 620.00 | 761.88 | 899.80 | 1039.93 | 1270.63 | 1589.04 | 1851.31 | 2154.92 |
Q&A
Is it reasonable to assume that an exponential regression model will represent a situation indefinitely?
No. Remember that models are formed by real-world data gathered for regression. It is usually reasonable to make estimates within the interval of original observation (interpolation). However, when a model is used to make predictions, it is important to use reasoning skills to determine whether the model makes sense for inputs far beyond the original observation interval (extrapolation).
Building a Logarithmic Model from Data
Just as with exponential functions, there are many real-world applications for logarithmic functions: intensity of sound, pH levels of solutions, yields of chemical reactions, production of goods, and growth of infants. As with exponential models, data modeled by logarithmic functions are either always increasing or always decreasing as time moves forward. Again, it is the way they increase or decrease that helps us determine whether a logarithmic model is best.
Recall that logarithmic functions increase or decrease rapidly at first, but then steadily slow as time moves on. By reflecting on the characteristics we’ve already learned about this function, we can better analyze real world situations that reflect this type of growth or decay. When performing logarithmic regression analysis , we use the form of the logarithmic function most commonly used on graphing utilities, For this function
- All input values, must be greater than zero.
- The point is on the graph of the model.
- If the model is increasing. Growth increases rapidly at first and then steadily slows over time.
- If the model is decreasing. Decay occurs rapidly at first and then steadily slows over time.
Logarithmic Regression
Logarithmic regression is used to model situations where growth or decay accelerates rapidly at first and then slows over time. We use the command “LnReg” on a graphing utility to fit a logarithmic function to a set of data points. This returns an equation of the form,
Note that
- all input values, must be non-negative.
- when the model is increasing.
- when the model is decreasing.
How To
Given a set of data, perform logarithmic regression using a graphing utility.
-
Use the
STAT
then
EDIT
menu to enter given data.
- Clear any existing data from the lists.
- List the input values in the L1 column.
- List the output values in the L2 column.
-
Graph and observe a scatter plot of the data using the
STATPLOT
feature.
- Use ZOOM [ 9 ] to adjust axes to fit the data.
- Verify the data follow a logarithmic pattern.
-
Find the equation that models the data.
- Select “ LnReg ” from the STAT then CALC menu.
- Use the values returned for a and b to record the model,
- Graph the model in the same window as the scatterplot to verify it is a good fit for the data.
Example 2
Using Logarithmic Regression to Fit a Model to Data
Due to advances in medicine and higher standards of living, life expectancy has been increasing in most developed countries since the beginning of the 20th century.
Table 3 shows the average life expectancies, in years, of Americans from 1900–2010 10 .
| Year | 1900 | 1910 | 1920 | 1930 | 1940 | 1950 |
| Life Expectancy(Years) | 47.3 | 50.0 | 54.1 | 59.7 | 62.9 | 68.2 |
| Year | 1960 | 1970 | 1980 | 1990 | 2000 | 2010 |
| Life Expectancy(Years) | 69.7 | 70.8 | 73.7 | 75.4 | 76.8 | 78.7 |
- Answer
-
-
ⓐ
Using the
STAT
then
EDIT
menu on a graphing utility, list the years using values 1–12 in L1 and the corresponding life expectancy in L2. Then use the
STATPLOT
feature to verify that the scatterplot follows a logarithmic pattern as shown in
Figure 3
:
Use the “ LnReg ” command from the STAT then CALC menu to obtain the logarithmic model,
Next, graph the model in the same window as the scatterplot to verify it is a good fit as shown in Figure 4 :
-
ⓐ
Using the
STAT
then
EDIT
menu on a graphing utility, list the years using values 1–12 in L1 and the corresponding life expectancy in L2. Then use the
STATPLOT
feature to verify that the scatterplot follows a logarithmic pattern as shown in
Figure 3
:
Try It #2
Sales of a video game released in the year 2000 took off at first, but then steadily slowed as time moved on. Table 4 shows the number of games sold, in thousands, from the years 2000–2010.
| Year | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 |
| Number Sold (thousands) | 142 | 149 | 154 | 155 | 159 | 161 |
| Year | 2006 | 2007 | 2008 | 2009 | 2010 | - |
| Number Sold (thousands) | 163 | 164 | 164 | 166 | 167 | - |
Building a Logistic Model from Data
Like exponential and logarithmic growth, logistic growth increases over time. One of the most notable differences with logistic growth models is that, at a certain point, growth steadily slows and the function approaches an upper bound, or limiting value . Because of this, logistic regression is best for modeling phenomena where there are limits in expansion, such as availability of living space or nutrients.
It is worth pointing out that logistic functions actually model resource-limited exponential growth. There are many examples of this type of growth in real-world situations, including population growth and spread of disease, rumors, and even stains in fabric. When performing logistic regression analysis , we use the form most commonly used on graphing utilities:
Recall that:
- is the initial value of the model.
- when the model increases rapidly at first until it reaches its point of maximum growth rate, At that point, growth steadily slows and the function becomes asymptotic to the upper bound
- is the limiting value, sometimes called the carrying capacity , of the model.
Logistic Regression
Logistic regression is used to model situations where growth accelerates rapidly at first and then steadily slows to an upper limit. We use the command “Logistic” on a graphing utility to fit a logistic function to a set of data points. This returns an equation of the form
Note that
- The initial value of the model is
- Output values for the model grow closer and closer to as time increases.
How To
Given a set of data, perform logistic regression using a graphing utility.
-
Use the
STAT
then
EDIT
menu to enter given data.
- Clear any existing data from the lists.
- List the input values in the L1 column.
- List the output values in the L2 column.
-
Graph and observe a scatter plot of the data using the
STATPLOT
feature.
- Use ZOOM [ 9 ] to adjust axes to fit the data.
- Verify the data follow a logistic pattern.
-
Find the equation that models the data.
- Select “ Logistic ” from the STAT then CALC menu.
- Use the values returned for and to record the model,
- Graph the model in the same window as the scatterplot to verify it is a good fit for the data.
Example 3
Using Logistic Regression to Fit a Model to Data
Mobile telephone service has increased rapidly in America since the mid 1990s. Today, almost all residents have cellular service. Table 5 shows the percentage of Americans with cellular service between the years 1995 and 2012 11 .
| Year | Americans with Cellular Service (%) | Year | Americans with Cellular Service (%) |
|---|---|---|---|
| 1995 | 12.69 | 2004 | 62.852 |
| 1996 | 16.35 | 2005 | 68.63 |
| 1997 | 20.29 | 2006 | 76.64 |
| 1998 | 25.08 | 2007 | 82.47 |
| 1999 | 30.81 | 2008 | 85.68 |
| 2000 | 38.75 | 2009 | 89.14 |
| 2001 | 45.00 | 2010 | 91.86 |
| 2002 | 49.16 | 2011 | 95.28 |
| 2003 | 55.15 | 2012 | 98.17 |
- Answer
-
-
ⓐ
Using the
STAT
then
EDIT
menu on a graphing utility, list the years using values 0–15 in L1 and the corresponding percentage in L2. Then use the
STATPLOT
feature to verify that the scatterplot follows a logistic pattern as shown in
Figure 5
:
Use the “ Logistic ” command from the STAT then CALC menu to obtain the logistic model,
Next, graph the model in the same window as shown in Figure 6 the scatterplot to verify it is a good fit:
-
To approximate the percentage of Americans with cellular service in the year 2013, substitute
for the in the model and solve for
According to the model, about 99.3% of Americans had cellular service in 2013.
- The model gives a limiting value of about 105. This means that the maximum possible percentage of Americans with cellular service would be 105%, which is impossible. (How could over 100% of a population have cellular service?) If the model were exact, the limiting value would be and the model’s outputs would get very close to, but never actually reach 100%. After all, there will always be someone out there without cellular service!
-
ⓐ
Using the
STAT
then
EDIT
menu on a graphing utility, list the years using values 0–15 in L1 and the corresponding percentage in L2. Then use the
STATPLOT
feature to verify that the scatterplot follows a logistic pattern as shown in
Figure 5
:
Try It #3
Table 6 shows the population, in thousands, of harbor seals in the Wadden Sea over the years 1997 to 2012.
| Year | Seal Population (Thousands) | Year | Seal Population (Thousands) |
|---|---|---|---|
| 1997 | 3.493 | 2005 | 19.590 |
| 1998 | 5.282 | 2006 | 21.955 |
| 1999 | 6.357 | 2007 | 22.862 |
| 2000 | 9.201 | 2008 | 23.869 |
| 2001 | 11.224 | 2009 | 24.243 |
| 2002 | 12.964 | 2010 | 24.344 |
| 2003 | 16.226 | 2011 | 24.919 |
| 2004 | 18.137 | 2012 | 25.108 |
Media
Access this online resource for additional instruction and practice with exponential function models.
Choosing an Appropriate Model for Data
Now that we have discussed various mathematical models, we need to learn how to choose the appropriate model for the raw data we have. Many factors influence the choice of a mathematical model, among which are experience, scientific laws, and patterns in the data itself. Not all data can be described by elementary functions. Sometimes, a function is chosen that approximates the data over a given interval. For instance, suppose data were gathered on the number of homes bought in the United States from the years 1960 to 2013. After plotting these data in a scatter plot, we notice that the shape of the data from the years 2000 to 2013 follow a logarithmic curve. We could restrict the interval from 2000 to 2010, apply regression analysis using a logarithmic model, and use it to predict the number of home buyers for the year 2015.
Three kinds of functions that are often useful in mathematical models are linear functions, exponential functions, and logarithmic functions. If the data lies on a straight line, or seems to lie approximately along a straight line, a linear model may be best. If the data is non-linear, we often consider an exponential or logarithmic model, though other models, such as quadratic models, may also be considered.
In choosing between an exponential model and a logarithmic model, we look at the way the data curves. This is called the concavity. If we draw a line between two data points, and all (or most) of the data between those two points lies above that line, we say the curve is concave down. We can think of it as a bowl that bends downward and therefore cannot hold water. If all (or most) of the data between those two points lies below the line, we say the curve is concave up. In this case, we can think of a bowl that bends upward and can therefore hold water. An exponential curve, whether rising or falling, whether representing growth or decay, is always concave up away from its horizontal asymptote. A logarithmic curve is always concave away from its vertical asymptote. In the case of positive data, which is the most common case, an exponential curve is always concave up, and a logarithmic curve always concave down.
A logistic curve changes concavity. It starts out concave up and then changes to concave down beyond a certain point, called a point of inflection.
After using the graph to help us choose a type of function to use as a model, we substitute points, and solve to find the parameters. We reduce round-off error by choosing points as far apart as possible.
Does a linear, exponential, logarithmic, or logistic model best fit the values listed in Table \( \PageIndex{ 1 } \)? Find the model, and use a graph to check your choice.
| \(x\) | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| \(y\) | 0 | 1.386 | 2.197 | 2.773 | 3.219 | 3.584 | 3.892 | 4.159 | 4.394 |
- Solution
-
First, plot the data on a graph as in Figure \( \PageIndex{ 8 } \). For the purpose of graphing, round the data to two decimal places.
Figure \( \PageIndex{ 8 } \)Clearly, the points do not lie on a straight line, so we reject a linear model. If we draw a line between any two of the points, most or all of the points between those two points lie above the line, so the graph is concave down, suggesting a logarithmic model. We can try \(y=aln(bx)\). Plugging in the first point, \(( \text{1,0} )\text{,}\) gives \(0=alnb\). We reject the case that \(a=0\) (if it were, all outputs would be 0), so we know \(ln(b)=0\). Thus \(b=1\) and \(y=aln( \text{x} )\). Next we can use the point \(( \text{9,4}\text{.394} )\) to solve for \(a\):\[y=aln(x) 4.394=aln(9) a= 4.394 ln(9) \nonumber \]
Because \(a= 4.394 ln( 9 ) \approx 2\), an appropriate model for the data is \(y=2ln( x )\).
To check the accuracy of the model, we graph the function together with the given points as in Figure \( \PageIndex{ 9 } \).
Figure \( \PageIndex{ 9 } \) The graph of \(y=2lnx\).We can conclude that the model is a good fit to the data.
Compare Figure \( \PageIndex{ 9 } \) to the graph of \(y=ln( x 2 )\) shown in Figure \( \PageIndex{ 10 } \).
Figure \( \PageIndex{ 10 } \) The graph of \(y=ln( x 2 )\)The graphs appear to be identical when \(x>0\). A quick check confirms this conclusion: \(y=ln( x 2 )=2ln( x )\) for \(x>0\).
However, if \(x<0\), the graph of \(y=ln( x 2 )\) includes a "extra" branch, as shown in Figure \( \PageIndex{ 11 } \). This occurs because, while \(y=2ln( x )\) cannot have negative values in the domain (as such values would force the argument to be negative), the function \(y=ln( x 2 )\) can have negative domain values.
Figure \( \PageIndex{ 11 } \)
Does a linear, exponential, or logarithmic model best fit the data in Table \( \PageIndex{ 2 } \)? Find the model.
| \(x\) | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| \(y\) | 3.297 | 5.437 | 8.963 | 14.778 | 24.365 | 40.172 | 66.231 | 109.196 | 180.034 |