7.6: The Normal Distribution- An extended numeric example

Last updated
Save as PDF

Page ID: 83957

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $

$ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $

$ \newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$

( \newcommand{\kernel}{\mathrm{null}\,}\) $ \newcommand{\range}{\mathrm{range}\,}$

$ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$

$ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$

$ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$

$ \newcommand{\Span}{\mathrm{span}}$

$ \newcommand{\id}{\mathrm{id}}$

$ \newcommand{\Span}{\mathrm{span}}$

$ \newcommand{\kernel}{\mathrm{null}\,}$

$ \newcommand{\range}{\mathrm{range}\,}$

$ \newcommand{\RealPart}{\mathrm{Re}}$

$ \newcommand{\ImaginaryPart}{\mathrm{Im}}$

$ \newcommand{\Argument}{\mathrm{Arg}}$

$ \newcommand{\norm}[1]{\| #1 \|}$

$ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$

$ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\AA}{\unicode[.8,0]{x212B}}$

$ \newcommand{\vectorA}[1]{\vec{#1}} % arrow$

$ \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow$

$ \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $

$ \newcommand{\vectorC}[1]{\textbf{#1}} $

$ \newcommand{\vectorD}[1]{\overrightarrow{#1}} $

$ \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} $

$ \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} $

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $

$ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $

We want to look at an extended example where we realistically want to find a definite integral, but need to use numerical methods rather than solving for the antiderivative and using the fundamental theorem of calculus. Most students are familiar with the concept of a course that is graded on a curve. Formally that means that there is a preset distribution of grades available in the class, with a certain percentage of the students getting an A, a certain percentage getting a B, and so forth. Most college students are also familiar with the ACT, SAT, or other standardized tests, where the grades typically follow a normal or bell curve. The result we pull from more advanced mathematics is that many phenomena such as height, weight, and hat size, also follow a bell curve. In a business setting, we are often concerned whether or not a portion of a market will be big enough to support a specialty store. We also want to know how much of my production should be allocated to a range of sizes of a product. This question often boils down to finding the area under a specified portion of the normal curve.

Background from probability

We want to pull some definitions and results from the theory of probability. In particular we want a description of the function we are finding the area under and also of the related area function.

Definition: Probability Density Function

A Probability Density Function is a function that spreads the area 1 over the entire real line, with the obvious understanding that no value can have a negative probability.

In calculus terms, a Probability Density Function is a function $f(x)$ defined for $-\infty\lt x \lt \infty $ such that $f(x)\ge 0$ and $\int_{-\infty}^{\infty}f(x)dx=1\text{.}$

A probability density function is also called a continuous distribution function. The probability density function that is of most interest to us is the normal distribution. The normal density function is given by

\[ f(x)=\frac{1}{\sigma\sqrt{2\pi}}\exp\left(\frac{-(x-\mu)^2}{2\sigma^2}\right) \nonumber \]

where sigma, $\sigma\text{,}$ and mu, $\mu\text{,}$ are respectively the standard deviation and mean of the distribution. For this course the mean is the center of the distribution and the standard deviation is a measure of how tightly packed the distribution is. If we set the mean to 0 and the standard deviation to 1 we have the standardized normal distribution, or the familiar bell curve.

Thus, when I note that the adult men in the United States have a height distribution that is normal with a mean of 70 inches and a standard deviation of 3 inches, the distribution is

\[ f(x)=\frac{1}{3\sqrt{2\pi}}\exp\left(\frac{-(x-70)^2}{2*3^2}\right) \nonumber \]

$clipboard_e27993c7294ca2e8be3f49d78e2876358.png$

Thus finding the percentage of men less than 5 feet tall, reduces to evaluating the appropriate integral. Since finding the percentage of the population that fits in our market reduces to finding the area under a specified portion of this curve, we are also interested in the anti-derivative of the distribution.

Definition: Cumulative Distribution Function

Given a probability density function, $f(x)$, the related Cumulative Distribution Function, $CDFf(x)\text{,}$ is a function that measures how much area is over the interval $(-\infty,x]\text{.}$

In calculus terms, $CDFf(x)\text{,}$ the Cumulative Distribution Function of $f(x)\text{,}$ is $\int_{-\infty}^x f(t)dt\text{.}$

You will notice the techniques we have for anti-differentiation will not work with the normal distribution. In fact, the normal distribution has no closed form anti-derivative using the functions we are familiar with. Thus we need to use numeric methods.

Example 7.6.1: Tall Men in an Area

In the United States, the height of men follows a normal distribution with a mean of 70 inches (5' 10") and a standard deviation of 3 inches. I want to set up a specialty shop for men who are at least 6’ tall, but no more than 7' tall. In an area with 100,000 adult men, how big is my potential market?

Solution set up. My distribution function is $\frac{1}{3\sqrt{2\pi}}\exp\left(\frac{-(x-70)^2}{2*3^3}\right)\text{.}$ Since I have a population of 100,000 and am interested in the men who are between 72 and 84 inches tall, my potential market is

\[ 100000\int_{72}^{84} \frac{1}{3\sqrt{2\pi}} \exp\left(\frac{-(x-70)^2}{2*3^2} \right)dx. \nonumber \]

As an alternative, I can convert the problem so it is expressed in terms of standard deviations. Then I use the standardized normal distribution and my limits of integration are

\[ low\ bound\ in\ SD = (low\ bound-mean)/(SD) = (72-70)/3 = 2/3 \nonumber \]

\[ upper\ bound\ in\ SD = (upper\ bound-mean)/(SD) = (84-70)/3 = 14/3. \nonumber \]

Then my potential market is

\[ 100000\int_{2/3}^{14/3} \frac{1}{\sqrt{2\pi}} \exp(-x^2/2)dx. \nonumber \]

Solution

1, using Excel and Riemann Sums: I want to set up a spreadsheet to find the area under the curve. Since I think I may do this for several problems, I want to set up the worksheet as a template that I can simply fill in. It will make my life easier if I recast the problem in terms of standard deviations. My potential market is $100000\int_{2/3}^{14/3} \frac{1}{\sqrt{2\pi}} \exp(-x^2/2)dx\text{.}$ I am ready to set up a Riemann sum worksheet as we did in section $7.1$

$clipboard_e7d9377e8119207b63c599ed2ba5a9edc.png$

In cells F3 through F5 we convert the lower bound, upper bound, and del x to standard deviations. We recall that we get better accuracy by evaluating the rectangles with a midpoint. The midpoint of the nth rectangle is (n-0.5)*del x above the lower bound. As we did in previous sections, we use the offset command to bring our answer into the top region. When we look at the numbers we see that the potential market is 25,249.

$clipboard_e89fca458c3a48b34512396e6706eadb6.png$

Solution 1a, using Excel Statistics Commands: By this point in the course you should expect that if we claim a computation is important and done by business many times, that there is an Excel command to do the computation.

The function we are interested in is

\[ \hbox{NORM.DIST(x, mean, standard deviation, cumulative).} \nonumber \]

Where $x\text{,}$ mean, and standard deviation have the obvious meanings. The cumulative parameter is either true or false. If it is true we get the cumulative distribution function. If it is false we get the probability density function. If we are working with the standardized normal distribution, where the mean is 0 and the standard deviation is 1, the command is

\[ \hbox{NORM.S.DIST(x, cumulative).} \nonumber \]

(If you are using older versions of Excel, the syntax of the command is a little different. Check the appropriate help page if you are using an older version of Excel.) With these commands, our spreadsheet is noticeably simpler.

$clipboard_e270fec1cfe3ac8e8cf38aa5d0c0776d8.png$

When we look at the values, we get a target population of 25,249. This agrees with our estimate to 5 significant figures.

$clipboard_e2d57a1fb2553b171bb74d11688b47bfb.png$

Solution 1b, using Wolfram alpha: Once I have reduced the problem to evaluating a definite integral, I can find a numeric solution with a CAS package like Wolfram|Alpha.

\[ 100000\int_{72}^{84} \frac{1}{3\sqrt{2\pi}} \exp\left(\frac{-(x-70)^2}{2*3^2 }\right)dx \nonumber \]

becomes

100000*integrate(exp(-(x-70)^2/(2*3^2))/(3*sqrt(2*pi))) from 72 to 84.

We get our familiar answer of 25,249.

$clipboard_e37b8d70efd29bbe81f29f7f4653507ce.png$

When we compute a target population, we sometimes want to include the tail of the distribution. We might, for example be concerned with all women who are 5 feet tall or less. This sets up an integral over an infinite interval, which we can’t do as a Riemann sum. The first workaround notes that the tails are very small. If all humans who have ever lived are normally distributed, less than 1 is more than 7 standard deviations from the mean. Taking the integral down to -7 will practically be the same as integrating down to $-\infty\text{.}$ The second workaround uses the symmetry of the normal distribution.

\[ \int_{-\infty}^a SND(x)dx=\int_{-\infty}^0 SND(x)dx+\int_0^a SND(x)dx=.5+\int_0^a SND(x)dx. \nonumber \]

Example 7.6.2: Finding Short Women.

In the United States, the height of women follows a normal distribution with a mean of 64 inches (5' 4") and a standard deviation of 2.75 inches. I want to set up a specialty shop for women who are no more than 5' tall. In an area with 500,000 adult women, how big is my potential market?

Solution Set up: Using the reasoning as above, I want to estimate my market if it is 50% of the population plus the percentage between 0 and (-4/2.75) standard deviations below the mean.

Solution

Using Excel and Riemann Sums: One advantage of having set up the first exercise well, it the Riemann sum problem is now a matter of changing the parameters and subtracting from 0.5 before multiplying by the market size.

$clipboard_e35df68a17990359dbb63393d7e2b9cc9.png$

We notice that since we are finding the area under the standardized normal distribution from 0 to a negative number, we get a negative area. Our potential market is composed of 3,645 women.

Using Excel Statistics Commands: When using the statistics commands, the area function is zero at $-\infty\text{.}$ Thus we simply have to evaluate

NORM.S.DIST(right hand limit, cumulative).

$clipboard_ec197e8d40cb302701c7a8f50c1a0d4ee.png$

Once again, we get a potential market of 3,645 women.

While the normal distribution spreads a population over the real numbers, most objects come in discrete sizes. Depending on the kind of shoes, the sizes are either whole or half numbers. You can’t buy a shoe of size 8.764. The normal procedure is to divide the population at the middle between the sizes.

Example 7.6.3: Women's Shoes.

In the United States, the shoe sizes of women follows a normal distribution with a mean of 8 and a standard deviation of 1.5. I want to order 1000 pairs of shoes. If the shoes are only available in full sizes, how many pairs should I order of size 7?

Solution

I want the portion of the population between size 6.5 and 7.5. I fit it into my worksheet for Riemann sums.

$clipboard_e03b897f8cbc015de4a9f469cc174d1c6.png$

Of the 1000 pairs of shoes, 211 should be size 7.

We have looked at three methods for finding a portion of a normally distributed population, which we describe as Excel with Riemann sums, Excel with statistics commands, and CAS. It is worthwhile to consider the advantages and disadvantages of the methods. The Riemann sums method takes the most work to set up. It is also conceptually the most straightforward and the most flexible. It is the easiest to adapt if we are doing some nonstandard distribution of a population. It also shows intermediate values if we have a less sharp question and are trying to see what is going on and are still deciding on the business question we want to ask. The Excel with statistics command approach requires us to learn special commands. It is also less work. It would probably be the favored method if we were doing a lot of these computations. It should be noted that Excel has corresponding commands for the other standard probability distributions. The CAS method does not require special commands, but it takes us out of our Excel environment. It does not let us leave a worksheet that is well documented and that can be easily modified by someone else asking similar questions.

class="

Exercises: Normal Distribution Problems

Exercise 1:

Assume that women’s shoe sizes are normally distributed with a mean of 8 and a standard deviation of 1.5. A particular style of shoes in available in full and half sizes. I plan to make 10,000 pairs of this style.

Express, as an integral, the number of pairs I should make of size 9.
How many pairs of size 9 shoes should I make?
How do your answers change if the shoes are only to be made in full sizes?

Answer

\[ \hbox{pairs size 9}=10000\int_{8.75}^{9.25} \frac{1}{1.5\sqrt{2\pi}}e^{\left({\frac{-(x-9)^2}{2*1.5^2}}\right)}\ dx \nonumber \]

Search

Text Color

Text Size

Margin Size

Font Type

Definition: Probability Density Function

Definition: Cumulative Distribution Function

Example 7.6.1: Tall Men in an Area

Example 7.6.2: Finding Short Women.

Example 7.6.3: Women's Shoes.

Exercise 1: