Loading [MathJax]/jax/output/HTML-CSS/jax.js
Skip to main content
Library homepage
 

Text Color

Text Size

 

Margin Size

 

Font Type

Enable Dyslexic Font
Mathematics LibreTexts

3.8: Probability

( \newcommand{\kernel}{\mathrm{null}\,}\)

Learning Objectives
  • Study examples of probability density functions for a continuous random variable.
  • Compute probability using tools of calculus.
  • Calculate measures of center: mean, median.
  • Calculate measures of variability: variance, standard deviation.

You perhaps have at least a rudimentary understanding of discrete probability, which measures the likelihood of an "event'' when there are a finite number of possibilities. For example, when an ordinary six-sided die is rolled, the probability of getting any particular number is 1/6. In general, the probability of an event is the number of ways the event can happen divided by the number of ways that "anything'' can happen.

For a slightly more complicated example, consider the case of two six-sided dice. The dice are physically distinct, so there are 36 possible outcomes:

Table 3.8.1: Set of possible outcomes when rolling two dice
First Die, Second Die 1 2 3 4 5 6
1 1, 1 1, 2 1, 3 1, 4 1, 5 1, 6
2 2, 1 2, 2 2, 3 2, 4 2, 5 2, 6
3 3, 1 3, 2 3, 3 3, 4 3, 5 3, 6
4 4, 1 4, 2 4, 3 4, 4 4, 5 4, 6
5 5, 1 5, 2 5, 3 5, 4 5, 5 5, 6
6 6, 1 6, 2 6, 3 6, 4 6, 5 6, 6

which means that rolling a "2, 5" is different than rolling a "5, 2"; each is an equally likely event out of a total of 36 ways the dice can land, so each has a probability of 1/36.

Most interesting events are not so simple. More interesting is the probability of rolling a certain sum out of the possibilities 2 through 12. It is clearly not true that all sums are equally likely: the only way to roll a 2 is to roll "1, 1", while there are many ways to roll a 7. Because the number of possibilities is quite small, and because a pattern quickly becomes evident, it is easy to see that the probabilities of the various sums are:

P(2)=P(12)=1/36P(3)=P(11)=2/36P(4)=P(10)=3/36P(5)=P(9)=4/36P(6)=P(8)=5/36P(7)=6/36

Here we use P(n) to mean "the probability of rolling an n.'' Since we have correctly accounted for all possibilities, the sum of all these probabilities is 36/36=1; the probability that the sum is one of 2 through 12 is 1, because there are no other possibilities.

The study of probability is concerned with more difficult questions as well; for example, suppose the two dice are rolled many times. On the average, what sum will come up? In the language of probability, this average is called the expected value of the sum. This is at first a little misleading, as it does not tell us what to "expect'' when the two dice are rolled, but what we expect the long term average will be.

Suppose that two dice are rolled 36 million times. Based on the probabilities, we would expect about 1 million rolls to be 2, about 2 million to be 3, and so on, with a roll of 7 topping the list at about 6 million. The sum of all rolls would be 1 million times 2 plus 2 million times 3, and so on, and dividing by 36 million we would get the average:

μ=(2106+3(2106)++7(6106)++12106)136106=210636106+3210636106++7610636106++1210636106=2P(2)+3P(3)++7P(7)++12P(12)=12i=2iP(i)=7.

There is nothing special about the 36 million in this calculation. No matter what the number of rolls, once we simplify the average, we get the same 12i=2iP(i). While the actual average value of a large number of rolls will not be exactly 7, the average should be close to 7 when the number of rolls is large. Turning this around, if the average is not close to 7, we should suspect that the dice are not fair.

A variable, say X, that can take certain values, each with a corresponding probability, is called a random variable; in the example above, the random variable was the sum of the two dice. If the possible values for X are x1, x2,,xn, then the expected value of the random variable is

E(X)=ni=1xiP(xi).

The expected value is also called the mean, and denoted by the Greek letter μ.

When the possible values of X are a discrete set, such as 2,3,4,...,12, as above, or even 0,1,2,3,..., we say that X is a discrete random variable.

However, in many applications of probability, the number of possible values of a random variable falls on a continuum. For example, we may be interested in the weight of die manufactured by a company, or the length of a certain species of fish, or the amount of time a person must wait in a line at a post office before they are served. To deal with this, we need a different approach, and since there is a sum involved, it should not be wholly surprising that integration turns out to be a useful tool. It then turns out that even when the number of possibilities is large but finite, it is frequently easier to pretend that the number is infinite.

Definition: Prabability Density Function

Let f:R to R be a function. If f(x)0 for every x and f(x)dx=1 then f is a probability density function (PDF).

We associate a probability density function with a random variable X by stipulating that the probability that X is between a and b is

P(a<X<b)=baf(x)dx.

Because of the requirement that the integral from to be 1, all probabilities are less than or equal to 1, and the probability that X takes on some value between and is 1, as it should be.

Consider again the two dice example; we can view it in a way that more resembles the probability density function approach. Consider a random variable X that takes on any real value with probabilities given by the probability density function in Figure 3.8.1. The function f consists of just the top edges of the rectangles, with vertical sides drawn for clarity; the function is zero below 1.5 and above 12.5. The area of each rectangle is the probability of rolling the sum in the middle of the bottom of the rectangle, or

P(n)=n+1/2n1/2f(x)dx.

The probability of rolling a 4, 5, or 6 is P(n)=13/27/2f(x)dx Of course, we could also compute probabilities that don't make sense in the context of the dice, such as the probability that X is between 4 and 5.8.

Probability density histogram of the sum of 2 dice

Figure 3.8.1: Histogram plotting the probability density for the sum of 2 dice.

The function F(x)=P(Xx)=xf(t)dt is called the cumulative distribution function (CDF) or simply (probability) distribution.

Example 3.8.1

Consider the function

f(x)={0x<0x0x<12x1x<20x2

  1. Show that f is a probability density function.
  2. Compute P(12<X<1).
  3. Compute the cumulative distribution function.

Solution

First, let's look at the graph of the given function.

Graph of the given PDF. Details in caption.​​​​​
Figure 3.8.2: Graph of the function f given in Example 3.8.1. For x<0 and x2, the graph has a horizontal line along the x-axis. In between, the graph makes an upside-down 'V' shape, reaching height 1 at x=1. (CC-BY-SA-4.0 via Desmos)
  1. To show that f is a probability density function, we need to check that f(x) is never negative, and show that f(x)dx=1.

f(x)dx=20f(x)dx=10xdx+212xdx=12+12=1

  1. This is the probability that X takes on a value between 12 and 1:

P(12<X<1)=112f(x)dx=112xdx=(12x2)|112=1218=38

  1. To find the CDF, we compute F(x)=P(Xx)=xf(t)dt. For the given PDF,

F(x)=P(Xx)=x0f(t)dt.

For 0x<1, x0f(t)dt=x0tdt=12x2.

For 1x<2, x0f(t)dt=10tdt+x12tdt=12+12(1+2x)(x1)=12x2+2x1.

Putting these together, we have that

F(x)={0x<012x20x<112x2+2x11x<20x2

Common Probability Density Functions

Next, we discuss some commonly used probability density functions for continuous random variables.

Uniform Distribution

Definition: Uniform Distribution

Suppose that a<b. The uniform distribution has probability density function

f(x)={1ba if axb0otherwise.

When a random variable X has such a distribution, we say that X is uniformly distributed from a to b, and write XU(a,b).

The uniform distribution is useful when all of the values of a variable on an interval are equally likely to occur. For example, if a person shows up to catch a train that arrives every 20 minutes, then X, the amount of time the person must wait for the train, falls between 0 minutes and 20 minutes, with no value within the interval [0,20] more likely than another.

Exponential Distribution

Definition: Exponential Distribution

The exponential distribution has probability density function

f(x)={0x<0cecxx0

where c is a positive constant.

When a random variable X has such a distribution, we say that X is exponentially distributed with parameter c, and write XExp(c).

The exponential distribution is useful to model a random variable that represents the time until a specific event occurs. For example, the time until a screen cracks on a phone, or the amount of time until a website is visited. An important property of the exponential distribution is that it's "memoryless". For example, if we want to know the probability that a phone's screen cracks within the next 1 year, then knowing that a phone's screen hasn't cracked in the last 2 years will not affect that probability.

Normal Distribution

Consider the function f(x)=ex2/2. What can we say about the following integral?

ex2/2dx

We cannot find an antiderivative of f, but we can see that this integral is some finite number. Notice that 0<f(x)=ex2/2ex/2 for |x|>1. This implies that the area under ex2/2 is less than the area under ex/2, over the interval [1,). It is easy to compute the latter area, namely

1ex/2dx=2e,

so

1ex/2dx

is some finite number smaller than 2/e.

Because f is symmetric around the y-axis, 1ex2/2dx=1ex2/2dx. This means that ex2/2dx=1ex2/2dx+11ex2/2dx+1ex2/2dx=A for some finite positive number A. Now if we let g(x)=f(x)/A, g(x)dx=1Aex2/2dx=1AA=1, so g is a probability density function.

Through comparison, we have shown that A is some finite number without computing it. We cannot compute it with the techniques we have available. By using some techniques from multivariable calculus, it can be shown that A=2π. So, g(x)=12πex22 is a probability density function and is an important on.

Definition: Standard Normal Distribution

The standard normal distribution has probability density function

g(x)=12πex22

This density function has the following graph (Figure 3.8.3).

The graph of a bell-shaped curve. Details in caption.
Figure 3.8.3: The standard normal distribution. This is an even function with absolute maximum value occurring at x=0. The curve tapers down on either side of x=0 in a symmetric way with inflection points occurring at x=±1.

The letter Z is typically used for a random variable has the standard normal distribution. We write ZN(0,1).

This basic density function, when transformed by horizontally stretch by a factor of σ, followed by a vertical compression to ensure the total area under the graph remains 1, followed by a horizontal shift by μ), becomes one of the most applicable distributions in probability and statistics. This results in the following probability density function for the normal distribution.

Definition: Normal Distribution

The normal distribution has probability density function

f(x)=1σ2πe(xμ)22σ2

When a random variable X has the normal distribution, we say that X is normally distributed with parameters mu and sigma, and write XN(μ,σ).

The normal distribution, or more commonly known as the bell curve, is the most frequently used density function. It is useful in modeling many natural phenomena that are impacted by multiple factors, such as heights of human adults, lengths of pregnancies, etc.

Measures of Center

The mean or expected value of a random variable is quite useful, as hinted at in our discussion of dice. Recall that the mean for a discrete random variable is E(X)=ni=1xiP(xi). In the more general context we use an integral in place of the sum.

Definition: Mean

The mean of a continuous random variable X with probability density function f is μ=E(X)=xf(x)dx, provided the integral converges.

When the mean exists it is unique, since it is the result of an explicit calculation. The mean does not always exist.

The mean might look familiar; it is essentially identical to the center of mass of a one-dimensional beam, as discussed in section 2.6. The probability density function f plays the role of the physical density function, but now the "beam'' has infinite length. If we consider only a finite portion of the beam, say between a and b, then the center of mass is ˉx=baxf(x)dxbaf(x)dx. If we extend the beam to infinity, we get ˉx=xf(x)dxf(x)dx=xf(x)dx=E(X), because f(x)dx=1. In the center of mass interpretation, this integral is the total mass of the beam, which is always 1 when f is a probability density function.

Example 3.8.2

Find the mean of the standard normal distribution, whose density function is given by g(x)=12πex22.

Solution

The mean of the standard normal distribution is

xex2/22πdx.

We compute the two halves:

0xex2/22πdx=limtex2/22π|0t=12π

and

0xex2/22πdx=limtex2/22π|t0=12π.

The sum of these is 0, which is the mean.

Example 3.8.3

Find the mean of the exponential distribution, whose density function is given f(x)=cecx, for x0.

Solution

μ=E(X)=xcecxdx=0cxecxdx=limtt0cxecxdx=limt[xecx1cecx]|t0Using integration by parts=limt[tect1cect(01cect)]=1cUsing L’Hôpital’s Rule

Definition: Median

Given a probability density function f for a continuous random variable X, the median value of X, denoted M, is a number such that

Mf(x)dx=Mf(x)dx=12.

HERE

Example 3.8.4

Find the median of the exponential distribution, whose density function is given f(x)=cecx, for x0.

Solution

The median of a probability distribution is the value M such that P(XM)=0.5..

For the exponential distribution, the cumulative distribution function (CDF) is: F(x)=P(Xx)=1ecx,x0.

To find the median, we set F(M)=0.5., and

solve for M:

1ecM=0.5.

ecM=0.5.

Taking the natural logarithm on both sides,

cM=ln(0.5).

M=ln(0.5)c.

Since ln(0.5)=ln2, we obtain that M=ln2c.

Measures of Variability

While the mean is very useful, it typically is not enough information to properly evaluate a situation. For example, suppose we could manufacture an 11-sided die, with the faces numbered 2 through 12 so that each face is equally likely to be down when the die is rolled. The value of a roll is the value on this lower face. Rolling the die gives the same range of values as rolling two ordinary dice, but now each value occurs with probability 1/11. The expected value of a roll is 211+311++1211=7. The mean does not distinguish the two cases, though of course they are quite different.

If f is a probability density function for a random variable X, with mean μ, we would like to measure how far a "typical'' value of X is from μ. One way to measure this distance is (Xμ)2; we square the difference so as to measure all distances as positive. To get the typical such squared distance, we compute the mean. For two dice, for example, we get (27)2136+(37)2236++(77)2636+(117)2236+(127)2136=3536. Because we squared the differences this does not directly measure the typical distance we seek; if we take the square root of this we do get such a measure, 35/362.42. Doing the computation for the strange 11-sided die we get (27)2111+(37)2111++(77)2111+(117)2111+(127)2111=10, with square root approximately 3.16. Comparing 2.42 to 3.16 tells us that the two-dice rolls clump somewhat more closely near 7 than the rolls of the weird die, which of course we already knew because these examples are quite simple.

To perform the same computation for a probability density function the sum is replaced by an integral, just as in the computation of the mean.

Definition: Variance and Standard Deviation

Let X be a continuous random variable with probability density function f. Then, the expected value of the squared distances is E((Xμ)2)=(xμ)2f(x)dx, called the variance of X. The square root of the variance is the standard deviation of X, denoted σ.

Example 3.8.5

Find the standard deviation of the standard normal distribution, whose density function is given by g(x)=12πex22.

Solution

The variance of X is

12πx2ex2/2dx

To compute the antiderivative, use integration by parts, with u=x and dv=xex2/2dx.

This gives

x2ex2/2dx=xex2/2+ex2/2dx

We cannot do the new integral, but we know its value when the limits are to , from our discussion of the standard normal distribution. Thus

12πx2ex2/2dx=limt12πxex2/2|tt+12πex2/2dx=0+12π2π=1

The standard deviation is then 1=1.

Example 3.8.6

Find the variance and standard deviation of the exponential distribution, whose density function is given f(x)=cecx, for x0.

Solution

The variance σ2 of a distribution is given by

σ2=E((Xμ)2)=(xμ)2cecxdx=(x1c)2cecxdx=0(x1c)2cecxdx=limtt0(x1c)2cecxdx=limt[(x1c)2ecx2c(x1c)ecx2c2ecx]|t0Using integration by parts=1c2Using L’Hôpital’s Rule

The standard deviation σ is the square root of the variance. So, σ=Var(X)=1c2=1c.

Suppose we have an exponential distribution with c=4. The standard deviation is: σ=1c=14=0.25. Thus, the standard deviation of an exponential distribution with c=4 is 0.25.

Here is a simple example showing how these ideas can be useful.

Suppose it is known that, in the long run, 1 out of every 100 computer memory chips produced by a certain manufacturing plant is defective when the manufacturing process is running correctly. Suppose 1000 chips are selected at random and 15 of them are defective. This is more than the 'expected' number (10), but is it so many that we should suspect that something has gone wrong in the manufacturing process? We are interested in the probability that various numbers of defective chips arise; the probability distribution is discrete: there can only be a whole number of defective chips. But - under reasonable assumptions - the distribution is very close to a normal distribution, namely this one:

f(x)=12π1000(.01)(.99)exp((x10)22(1000)(.01)(.99)),

which is pictured in Figure 3.8.4 (recall that exp(x)=ex).

Figure shows a Normal density function.
Figure 3.8.4: Normal density function for the defective chips examples.

Now how do we measure how unlikely it is that under normal circumstances we would see 15 defective chips? We can't compute the probability of exactly 15 defective chips, as this would be 1515f(x)dx=0. We could compute 15.514.5f(x)dx0.036; this means there is only a 3.6 chance that the number of defective chips is 15. (We cannot compute these integrals exactly; computer software has been used to approximate the integral values in this discussion.) But this is misleading: 10.59.5f(x)dx0.126, which is larger, certainly, but still small, even for the "most likely'' outcome. The most useful question, in most circumstances, is this: how likely is it that the number of defective chips is "far from'' the mean? For example, how likely, or unlikely, is it that the number of defective chips is different by 5 or more from the expected value of 10? This is the probability that the number of defective chips is less than 5 or larger than 15, namely

5f(x)dx+15f(x)dx0.11.

So there is an 11% chance that this happens---not large, but not tiny. Hence the 15 defective chips does not appear to be cause for alarm: about one time in nine we would expect to see the number of defective chips 5 or more away from the expected 10. How about 20? Here we compute 0f(x)dx+20f(x)dx0.0015. So there is only a 0.15 chance that the number of defective chips is more than 10 away from the mean; this would typically be interpreted as too suspicious to ignore---it shouldn't happen if the process is running normally.

The big question, of course, is what level of improbability should trigger concern? It depends to some degree on the application, and in particular on the consequences of getting it wrong in one direction or the other. If we're wrong, do we lose a little money? A lot of money? Do people die? In general, the standard choices are 5% and 1%. So what we should do is find the number of defective chips that has only, let us say, a 1% chance of occurring under normal circumstances, and use that as the relevant number. In other words, we want to know when

10rf(x)dx+10+rf(x)dx<0.01.

A bit of trial and error shows that with r=8 the value is about 0.011, and with r=9 it is about 0.004, so if the number of defective chips is 19 or more, or 1 or fewer, we should look for problems. If the number is high, we worry that the manufacturing process has a problem, or conceivably that the process that tests for defective chips is not working correctly and is flagging good chips as defective. If the number is too low, we suspect that the testing procedure is broken, and is not detecting defective chips.

Contributors and Attributions


This page titled 3.8: Probability is shared under a CC BY-NC 4.0 license and was authored, remixed, and/or curated by David Guichard via source content that was edited to the style and standards of the LibreTexts platform.

Support Center

How can we help?