10.5: Central Tendency
Consider the following two situations:
- Situation 1 . A small town decides to hold a lottery to raise funds for charitable purposes. A total of 10,001 tickets are sold, and the tickets are labeled with numbers from the set {0,1,2,…,10,000}. At a public ceremony, duplicate tickets are placed in a big box, and the mayor draws the winning ticket from out of the box. Just to heighten the suspense as to who has actually won the prize, the mayor reports that the winning number is at least 7,500. The citizens ooh and aah and they can't wait to see who among them will be the final winner.
- Situation 2 . Behind a curtain, a fair coin is tossed 10,000 times, and the number of heads is recorded by an observer, who is reputed to be honest and impartial. Again, the outcome is an integer in the set {0,1,2,…,10,000}. The observer then emerges from behind the curtain and announces that the number of heads is at least than 7,500. There is a pause and then someone says “What? Are you out of your mind?”
So we have two probability spaces, both with sample space \(S=\{0,1,2,…,10,000\}\). For each, we have a random variable \(X\), the winning ticket number in the first situation, and the number of heads in the second. In each case, the expected value, \(E(X)\), of the random variable \(X\) is 5,000. In the first case, we are not all that surprised at an outcome far from the expected value, while in the second, it seems intuitively clear that this is an extraordinary occurrence. The mathematical concept here is referred to as central tendency , and it helps us to understand just how likely a random variable is to stray from its expected value.
For starters, we have the following elementary result.
Let \(X\) be a random variable in a probability space \((S,P)\) . Then for every \(k>0\) ,
\(P(|X| \geq k) \leq E(|X|)/k\).
- Proof
-
Of course, the inequality holds trivially unless \(k>E(|X|)\). For \(k\) in this range, we establish the equivalent inequality: \(kP(|X| \geq k) \leq E(|X|)\).
\(kP(|X| \geq k) = \displaystyle \sum_{r \geq k} kP(|X| = r)\)
\( \leq \displaystyle \sum_{r \geq k} rP(|X| = r)\)
\( \leq \displaystyle \sum_{r>0} rP(|X| = r)\)
\(= E(|X|)\).
To make Markov's inequality more concrete, we see that on the basis of this trivial result, the probability that either the winning lottery ticket or the number of heads is at least 7,500 is at most 5000/7500=2/3. So nothing alarming here in either case. Since we still feel that the two cases are quite different, a more subtle measure will be required.
10.5.1 Variance and Standard Deviation
Again, let \((S,P)\) be a probability space and let \(X\) be a random variable. The quantity \(E((X−E(X))^2)\) is called the variance of \(X\) and is denoted \(var(X)\). Evidently, the variance of \(X\) is a non-negative number. The standard deviation of \(X\), denoted \( \sigma_X\) is then defined as the quantity \(\sqrt{var(x)}\), i.e., \(\sigma_X^2 = var(X)\).
For the spinner shown at the beginning of the chapter, let \(X(i)=i^2\) when the pointer stops in region \(i\). Then we have already noted that the expectation \(E(X)\) of the random variable \(X\) is 109/8. It follows that the variance var(\(X\)) is:
\(var(X) = (1^2 - \dfrac{109}{8})^2 \dfrac{1}{8} + (2^2 - \dfrac{109}{8})^2 \dfrac{1}{4} + (3^2 - \dfrac{109}{8})^2 \dfrac{1}{8} + (4^2 - \dfrac{109}{8})^2 \dfrac{1}{8} + (5^2 - \dfrac{109}{8})^2 \dfrac{3}{8}\)
\( = (108^2 + 105^2 + 100^2 + 93^2 + 84^2)/512\)
\(= 48394/512\)
It follows that the standard deviation \(\sigma_X\) of \(X\) is then \(\sqrt{48394/512} \approx 9.722\).
Suppose that \(0<p<1\) and consider a series of \(n\) Bernoulli trials with the probability of success being \(p\), and let \(X\) count the number of successes. We have already noted that \(E(X)=np\). Now we claim the variance of \(X\) is given by:
\(var(X) = \displaystyle \sum_{i=0}^n (i -np)^2 \dbinom{n}{i}p^i (1-p)^{n-i} = np(1-p)\)
There are several ways to establish this claim. One way is to proceed directly from the definition, using the same method we used previously to obtain the expectation. But now you need also to calculate the second derivative. Here is a second approach, one that capitalizes on the fact that separate trials in a Bernoulli series are independent.
Let \(\mathcal{F}=\{X_1,X_2,…,X_n\}\) be a family of random variables in a probability space \((S,P)\). We say the family \(\mathcal{F}\) is independent if for each \(i\) and \(j\) with \(1 \leq i<j \leq n\), and for each pair \(a,b\) of real numbers with \(0 \leq a,b \leq 1\), the following two events are independent: \(\{x \in S:X_i(x) \leq a\}\) and \(\{x \in S:X_j(x) \leq b\}\). When the family is independent, it is straightforward to verify that
\(var(X_1+X_2+ \cdot \cdot \cdot +X_n)=var(X_1)+var(X_2)+ \cdot \cdot \cdot +var(X_n)\).
With the aid of this observation, the calculation of the variance of the random variable \(X\) which counts the number of successes becomes a trivial calculation. But in fact, the entire treatment we have outlined here is just a small part of a more complex subject which can be treated more elegantly and ultimately much more compactly—provided you first develop additional background material on families of random variables. For this we will refer you to suitable probability and statistics texts, such as those given in our references.
Let \(X\) be a random variable in a probability space \((S,P)\) . Then \(var(X) = E(X^2) - E^2(X)\) .
- Proof
-
Let \(E(X)=μ\). From its definition, we note that
\(var(X) = \displaystyle \sum_{r} (r - μ)^2 prob (X = r)\)
\( = \displaystyle \sum_{r} (r^2 - 2rμ + μ^2)prob(X = r)\)
\( = \displaystyle \sum_{r} r^2 prob(X=r) - 2μ \sum_{r} prob(X = r) + μ^2 \sum_{r} prob(X=r)\)
\( = E(X^2) - 2μ^2 + μ^2\)
\( = E(X^2) - μ^2\)
\( = E(X^2) - E^2(X)\).
Variance (and standard deviation) are quite useful tools in discussions of just how likely a random variable is to be near its expected value. This is reflected in the following theorem.
Let \(X\) be a random variable in a probability space \((S,P)\) , and let \(k>0\) be a positive real number. If the expectation \(E(X)\) of \(X\) is \(μ\) and the standard deviation is \(σ_X\) , then
prob(\(|X - E(X)| \leq k \sigma_X) \geq 1 - \dfrac{1}{k^2}\).
- Proof
-
Let \(A=\{r \in \mathbb{R}:|r−μ|>kσ_X\}\).
Then we have:
var(\(X\))=\(E((X−μ)^2)\)
\(= \sum_{r \in \mathbb{R}} (r−μ)^2\) prob(\(X=r\))
\(≥\sum_{r \in A}(r−μ)^2\) prob(\(X=r\))
\(≥k^2σ_X^2 \sum_{r \in A}\) prob(\(X=r\))
\(≥k^2σ_X^2\)prob(\(|X−μ|>kσX\)).
Since var(\(X\))=\(σ_X^2\), we may now deduce that \(1/k^2 \geq \) prob(\(|X−μ|)>kσ_X\)). Therefore, since prob(\(|X−μ|≤kσ_X)=1\)−prob(\(|X−μ|>kσ_X\)), we conclude that
prob(\(|X - μ| \leq k \sigma_X) \geq 1 - \dfrac{1}{k^2}\).
Here's an example of how Chebyshev's Inequality can be applied. Consider \(n\) tosses of a fair coin with \(X\) counting the number of heads. As noted before, \(μ=E(X)=n/2\) and \(var(X)=n/4\), so \(σ_X=\sqrt{n}/2\). When \(n=10,000\) and \(μ=5,000\) and \(σ_X=50\). Setting \(k=50\) so that \(kσ_X=2500\), we see that the probability that \(X\) is within 2500 of the expected value of 5000 is at least 0.9996. So it seems very unlikely indeed that the number of heads is at least 7,500.
Going back to lottery tickets, if we make the rational assumption that all ticket numbers are equally likely, then the probability that the winning number is at least 7,500 is exactly 2501/100001, which is very close to 1/4.
In the case of Bernoulli trials, we can use basic properties of binomial coefficients to make even more accurate estimates. Clearly, in the case of coin tossing, the probability that the number of heads in 10,000 tosses is at least 7,500 is given by
\(\displaystyle \sum_{i = 7,500}^{10,000} \dbinom{10,000}{i}/2^{10,000}\)
Now a computer algebra system can make this calculation exactly, and you are encouraged to check it out just to see how truly small this quantity actually is.