# 01 Introduction

- Page ID
- 1100

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

## What is probability theory?

In this course we'll learn about probability theory. But what exactly \emph{is} probability theory? Like some other mathematical fields (but unlike some others), it has a dual role:

- It is a
**rigorous mathematical theory**-- with definitions, lemmas, theorems, proofs etc. - It is a
**mathematical model**that purports to explain or model real-life phenomena.

We will concentrate on the rigorous mathematical aspects, but we will try not to forget the connections to the intuitive notion of real-life probability. These connections will enhance our intuition, and they make probability an extremely useful tool in all the sciences. And they make the study of probability much more *fun*, too! A note of caution is in order, though: mathematical models are only as good as the assumptions they are based on. So probability can be *used*, and it can be (and quite frequently is) *abused*....

Example |
---|

The theory of differential equations is another mathematical theory which has the dual role of a rigorous theory and an applied mathematical model. Game theory is another one (for which the question of how to apply it to real-life situations is often very contentious). On the other hand, number theory, complex analysis, and algebraic topology are examples of fields which are not normally used to model real-life phenomena. |

## The algebra of events

A central notion in probability is that of the **algebra of events** (we'll clarify later what the word ``algebra'' means here). We begin with an informal discussion. We imagine that probability is a function, denoted \(\prob\), that takes as its argument an ``event'' (i.e., occurrence of something in a real-life situation involving uncertainty) and returns a real number in \([0,1]\) representing how likely this event is to occur. For example, if a fair coin is tossed 10 times and we denote the results of the tosses by \(X_1,X_2,\ldots,X_{10}\) (where each of \(X_i\) is \(0\) or \(1\), signifying ``tails'' or ``heads''), then we can write statements like

$$ \prob(X_i = 0) = 1/2, \qquad (1\le i\le 10),

$$

$$ \prob\left(\sum_{i=1}^{10} X_i = 4\right) = \frac{\binom{10}{4}}{2^{10}}.

$$

Note that if \(A\) and \(B\) represent events (meaning, for the purposes of the present informal discussion, objects that have a well-defined probability), then we expect that the phrases ``\(A\) did not occur'', ``\(A\) and \(B\) both occurred'' and ``at least one of \(A\) and \(B\) occurred'' also represent events. We can use notation borrowed from mathematical logic and denote these new events by \(\lnot A\), \(A\wedge B\), and \(A \vee B\), respectively.

Thus, the set of events is not just a set -- it is a set with some extra structure, namely the ability to perform **negation**, **conjunction** and **disjunction** operations on its elements. Such a set is called an **algebra** in some contexts.

But what if the coin is tossed an *infinite* number of times? In other words, we now imagine an infinite sequence \(X_1,X_2,X_3,\ldots\) of (independent) coin toss results. We want to be able to ask questions such as

\begin{eqnarray*}

\prob(\textrm{infinitely many of the \(X_i$'s are 0}) &=& ? \\

\prob\left(\lim_{n\to\infty} \frac{1}{n} \sum_{k=0}^n X_k = \frac12\right) &=& ? \\

\prob\left(\sum_{k=1}^\infty \frac{2X_k-1}{k}\textrm{ converges}\right) &=& ?

\end{eqnarray*}

Do such questions make sense? (And if they do, can you guess what the answers are?) Maybe it is not enough to have an *informal* discussion to answer this...

Example 2 |
---|

(a) An urn initially contains a white ball and a black ball. A ball is drawn out at random from the urn, then added back and another white ball is added to the urn. This procedure is repeated infinitely many times, so that after step \(n\) the urn contains 1 black ball and \(n+1\) white balls. For each \(n\ge1\), let \(A_n\) denote the event that at step \(n\) the black ball was drawn. Now let \(A_\infty\) denote the event \[ A_\infty = \textrm{``in total, the black ball was selected infinitely many times''}, \] (i.e., the event that infinitely many of the events \(A_n\) occurred). (b) While this experiment takes place, an identical copy of the experiment is taking place in the next room. The random selections in the two neighboring rooms have no connection to each other, i.e., they are ``independent''. For each \(n\ge 1\), let \(B_n\) be the event that at step \(n\) the black ball was drawn out of the ``copy'' experiment urn. Now let \(B_\infty\) denote the event \begin{eqnarray*} (in other words, the event that infinitely many of the events \(B_n\) occurred). (c) For each \(n\ge 1\), let \(C_n\) be the event that both \(A_n\) and \(B_n\) occurred, i.e. \begin{eqnarray*} and let \(C_\infty\) denote the event ``\(C_n\) occurred for infinitely many values of \(n\)''. |

Theorem |
---|

We have \[ \prob(A_\infty) = \prob(B_\infty) = 1, \qquad \prob(C_\infty) = 0. \] |

Proof |
---|

These claims are consequences of the \[ \prob(A_n) = \prob(B_n) = \frac{1}{n+1}, \] since at time \(n$ each of the urns contains \(n+1$ balls, only one of which is black. Moreover, the choices in both rooms are made independently, so we have \[ \prob(C_n) = \prob(A_n \wedge B_n) = \prob(A_n) \prob(B_n) = \frac{1}{(n+1)^2}. \] It turns out that to prove that \(\prob(C_\infty)=0\), the only relevant bit of information is that the infinite series \(\sum_{n=1}^\infty \prob(C_n)\) is a convergent series; the precise values of the probabilities are irrelevant. Indeed, we can try to do various manipulations on the definition of the event \(C\), as follows: \begin{eqnarray*} In particular, in order for the event \(C_\infty\) to happen, \(D_N$ must happen for any fixed value of \(N$ (for example, \(D_{100}\) must happen, \(D_{101}$ must happen, etc.). It follows that \(C_\infty\) is at most as likely to happen as any of the \(D_N\)'s; in other words we have \[ \prob(C_\infty) \le \prob(D_N), \qquad (N\ge 1). \] Now, what can we say about \(\prob(D_N)$? Looking at the definition of \(D_N\), we see that it too can be written as an infinite \begin{eqnarray*} D_N &=& C_N \vee C_{N+1} \vee C_{N+2} \vee \ldots \quad \textrm{(infinite disjunction)} \\ If this were a finite disjunction, we could say that the likelihood for at least one of the events to happen is at most the sum of the likelihoods (for example, the probability that it will rain next weekend is at most the probability that it will rain next Saturday, \[ \prob(D_N) \le \sum_{n=N}^\infty \prob(C_n). \] But now recall that the infinite series of probabilities \(\sum_{n=1}^\infty \prob(C_n)\) converges. Therefore, for any \(\epsilon>0\), we can find an \(N\) for which the tail \(\sum_{n=N}^\infty \prob(C_n)\) of the series is less than \(\epsilon\). For such an \(N\), we get that \(\prob(D_N)< \epsilon\), and therefore that \(\prob(C_\infty)<\epsilon\). This is true for any \(\epsilon>0\), so it follows that \(\prob(C_\infty)=0\). |

## Contributors

- Dan Romik (UC Davis)