Skip to main content
\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)
Mathematics LibreTexts

1.1: 01 Introduction

  • Page ID
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)

    What is probability theory?

    In this course we'll learn about probability theory. But what exactly \emph{is} probability theory? Like some other mathematical fields (but unlike some others), it has a dual role:

    1. It is a rigorous mathematical theory -- with definitions, lemmas, theorems, proofs etc.
    2. It is a mathematical model that purports to explain or model real-life phenomena.

    We will concentrate on the rigorous mathematical aspects, but we will try not to forget the connections to the intuitive notion of real-life probability. These connections will enhance our intuition, and they make probability an extremely useful tool in all the sciences. And they make the study of probability much more fun, too! A note of caution is in order, though: mathematical models are only as good as the assumptions they are based on. So probability can be used, and it can be (and quite frequently is) abused....

    The theory of differential equations is another mathematical theory which has the dual role of a rigorous theory and an applied mathematical model. Game theory is another one (for which the question of how to apply it to real-life situations is often very contentious). On the other hand, number theory, complex analysis, and algebraic topology are examples of fields which are not normally used to model real-life phenomena.

    The algebra of events

    A central notion in probability is that of the algebra of events (we'll clarify later what the word ``algebra'' means here). We begin with an informal discussion. We imagine that probability is a function, denoted \(\prob\), that takes as its argument an ``event'' (i.e., occurrence of something in a real-life situation involving uncertainty) and returns a real number in \([0,1]\) representing how likely this event is to occur. For example, if a fair coin is tossed 10 times and we denote the results of the tosses by \(X_1,X_2,\ldots,X_{10}\) (where each of \(X_i\) is \(0\) or \(1\), signifying ``tails'' or ``heads''), then we can write statements like

    $$ \prob(X_i = 0) = 1/2, \qquad (1\le i\le 10),
    $$ \prob\left(\sum_{i=1}^{10} X_i = 4\right) = \frac{\binom{10}{4}}{2^{10}}.
    Note that if \(A\) and \(B\) represent events (meaning, for the purposes of the present informal discussion, objects that have a well-defined probability), then we expect that the phrases ``\(A\) did not occur'', ``\(A\) and \(B\) both occurred'' and ``at least one of \(A\) and \(B\) occurred'' also represent events. We can use notation borrowed from mathematical logic and denote these new events by \(\lnot A\), \(A\wedge B\), and \(A \vee B\), respectively.

    Thus, the set of events is not just a set -- it is a set with some extra structure, namely the ability to perform negation, conjunction and disjunction operations on its elements. Such a set is called an algebra in some contexts.

    But what if the coin is tossed an infinite number of times? In other words, we now imagine an infinite sequence \(X_1,X_2,X_3,\ldots\) of (independent) coin toss results. We want to be able to ask questions such as

    \prob(\textrm{infinitely many of the \(X_i$'s are 0}) &=& ? \\
    \prob\left(\lim_{n\to\infty} \frac{1}{n} \sum_{k=0}^n X_k = \frac12\right) &=& ? \\
    \prob\left(\sum_{k=1}^\infty \frac{2X_k-1}{k}\textrm{ converges}\right) &=& ?

    Do such questions make sense? (And if they do, can you guess what the answers are?) Maybe it is not enough to have an informal discussion to answer this...

    Example 2

    (a) An urn initially contains a white ball and a black ball. A ball is drawn out at random from the urn, then added back and another white ball is added to the urn. This procedure is repeated infinitely many times, so that after step \(n\) the urn contains 1 black ball and \(n+1\) white balls. For each \(n\ge1\), let \(A_n\) denote the event that at step \(n\) the black ball was drawn. Now let \(A_\infty\) denote the event

    \[ A_\infty = \textrm{``in total, the black ball was selected infinitely many times''}, \]

    (i.e., the event that infinitely many of the events \(A_n\) occurred).

    (b) While this experiment takes place, an identical copy of the experiment is taking place in the next room. The random selections in the two neighboring rooms have no connection to each other, i.e., they are ``independent''. For each \(n\ge 1\), let \(B_n\) be the event that at step \(n\) the black ball was drawn out of the ``copy'' experiment urn. Now let \(B_\infty\) denote the event

    B_\infty &=& \textrm{``in total, the black ball was selected infinitely many times in}\\&&
    \textrm{the second copy of the experiment''},

    (in other words, the event that infinitely many of the events \(B_n\) occurred).

    (c) For each \(n\ge 1\), let \(C_n\) be the event that both \(A_n\) and \(B_n\) occurred, i.e.

    C_n &=& \textrm{``at step \(n\), the black ball was selected simultaneously}
    \\ & & \textrm{ in both experiments'',}

    and let \(C_\infty\) denote the event ``\(C_n\) occurred for infinitely many values of \(n\)''.


    We have

    \[ \prob(A_\infty) = \prob(B_\infty) = 1, \qquad \prob(C_\infty) = 0. \]


    These claims are consequences of the Borel-Cantelli lemmas which we will learn about later in the course. Here is a sketch of the proof that \(P(C_\infty)=0\) (remember, this is still an ``informal discussion'', so our ``proof'' is really more of an exploration of what formal assumptions are needed to make the claim hold). For each \(n\) we have

    \[ \prob(A_n) = \prob(B_n) = \frac{1}{n+1}, \]

    since at time \(n$ each of the urns contains \(n+1$ balls, only one of which is black. Moreover, the choices in both rooms are made independently, so we have

    \[ \prob(C_n) = \prob(A_n \wedge B_n) = \prob(A_n) \prob(B_n) = \frac{1}{(n+1)^2}. \]

    It turns out that to prove that \(\prob(C_\infty)=0\), the only relevant bit of information is that the infinite series \(\sum_{n=1}^\infty \prob(C_n)\) is a convergent series; the precise values of the probabilities are irrelevant. Indeed, we can try to do various manipulations on the definition of the event \(C\), as follows:

    C_\infty &=& \textrm{``infinitely many of the \(C_n$'s occurred''} \\
    &=& \textrm{``for all \(N\ge 1\), the event \(C_n$ occurred for some \(n\ge N$''.}
    For any \(N\ge 1\), denote the event ``$C_n$ occurred for some \(n\ge N$'' by \(D_N\). Then
    C_\infty &=& \textrm{``for all \(N\ge 1\), \(D_N$ occurred''}
    \\ &=& D_1 \wedge D_2 \wedge D_3 \wedge \ldots \ \ \, \textrm{(infinite conjunction...?!)}
    \\ &=& \bigwedge_{N=1}^\infty D_N \quad \qquad \quad\ \ \ \ \ \ \textrm{(shorthand notation for infinite conjunction)}.

    In particular, in order for the event \(C_\infty\) to happen, \(D_N$ must happen for any fixed value of \(N$ (for example, \(D_{100}\) must happen, \(D_{101}$ must happen, etc.). It follows that \(C_\infty\) is at most as likely to happen as any of the \(D_N\)'s; in other words we have

    \[ \prob(C_\infty) \le \prob(D_N), \qquad (N\ge 1). \]

    Now, what can we say about \(\prob(D_N)$? Looking at the definition of \(D_N\), we see that it too can be written as an infinite disjunction of events, namely


    D_N &=& C_N \vee C_{N+1} \vee C_{N+2} \vee \ldots \quad \textrm{(infinite disjunction)} \\
    &=& \bigvee_{n=N}^\infty C_n \qquad\qquad\qquad\qquad\ \ \textrm{(shorthand for infinite disjunction)}.

    If this were a finite disjunction, we could say that the likelihood for at least one of the events to happen is at most the sum of the likelihoods (for example, the probability that it will rain next weekend is at most the probability that it will rain next Saturday, plus the probability that it will rain next Sunday; of course it might rain on both days, so the sum of the probabilities can be strictly greater than the probability of the disjunction). What can we say for an infinite disjunction? Since this is an informal discussion, it is impossible to answer this without being more formal about the precise mathematical model and its assumptions. As it turns out, the correct thing to do (in the sense that it leads to the most interesting and natural mathematical theory) is to assume that this fact that holds for finite disjunctions also holds for infinite ones. Whether this has any relevance to real life is a different question! If we make this assumption, we get for each \(N\ge 1\) the bound

    \[ \prob(D_N) \le \sum_{n=N}^\infty \prob(C_n). \]

    But now recall that the infinite series of probabilities \(\sum_{n=1}^\infty \prob(C_n)\) converges. Therefore, for any \(\epsilon>0\), we can find an \(N\) for which the tail \(\sum_{n=N}^\infty \prob(C_n)\) of the series is less than \(\epsilon\). For such an \(N\), we get that \(\prob(D_N)< \epsilon\), and therefore that \(\prob(C_\infty)<\epsilon\). This is true for any \(\epsilon>0\), so it follows that \(\prob(C_\infty)=0\).