$$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\kernel}{\mathrm{null}\,}$$ $$\newcommand{\range}{\mathrm{range}\,}$$ $$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$ $$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}{\| #1 \|}$$ $$\newcommand{\inner}{\langle #1, #2 \rangle}$$ $$\newcommand{\Span}{\mathrm{span}}$$

# 02 Probability spaces

$$\newcommand{\vecs}{\overset { \rightharpoonup} {\mathbf{#1}} }$$ $$\newcommand{\vecd}{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$$$$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\kernel}{\mathrm{null}\,}$$ $$\newcommand{\range}{\mathrm{range}\,}$$ $$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$ $$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}{\| #1 \|}$$ $$\newcommand{\inner}{\langle #1, #2 \rangle}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\kernel}{\mathrm{null}\,}$$ $$\newcommand{\range}{\mathrm{range}\,}$$ $$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$ $$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}{\| #1 \|}$$ $$\newcommand{\inner}{\langle #1, #2 \rangle}$$ $$\newcommand{\Span}{\mathrm{span}}$$

## Basic definitions

We now formalize the concepts introduced in the previous lecture. It turns out that it's easiest to deal with events as subsets of a large set called the probability space, instead of as abstract logical statements. The logical operations of negation, conjunction and disjunction are replaced by the set-theoretic operations of taking the complement, intersection or union, but the intuitive meaning attached to those operations is the same.

Definition: Algebra

If $$\Omega$$ is a set, an algebra of subsets of $$\Omega$$ is a collection $${\cal F}$$ of subsets of $$\Omega$$ that satisfies the following axioms:

$\emptyset \in {\cal F}, \tag{A1}\qquad\qquad\qquad$

$A \in {\cal F} \implies \Omega\setminus A \in {\cal F}, \tag{A2}$

$A,B \in {\cal F} \implies A\cup B \in {\cal F}. \tag{A3}$

A word synonymous with algebra in this context is field.
Definition: Sigma Algebra

A $$\sigma$$-algebra (also called a $$\sigma$$-field) is an algebra $${\cal F}$$ that satisfies the additional axiom

$A_1, A_2, A_3, \ldots \in {\cal F} \implies \cup_{n=1}^\infty A_n \in {\cal F}. \tag{A4}$

Example
If $$\Omega$$ is any set, then $$\{ \emptyset, \Omega \}$$ is a $$\sigma$$ algebra -- in fact it is the smallest possible $$\sigma$$ algebra of subsets of $$\Omega$$ Similarly, the power set $${\cal P}(\Omega)$$ of all subsets of $$\Omega$$ is a $$\sigma$$ algebra, and is (obviously) the largest $$\sigma$$ algebra of subsets of $$\Omega$$
Definition: Measurable Space
A measurable space is a pair $$(\Omega,{\cal F})$$ where $$\Omega$$ is a set and $$\cal F$$ is a $$\sigma$$-algebra of subsets of $$\Omega$$.
Definition: Probability Measure

Given a measurable space$$(\Omega, {\cal F})$$ a probability measure} on $$(\Omega, {\cal F})$$ is a function $$\prob:{\cal F}\to[0,1]$$ that satisfies the properties:

$\prob(\emptyset) = 0, \quad \prob(\Omega) = 1, \tag{P1}$

\begin{equation}
A_1, A_2, \ldots \in {\cal F}\textrm{ are pairwise disjoint} \implies
\prob(\cup_{n=1}^\infty A_n) = \sum_{n=1}^\infty \prob(A_n).
\tag{P2}
\end{equation}

Definition: Probability Space
A probability space is a triple $$(\Omega, {\cal F}, \prob)$$, where $$(\Omega, {\cal F})$$ is a measurable space, and $$\prob$$ is a probability measure on $$(\Omega, {\cal F})$$.

Intuitively, we think of $$\Omega$$ as representing the set of possible outcomes of a probabilistic experiment, and refer to it as the sample space. The $$\sigma$$ algebra $${\cal F}$$ is the $$\sigma$$\) algebra of events, namely those subsets of $$\Omega$$ which have a well-defined probability (as we shall see later, it is not always possible to assign well-defined probabilities to all sets of outcomes). And $$\prob$$ is the notion'' or measure'' of probability on our sample space.

Probability theory can be described loosely as the study of probability spaces (this is of course a gross oversimplification...). A more general mathematical theory called measure theory studies measure spaces, which are like probability spaces except that the measures can take values in $$[0,\infty]$$ instead of $$[0,1]$$ and the total measure of the space is not necessarily equal to$$1$$ (such measures are referred to as $$\sigma$$ additive nonnegative measures. Measure theory is an important and non-trivial theory, and studying it requires a separate concentrated effort. We shall content ourselves with citing and using some of its most basic results. For proofs and more details, refer to Chapter 1 and the measure theory appendix in [Dur2010] or to a measure theory textbook.

## Properties and examples

Lemma: Probability Space Properties

If $$(\Omega, {\cal F}, \prob)$$ is a probability space, then we have:

1. Monotonicity: If $$A,B \in {\cal F}$$, $$A\subset B$$ then $$\prob(A)\le \prob(B)$$.
2. Sub-additivity: If $$A_1,A_2,\ldots \in {\cal F}$$ then $\prob(\cup_{n=1}^\infty A_n) \le \sum_{n=1}^\infty \prob(A_n)$.
3. Continuity from below:} If $$A_1,A_2,\ldots \in {\cal F}$$ such that \)A_1 \subset A_2 \subset A_3 \subset \ldots\), then $\prob(\cup_{n=1}^\infty A_n) = \lim_{n\to\infty} \prob(A_n).$
4. Continuity from above: If $$A_1,A_2,\ldots \in {\cal F}$$ such that $$A_1 \supset A_2 \supset A_3 \supset \ldots$$, then $\prob(\cap_{n=1}^\infty A_n) = \lim_{n\to\infty} \prob(A_n).$
Exercise
Prove Lemma \ref{lem-probspaceproperties}.
Example: Discrete Probability Spaces

Let $$\Omega$$ be a countable set and let $$p:\Omega\to[0,1]$$ be a function such that

$\sum_{\omega \in \Omega} p(\omega) = 1.$

This corresponds to the intuitive notion of a probabilistic experiment with a finite or countably infinite number of outcomes, where each individual outcome $$\omega$$ has a probability $$p(\omega)$$ of occurring. We can put such an elementary'' or discrete'' experiment in our more general framework by defining the$$\sigma$$algebra of events $${\cal F}$$ to be the set of subsets of $$\Omega$$ and defining the probability measure $$\prob$$ by

$\prob(A) = \sum_{\omega \in A} p(\omega), \qquad A\in {\cal F}.$

If $$\Omega$$ is a finite set, a natural probability measure to consider is the uniform measure, defined by

$\prob(A) = \frac{|A|}{|\Omega|}.$

Example: Choosing a random number uniformly in $$(0,1)$$

The archetypical example of a non-elementary'' probability space (i.e., one which does not fall within the scope of the previous example) is the experiment of choosing a random number uniformly in the interval $$(0,1)$$ How do we know that it makes sense to speak of such an experiment? We don't, yet. But let us imagine what constructing such an experiment might entail. We are looking for a hypothetical probability space $$(\Omega, {\cal F}, \prob)$$ in which the sample space$$\Omega$$ is simply$$(0,1)$$$${\cal F}$$ is some$$\sigma$$-algebra of subsets of $$(0,1)$$ and $$\prob$$ is a probability measure that corresponds to our notion of a uniform'' choice of a random number. One plausible way to formalize this is to require that intervals of the form $$(a,b) \subset (0,1)$$ be considered as events, and that the probability for our uniform'' number to fall in such an interval should be equal to its length $$b-a$$ In other words, we shall require that

$(a,b) \in {\cal F}, \qquad (0\le a<b\le 1),$

and that
\begin{equation} \label{eq:lebesgue}
\prob\big((a,b)\big) = b-a, \qquad (0\le a<b\le 1).
\end{equation}

How do we generate a$$\sigma$$algebra of subsets of $$(0,1)$$ that contains all the intervals? We already saw that the set of all subsets of $$(0,1)$$ will work. But that is too large! If we take all subsets, we will see in an exercise later that it will be impossible to construct the probability measure $$\prob$$ to satisfy our requirements. So let's try to build the smallest possible $$\sigma$$-algebra. One way (which can perhaps be described as the bottom-up approach) would be to start with the intervals, then take all countable unions of such and add them to our collection of sets, then add all countable intersections of such sets, then add all countable unions, etc. Will this work? In principle it can be made to work, but is a bit difficult and requires knowing something about transfinite induction. Fortunately there is a more elegant way (but somewhat more abstract and less intuitive) of constructing the minimal $$\sigma$$-algebra, that is outlined in the next exercise below, and can be thought of as the top-down approach. The resulting $$\sigma$$-algebra of subsets of $$(0,1)$$ is called the Borel $$\sigma$$-algebra}; its elements are called Borel sets.

What about the probability measure $$\prob$$? Here we will simply cite a result from measure theory that says that the measure we are looking for exists, and is unique. This is not too difficult to prove, but doing so would take us a bit too far off course.

Theorem: Lebesque Measures

Let $${\cal B}$$ be the $$\sigma$$-algebra of Borel sets on $$(0,1)$$ the minimal $$\sigma$$-algebra containing all the sub-intervals of $$(0,1)$$ proved to exist in the exercise below. There exists a unique measure $$\prob$$on the measure space satisfying \eqref{eq:lebesgue}, called Lebesgue measure on $$(0,1)$$

Exercise: The $$\sigma$$-algebra generated by a set of subsets of $$\Omega$$
1. Let $$\Omega$$ be a set, and let$$\{{\cal F}_i\}_{i\in I}$$ be some collection of $$\sigma$$ algebras of subsets of $$\Omega$$ indexed by some index set$$I$$ Prove that the intersection of all the$${\cal F}_i$$'s (i.e., the collection of subsets of $$\Omega$$that are elements of all the$${\cal F}_i$$'s) is also a$$\sigma$$algebra.
2. Let $$\Omega$$ be a set, and let$${\cal A}$$ be a collection of subsets of $$\Omega$$ Prove that there exists a unique$$\sigma$$ algebra $$\sigma({\cal A})$$ of subsets of $$\Omega$$ that satisfies the following two properties:
1. ({\cal A} \subset \sigma({\cal A})\) (in words,$$\sigma({\cal A})$$ contains all the elements of $${\cal A}$$).
2. $$\sigma({\cal A})$$ is the minimal $$\sigma$$algebra satisfying property 1 above, in the sense that if $${\cal F}$$ is any other $$\sigma$$ algebra that contains all the elements of $${\cal A}$$ then $$\sigma({\cal A}) \subset {\cal F}$$

Hint for (ii): Let $$({\cal F}_i)_{i\in I}$$ be the collection of all $$\sigma$$ algebras of subsets of $$\Omega$$ that contain $${\cal A}$$ This is a non-empty collection, since it contains for example $${\cal P}(\Omega)$$ the set of all subsets of $$\Omega$$ Any $$\sigma$$ algebra $$\sigma({\cal A})$$ that satisfies the two properties above is necessarily a subset of any of the $${\cal F}_i$$'s, hence it is also contained in the intersection of all the $${\cal F}_i$$'s, which is a $$\sigma$$ algebra by part (i) of the exercise.

Definition: $$\sigma$$ algebra
If $${\cal A}$$ is a collection of subsets of a set $$\Omega$$ the $$\sigma$$algebra $$\sigma({\cal A})$$ discussed above is called the $$\sigma$$ algebra generated by $${\cal A}$$.
Example: The space of infinite coin toss sequences

Another archetypical experiment in probability theory is that of a sequence of independent fair coin tosses, so let's try to model this experiment with a suitable probability space. If for convenience we represent the result of each coin as a binary value of $$0$$ or $$1$$ then the sample space $$\Omega$$is simply the set of infinite sequences of $$0$$'s and$$1$$'s, namely

$\Omega = \big\{ (x_1, x_2, x_3, \ldots) : x_i \in \{0,1\}, i=1,2,\ldots \big\} = \{ 0,1 \}^{\mathbb{N}}.$

What about the $$\sigma$$ algebra $${\cal F}$$? We will take the same approach as we did in the previous example, which is to require certain natural sets to be events, and to take as our$$\sigma$$algebra the $$\sigma$$ algebra generated by these elementary'' events. In this case, surely, for each $$n\ge 1$$ we would like the set

\begin{equation}
A_n(1) := \{ \mathbf{x}=(x_1,x_2,\ldots) \in \Omega : x_n = 1 \}
\label{eq:specialcase}
\end{equation}

to be an event (in words, this represents the event the coin toss $$x_n$$ came out Heads''). Therefore we take $${\cal F}$$ to be the $$\sigma$$ algebra generated by the collection of sets of this form.

Finally, the probability measure $$\prob$$ should conform to our notion of a sequence of independent fair coin tosses. Generalizing the notation in \eqref{eq:specialcase}, for $$a\in\{0,1\}$$ define

$A_n(a) = \{ \mathbf{x}=(x_1,x_2,\ldots) \in \Omega : x_n = a \}.$

Then $$\prob$$ should satisfy

$\prob(A_n(a)) = \frac12,$

representing the fact that the$$n$$-th coin toss is unbiased. But more generally, for any $$n\ge 1)) and \((a_1,a_2,\ldots,a_n) \in \{0,1\}^n$$ since the first $$n$$ coin tosses are independent, $$\prob$$ should satisfy

$\prob\big(A_1(a_1) \cap A_2(a_2) \cap \ldots \cap A_n(a_n)\big) = \frac{1}{2^n}. %quad (n\ge 1, \ \ (a_1,\ldots,a_n)\in\{0,1\}^n)$

As in the example of Lebesgue measure discussed above, the fact that a probability measure $$\prob$$ on $$(\Omega, {\cal F})$$ that satisfies \eqref{eq:indcointosses} exists and is unique follows from some slightly non-trivial facts from measure theory, and we will take it on faith for the time being. Below we quote the relevant theorem from measure theory, which generalizes the setting discussed in this example to the more general situation of a product of probability spaces.

Theorem: Products of Probability Spaces
Let

$(\Omega_n, {\cal F}_n, \prob_n)_{n=1}^\infty$

be a sequence of probability spaces. Denote

$\Omega = \prod_{n=1}^\infty \Omega_n$

(the Cartesian product of the outcome sets), and let $${\cal F}$$ be the $$\sigma$$ algebra of subsets of $$\Omega$$ generated by sets which are of the form

$\{ (x_1,x_2,\ldots) \in \Omega : x_n \in A \}$

for some $$n\ge1$$ and set $$A \in {\cal F}_n$$. Then there exists a unique probability measure $$\prob$$ on $$(\Omega, {\cal F})$$ such that for any $$n\ge 1$$ and any finite sequence

$(A_1, A_2, \ldots, A_n) \in {\cal F}_1 \times {\cal F}_2 \times \ldots \times {\cal F}_n$

the equation

$\prob\Big(\big\{ (x_1,x_2,\ldots ) \in \Omega : x_1\in A_1, x_2 \in A_2, \ldots, x_n \in A_n \big\} \Big) = \prod_{k=1}^n \prob_k(A_k)$

holds.
Exercise
Explain why the infinite sequence of coin tosses'' experiment is a special case of a product of probability spaces, and why the existence and uniqueness of a probability measure satisfying \eqref{eq:indcointosses} follows from Theorem \ref{thm-productspaces}.