$$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\kernel}{\mathrm{null}\,}$$ $$\newcommand{\range}{\mathrm{range}\,}$$ $$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$ $$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}{\| #1 \|}$$ $$\newcommand{\inner}{\langle #1, #2 \rangle}$$ $$\newcommand{\Span}{\mathrm{span}}$$

# 04 Random vectors and independence

$$\newcommand{\vecs}{\overset { \rightharpoonup} {\mathbf{#1}} }$$ $$\newcommand{\vecd}{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$$$$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\kernel}{\mathrm{null}\,}$$ $$\newcommand{\range}{\mathrm{range}\,}$$ $$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$ $$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}{\| #1 \|}$$ $$\newcommand{\inner}{\langle #1, #2 \rangle}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\kernel}{\mathrm{null}\,}$$ $$\newcommand{\range}{\mathrm{range}\,}$$ $$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$ $$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}{\| #1 \|}$$ $$\newcommand{\inner}{\langle #1, #2 \rangle}$$ $$\newcommand{\Span}{\mathrm{span}}$$

lecture{Random vectors and independence}
\subsection{Random vectors}

A random variable is a real-valued measurable function defined on a probability space $$(\Omega, {\cal F}, \prob)$$ (when we talk of $$\R as a measurable space, it will always be taken to mean with the \(\sigma$$-algebra of Borel sets). Similarly, we now wish to talk about \emph{vector}-valued measurable functions on a probability space, i.e., functions taking values in $$\R^d$$. First, we need to identify a good $$\sigma$$-algebra of subsets of $$\R^d$$. Risking some confusion, we will still call it the Borel $$\sigma$$-algebra and denote it by $${\cal B}$$, or sometimes by $${\cal B}(\R^d)$$.

\begin{defi}
\label{defi-borelrd}
The Borel $$\sigma$$-algebra on $$\R^d is defined in one of the following equivalent ways: \renewcommand{\labelenumi}{(\roman{enumi})} \begin{enumerate} \item It is the \(\sigma$$-algebra generated by boxes of the form
$(a_1,b_1) \times (a_2,b_2) \times \ldots \times (a_d,b_d).$
\item It is the $$\sigma$$-algebra generated by the balls in $$\R^d$$.
\item It is the $$\sigma$$-algebra generated by the open sets in $$\R^d$$.
\item It is the minimal $$\sigma$$-algebra of subsets of $$\R^d such that the coordinate functions \(\pi_i:\R^d\to\R defined by \]\pi_i({\bf x}) = x_i, \qquad i=1,2,\ldots,d are all measurable (where measurability is respect to the Borel \(\sigma$$-algebra on the target space $$\R$$).
\end{enumerate}
\end{defi}

\begin{exercise} Check that the definitions above are indeed all equivalent.
\end{exercise}

\begin{defi} A \textbf{random ($d\)-dimensional) vector} (or \textbf{vector random variable}) $$\mathbf{X}=(X_1,X_2,\ldots,X_d) on a probability space \((\Omega, {\cal F}, \prob) is a function \(\mathbf{X}:\Omega\to\R^d that is measurable (as a function between the measurable spaces \((\Omega, {\cal F}) and \((\R^d, {\cal B})$$. \end{defi} \begin{lem} $$\mathbf{X}=(X_1,\ldots,X_d) is a random vector if and only if \(X_i is a random variable for each \(i=1,\ldots,d$$. \end{lem} \begin{proof} If $$\mathbf{X} is a random vector then each of its coordinates \(X_i = \pi_i \circ \mathbf{X} is a composition of two measurable functions and therefore (check!) measurable. Conversely, if \(X_1,\ldots,X_d are random variables then for any box \(E=(a_1,b_1)\times(a_2,b_2)\times\ldots\times(a_d,b_d) \subset \R^d we have $\mathbf{X}^{-1}(E) = \cap_{k=1}^d X_i^{-1}((a_i,b_i)) \in {\cal F}.$ Therefore by Definition~\ref{defi-borelrd} and Exercise~\ref{ex-checkrv}, \(\mathbf{X} is a random vector. \end{proof} \begin{exercise} (i) Prove that any continuous function \(f:\R^m\to\R^n is measurable (when each of the spaces is equipped with the respective Borel \(\sigma$$-algebra). \\ (ii) Prove that the composition $$g\circ f of measurable functions \(f:(\Omega_1, {\cal F}_1)\to (\Omega_2,{\cal F}_2) and \(g:(\Omega_2, {\cal F}_2)\to (\Omega_3, {\cal F}_3)$$ (where $$(\Omega_i, {\cal F}_i) are measurable spaces for \(i=1,2,3$$) is a measurable function. \\ %Prove that if $$\mathbf{X} is a \(d$$-dimensional random vector and $$f:\R^d \to \R^k is a continuous function, then \(f(\mathbf{X}) = f\circ \mathbf{X} is a \(k$$-dimensional random vector. \\ (iii) Deduce that the sum $$X_1+\ldots+X_d of random variables is a random variable. \end{exercise} \begin{exercise} Prove that if \(X_1, X_2, \ldots \( is a sequence of random variables (all defined on the same probability space, then the functions $\inf_n X_n, \quad \sup_n X_n, \quad \limsup_n X_n, \quad \liminf_n X_n$ are all random variables. Note: Part of the question is to generalize the notion of random variable to a function taking values in \(\overline{\mathbb{R}} = \R \cup \{-\infty, +\infty \}$$, or you can solve it first with the additional assumption that all the $$X_i's are uniformly bounded by some constant \(M$$. \end{exercise} ## Multi-dimensional distribution functions If $$\mathbf{X}=(X_1,\ldots,X_d) is a \(d$$-dimensional random vector, we define its \textbf{distribution} to be the probability measure $\mu_{\mathbf{X}}(A) = \prob(X^{-1}(A)) = \prob(\omega \in \Omega : X(\omega) \in A), \qquad A \in {\cal B}(\R^d),$ similarly to the one-dimensional case. The measure $$\mu_X is also called the \textbf{joint distribution} (or \textbf{joint law}) of the random variables \(X_1,\ldots,X_d$$. Once again, to avoid having to work with measures, we introduce the concept of a \textbf{$d\)-dimensional distribution function}.

\begin{defi} The $$d$$-dimensional distribution function of a $$d$$-dimensional random vector $$\mathbf{X}=(X_1,\ldots,X_d)$$ (also called the joint distribution function of $$X_1,\ldots,X_d$$) is the function $$F_{\mathbf{X}}:\R^d\to[0,1] defined by \begin{eqnarray*} F_{\mathbf{X}}(x_1,x_2,\ldots,x_d) &=& \prob( X_1 \le x_1, X_2 \le x_2, \ldots, X_d \le x_d) \\ &=& \mu_{\mathbf{X}}\Big( (-\infty,x_1] \times (-\infty,x_2]\times \ldots \times (-\infty,x_d] \Big) \end{eqnarray*} \end{defi} \begin{thm}[Properties of distribution functions] If \(F=F_{\mathbf X} is a distribution function of a \(d$$-dimensional random vector, then it has the following properties:
\renewcommand{\labelenumi}{(\roman{enumi})}
\begin{enumerate}
\item $$F is nondecreasing in each coordinate. \item For any \(1\le i\le d$$, $$\lim_{x_i\to-\infty} F(\mathbf{x}) = 0$$.
\item $$\lim_{\mathbf{x}\to(\infty,\ldots,\infty)} F(\mathbf{x}) = 1$$.
\item $$F is right-continuous, i.e., \(F(\mathbf{x}+) := \lim_{\mathbf{y}\downarrow \mathbf{x}} F(\mathbf{y}) = F(\mathbf{x}), \( where here \(\mathbf{y}\downarrow \mathbf{x} means that \(y_i\downarrow x_i in each coordinate. %\item \(F(x-) := \lim_{y\uparrow x} = \prob(X < x)$$.
%\item $$\prob(X=x) = F(x) - F(x-)$$.
%\item If $$G=F_Y is another distribution function of a random variable \(Y$$, then $$X and \(Y are equal in distribution if and only if \(F \equiv G$$.
\item For $$1\le i\le d and \(a<b$$, denote by
$\Delta_{a,b}^{x} $$the \emph{differencing operator in the variable \(x}, which takes a function \(f of the real variable \(x$$ (and possibly also dependent on other variables) and returns the value $\Delta_{a,b}^{x}f = f(b) - f(a)$ Then, for any real numbers $$a_1 < b_1$$, $$a_2<b_2$$, \ldots, $$a_d < b_d$$, we have $\Delta^{x_1}_{a_1,b_1} \Delta^{x_2}_{a_d,b_d} \ldots \Delta^{x_d}_{a_d,b_d} F \ge 0.$ \end{enumerate} \label{thm-multidim} \end{thm} \begin{proof} See Chapter 1 in [Dur2010] or Appendix A.2 in [Dur2004]. \end{proof} \begin{thm} Any function $$F satisfying the properties in Theorem \ref{thm-multidim} above is a distribution function of some random vector \(\mathbf{X}$$. \end{thm} \begin{proof} See Chapter 1 in [Dur2010] or Appendix A.2 in [Dur2004]. \end{proof} \subsection{Independence} \begin{defi} Events $$A, B \in {\cal F} in a probability space \((\Omega, {\cal F}, \prob) are called independent if $\prob(A\cap B) = \prob(A)\prob(B).$ More generally, a family \({\cal A}=(A_i)_{i\in {\cal I}} of events in a probability space \((\Omega, {\cal F}, \prob) is called an independent family if for any finite subset \(A_{i_1}, A_{i_1}, \ldots, A_{i_k} \in {\cal A} of distinct events in the family we have that $\prob(A_{i_1} \cap A_{i_2} \cap \ldots \cap A_{i_k}) = \prod_{j=1}^k \prob(A_{i_j}). \end{defi} \begin{defi} Random variables \(X, Y on a probability space \((\Omega, {\cal F}, \prob) are called independent if \[ \prob(X \in E, Y \in F) = \prob(X\in E)\prob(Y\in F)$ for any Borel sets \(E,F \subset \R$$. In other words any two events representing possible statements about the behaviors of $$X and \(Y are independent events. \end{defi} \begin{defi} If \(\Omega is a set and \(X:\Omega\to\R is a function, the family of subsets of \(\Omega defined by $\sigma(X) = \Big\{ X^{-1}(A) : A \in {\cal B}(\R) \Big\}$ is a \(\sigma$$-algebra (check!) called the \textbf{$\sigma\)-algebra generated by $$X}. It is easy to check that it is the minimal \(\sigma$$-algebra with which $$\Omega can be equipped so as to make \(X into a random variable. \end{defi} \begin{defi} If \((\Omega, {\cal F}, \prob) is a probability space, two \(\sigma$$-algebras $${\cal A}, {\cal B}\subset {\cal F} are called independent if any two events \(A\in {\cal A}, B\in{\cal B} are independent events. \end{defi} It follows from the above definitions that r.v.'s \(X,Y are independent if and only if the \(\sigma$$-algebras $$\sigma(X), \sigma(Y) generated by them are independent \(\sigma$$-algebras.

\begin{defi} If $$(\Omega, {\cal F}, \prob) is a probability space({\cal F} and \(({\cal F}_i)_{i\in I} is some family of sub-\sigma$$-algebras of $${\cal F}$$ (i.e., $$\sigma$$-algebras that are subsets of $${\cal F}$$, we say that $$({\cal F}_i)_{i\in I} is an independent family of \(\sigma$$-algebras if for any $$i_1,i_2,\ldots i_k\in I and events \(A_1 \in {\cal F}_{i_1}, A_2 \in {\cal F}_{i_2}, \ldots, A_k \in {\cal F}_{i_k}$$, the events $$A_1,\ldots,A_k are independent. \end{defi} \begin{defi} A family \((X_i)_{i\in I} of random variables defined on some common probability space \((\Omega, {\cal F}, \prob) is called an \textbf{independent family of random variables} if the \(\sigma$$-algebras $$\{ \sigma(X_i) \}_{i\in I} form an independent family of \(\sigma$$-algebras.
\end{defi}

Unraveling these somewhat abstract definitions, we see that $$(X_i)_{i\in I} is an independent family of r.v.'s if and only if we have $\prob(X_{i_1} \in A_1, \ldots X_{i_k} \in A_k) = \prod_{j=1}^k \prob(X_{i_j} \in A_j)$ for all indices \(i_1,\ldots,i_k \in I and Borel sets \(A_1,\ldots, A_k \in {\cal B}(\R)$$.

\begin{thm} If $$({\cal F}_i)_{i\in I} are a family of sub-\sigma$$-algebras of the $$\sigma$$-algebra of events $${\cal F} in a probability space, and for each \(i\in I$$, the $$\sigma$$-algebra $${\cal F}_i is generated by a family \({\cal A}_i of subsets of \(\Omega$$, and each family $${\cal A}_i is closed under taking the intersection of two sets (such a family is called a \textbf{\pi$$-system}), then the family $$({\cal F}_i)_{i\in I} is independent if and only if for each \(i_1,\ldots,i_k\in I$$, any finite sequence of events $$A_1 \in {\cal A}_{i_1}, A_2\in {\cal A}_{i_2}, \ldots, A_k \in {\cal A}_{i_k} is independent. \end{thm} \begin{proof} This uses Dynkin's \(\pi-\lambda theorem from measure theory. See [Dur2010], Theorem 2.1.3, p.\ 39 or [Dur2004], Theorem (4.2), p.\ 24. \end{proof} As a corollary, we get a convenient criterion for checking when the coordinates of a random vector are independent. \begin{lem} If \(X_1,\ldots,X_d are random variables defined on a common probability space, then they are independent if and only if for all \(x_1,\ldots,x_d \in \R we have that $F_{X_1,\ldots,X_d}(x_1,\ldots,x_d) = F_{X_1}(x_1) F_{X_2}(x_2) \ldots F_{X_d}(x_d).$ \end{lem} \begin{exercise} (i) We say that a Riemann-integrable function \(f:\R^d\to[0,\infty) is a (d$$-dimensional, or joint) density function for a random vector $$\mathbf{X}=(X_1,\ldots,X_d) if $F_X(x_1,\ldots,x_d) = \int_{-\infty}^{x_1} \int_{-\infty}^{x_2} \ldots \int_{-\infty}^{x_d} f(u_1,\ldots,u_d)\,du_d\ldots du_1 \qquad \forall x_1,\ldots,x_d \in \R.$ Show that if \(f is a density for \(\mathbf{X} and can be written in the form $f(x_1,\ldots,x_d) = f_1(x_1) f_2(x_2) \ldots f_d(x_d),$ then \(X_1,\ldots,X_d are independent. \\ (ii) Show that if \(X_1,\ldots,X_d are random variables taking values in a countable set \(S$$, then in order for \(X_1,\ldots,X_d$to be independent it is enough that for all \(x_1,\ldots,x_d\in S$ we have
$\prob(X_1=x_1,\ldots,X_d = x_d) = \prob(X_1=x_1)\ldots \prob(X_d=x_d).$
\end{exercise}