Skip to main content
\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)
Mathematics LibreTexts

04 Random vectors and independence

lecture{Random vectors and independence}
\subsection{Random vectors}

A random variable is a real-valued measurable function defined on a probability space \((\Omega, {\cal F}, \prob)\) (when we talk of \(\R$ as a measurable space, it will always be taken to mean with the \(\sigma\)-algebra of Borel sets). Similarly, we now wish to talk about \emph{vector}-valued measurable functions on a probability space, i.e., functions taking values in \(\R^d\). First, we need to identify a good \(\sigma\)-algebra of subsets of \(\R^d\). Risking some confusion, we will still call it the Borel \(\sigma\)-algebra and denote it by \({\cal B}\), or sometimes by \({\cal B}(\R^d)\).

The Borel \(\sigma\)-algebra on \(\R^d$ is defined in one of the following equivalent ways:
\item It is the \(\sigma\)-algebra generated by boxes of the form
\[ (a_1,b_1) \times (a_2,b_2) \times \ldots \times (a_d,b_d). \]
\item It is the \(\sigma\)-algebra generated by the balls in \(\R^d\).
\item It is the \(\sigma\)-algebra generated by the open sets in \(\R^d\).
\item It is the minimal \(\sigma\)-algebra of subsets of \(\R^d$ such that the coordinate functions \(\pi_i:\R^d\to\R$ defined by \]\pi_i({\bf x}) = x_i, \qquad i=1,2,\ldots,d$$
are all measurable (where measurability is respect to the Borel \(\sigma\)-algebra on the target space \(\R\)).

\begin{exercise} Check that the definitions above are indeed all equivalent.

\begin{defi} A \textbf{random ($d\)-dimensional) vector} (or \textbf{vector random variable}) \(\mathbf{X}=(X_1,X_2,\ldots,X_d)$ on a probability space \((\Omega, {\cal F}, \prob)$ is a function \(\mathbf{X}:\Omega\to\R^d$ that is measurable (as a function between the measurable spaces \((\Omega, {\cal F})$ and \((\R^d, {\cal B})\).

\begin{lem} \(\mathbf{X}=(X_1,\ldots,X_d)$ is a random vector if and only if \(X_i$ is a random variable for each \(i=1,\ldots,d\).

\begin{proof} If \(\mathbf{X}$ is a random vector then each of its coordinates \(X_i = \pi_i \circ \mathbf{X}$ is a composition of two measurable functions and therefore (check!) measurable. Conversely, if \(X_1,\ldots,X_d$ are random variables then for any box \(E=(a_1,b_1)\times(a_2,b_2)\times\ldots\times(a_d,b_d) \subset \R^d$ we have
\[ \mathbf{X}^{-1}(E) = \cap_{k=1}^d X_i^{-1}((a_i,b_i)) \in {\cal F}. \]
Therefore by Definition~\ref{defi-borelrd} and Exercise~\ref{ex-checkrv}, \(\mathbf{X}$ is a random vector.

\begin{exercise} (i) Prove that any continuous function \(f:\R^m\to\R^n$ is measurable (when each of the spaces is equipped with the respective Borel \(\sigma\)-algebra).
(ii) Prove that the composition \(g\circ f$ of measurable functions \(f:(\Omega_1, {\cal F}_1)\to (\Omega_2,{\cal F}_2)$ and \(g:(\Omega_2, {\cal F}_2)\to (\Omega_3, {\cal F}_3)\) (where \((\Omega_i, {\cal F}_i)$ are measurable spaces for \(i=1,2,3\)) is a measurable function. \\
%Prove that if \(\mathbf{X}$ is a \(d\)-dimensional random vector and \(f:\R^d \to \R^k$ is a continuous function, then \(f(\mathbf{X}) = f\circ \mathbf{X}$ is a \(k\)-dimensional random vector. \\
(iii) Deduce that the sum \(X_1+\ldots+X_d$ of random variables is a random variable.

\begin{exercise} Prove that if \(X_1, X_2, \ldots \( is a sequence of random variables (all defined on the same probability space, then the functions
\[ \inf_n X_n, \quad \sup_n X_n, \quad \limsup_n X_n, \quad \liminf_n X_n \]
are all random variables. Note: Part of the question is to generalize the notion of random variable to a function taking values in \(\overline{\mathbb{R}} = \R \cup \{-\infty, +\infty \}\), or you can solve it first with the additional assumption that all the \(X_i$'s are uniformly bounded by some constant \(M\).

Multi-dimensional distribution functions

If \(\mathbf{X}=(X_1,\ldots,X_d)$ is a \(d\)-dimensional random vector, we define its \textbf{distribution} to be the probability measure
\[ \mu_{\mathbf{X}}(A) = \prob(X^{-1}(A)) = \prob(\omega \in \Omega : X(\omega) \in A),
\qquad A \in {\cal B}(\R^d), \]
similarly to the one-dimensional case. The measure \(\mu_X$ is also called the \textbf{joint distribution} (or \textbf{joint law}) of the random variables \(X_1,\ldots,X_d\).

Once again, to avoid having to work with measures, we introduce the concept of a \textbf{$d\)-dimensional distribution function}.

\begin{defi} The \(d\)-dimensional distribution function of a \(d\)-dimensional random vector \(\mathbf{X}=(X_1,\ldots,X_d)\) (also called the joint distribution function of \(X_1,\ldots,X_d\)) is the function \(F_{\mathbf{X}}:\R^d\to[0,1]$ defined by
F_{\mathbf{X}}(x_1,x_2,\ldots,x_d) &=& \prob( X_1 \le x_1, X_2 \le x_2, \ldots, X_d \le x_d)
\\ &=& \mu_{\mathbf{X}}\Big(  (-\infty,x_1] \times (-\infty,x_2]\times \ldots \times (-\infty,x_d]  \Big)

\begin{thm}[Properties of distribution functions]
If \(F=F_{\mathbf X}$ is a distribution function of a \(d\)-dimensional random vector, then it has the following properties:
\item \(F$ is nondecreasing in each coordinate.
\item For any \(1\le i\le d\), \(\lim_{x_i\to-\infty} F(\mathbf{x}) = 0\).
\item \(\lim_{\mathbf{x}\to(\infty,\ldots,\infty)} F(\mathbf{x}) = 1\).
\item \(F$ is right-continuous, i.e., \(F(\mathbf{x}+) := \lim_{\mathbf{y}\downarrow \mathbf{x}} F(\mathbf{y}) = F(\mathbf{x}), \(
where here \(\mathbf{y}\downarrow \mathbf{x}$ means that \(y_i\downarrow x_i$ in each coordinate.
%\item \(F(x-) := \lim_{y\uparrow x} = \prob(X < x)\).
%\item \(\prob(X=x) = F(x) - F(x-)\).
%\item If \(G=F_Y$ is another distribution function of a random variable \(Y\), then \(X$ and \(Y$ are equal in distribution if and only if \(F \equiv G\).
\item For \(1\le i\le d$ and \(a<b\), denote by
$ \Delta_{a,b}^{x} \(
the \emph{differencing operator in the variable \(x$}, which takes a function \(f$ of the real variable \(x\) (and possibly also dependent on other variables) and returns the value
\[ \Delta_{a,b}^{x}f = f(b) - f(a) \]
Then, for any real numbers \(a_1 < b_1\), \(a_2<b_2\), \ldots, \(a_d < b_d\), we have
\[ \Delta^{x_1}_{a_1,b_1} \Delta^{x_2}_{a_d,b_d}
\ldots \Delta^{x_d}_{a_d,b_d} F \ge 0. \]

\begin{proof} See Chapter 1 in [Dur2010] or Appendix A.2 in [Dur2004].

\begin{thm} Any function \(F$ satisfying the properties in Theorem \ref{thm-multidim} above is a distribution function of some random vector \(\mathbf{X}\).

\begin{proof} See Chapter 1 in [Dur2010] or Appendix A.2 in [Dur2004].




\begin{defi} Events \(A, B \in {\cal F}$ in a probability space \((\Omega, {\cal F}, \prob)$ are called independent if
\[ \prob(A\cap B) = \prob(A)\prob(B). \]
More generally, a family \({\cal A}=(A_i)_{i\in {\cal I}}$ of events in a probability space \((\Omega, {\cal F}, \prob)$ is called an independent family if for any finite subset \(A_{i_1}, A_{i_1}, \ldots, A_{i_k} \in {\cal A}$ of distinct events in the family we have that
\[ \prob(A_{i_1} \cap A_{i_2} \cap \ldots \cap A_{i_k}) = \prod_{j=1}^k \prob(A_{i_j}).

\begin{defi} Random variables \(X, Y$ on a probability space \((\Omega, {\cal F}, \prob)$ are called independent if
\[ \prob(X \in E, Y \in F) = \prob(X\in E)\prob(Y\in F) \]
for any Borel sets \(E,F \subset \R\). In other words any two events representing possible statements about the behaviors of \(X$ and \(Y$ are independent events.

\begin{defi} If \(\Omega$ is a set and \(X:\Omega\to\R$ is a function, the family of subsets of \(\Omega$ defined by
\[ \sigma(X) = \Big\{ X^{-1}(A) : A \in {\cal B}(\R) \Big\} \]
is a \(\sigma\)-algebra (check!) called the \textbf{$\sigma\)-algebra generated by \(X$}. It is easy to check that it is the minimal \(\sigma\)-algebra with which \(\Omega$ can be equipped so as to make \(X$ into a random variable.

If \((\Omega, {\cal F}, \prob)$ is a probability space, two \(\sigma\)-algebras \({\cal A}, {\cal B}\subset {\cal F}$ are called independent if any two events \(A\in {\cal A}, B\in{\cal B}$ are independent events.

It follows from the above definitions that r.v.'s \(X,Y$ are independent if and only if the \(\sigma\)-algebras \(\sigma(X), \sigma(Y)$ generated by them are independent \(\sigma\)-algebras.

\begin{defi} If \((\Omega, {\cal F}, \prob)$ is a probability space$({\cal F}$ and \(({\cal F}_i)_{i\in I}$ is some family of sub-$\sigma\)-algebras of \({\cal F}\) (i.e., \(\sigma\)-algebras that are subsets of \({\cal F}\), we say that \(({\cal F}_i)_{i\in I}$ is an independent family of \(\sigma\)-algebras if for any \(i_1,i_2,\ldots i_k\in I$ and events \(A_1 \in {\cal F}_{i_1}, A_2 \in {\cal F}_{i_2}, \ldots, A_k \in {\cal F}_{i_k}\), the events \(A_1,\ldots,A_k$ are independent.

\begin{defi} A family \((X_i)_{i\in I}$ of random variables defined on some common probability space \((\Omega, {\cal F}, \prob)$ is called an \textbf{independent family of random variables} if the \(\sigma\)-algebras \(\{ \sigma(X_i) \}_{i\in I}$ form an independent family of \(\sigma\)-algebras.

Unraveling these somewhat abstract definitions, we see that \((X_i)_{i\in I}$ is an independent family of r.v.'s if and only if we have
\[ \prob(X_{i_1} \in A_1, \ldots X_{i_k} \in A_k) = \prod_{j=1}^k \prob(X_{i_j} \in A_j) \]
for all indices \(i_1,\ldots,i_k \in I$ and Borel sets \(A_1,\ldots, A_k \in {\cal B}(\R)\).

\begin{thm} If \(({\cal F}_i)_{i\in I}$ are a family of sub-$\sigma\)-algebras of the \(\sigma\)-algebra of events \({\cal F}$ in a probability space, and for each \(i\in I\), the \(\sigma\)-algebra \({\cal F}_i$ is generated by a family \({\cal A}_i$ of subsets of \(\Omega\), and each family \({\cal A}_i$ is closed under taking the intersection of two sets (such a family is called a \textbf{$\pi\)-system}), then the family \(({\cal F}_i)_{i\in I}$ is independent if and only if for each \(i_1,\ldots,i_k\in I\), any finite sequence of events \(A_1 \in {\cal A}_{i_1}, A_2\in {\cal A}_{i_2}, \ldots, A_k \in {\cal A}_{i_k}$ is independent.

\begin{proof} This uses Dynkin's \(\pi-\lambda$ theorem from measure theory. See [Dur2010], Theorem 2.1.3, p.\ 39 or [Dur2004], Theorem (4.2), p.\ 24.

As a corollary, we get a convenient criterion for checking when the coordinates of a random vector are independent.

If \(X_1,\ldots,X_d$ are random variables defined on a common probability space, then they are independent if and only if for all \(x_1,\ldots,x_d \in \R$ we have that
\[ F_{X_1,\ldots,X_d}(x_1,\ldots,x_d) = F_{X_1}(x_1) F_{X_2}(x_2) \ldots F_{X_d}(x_d). \]

(i) We say that a Riemann-integrable function \(f:\R^d\to[0,\infty)$ is a ($d\)-dimensional, or joint) density function for a random vector \(\mathbf{X}=(X_1,\ldots,X_d)$ if
\[ F_X(x_1,\ldots,x_d) = \int_{-\infty}^{x_1} \int_{-\infty}^{x_2} \ldots \int_{-\infty}^{x_d}
f(u_1,\ldots,u_d)\,du_d\ldots du_1 \qquad \forall x_1,\ldots,x_d \in \R. \]
Show that if \(f$ is a density for \(\mathbf{X}$ and can be written in the form
\[ f(x_1,\ldots,x_d) = f_1(x_1) f_2(x_2) \ldots f_d(x_d), \]
then \(X_1,\ldots,X_d$ are independent.
(ii) Show that if \(X_1,\ldots,X_d$ are random variables taking values in a countable set \(S\), then in order for \(X_1,\ldots,X_d$ to be independent it is enough that for all \(x_1,\ldots,x_d\in S$ we have
\[ \prob(X_1=x_1,\ldots,X_d = x_d) = \prob(X_1=x_1)\ldots \prob(X_d=x_d). \]