Skip to main content
\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)
Mathematics LibreTexts

1.3: 03 Random variables

  • Page ID
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)

    In an upcoming homework exercise we will show an alternative way of proving the existence of the probability space of infinite coin toss sequences using Lebesgue measure on \((0,1)\).

    Random variables and their distributions

    As we have seen, a probability space is an abstract concept that represents our intuitive notion of a probabilistic experiment. Such an experiment however can be very long (even infinite) and contain a lot of information. To make things more manageable, we consider numerical-valued functions on probability spaces, which we call random variables. However, not any function will do: a random variable has to relate in a nice way to the measurable space structure, so that we will be able to ask questions like ``what is the probability that this random variable takes a value less than 8'', etc. This leads us to the following definitions.


    If \((\Omega_1, {\cal F}_1)\) and \((\Omega_2, {\cal F}_2)\) are two measurable spaces, a function \(X:\Omega_1\to \Omega_2\) is called measurable if for any set \(E \in {\cal F}_2\), the set

    \[ X^{-1}(E) = \{ \omega \in \Omega_1 : X(\omega) \in E \} \]

    is in \({\cal F}_1\).

    If \((\Omega, {\cal F}, \prob)\) is a probability space, a real-valued function \(X:\Omega \to \R\) is called a \textbf{random variable} if it is a measurable function when considered as a function from the measurable space \((\Omega, {\cal F})\) to the measurable space \((\R, {\cal B})\), where \({\cal B}\) is the Borel \(\sigma\)-algebra on \(\R\), namely the \(\sigma\)-algebra generated by the intervals.
    Let \((\Omega_1, {\cal F}_1)\) and \((\Omega_2, {\cal F}_2)\) be two measurable spaces such that \({\cal F}_2\) is the \(\sigma\)-algebra generated by a collection \({\cal A}\) of subsets of \(\Omega_2\). Prove that a function \(X:\Omega_1 \to \Omega_2\) is measurable if and only if \(X^{-1}(A) \in {\cal F}_1\) for all \(A \in {\cal A}\).

    It follows that the random variables are exactly those real-valued functions on \(\Omega\) for which the question

    \[ \textrm{``What is the probability that \(a < X < b\)?''} \]

    has a well-defined answer for all \(a<b\). This observation makes it easier in practice to check if a given function is a random variable or not, since working with intervals is much easier than with the rather unwieldy (and mysterious, until you get used to them) Borel sets.

    What can we say about the behavior of a random variable \(X\) defined on a probability space \((\Omega, {\cal F}, \prob)\)? All the information is contained in a new probability measure \(\mu_X\) on the measurable space \((\R, {\cal B})\) that is induced by \(X\), defined by

    \[ \mu_X(A) = \prob(X^{-1}(A)) = \prob(\omega \in \Omega : X(\omega) \in A). \]

    The number \(\mu_X(A)\) is the probability that \(X\) ``falls in \(A\)'' (or ``takes its value in \(A\)').

    Exercise 1: I see you
    Verify that \(\mu_X\) is a probability measure on \((\R, {\cal B})\). This measure is called the distribution of \(X\), or sometimes referred to more fancifully as the law of \(X\). In some textbooks it is denoted \({\cal L}_X\).

    If \(X\) and \(Y\) are two random variables (possibly defined on different probability spaces), we say that \(X\) and \(Y\) are identically distributed (or equal in distribution) if \(\mu_X = \mu_Y\) (meaning that \(\mu_X(A) = \mu_Y(A)\) for any Borel set \(A\subset \R\)). We denote this

    \[ X \stackrel{d}{=} Y. \]

    How can we check if two random variables are identically distributed? Once again, working with Borel sets can be difficult, but since the Borel sets are generated by the intervals, a simpler criterion involving just this generating family of sets exists. The following lemma is a consequence of basic facts in measure theory, which can be found in the Measure Theory appendix in [Dur2010].


    Two probability measures \(\mu_1, \mu_2\) on the measurable space \((\R, {\cal B})\) are equal if only if they are equal on the generating set of intervals, namely if

    \[ \mu_1\big((a,b)\big) = \mu_2\big((a,b)\big) \]

    for all \(-\infty < a<b < \infty\).

    Distribution functions

    Instead of working with distributions of random variables (which are probability measure on the measurable space \((\R, {\cal B})\) and themselves quite unwieldy objects), we will encode them in a simpler object called a distribution function (sometimes referred to as a cumulative distribution function, or c.d.f.).


    The cumulative distribution function (or c.d.f., or just \textbf{distribution function}) of a random variable \(X\) defined on a probability space \((\Omega, {\cal F}, \prob)\) is the function \) F_X:\R \to [0,1]\) defined by

    \[ F_X(x) = \prob( X \le x ) = \prob( X^{-1}((-\infty,x])) = \prob( \{ \omega \in \Omega : X(\omega) \le x \} ),
    \quad (x\in \R). \]

    Note that we have introduced here a useful notational device that will be used again many times in the following sections: if \(A\) is a Borel set, we will often write \(\{X \in A\}\) as shorthand for the set \(\{ \omega \in \Omega : X(\omega) \in A \}\). In words, we may refer to this as ``the event that \(X\) falls in \(A\)''. When discussing its probability, we may omit the curly braces and simply write \(P(X\in A)\). Of course, one should always remember that on the formal level this is just the set-theoretic inverse image of a set by a function!

    Theorem: Properties of Distribution Functions

    If \(F=F_X\) is a distribution function, then it has the following properties:

    1. \(F\) is nondecreasing.
    2. \(\lim_{x\to\infty} F(x) = 1, \ \ \lim_{x\to-\infty} F(x) = 0\).
    3. \(F\) is right-continuous, i.e., \(F(x+) := \lim_{y\downarrow x} F(y) = F(x). \)
    4. \(F(x-) := \lim_{y\uparrow x} F(y) = \prob(X < x)\).
    5. \(\prob(X=x) = F(x) - F(x-).\)
    6. If \(G=F_Y\) is another distribution function of a random variable \(Y\), then \(X\) and \(Y\) are equal in distribution if and only if \(F \equiv G\).

    Exercise (recommended), or see page 9 of [Dur2010].

    Definition: Cumalative Distribution Function
    A function \(F:\R\to[0,1]\) satisfying properties (i)--(iii) in the previous theorem is called a cumulative distribution function, or just distribution function.
    Theorem: Existence
    If \(F\) is a distribution function, then there exists a random variable \(X\) such that \(F=F_X\).

    This fact has a measure-theoretic proof similar to the proof of Theorem~\ref{thm-lebesgue}, but fortunately in this case, there is a more probabilistic proof that relies only on the existence of Lebesgue measure. (This is one of many examples of probabilistic ideas turning out to be useful to prove facts in analysis and measure theory.) This involves the probabilistic concept of a quantile (a generalization of the concepts of percentile and median that we frequently hear about in news reports).


    If \(X\) is a random variable on a probability space \((\Omega, {\cal F}, \prob)\) and \(0<p<1\) is a number, then a real number \(x\) is called a \(p\)-quantile of \(X\) if the inequalities

    \prob(X \le x) &\ge& p, \\
    \prob(X \ge x) & \ge & 1-p

    Note that the question of whether \(t\) is a \(p\)-quantile of \(X\) can be answered just by knowing the distribution function \(F_X\) of \(X\): since \(\prob(X\le x)=F_X(x)\) and \(\prob(X\ge x)=1-F(x-)\), we can write the conditions above as

    \[ F_X(x-) \le p \le F_X(x). \]


    A \(p\)-quantile for \(X\) always exists. Moreover, the set of \(p\)-quantiles of \(X\) is equal to the (possibly degenerate) closed interval \([a_p, b_p]\), where

    a_p &=& \sup \{ x : F_X(x) < p \}, \\
    b_p &=& \inf \{ x : F_X(x) > p \}.

    Prove Lemma \ref{lem-quantile}.
    Proof: Existence

    Let \(((0,1), {\cal B}, \prob)\) be the unit interval with Lebesgue measure, representing the experiment of drawing a uniform random number in \((0,1)\). We shall construct our random variable \(X\) on this space. Inspired by the discussion of quantiles above, we define

    \[ X(p) = \sup\{ y : F(y) < p \}, \qquad (0<p<1). \]

    If \(F\) were the distribution function of a random variable, then \(X(p)\) would be its (minimal) \(p\)-quantile.

    Note that properties (i) and (ii) of \(F\) imply that \(X(p)\) is defined and finite for any \(p\) and that it is a monotone nondecreasing function on \((0,1)\). In particular, it is measurable, so it is in fact a random variable on the probability space \(((0,1), {\cal B}, \prob)\). We need to show that \(F\) is its distribution function. We will show that for each \(p\in (0,1)\) and \(x\in \R\), we have that \(X(p) \le x\) if and only if \(p \le F(x)\). This will imply that for every \(x\in \R\) we have the equality of sets

    \[ \{ p : X(p) \le x \} = \{ p : p \le F(x) \}, \]

    and, applying \(\prob\) to both sides of this equation we will get

    \[ F_X(x) = \prob(X\le x) = \prob\big( p : p \le F(x) \big) = \prob\big( (0,F(x)] \big) = F(x) \]

    (since \(\prob\) is Lebesgue measure).

    To prove the claim, note that if \(p \le F(x)\) then all elements of the set \(\{ y : F(y) < p\}\) satisfy \(y\le x\), and therefore the supremum \(X(p)\) of this set also satisfies \(X(p) \le x\). Conversely, if \(p > F(x)\), then, since \(F\) is right-continuous, we have \(p > F(x+\epsilon)\) for some \(\epsilon>0\). It follows that \(X(p)\ge x+\epsilon>x\) (since \(x+\epsilon\) is in the set \(\{ y : F(y)<p \}\)).

    The function \(X\) defined in the proof above is sometimes referred to as the (lower) quantile function of the distribution \(F\). Note that if \(F\) is a strictly increasing function then \(X\) is simply its ordinary (set-theoretic) inverse function.

    Example: Indicator Random Variables

    If \(A\) is an event in a probability space \((\Omega, {\cal F}, \prob)\), its indicator random variable is the r.v.\ \(\ind_A\) defined by

    \[ \ind_A(\omega) = \begin{cases} 0 & \omega \notin A, \\ 1 & \omega \in A. \end{cases} \]

    The above discussion shows that to specify the behavior of a random variable, it is enough to specify its distribution function. Another useful concept is that of a density function. If \(F=F_X\) is a distribution function such that for some nonnegative function \(f:\R\to\R\) we have

    \begin{equation} \label{eq:density}
    F(x) = \int_{-\infty}^x f(y)\,dy, \qquad (y\in\R),

    then we say that \(X\) has a density function \(f$}. Note that \(f\) determines \(F\) but is itself only determined by \(F\) up to ``small'' changes that do not affect the integral in \eqref{eq:density} (in measure-theoretic terminology we say that \(f\) is determined ``up to a set of measure \(0\)''). For example, changing the value \(f\) in a finite number of points results in a density function that is equally valid for computing \(F\).

    Example: Uniform Random Variables

    \begin{example}[Uniform random variables.] We say that \(X\) is a uniform random variable on \((0,1)\)} if it has the distribution function

    \[ F(x) = \begin{cases} 0 & x \le 0, \\
    x & 0\le x \le 1, \\
    1 & x \ge 1.

    Such a r.v.\ has as its density function the function

    \[ f(x) = \begin{cases} 1 & 0\le x\le 1,\\ 0&\textrm{otherwise.}\end{cases} \]

    More generally, if \(a<b\) we say that \(X\) is a uniform random variable in the interval \((a,b)\) if it has the (respective) distribution and density functions

    \[ F(x) = \begin{cases} 0 & x \le a, \\ \frac{x-a}{b-a} & a\le x\le b, \\ 1 & x\ge b, \end{cases}
    f(x) = \begin{cases} \frac{1}{b-a} & a\le x\le b, \\ 0 & \textrm{otherwise.}

    Example: Exponential Distributionn
    F(x) = \begin{cases} 0 & x\le 0 \\ 1-e^{-x} & x \ge 0, \end{cases}
    f(x) = \begin{cases} 0 & x < 0 \\ e^{-x} & x \ge 0. \end{cases}
    Example: Standard Normal Distribution

    The normal (or Gaussian) distribution is given in terms of its density function

    \[ f(x) = \frac{1}{\sqrt{2\pi}} e^{-x^2/2}. \]

    The cumulative distribution function is denoted by

    \[ \Phi(x) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^x e^{-y^2/2} dy. \]

    This integral cannot be evaluated explicitly in terms of more familiar functions, but \(\Phi\) is an important special function of mathematics nonetheless.