9.2: Catalan Numbers

Last updated
Save as PDF

Page ID: 60116

Joy Morris
University of Lethbridge

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

This is an example that shows even more clearly the power of the generating function method.

The Catalan numbers are a sequence that can be defined in a variety of ways, because they arise in a number of different circumstances. We’ll use the following definition.

Definition: Catalan Number

The \(n^{\text{th}}\) Catalan number, \(C_n\), is the number of different ways in which brackets can be put around \(n\) terms, to indicate different orders of combining the terms.

Thus, for example, \(C_3 = 2\), since three terms can be combined as either

[(_·_) ·_], or [_·_(_·_)].

These numbers have something in common with Example 6.3.2, in which Shawna was building towers from lego. If we’d asked in how many different orders she could combine the blocks to build her tower, assuming that the final order for the blocks was decided in advance, we would have been asking for the Catalan number. So we can use logic similar to the logic we used in that example: in order to create an expression with n terms, our final step must involve combining a set of \(k\) terms (for which the order of combining them has already been determined) and a set of \(n − k\) terms (for which the order of combining them has already been determined). Here, \(k\) may take on any value from \(1\) to \(n − 1\)). This results in the recursive relation:

\(C_n = \sum_{k=1}^{n-1} C_kC_{n−k}\)

This may be easier to see with an example. We’ve worked out \(C_3\) above; in order to work out \(C_4\) using this recursive relation, we also need to know \(C_1\) and \(C_2\). There is only one way to combine a single term (we don’t need brackets at all), so \(C_1 = 1\). We also have \(C_2 = 1\), since there is only one way to put brackets around a pair of terms: (_·_).

Now, to use brackets to order the operations in a four-term expression, our final operation must either combine a group of three terms with a single term; a group of two terms with another group of two terms; or a single term with a group of three terms (this time, the single term is at the left). The first two expressions below come from combining a group of three terms with a single term; the third comes from combining a group of two terms with another group of two terms; and the last two come from combining a single term with a group of three terms.

([(_·_)·_]·_), ([_·(_·_)]·_), [(_·_) · (_·_)] (_·[(_·_)·_]) (_·[_·(_·_)]).

Thus,

\(C_4 = C_3C_1 + C_2C_2 + C_1C_3 = 2 + 1 + 2 = 5\).

Now we want to use generating functions to figure out what we can about the Catalan numbers. Unfortunately, there is a difficulty. Any time we want to use generating functions to solve a recursively-defined sequence, the sequence must start with a \(0^{\text{th}}\) term, to be the coefficient of \(x^0\). With some recursively-defined sequences, we can simply use the recursive relation “backwards” to solve previous terms, going down to \(n = 0\), even if our initial conditions began with much higher terms. For example, if a recursively-defined sequence is given by \(h_2 = 1\), \(h_3 = 5\) and \(h_n = 8h_{n−2} − h_{n−1}\) for every \(n ≥ 4\), we can use \(n = 3\) in this to get

\(h_3 = 5 = 8h_1 − h_2 = 8h_1 − 1\).

Solving for \(h_1\) gives \(h_1 = \dfrac{3}{4}\). Then using the recursive relation with \(n = 2\) gives

\(h_2 = 1 = 8h_0 − h_1 = 8h_0 − \dfrac{3}{4}\).

Solving for \(h_0\) gives \(h_0 = \dfrac{7}{32}\). This allows us to use generating functions on the sequence.

The recursive relation for the Catalan numbers doesn’t have a form that allows us to solve for \(C_0\) by knowing other terms of the sequence, so we do what we have to, in order to make things work. Instead of working with the generating function for the Catalan numbers themselves (since we can’t), we work with the generating function for the sequence \(c_0, c_1, c_2, . . .\), where \(c_i = C_{i+1}\) for every \(i ≥ 0\). In other words, the \(n^{\text{th}}\) term of our new sequence will be the \(n + 1^{\text{st}}\) Catalan number.

Adjusting the recursive relation we’ve determined for the Catalan numbers to this new sequence, gives

\(c_0 = 1\), and \(c_n = \sum_{k=0}^{n-1} c_kc_{n−k−1}\) for every \(n ≥ 1\).

Notice that

\( \begin{equation} \begin{split} c(x)c(x)&= (c_0 + c_1x + c_2x^2 + c_3x^3 + . . .)(c_0 + c_1x + c_2x^2 + c_3x^3 + . . .) \\ &= c_0c_0 + (c_1c_0 + c_0c_1)x + (c_2c_0 + c_1c_1 + c_0c_2)x^2 + (c_0c_3 + c_1c_2 + c_2c_1 + c_3c_0)x^3 + . . ., \end{split} \end{equation} \)

and in general, the coefficient of \(x^m\) in \((c(x))^2\), is

\(\sum_{k=0}^{m}c_kc_{m−k}\).

This should look familiar! In fact, you can see that the coefficient of \(x^m\) in \((c(x))^2\), is the same as the coefficient of \(x^{m+1}\) in \(c(x)\), since the latter is

\(\sum_{k=0}^{m}c_kc_{m−k}\)

also.

Thus, we have an expression for \(c(x)\) in terms of \((c(x))^2\), since multiplying \((c(x))^2\) by x gives all of the terms of \(c(x)\) except \(c_0: c(x) = x(c(x))^2 + c_0\). We can rearrange this equation, to see that

\(x[c(x)]^2 − c(x) + 1 = 0\).

We are about to do something to this generating function that may seem a bit like black magic: we will use the quadratic formula to factor this quadratic equation in \(c(x)\), treating \(x\) as the coefficient of \((c(x))^2\). Thus, in the quadratic formula, we take \(a = x\), \(b = −1\), and \(c = 1\), and obtain

\(c(x) = \dfrac{1 ± \sqrt{1 − 4x}}{2x}\)

Of course, there are two roots to this, and only one of them will give the correct generating function; we need to work out which one (whether to take the plus or the minus).

Using the Generalized Binomial Theorem, we see that

\((1 − 4x)^{\dfrac{1}{2}} = \binom{\dfrac{1}{2}}{0} + \binom{\dfrac{1}{2}}{1}(-4x) + \binom{\dfrac{1}{2}}{2}(-4x)^2 + ... + \binom{\dfrac{1}{2}}{k}(-4x)^k + ... \)

and

\(\binom{\dfrac{1}{2}}{k} = \dfrac{(\dfrac{1}{2})(−\dfrac{1}{2})(−\dfrac{3}{2}). . .(\dfrac{1}{2} − k + 1)}{k!} = \dfrac{(−1)^{k−1} 1 · 3 · 5 · (2k − 3)}{2^kk!} \)

so the coefficient of \(x^k\) in \((1 − 4x)^{\dfrac{1}{2}}\) is

\(\dfrac{(−1)^{k−1} 1 · 3 · 5 · (2k − 3)}{2^kk!} (-4)^k)= \dfrac{(−1) 1 · 3 · 5 · (2k − 3)2^k}{k!}\)

Whichever root we use will require this expression, so let’s work with it a bit more to get it into a nicer form.

\(2^k k! = 2^k (1 · 2 · 3 · . . . · k) = 2 · 4 · 6 · . . . · 2^k,\)

so if we multiply the numerator and denominator of the fraction by \(k!\) (which does not change the result), we see that we have

\(\dfrac{(−1)1 · 3 · 5 · (2k − 3)2 · 4 · 6 . . . · 2k}{k!k!} = \dfrac{(−1)(2k − 2)!2k}{k!k!} = \dfrac{(−1)(2k)!}{(2k − 1)k!k!} = \dfrac{-1}{2k − 1} \binom{2k}{k}, \)

\((1 − 4x)^{\dfrac{1}{2}} = −\sum_{k=0}^{\infty} \dfrac{1}{2k-1} \binom{2k}{k}x^k\)

The coefficients shown on the right-hand side of this equation quickly get big and negative.

\(c(x) = \dfrac{1 + \sqrt{1-4x}}{2x} \)

then for \(n > 0\) the coefficient of \(x^n\) in \(c(x)\) will be half of the coefficient of \(x^{n+1}\) in \((1 − 4x)^{\dfrac{1}{2}}\), which (when \(n\) is large) will be big and negative. But it is easy to see from the recurrence relation that all of the Catalan numbers are positive. To get positive coefficients, we must use

\(c(x) = \dfrac{1 - \sqrt{1-4x}}{2x} \)

Since in this expression we take the negative of the large negative coefficients, the result will be large positive coefficients (even when we divide by \(2\), and look for the coefficient of \(x^{n+1}\)).

Thus,

\(c(x) = \dfrac{1 - (1 − 4x)^{\dfrac{1}{2}}}{2x} = \dfrac{1 + \sum_{k=0}^{\infty} \dfrac{1}{2k-1} \binom{2k}{k}x^k}{2x} \)

From this, we see that for \(n > 0\), the coefficient of \(x^n\) in \(c(x)\) is half of the coefficient of \(x^{n+1}\) in \((1 − 4x)^{\dfrac{1}{2}}\), which is

\( \begin{equation} \begin{split} \dfrac{1}{2} \binom{2(n+1)}{n+1} \dfrac{1}{2n+1}&= \dfrac{(2n+2)!}{2(n + 1)!(n + 1)!(2n + 1)} \\ &= \dfrac{1}{2} \cdot \dfrac{2n + 2}{n+1} \cdot \dfrac{(2n)!}{(n + 1)!n!} \cdot \dfrac{2n+1}{2n+1} \\ &= \dfrac{1}{n+1}\binom{2n}{n}. \end{split} \end{equation} \)

\(c_n = \dfrac{1}{2n+1} \binom{2n}{n}\).

Although we derived this expression for \(n > 0\) only, we can verify that \(c_0 = 1 = \dfrac{1}{0+1} \binom{0}{0}\) since \(0! = 1\), so this expression is true for every \(n ≥ 0\).

Exercise \(\PageIndex{1}\)

Use induction and the recursive relation for Catalan numbers (as adjusted for the values of \(\{c_i\}\), where \(c_i = C_{i+1}\)) to prove that \(c_n > 0\) for every \(n ≥ 0\).
Calculate \(c_4\) using the explicit formula that we calculated in this section.
Calculate \(c_4\) using the recursive relation