# 2.1: Some Examples of Mathematical Introduction

- Page ID
- 6100

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)## 2.1: Mathematical Induction

The** principle of mathematical induction** states that

In order to prove a statement about an integer \(n\), if we can

- Prove the statement when \(n = b\), for some fixed integer \(b\), and
- Show that the truth of the statement for \(n = k − 1\) implies the truth of the statement for \(n = k\) whenever \(k > b\), then we can conclude the statement is true for all integers \(n \geq b\).

As an example, let us give yet another proof that a set with \(n\) elements has \(2^n\) subsets. This proof uses essentially the same bijections we used in proving the Pascal Equation. The statement we wish to prove is the statement that “A set of size \(n\) has \(2^n\) subsets.”

Our statement is true when \(n = 0\), because a set of size \(0\) is the empty set and the empty set has \(1 = 2^{0}\) subsets. (This step of our proof is called a base step.)

Now suppose that \(k > 0\) and every set with \(k − 1\) elements has \(2^{k−1}\) subsets. Suppose \(S = \{a_{1}, a_{2}, \dotsi , a_{k}\}\) is a set with \(k\) elements. We partition the subsets of \(S\) into two blocks. Block \(B_{1}\) consists of the subsets that do not contain an and block \(B_2\) consists of the subsets that do contain an. Each set in \(B_{1}\) is a subset of \(\{a_1, a_2, . . . a_{k−1}\}\), and each subset of \(\{a_1, a_2, . . . a_{k−1}\}\) is in \(B_{1}\). Thus \(B_{1}\) is the set of all subsets of \(\{a_1, a_2, . . . a_{k−1}\}\). Therefore by our assumption in the first sentence of this paragraph, the size of \(B_{1}\) is \(2^{k−1}\). Consider the function from \(B_{2}\) to \(B_{1}\) which takes a subset of \(S\) including ak and removes ak from it. This function is defined on \(B_{2}\), because every set in \(B_{2}\) contains ak. This function is onto, because if \(T\) is a set in \(B_{1}\), then \(T \cup \{ak\}\) is a set in \(B_{2}\) which the function sends to \(T\). This function is one-to-one because if V and W are two different sets in \(B_{2}\), then

removing ak from them gives two different sets in \(B_{1}\). Thus we have a bijection between \(B_{1}\) and \(B_{2}\), so \(B_{1}\) and \(B_{2}\) have the same size. Therefore by the sum principle the size of \(B_{1} \cup B_{2}\) is \(2^{k−1} + 2^{k−1} = 2^{k}\). Therefore \(S\) has \(2^{k}\) subsets. This shows that if a set of size \(k − 1\) has \(2^{k−1}\) subsets, then a set of size \(k\) has \(2^{k}\) subsets. Therefore by the principle of mathematical induction, a set of size \(n\) has \(2^{n}\) subsets for every nonnegative integer \(n\).

The first sentence of the last paragraph is called the** inductive hypothesis**. In an inductive proof we always make an inductive hypothesis as part of proving that the truth of our statement when \(n = k−1\) implies the truth of our statement when \(n = k\). The last paragraph itself is called the** inductive step **of our proof. In an inductive step we derive the statement for \(n = k\) from the statement for \(n = k−1\), thus proving that the truth of our statement when \(n = k−1\) implies the truth of our statement when \(n = k\). The last sentence in the last paragraph is called the inductive conclusion. All inductive proofs should have a base step, an inductive hypothesis, an inductive step, and an inductive conclusion.

There are a couple details worth noticing. First, in this problem, our base step was the case \(n = 0\), or in other words, we had \(b = 0\). However, in other proofs, \(b\) could be any integer, positive, negative, or \(0\). Second, our proof that the truth of our statement for \(n = k − 1\) implies the truth of our statement for \(n = k\) required that \(k\) be at least \(1\), so that there would be an element ak we could take away in order to describe our bijection. However, condition (2) of the principle of mathematical induction only requires that we be able to prove the implication for \(k > 0\), so we were allowed to assume \(k > 0\).

### 2.1.1: Strong Mathematical Induction

One way of looking at the principle of mathematical induction is that it tells us that if we know the “first” case of a theorem and we can derive each other case of the theorem from a smaller case, then the theorem is true in all cases. However, the particular way in which we stated the theorem is rather restrictive in that it requires us to derive each case from the immediately preceding case. This restriction is not necessary, and removing it leads us to a more general statement of the principle of mathematical induction which people often call the **strong principle of mathematical induction**. It states:

In order to prove a statement about an integer \(n\) if we can

- Prove our statement when \(n = b\), and
- Prove that the statements we get with \(n = b, n = b + 1, \dotsi, n = k − 1\) imply the statement with \(n = k\), then our statement is true for all integers \(n \geq b\).

You will find some explicit examples of the use of the strong principle of mathematical induction in Appendix B and will find some uses for it in this chapter.

## 2.1.2: Binomial Coefficients and the Binomial Theorem

\(\bullet\) Exercise \(72\)

When we studied the Pascal Equation and subsets in Chapter 1, it may have appeared that there is no connection between the Pascal relation \(\binom{n}{k} = \binom{n-1}{k-1} + \binom{n-1}{k}\) and the formula \(\binom{n}{k} = \frac{n!}{k!(n-k)!}\). Of course you probably realize you can prove the Pascal relation by substituting the values the formula gives you into the right-hand side of the equation and simplifying to give you the left hand side. In fact, from the Pascal Relation and the facts that \(\binom{n}{j} = 1\) and \(\binom{n}{n} = 1\), you can actually prove the formula for \(\binom{n}{k}\) by induction on \(n\). Do so. (Hint).

\(\rightarrow\) Exercise \(73\)

Use the fact that \((x + y)^{n} = (x + y)(x + y)^{n−1}\) to give an inductive proof of the binomial theorem. (Hint).

Exercise \(74\)

Suppose that f is a function defined on the nonnegative integers such that \(f(0) = 3\) and \(f(n) = 2f(n − 1)\). Find a formula for \(f(n)\) and prove your formula is correct.

\(+\) Exercise \(75\)

Prove the conjecture in Problem 13b for an arbitrary positive integer \(m\) without appealing to the general product principle. (Hint).

## 2.1.3: Inductive Definition

You may have seen \(n!\) described by the two equations \(0! = 1\) and \(n! = n(n − 1)!\) for \(n > 0\). By the principle of mathematical induction, we know that this pair of equations defines \(n!\) for all nonnegative numbers \(n\). For this reason, we call such a definition an inductive definition. An inductive definition is sometimes called a recursive definition. Often we can get very easy proofs of useful facts by using inductive definitions.

\(\rightarrow\) Exercise \(76\)

An inductive definition of \(a^{n}\) for nonnegative \(n\) is given by \(a^{0} = 1\) and \(a^{n} = aa^{n−1}\). (Notice the similarity to the inductive definition of \)(n!\).) We remarked above that inductive definitions often give us easy proofs of useful facts. Here we apply this inductive definition to prove two useful facts about exponents that you have been using almost since you learned the meaning of exponents.

\(+\) Exercise \(77\)

Suppose that \(f\) is a function on the nonnegative integers such that \(f(0) = 0\) and \(f(n) = n + f(n − 1)\). Prove that \(f(n) = n(n + 1)/2\). Notice that this gives a third proof that \(1 + 2 + \dotsi + n = n(n + 1)/2\), because this sum satisfies the two conditions for \(f\). (The sum has no terms and is thus \(0\) when \(n = 0\).)

\(\rightarrow\) Exercise \(78\)

Give an inductive definition of the summation notation \(\sum^{n}_{i=1}a_{i}\). Use it and the distributive law \(b(a + c) = ba + bc\) to prove the distributive law \[b\sum_{i=1}^{n} a_i = \sum^{n}_{i=1}ba_{i}.\]

## 2.1.4: Proving the General Product Principle (Optional)

We stated the sum principle as

If we have a partition of a finite set \(S\), then the size of \(S\) is the sum of the sizes of the blocks of the partition.

In fact, the simplest form of the sum principle says that the size of the sum of two disjoint (finite) sets is the sum of their sizes.

Exercise \(79\)

Prove the sum principle we stated for partitions of a set from the simplest form of the sum principle. (Hint).

We stated the partition form of the product principle as

If we have a partition of a finite set \(S\) into \(m\) blocks, each of size \(n\), then \(S\) has size \(mn\).

In Problem 11 we gave a more general form of the product principle which can be stated as

If we make a sequence of \(m\) choices for which

- there are \(k_{1}\) possible first choices, and
- for each way of making the first \(i − 1\) choices, there are \(k_{i}\)

ways to make the \(i^{\text{th}}\) choice,

then we may make our sequence \(Q\) of choices in \(k_{1} \cdot k_{2} \dotsi \dotsi k_{m} = \prod^{m}_{i=1}k_{i}\) ways.

In Problem 14 we stated the general product principle as follows.

Let \(S\) be a set of functions \(f\) from \([n]\) to some set \(X\). Suppose that

- there are \(k_{1}\) choices for \(f(1)\), and
- for each choice of \(f(1), f(2), \dotsc, f(i − 1)\), there are \(k_{i}\) choices for \(f(i)\).

Then the number of functions in the set \(S\) is \(k_{1}k_{2} \dotsi k_{n}\).

You may use either way of stating the general product principle in the following Problem.

\(+\) Exercise \(80\)

Prove the general form of the product principle from the partition form of the product principle. (Hint).

## 2.1.5: Double Induction and Ramsey Numbers

In Section 1.3.4 we gave two different descriptions of the Ramsey number \(R(m, n)\). However, if you look carefully, you will see that we never showed that Ramsey numbers actually exist; we merely described what they were and showed that \(R(3, 3)\) and \(R(3, 4)\) exist by computing them directly. As long as we can show that there is some number \(R\) such that when there are \(R\) people together, there are either \(m\) mutual acquaintances or \(n\) mutual strangers, this shows that the Ramsey Number \(R(m, n)\) exists, because it is the smallest such \(R\). Mathematical induction allows us to show that one such \(R\) is \(\binom{m+n-2}{m-1}\). The question is, what should we induct on, \(m\) or \(n\)? In other words, do we use the fact that with \(\binom{m+n-3}{m-2}\) people in a room there are at least \(m−1\) mutual acquaintances or \(n\) mutual strangers, or do we use the fact that with at least \(\binom{m+n-3}{m-2}\) people in a room there are at least \(m\) mutual acquaintances or at least \(n − 1\) mutual strangers? It turns out that we use both. Thus we want to be able to simultaneously induct on \(m\) and \(n\). One way to do that is to use yet another variation on the principle of mathematical induction, the *Principle of Double Mathematical Induction*. This principle (which can be derived from one of our earlier ones) states that

In order to prove a statement about integers \(m\) and \(n\), if we can

- Prove the statement when \(m = a\) and \(n = b\), for fixed integers \(a\) and \(b\)
- Prove the statement when \(m = a\) and \(n > b\) and when \(m > a\) and \(n = b\) (for the same fixed integers \(a\) and \(b\)),
- Show that the truth of the statement for \(m = j\) and \(n = k − 1\) (with j a and k > j) and the truth of the statement for \(m = j − 1\) and \(n = k\) (with \(j > a\) and \(k \geq b\)) imply the truth of the statement for \(m = j\) and \(n = k\),

then we can conclude the statement is true for all pairs of integers \(m \geq a\) and \(n \geq b\).

There is a strong version of double induction, and it is actually easier to state. The principle of **strong double mathematical induction** says the following.

In order to prove a statement about integers \(m\) and \(n\), if we can

- Prove the statement when \(m = a\) and \(n = b\), for fixed integers \(a\) and \(b\).
- Show that the truth of the statement for values of \(m\) and \(n\) with \(a + b \leq m + n < k\) implies the truth of the statement for \(m + n = k\),

then we can conclude that the statement is true for all pairs of integers \(m \geq a\) and \(n \geq b\).

\(\rightarrow \; \cdot\) Exercise \(81\)

Prove that \(R(m, n)\) exists by proving that if there are \(\binom{m+n+2}{m-1}\) people in a room, then there are either at least \(m\) mutual acquaintances or at least \(n\) mutual strangers. (Hint).

Exercise \(82\)

Prove that \(R(m, n) \leq R(m − 1, n) + R(m, n − 1)\). (Hint).

Exercise \(83\)

- What does the equation in Problem 82 tell us about \(R(4, 4)\)?
- Consider \(17\) people arranged in a circle such that each person is acquainted with the first, second, fourth, and eighth person to the right and the first, second, fourth, and eighth person to the left. Can you find a set of four mutual acquaintances? Can you find a set of four mutual strangers? (Hint).
- What is \(R(4, 4)\)?

Exercise \(84\)

(Optional) Prove the inequality of Problem 81 by induction on \(m+n\).

Exercise \(85\)

Use Stirling’s approximation (Problem 46) to convert the upper bound for \(R(n, n)\) that you get from Problem 81 to a multiple of a power of an integer.

## 2.1.6: A Bit of Asymptotic Combinatorics

Problem 85 gives us an upper bound on \(R(n, n)\). A very clever technique due to Paul Erdös, called the “probabilistic method,” will give a lower bound. Since both bounds are exponential in \(n\), they show that \(R(n, n)\) grows exponentially as \(n\) gets large. An analysis of what happens to a function of \(n\) as \(n\) gets large is usually called an *asymptotic analysis*. The *probabilistic method*, at least in its simpler forms, can be expressed in terms of averages, so one does not need to know the language of probability in order to understand it. We will apply it to Ramsey numbers in the next problem. Combined with the result of Problem 85, this problem will give us that \(\sqrt{2}^{n} < R(n, n) < 2^{2n−2}\), so that we know that the Ramsey number \(R(n, n)\) grows exponentially with \(n\).

\(\rightarrow\) Exercise \(86\)

Suppose we have two numbers \(n\) and \(m\). We consider all possible ways to color the edges of the complete graph \(K_{m}\) with two colors, say red and blue. For each coloring, we look at each \(n\)-element subset \(N\) of the vertex set \(M\) of \(K_{m}\). Then \(N\) together with the edges of \(K_{m}\) connecting vertices in \(N\) forms a complete graph on \(n\) vertices. This graph, which we denote by \(K_{N}\), has its edges colored by the original coloring of the edges of \(K_{m}\).

- Why is it that, if there is no subset \(N \subseteq M\) so that all the edges of \(K_{N}\) are colored the same color for any coloring of the edges of \(K_{m}\), then \(R(n, n) > m\)? (Hint).
- To apply the probabilistic method, we are going to compute the average, over all colorings of \(K_{m}\), of the number of sets \(N \subseteq M\) with \(|N| = n\) such that \(K_{N}\) does have all its edges the same color. Explain why it is that if the average is less than \(1\), then for some coloring there is no set \(N\) such that \(K_{N}\) has all its edges colored the same color. Why does this mean that \(R(n, n) > m\)? (Hint).
- We call a \(K_{N}\) monochromatic for a coloring \(c\) of \(K_{m}\) if the color \(c(e)\) assigned to edge \(e\) is the same for every edge e of \(K_{N}\). Let us define mono(\(c,N\)) to be \(1\) if \(N\) is monochromatic for \(c\) and to be \(0\) otherwise. Find a formula for the average number of monochromatic \(K_{N}\)s over all colorings of \(K_{m}\) that involves a double sum first over all edge colorings \(c\) of \(K_{m}\) and then over all \(n\)-element subsets \(N \subseteq M\) of \(mono(c,N\)). (Hint).
- Show that your formula for the average reduces to \(2\binom{m}{n} \cdot 2^{-\binom{n}{2}}\) (Hint).
- Explain why \(R(n, n) > m\) if \(\binom{m}{n} \leq 2^{\binom{n}{2} - 1}\). (Hint).
- *Explain why \(R(n,n) > \sqrt[n]{n!2^{\binom{n}{2} - 1}}\). (Hint).
- By using Stirling’s formula, show that if \(n\) is large enough, then \(R(n,n) > \sqrt{2^{n}} = \sqrt{2}^{n}\). (Here large enough means large enough

for Stirling’s formula to be reasonably accurate.)