9.3: Proof by induction

Last updated
Save as PDF

Page ID: 97279

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

One of the most powerful methods of proof — and one of the most difficult to wrap your head around — is called mathematical induction, or just “induction" for short. I like to call it “proof by recursion," because this is exactly what it is. Remember that we discussed recursion in the context of rooted trees (see p.5.2). A tree can be thought of as a node with several children — each of which are, in turn, trees. Each of them is the root node of a tree comprised of yet smaller trees, and so on and so forth. If you flip back to the left-hand side of Figure 5.2.2 on p.5.2, you’ll see that A is the root of one tree, and its two children, F and B, are roots of their own smaller trees in turn. If we were to traverse this tree in (say) pre-order, we’d visit the root, then visit the left and right subtrees in turn, treating each of them as their own tree. In this way we’ve broken up a larger problem (traversing the big tree) into smaller problems (traversing the smaller trees F and B). The A node has very little to do: it just visits itself, then defers all the rest of the work onto its children. This idea of pawning off most of the work onto smaller subproblems that you trust will work is key to the idea of inductive proofs.

Mathematical induction is hard to wrap your head around because it feels like cheating. It seems like you never actually prove anything: you defer all the work to someone else, and then declare victory. But the chain of reasoning, though delicate, is strong as iron.

Casting the problem in the right form

Let’s examine that chain. The first thing you have to be able to do is express the thing you’re trying to prove as a predicate about natural numbers. In other words, you need to form a predicate that has one input, which is a natural number. You’re setting yourself up to prove that the predicate is true for all natural numbers. (Or at least, all natural numbers of at least a certain size.)

Suppose I want to prove that in the state of Virginia, all legal drinkers can vote. Then I could say “let Vote(\(n\)) be the proposition that a citizen of age \(n\) can vote."

If I want to prove an algebraic identity, like \(\sum_{i=1}^x{i} = \frac{x(x+1)}{2}\), then I have to figure out which variable is the one that needs to vary across the natural numbers. In this case it’s the \(x\) variable in my equation. So I’ll say “let P(\(n\)) be the proposition that \(\sum_{i=1}^n{i} = \frac{n(n+1)}{2}\)." (The choice of the letter “\(n\)" isn’t important here — it just needs to be a letter that stands for a number. We could have chosen anything, even sticking with \(x\). Later, we’ll use “\(k\)" as a stand-in, so keep your eyes peeled for that.)

If I want to prove that the number of leaves in a perfect binary tree is one more than the number of internal nodes, I’d have to think about which quantity I can parameterize on (i.e., which quantity I can use for my \(n\).) In this case, I’d probably use the height of the tree. I’d say “let P(\(n\)) be the proposition that the number of leaves in a perfect binary tree of height \(n\) is one more than the number of internal nodes."

These are just examples. In any case, you need to cast your proof in a form that allows you to make statements in terms of the natural numbers. Then you’re ready to begin the process of proving by induction that your predicate is true for all the natural numbers.

Proof by induction: weak form

There are actually two forms of induction, the weak form and the strong form. Let’s look at the weak form first. It says:

If a predicate is true for a certain number,
and its being true for some number would reliably mean that it’s also true for the next number (i.e., one number greater),
then it’s true for all numbers.

All you have to do is prove those two things, and you’ve effectively proven it for every case.

The first step is called the base case, and the “certain number" we pick is normally either 0 or 1. The second step, called the inductive step, is where all the trouble lies. You have to look really, really carefully at how it’s worded, above. We are not assuming that the predicate is true for any old number! We are simply considering, if it’s true for any old number, whether that would necessarily imply it’s also true for the next number. In terms of the predicate, we’re asking “does P(\(k\)) imply P(\(k+1\))?" In other words: “we aren’t sure if P(\(k\)) is true. But if it does — a big “if," of course — would that logically demand that P(\(k+1\)) was also true?" If you can prove that it does, then you’re in business.

The whole thing is set up like a row of dominos. If one domino falls, then the one after it will also fall. And if that one falls, then so will the next. All that is needed is a base case to tip over the first domino, and by this trail of causality, all the dominos will fall.

One terminology note: the entire second step is called the inductive step, but the first half of it (the part where we assume that P(\(k\)) is true) is called the inductive hypothesis. We never prove the inductive hypothesis; rather, we assume it, and then see if that allows us to deduce that P(\(k+1\)) would also be true.

Example 1

Let’s work this out for the drinking/voting example. Let Vote(\(n\)) be the proposition that a citizen of age \(n\) can vote. Our proof goes like this:

base case. Vote(21) is true, because a 21-year old is old enough to vote in the state and national elections.
inductive step. Vote(k)\(\Rightarrow\)Vote(k+1). Why? Because nobody’s gettin’ any younger. If you can vote in a particular year, then you’re also old enough to vote next year. Unless the laws change, there will never be a case when someone old enough to vote this year turns out to be too young to vote next year.
Wow. We’re done. Q.E.D. and all that.

The only specific example we showed was true was Vote(21). And yet we managed to prove Vote(\(n\)) for any number \(n\geq21\).

Let’s look back at that inductive step, because that’s where all the action is. It’s crucial to understand what that step does not say. It doesn’t say “Vote(\(k\)) is true for some number \(k\)." If it did, then since \(k\)’s value is arbitrary at that point, we would basically be assuming the very thing we were supposed to prove, which is circular reasoning and extremely unconvincing. But that’s not what we did. Instead, we made the inductive hypothesis and said, “okay then, let’s assume for a second a 40-year-old can vote. We don’t know for sure, but let’s say she can. Now, if that’s indeed true, can a 41-year-old also vote? The answer is yes." We might have said, “okay then, let’s assume for a second a 7-year-old can vote. We don’t know for sure, but let’s say she can. Now, if that’s indeed true, can an 8-year-old also vote? The answer is yes." Note carefully that we did not say that 8-year-olds can vote! We merely said that if 7-year-olds can, why then 8-year-olds must be able to as well. Remember that X\(\Rightarrow\)Y is true if either X is false or Y is true (or both). In the 7/8-year-old example, the premise X turns out to be false, so this doesn’t rule out our implication.

The result is a row of falling dominos, up to whatever number we wish. Say we want to verify that a 25-year-old can vote. Can we be sure? Well:

If a 24-year-old can vote, then that would sure prove it (by the inductive step).
So now we need to verify that a 24-year-old can vote. Can he? Well, if a 23-year-old can vote, then that would sure prove it (by the inductive step).
Now everything hinges on whether a 23-year-old can vote. Can he? Well, if a 22-year-old can vote, then that would sure prove it (by the inductive step).
So it comes down to whether a 22-year-old can vote. Can he? Well, if a 21-year-old can vote, then that would sure prove it (by the inductive step).
And now we need to verify whether a 21-year-old can vote. Can he? Yes (by the base case).

Example 2

A famous story tells of Carl Friedrich Gauss, perhaps the most brilliant mathematician of all time, getting in trouble one day as a schoolboy. As punishment, he was sentenced to tedious work: adding together all the numbers from 1 to 100. To his teacher’s astonishment, he came up with the correct answer in a moment, not because he was quick at adding integers, but because he recognized a trick. The first number on the list (1) and the last (100) add up to 101. So do the second number (2) and the second-to-last (99). So do 3 and 98, and so do 4 and 97, etc., all the way up to 50 and 51. So really what you have here is 50 different sums of 101 each, so the answer is \(50\times 101 = 5050\). In general, if you add the numbers from 1 to \(x\), where \(x\) is any integer at all, you’ll get \(\frac{x}{2}\) sums of \(x+1\) each, so the answer will be \(\frac{x(x+1)}{2}\).

Now, use mathematical induction to prove that Gauss was right (i.e., that \(\sum_{i=1}^x{i} = \frac{x(x+1)}{2}\)) for all numbers \(x\).

First we have to cast our problem as a predicate about natural numbers. This is easy: we say “let P(\(n\)) be the proposition that \(\sum_{i=1}^n{i} = \frac{n(n+1)}{2}\)."

Then, we satisfy the requirements of induction:

base case. We prove that P(1) is true simply by plugging it in. Setting \(n=1\) we have \[\begin{aligned} \sum_{i=1}^1{i} & \stackrel{?}{=} \frac{1(1+1)}{2} \\ 1 & \stackrel{?}{=} \frac{1(2)}{2} \\ 1 &= 1 \quad \checkmark\end{aligned}\]
inductive step. We now must prove that P(\(k\))\(\Rightarrow\)P(\(k+1\)). Put another way, we assume P(\(k\)) is true, and then use that assumption to prove that P(\(k+1\)) is also true.

Let’s be crystal clear where we’re going with this. Assuming that P(\(k\)) is true means we can count on the fact that \[\begin{aligned} 1+2+3+\cdots+k = \frac{k(k+1)}{2}.\end{aligned}\]

What we need to do, then, is prove that P(\(k+1\)) is true, which amounts to proving that \[\begin{aligned} 1+2+3+\cdots+(k+1) = \frac{(k+1)((k+1)+1)}{2}.\end{aligned}\]

Very well. First we make the inductive hypothesis, which allows us to assume: \[\begin{aligned} 1+2+3+\cdots+k = \frac{k(k+1)}{2}.\end{aligned}\] The rest is just algebra. We add \(k+1\) to both sides of the equation, then multiply things out and factor it all together. Watch carefully: \[\begin{aligned} 1+2+3+\cdots+k+(k+1) &= \frac{k(k+1)}{2} + (k+1) \\ &= \frac{1}{2}k^2 + \frac{1}{2}k + k + 1 \\ &= \frac{1}{2}k^2 + \frac{3}{2}k + 1 \\ &= \frac{k^2+3k+2}{2} \\ &= \frac{(k+1)(k+2)}{2} \\ &= \frac{(k+1)((k+1)+1)}{2}. \quad \checkmark\end{aligned}\]

Therefore, \(\forall n\geq1 \ \text{P}(n)\).

Example 3

Another algebra one. You learned in middle school that \((ab)^n=a^n b^n\). Prove this by mathematical induction.

Solution: Let P(\(n\)) be the proposition that \((ab)^n=a^n b^n\).

base case. We prove that P(1) is true simply by plugging it in. Setting \(n=1\) we have \[\begin{aligned} (ab)^1 &= \stackrel{?}{=} a^1 b^1 \\ ab &= ab \quad \checkmark\end{aligned}\]
inductive step. We now must prove that P(\(k\))\(\Rightarrow\)P(\(k+1\)). Put another way, we assume P(\(k\)) is true, and then use that assumption to prove that P(\(k+1\)) is also true.

Let’s be crystal clear where we’re going with this. Assuming that P(\(k\)) is true means we can count on the fact that \[\begin{aligned} (ab)^k = a^k b^k.\end{aligned}\]

What we need to do, then, is prove that P(\(k+1\)) is true, which amounts to proving that \[\begin{aligned} (ab)^{k+1} = a^{k+1} b^{k+1}.\end{aligned}\]

Now we know by the very definition of exponents that: \[\begin{aligned} (ab)^{k+1} &= ab(ab)^k. \\\end{aligned}\]

Adding in our inductive hypothesis then lets us determine: \[\begin{aligned} (ab)^{k+1} &= ab(ab)^k \\ &= ab \cdot a^k b^k \\ &= a \cdot a^k \cdot b \cdot b^k \\ &= a^{k+1} b^{k+1} \quad \checkmark\end{aligned}\]

Therefore, \(\forall n\geq1 \ \text{P}(n)\).

Example 4

Let’s switch gears and talk about structures. Prove that the number of leaves in a perfect binary tree is one more than the number of internal nodes.

Solution: let P(\(n\)) be the proposition that a perfect binary tree of height \(n\) has one more leaf than internal node. That is, if \(l_k\) is the number of leaves in a tree of height \(k\), and \(i_k\) is the number of internal nodes in a tree of height \(k\), let P(\(n\)) be the proposition that \(l_n = i_n + 1\).

base case. We prove that P(0) is true simply by inspection. If we have a tree of height 0, then it has only one node (the root). This sole node is a leaf, and is not an internal node. So this tree has 1 leaf, and 0 internal nodes, and so \(l_0 = i_0 + 1\).
inductive step. We now must prove that P(\(k\))\(\Rightarrow\)P(\(k+1\)). Put another way, we assume P(\(k\)) is true, and then use that assumption to prove that P(\(k+1\)) is also true.

Let’s be crystal clear where we’re going with this. Assuming that P(\(k\)) is true means we can count on the fact that \[\begin{aligned} l_k = i_k + 1.\end{aligned}\]

What we need to do, then, is prove that P(\(k+1\)) is true, which amounts to proving that \[\begin{aligned} l_{k+1} = i_{k+1} + 1.\end{aligned}\]

We begin by noting that the number of nodes on level \(k\) of a perfect binary tree is \(2^k\). This is because the root is only one node, it has two children (giving 2 nodes on level 1), both those children have two children (giving 4 nodes on level 2), all four of those children have two children (giving 8 nodes on level 3), etc. Therefore, \(l_k = 2^k\), and \(l_{k+1} = 2^{k+1}\).

Further, we observe that \(i_{k+1} = i_k + l_k\): this is just how trees work. In words, suppose we have a perfect binary tree of height \(k\), and we add another level of nodes to it, making it a perfect binary tree of height \(k+1\). Then all of the first tree’s nodes (whether internal or leaves) become internal nodes of bigger tree.

Combining these two facts, we have \(i_{k+1} = i_k + 2^k\). By the inductive hypothesis, we assume that \(2^k = i_k + 1\), and we now must prove that \(2^{k+1} = i_{k+1} + 1\). Here goes: \[\begin{aligned} n_{k+1} &= n_k + 2^k &(property\space of\space trees) \\ n_{k+1} &= 2^k - 1 + 2^k &(using\space inductive\space hypothesis) \\ n_{k+1} + 1 &= 2^k + 2^k \\ n_{k+1} + 1 &= 2(2^k) \\ n_{k+1} + 1 &= 2^{k+1}. \quad \checkmark\end{aligned}\]

Therefore, \(\forall n\geq0 \ \text{P}(n)\).

Proof by induction: strong form

Now sometimes we actually need to make a stronger assumption than just “the single proposition P(\(k\)) is true" in order to prove that P(\(k+1\)) is true. In all the examples above, the \(k+1\) case flowed directly from the \(k\) case, and only the \(k\) case. But sometimes, you need to know that all the cases less than \(k+1\) are true in order to prove the \(k+1\) case. In those situations, we use the strong form of mathematical induction. It says:

If a predicate is true for a certain number,
and its being true for all numbers up to and including some number would reliably mean that it’s also true for the next number (i.e., one number greater),
then it’s true for all numbers.

It’s exactly the same as the weak form, except that the inductive hypothesis is stronger. Instead of having to prove

P(\(k\))\(\Rightarrow\)P(\(k+1\)),

we get to prove

(\(\forall i\leq k\) P(\(i\)))\(\Rightarrow\)P(\(k+1\)).

At first glance that might not seem any easier. But if you look carefully, you can see that we’ve added information to the left hand side of the implication. No longer do we need to rely on the single fact that P(5) is true in order to prove P(6). Now we get to take advantage of the fact that P(1), P(2), P(3), P(4), and P(5) are all known to be true when we try to prove P(6). And that can make a world of difference.

Example 1

The Fundamental Theorem of Arithmetic says that every natural number (greater than 2) is expressible as the product of one or more primes. For instance, 6 can be written as “\(2 \cdot 3\)", where 2 and 3 are primes. The number 7 is itself prime, and so can be written as “\(7\)." The number 9,180 can be written as “\(2 \cdot 2 \cdot 3 \cdot 3 \cdot 3 \cdot 5 \cdot 17\)," all of which are primes. How can we prove that this is always possible, no matter what the number?

Let P(\(n\)) be the proposition that the number \(n\) can be expressed as a product of prime numbers. Our proof goes like this:

base case. P(2) is true, since 2 can be written as “2," and 2 is a prime number. (Note we didn’t use 0 or 1 as our base case here, since actually neither of those numbers is expressible as a product of primes. Fun fact.)
inductive step. We now must prove that (\(\forall i \leq k\) P(\(k\)))\(\Rightarrow\)P(\(k+1\)). Put another way, we assume that P(\(i\)) is true for every number up to \(k\), and then use that assumption to prove that P(\(k+1\)) is true as well.

Regarding the number \(k+1\), there are two possibilities: either it’s prime, or it’s not. If it is, then we’re done, because it can obviously be written as just itself, which is the product of one prime. (23 can be written as “23.") But suppose it’s not. Then, it can be broken down as the product of two numbers, each less than itself. (21 can be broken down as \(7 \cdot 3\); 24 can be broken down as \(6 \cdot 4\) or \(12 \cdot 2\) or \(8 \cdot 3\), take your pick.) Now we know nothing special about those two numbers…except the fact that the inductive hypothesis tells us that all numbers less than \(k+1\) are expressible as the product of one or more primes! So these two numbers, whatever they may be, are expressible as the product of primes, and so when you multiply them together to get \(k+1\), you will have a longer string of primes multiplied together. Therefore, (\(\forall i \leq k\) P(\(k\)))\(\Rightarrow\)P(\(k+1\)).

Therefore, by the strong form of mathematical induction, \(\forall n \geq 2\) P(\(n\)).

You can see why we needed the strong form here. If we wanted to prove that 15 is expressible as the product of primes, knowing that 14 is expressible as the product of primes doesn’t do us a lick of good. What we needed to know was that 5 and 3 were expressible in that way. In general, the strong form of induction is useful when you have to break something into smaller parts, but there’s no guarantee that the parts will be “one less" than the original. You only know that they’ll be smaller than the original. A similar example follows.

Example 2

Earlier (p.) we stated that every free tree has one less edge than node. Prove it.

Let P(\(n\)) be the proposition that a free tree with \(n\) nodes has \(n-1\) edges.

base case. P(1) is true, since a free tree with 1 node is just a single lonely node, and has no edges.
inductive step. We now must prove that (\(\forall i \leq k\) P(\(k\)))\(\Rightarrow\)P(\(k+1\)). Put another way, we assume that all trees smaller than the one we’re looking at have one more node than edge, and then use that assumption to prove that the tree we’re looking at also has one more node than edge.

We proceed as follows. Take any free tree with \(k+1\) nodes. Removing any edge gives you two free trees, each with \(k\) nodes or less. (Why? Well, if you remove any edge from a free tree, the nodes will no longer be connected, since a free tree is “minimally connected" as it is. And we can’t break it into more than two trees by removing a single edge, since the edge connects exactly two nodes and each group of nodes on the other side of the removed edge are still connected to each other.)

Now the sum of the nodes in these two smaller trees is still \(k+1\). (This is because we haven’t removed any nodes from the original free tree — we’ve simply removed an edge.) If we let \(k_1\) be the number of nodes in the first tree, and \(k_2\) the number of nodes in the second, we have \(k_1 + k_2 = k + 1\).

Okay, but how many edges does the first tree have? Answer: \(k_1 - 1\). How do we know that? By the inductive hypothesis. We’re assuming that any tree smaller than \(k+1\) nodes has one less edge than node, and so we’re taking advantage of that (legal) assumption here. Similarly, the second tree has \(k_2 - 1\) edges.

The total number of edges in these two trees is thus \(k_1 - 1 + k_2 - 1\), or \(k_1 + k_2 - 2\). Remember that \(k+1 = k_1 + k_2\) (no nodes removed), and so this is a total of \(k+1-2 = k-1\) edges.

Bingo. Removing one edge from our original tree of \(k+1\) nodes gave us a total of \(k-1\) edges. Therefore, that original tree must have had \(k\) edges. We have now proven that a tree of \(k+1\) nodes has \(k\) edges, assuming that all smaller trees also have one less edge than node.

Therefore, by the strong form of mathematical induction, \(\forall n \geq 1\) P(\(n\)).