5.2: Trees

Free trees

A free tree is just a connected graph with no cycles. Every node is reachable from the others, and there’s only one way to get anywhere. Take a look at Figure \(\PageIndex{1}\) . It looks just like a graph (and it is) but unlike the WWII France graph, it’s more skeletal. This is because in some sense, a free tree doesn’t contain anything “extra."

5.2.1.png — Figure \(\PageIndex{1}\): A free tree.

If you have a free tree, the following interesting facts are true:

There’s exactly one path between any two nodes. (Check it!)
If you remove any edge, the graph becomes disconnected. (Try it!)
If you add any new edge, you end up adding a cycle. (Try it!)
[onelessedge] If there are \(n\) nodes, there are \(n-1\) edges. (Think about it!)

So basically, if your goal is connecting all the nodes, and you have a free tree, you’re all set. Adding anything is redundant, and taking away anything breaks it.

If this reminds you of Prim’s algorithm, it should. Prim’s algorithm produced exactly this: a free tree connecting all the nodes — and specifically the free tree with shortest possible total length. Go back and look at the final frame of Figure 5.1.12 and convince yourself that the darkened edges form a free tree.

For this reason, the algorithm is often called Prim’s minimal spanning tree algorithm. A “spanning tree" just means “a free tree that spans (connects) all the graph’s nodes."

Keep in mind that there are many free trees one can make with the same set of vertices. For instance, if you remove the edge from A to F, and add one from anything else to F, you have a different free tree.

Rooted trees

Now a rooted tree is the same thing as a free tree, except that we elevate one node to become the root. It turns out this makes all the difference. Suppose we chose A as the root of Figure \(\PageIndex{1}\) . Then we would have the rooted tree in the left half of Figure \(\PageIndex{2}\) . The A vertex has been positioned at the top, and everything else is flowing under it. I think of it as reaching into the free tree, carefully grasping a node, and then lifting up your hand so the rest of the free tree dangles from there. Had we chosen (say) C as the root instead, we would have a different rooted tree, depicted in the right half of the figure. Both of these rooted trees have all the same edges as the free tree did: B is connected to both A and C, F is connected only to A, etc. The only difference is which node is designated the root.

Up to now we’ve said that the spatial positioning on graphs is irrelevant. But this changes a bit with rooted trees. Vertical positioning is our only way of showing which nodes are “above" others, and the word “above" does indeed have meaning here: it means closer to the root. The altitude of a node shows how many steps it is away from the root. In the right rooted tree, nodes B, D, and E are all one step away from the root (C), while node F is three steps away.

5.2.2.jpg — Figure \(\PageIndex{2}\): Two different rooted trees with the same vertices and edges.

The key aspect to rooted trees — which is both their greatest advantage and greatest limitation — is that every node has one and only one path to the root. This behavior is inherited from free trees: as we noted, every node has only one path to every other.

Trees have a myriad of applications. Think of the files and folders on your hard drive: at the top is the root of the filesystem (perhaps “/" on Linux/Mac or “C:\\" on Windows) and underneath that are named folders. Each folder can contain files as well as other named folders, and so on down the hierarchy. The result is that each file has one, and only one, distinct path to it from the top of the filesystem. The file can be stored, and later retrieved, in exactly one way.

An “org chart" is like this: the CEO is at the top, then underneath her are the VP’s, the Directors, the Managers, and finally the rank-and-file employees. So is a military organization: the Commander in Chief directs generals, who command colonels, who command majors, who command captains, who command lieutenants, who command sergeants, who command privates.

The human body is even a rooted tree of sorts: it contains skeletal, cardiovascular, digestive, and other systems, each of which is comprised of organs, then tissues, then cells, molecules, and atoms. In fact, anything that has this sort of part-whole containment hierarchy is just asking to be represented as a tree.

In computer programming, the applications are too numerous to name. Compilers scan code and build a “parse tree" of its underlying meaning. HTML is a way of structuring plain text into a tree-like hierarchy of displayable elements. AI chess programs build trees representing their possible future moves and their opponent’s probable responses, in order to “see many moves ahead" and evaluate their best options. Object-oriented designs involve “inheritance hierarchies" of classes, each one specialized from a specific other. Etc. Other than a simple sequence (like an array), trees are probably the most common data structure in all of computer science.

Rooted tree terminology

Rooted trees carry with them a number of terms. I’ll use the tree on the left side of Figure \(\PageIndex{2}\) as an illustration of each:

root.

The node at the top of the tree, which is A in our example. Note that unlike trees in the real world, computer science trees have their root at the top and grow down. Every tree has a root except the empty tree, which is the “tree" that has no nodes at all in it. (It’s kind of weird thinking of “nothing" as a tree, but it’s kind of like the empty set \(\varnothing\), which is still a set.)

parent.

Every node except the root has one parent: the node immediately above it. D’s parent is C, C’s parent is B, F’s parent is A, and A has no parent.

child.

Some nodes have children, which are nodes connected directly below it. A’s children are F and B, C’s are D and E, B’s only child is C, and E has no children.

sibling.

A node with the same parent. E’s sibling is D, B’s is F, and none of the other nodes have siblings.

ancestor.

Your parent, grandparent, great-grandparent, etc., all the way back to the root. B’s only ancestor is A, while E’s ancestors are C, B, and A. Note that F is not C’s ancestor, even though it’s above it on the diagram: there’s no connection from C to F, except back through the root (which doesn’t count).

descendent.

Your children, grandchildren, great-grandchildren, etc., all the way the leaves. B’s descendents are C, D and E, while A’s are F, B, C, D, and E.

leaf.

A node with no children. F, D, and E are leaves. Note that in a (very) small tree, the root could itself be a leaf.

internal node.

Any node that’s not a leaf. A, B, and C are the internal nodes in our example.

depth (of a node).

A node’s depth is the distance (in number of nodes) from it to the root. The root itself has depth zero. In our example, B is of depth 1, E is of depth 3, and A is of depth 0.

height (of a tree).

A rooted tree’s height is the maximum depth of any of its nodes; i.e., the maximum distance from the root to any node. Our example has a height of 3, since the “deepest" nodes are D and E, each with a depth of 3. A tree with just one node is considered to have a height of 0. Bizarrely, but to be consistent, we’ll say that the empty tree has height -1! Strange, but what else could it be? To say it has height 0 seems inconsistent with a one-node tree also having height 0. At any rate, this won’t come up much.

level.

All the nodes with the same depth are considered on the same “level." B and F are on level 1, and D and E are on level 3. Nodes on the same level are not necessarily siblings. If F had a child named G in the example diagram, then G and C would be on the same level (2), but would not be siblings because they have different parents. (We might call them “cousins" to continue the family analogy.)

subtree.

Finally, much of what gives trees their expressive power is their recursive nature. This means that a tree is made up of other (smaller) trees. Consider our example. It is a tree with a root of A. But the two children of A are each trees in their own right! F itself is a tree with only one node. B and its descendents make another tree with four nodes. We consider these two trees to be subtrees of the original tree. The notion of “root" shifts somewhat as we consider subtrees — A is the root of the original tree, but B is the root of the second subtree. When we consider B’s children, we see that there is yet another subtree, which is rooted at C. And so on. It’s easy to see that any subtree fulfills all the properties of trees, and so everything we’ve said above applies also to it.

Binary trees (BT’s)

The nodes in a rooted tree can have any number of children. There’s a special type of rooted tree, though, called a binary tree which we restrict by simply saying that each node can have at most two children. Furthermore, we’ll label each of these two children as the “left child" and “right child." (Note that a particular node might well have only a left child, or only a right child, but it’s still important to know which direction that child is.)

The left half of Figure \(\PageIndex{2}\) is a binary tree, but the right half is not (C has three children). A larger binary tree (of height 4) is shown in Figure \(\PageIndex{3}\) .

Traversing binary trees

There were two ways of traversing a graph: breadth-first, and depth-first. Curiously, there are three ways of traversing a tree: pre-order, post-order, and in-order. All three begin at the root, and all three consider each of the root’s children as subtrees. The difference is in the order of visitation.

5.2.3.jpg — Figure \(\PageIndex{3}\): A binary tree.

To traverse a tree pre-order, we:

Visit the root.
Treat the left child and all its descendents as a subtree, and traverse it in its entirety.
Do the same with the right child.

It’s tricky because you have to remember that each time you “treat a child as a subtree" you do the whole traversal process on that subtree. This involves remembering where you were once you finish.

Follow this example carefully. For the tree in Figure \(\PageIndex{3}\) , we begin by visiting G. Then, we traverse the whole “K subtree." This involves visiting K itself, and then traversing its whole left subtree (anchored at D). After we visit the D node, we discover that it actually has no left subtree, so we go ahead and traverse its right subtree. This visits O followed by I (since O has no left subtree either) which finally returns back up the ladder.

It’s at this point where it’s easy to get lost. We finish visiting I, and then we have to ask “okay, where the heck were we? How did we get here?" The answer is that we had just been at the K node, where we had traversed its left (D) subtree. So now what is it time to do? Traverse the right subtree, of course, which is M. This involves visiting M, C, and E (in that order) before returning to the very top, G.

Now we’re in the same sort of situation where we could have gotten lost before: we’ve spent a lot of time in the tangled mess of G’s left subtree, and we just have to remember that it’s now time to do G’s right subtree. Follow this same procedure, and the entire order of visitation ends up being: G, K, D, O, I, M, C, E, H, A, B, F, N, L. (See Figure \(\PageIndex{4}\) for a visual.)

5.2.4.jpg — Figure \(\PageIndex{4}\): The order of node visitation in **pre-order** traversal.

To traverse a tree post-order, we:

Treat the left child and all its descendents as a subtree, and traverse it in its entirety.
Do the same with the right child.
Visit the root.

It’s the same as pre-order, except that we visit the root after the children instead of before. Still, despite its similarity, this has always been the trickiest one for me. Everything seems postponed, and you have to remember what order to do it in later.

For our sample tree, the first node visited turns out to be I. This is because we have to postpone visiting G until we finish its left (and right) subtree; then we postpone K until we finish its left (and right) subtree; postpone D until we’re done with O’s subtree, and postpone O until we do I. Then finally, the thing begins to unwind...all the way back up to K. But we can’t actually visit K itself yet, because we have to do its right subtree. This results in C, E, and M, in that order. Then we can do K, but we still can’t do G because we have its whole right subtree’s world to contend with. The entire order ends up being: I, O, D, C, E, M, K, A, F, L, N, B, H, and finally G. (See Figure \(\PageIndex{5}\) for a visual.)

Note that this is not remotely the reverse of the pre-order visitation, as you might expect. G is last instead of first, but the rest is all jumbled up.

5.2.5.jpg — Figure \(\PageIndex{5}\): The order of node visitation in **post-order** traversal.

Finally, to traverse a tree in-order, we:

Treat the left child and all its descendents as a subtree, and traverse it in its entirety.
Visit the root.
Traverse the right subtree in its entirety.

So instead of visiting the root first (pre-order) or last (post-order) we treat it in between our left and right children. This might seem to be a strange thing to do, but there’s a method to the madness which will become clear in the next section.

For the sample tree, the first visited node is D. This is because it’s the first node encountered that doesn’t have a left subtree, which means step 1 doesn’t need to do anything. This is followed by O and I, for the same reason. We then visit K before its right subtree, which in turn visits C, M, and E, in that order. The final order is: D, O, I, K, C, M, E, G, A, H, F, B, L, N. (See Figure \(\PageIndex{6}\) .)

If your nodes are spaced out evenly, you can read the in-order traversal off the diagram by moving your eyes left to right. Be careful about this, though, because ultimately the spatial position doesn’t matter, but rather the relationships between nodes. For instance, if I had drawn node I further to the right, in order to make the lines between D–O–I less steep, that I node might have been pushed physically to the right of K. But that wouldn’t change the order and have K visited earlier.

5.2.6.jpg — Figure \(\PageIndex{6}\): The order of node visitation in **in-order** traversal.

Finally, it’s worth mentioning that all of these traversal methods make elegant use of recursion. Recursion is a way of taking a large problem and breaking it up into similar, but smaller, subproblems. Then, each of those subproblems can be attacked in the same way as you attacked the larger problem: by breaking them up into subproblems. All you need is a rule for eventually stopping the “breaking up" process by actually doing something.

Every time one of these traversal processes treats a left or right child as a subtree, they are “recursing" by re-initiating the whole traversal process on a smaller tree. Pre-order traversal, for instance, after visiting the root, says, “okay, let’s pretend we started this whole traversal thing with the smaller tree rooted at my left child. Once that’s finished, wake me up so I can similarly start it with my right child." Recursion is a very common and useful way to solve certain complex problems, and trees are rife with opportunities.

Sizes of binary trees

Binary trees can be any ragged old shape, like our Figure \(\PageIndex{3}\) example. Sometimes, though, we want to talk about binary trees with a more regular shape, that satisfy certain conditions. In particular, we’ll talk about three special kinds:

full binary tree.

A full binary tree is one in which every node (except the leaves) has two children. Put another way, every node has either two children or none: no stringiness allowed. Figure \(\PageIndex{3}\) is not full, but it would be if we added the three blank nodes in Figure \(\PageIndex{7}\) .

5.2.7.jpg — Figure \(\PageIndex{7}\): A full binary tree.

By the way, it isn’t always possible to have a full binary tree with a particular number of nodes. For instance, a binary tree with two nodes, can’t be full, since it inevitably will have a root with only one child.

complete binary tree.

A complete binary tree is one in which every level has all possible nodes present, except perhaps for the deepest level, which is filled all the way from the left. Figure \(\PageIndex{7}\) is not full, but it would be if we fixed it up as in Figure \(\PageIndex{8}\) .

5.2.8.jpg — Figure \(\PageIndex{8}\): A complete binary tree.

Unlike full binary trees, it is always possible to have a complete binary tree no matter how many nodes it contains. You just keep filling in from left to right, level after level.

perfect binary tree.

Our last special type has a rather audacious title, but a “perfect" tree is simply one that is exactly balanced: every level is completely filled. Figure \(\PageIndex{8}\) is not perfect, but it would be if we either added nodes to fill out level 4, or deleted the unfinished part of level 3 (as in Figure \(\PageIndex{9}\) .)

5.2.9.png — Figure \(\PageIndex{9}\): A “perfect" binary tree.

Perfect binary trees obviously have the strictest size restrictions. It’s only possible, in fact, to have perfect binary trees with \(2^{h+1}-1\) nodes, if \(h\) is the height of the tree. So there are perfect binary trees with 1, 3, 7, 15, 31, ... nodes, but none in between. In each such tree, \(2^h\) of the nodes (almost exactly half) are leaves.

Now as we’ll see, binary trees can possess some pretty amazing powers if the nodes within them are organized in certain ways. Specifically, a binary search tree and a heap are two special kinds of binary trees that conform to specific constraints. In both cases, what makes them so powerful is the rate at which a tree grows as nodes are added to it.

Suppose we have a perfect binary tree. To make it concrete, let’s say it has height 3, which would give it 1+2+4+8=15 nodes, 8 of which are leaves. Now what happens if you increase the height of this tree to 4? If it’s still a “perfect" tree, you will have added 16 more nodes (all leaves). Thus you have doubled the number of leaves by simply adding one more level. This cascades the more levels you add. A tree of height 5 doubles the number of leaves again (to 32), and height 6 doubles it again (to 64).

If this doesn’t seem amazing to you, it’s probably because you don’t fully appreciate how quickly this kind of exponential growth can accumulate. Suppose you had a perfect binary tree of height 30 — certainly not an awe-inspiring figure. One could imagine it fitting on a piece of paper...height-wise, that is. But run the numbers and you’ll discover that such a tree would have over , more than one for every person in the United States. Increase the tree’s height to a mere 34 — just 4 additional levels — and suddenly you have over 8 billion leaves, easily greater than the population of planet Earth.

The power of exponential growth is only fully reached when the binary tree is perfect, since a tree with some “missing" internal nodes does not carry the maximum capacity that it’s capable of. It’s got some holes in it. Still, as long as the tree is fairly bushy (i.e., it’s not horribly lopsided in just a few areas) the enormous growth predicted for perfect trees is still approximately the case.

The reason this is called “exponential" growth is that the quantity we’re varying — the height — appears as an exponent in the number of leaves, which is \(2^h\). Every time we add just one level, we double the number of leaves.

So the number of leaves (call it \(l\)) is \(2^h\), if \(h\) is the height of the tree. Flipping this around, we say that \(h = \lg(l)\). The function “lg" is a logarithm, specifically a logarithm with base-2. This is what computer scientists often use, rather than a base of 10 (which is written “log") or a base of \(e\) (which is written “ln"). Since \(2^h\) grows very, very quickly, it follows that \(\lg(l)\) grows very, very slowly. After our tree reaches a few million nodes, we can add more and more nodes without growing the height of the tree significantly at all.

The takeaway message here is simply that an incredibly large number of nodes can be accommodated in a tree with a very modest height. This makes it possible to, among other things, search a huge amount of information astonishingly quickly...provided the tree’s contents are arranged properly.

Binary trees (BT’s)

The nodes in a rooted tree can have any number of children. There’s a special type of rooted tree, though, called a binary tree which we restrict by simply saying that each node can have at most two children. Furthermore, we’ll label each of these two children as the “left child" and “right child." (Note that a particular node might well have only a left child, or only a right child, but it’s still important to know which direction that child is.)

The left half of Figure \(\PageIndex{9}\) is a binary tree, but the right half is not (C has three children). A larger binary tree (of height 4) is shown in Figure \(\PageIndex{10}\) .

Traversing binary trees

There were two ways of traversing a graph: breadth-first, and depth-first. Curiously, there are three ways of traversing a tree: pre-order, post-order, and in-order. All three begin at the root, and all three consider each of the root’s children as subtrees. The difference is in the order of visitation.

5.2.10.jpg — Figure \(\PageIndex{10}\): A binary tree.

To traverse a tree pre-order, we:

Visit the root.
Treat the left child and all its descendents as a subtree, and traverse it in its entirety.
Do the same with the right child.

It’s tricky because you have to remember that each time you “treat a child as a subtree" you do the whole traversal process on that subtree. This involves remembering where you were once you finish.

Follow this example carefully. For the tree in Figure \(\PageIndex{10}\) , we begin by visiting G. Then, we traverse the whole “K subtree." This involves visiting K itself, and then traversing its whole left subtree (anchored at D). After we visit the D node, we discover that it actually has no left subtree, so we go ahead and traverse its right subtree. This visits O followed by I (since O has no left subtree either) which finally returns back up the ladder.

It’s at this point where it’s easy to get lost. We finish visiting I, and then we have to ask “okay, where the heck were we? How did we get here?" The answer is that we had just been at the K node, where we had traversed its left (D) subtree. So now what is it time to do? Traverse the right subtree, of course, which is M. This involves visiting M, C, and E (in that order) before returning to the very top, G.

Now we’re in the same sort of situation where we could have gotten lost before: we’ve spent a lot of time in the tangled mess of G’s left subtree, and we just have to remember that it’s now time to do G’s right subtree. Follow this same procedure, and the entire order of visitation ends up being: G, K, D, O, I, M, C, E, H, A, B, F, N, L. (See Figure \(\PageIndex{11}\) for a visual.)

5.2.11.jpg — Figure \(\PageIndex{11}\): The order of node visitation in **pre-order** traversal.

To traverse a tree post-order, we:

Treat the left child and all its descendents as a subtree, and traverse it in its entirety.
Do the same with the right child.
Visit the root.

It’s the same as pre-order, except that we visit the root after the children instead of before. Still, despite its similarity, this has always been the trickiest one for me. Everything seems postponed, and you have to remember what order to do it in later.

For our sample tree, the first node visited turns out to be I. This is because we have to postpone visiting G until we finish its left (and right) subtree; then we postpone K until we finish its left (and right) subtree; postpone D until we’re done with O’s subtree, and postpone O until we do I. Then finally, the thing begins to unwind...all the way back up to K. But we can’t actually visit K itself yet, because we have to do its right subtree. This results in C, E, and M, in that order. Then we can do K, but we still can’t do G because we have its whole right subtree’s world to contend with. The entire order ends up being: I, O, D, C, E, M, K, A, F, L, N, B, H, and finally G. (See Figure \(\PageIndex{11}\) for a visual.)

Note that this is not remotely the reverse of the pre-order visitation, as you might expect. G is last instead of first, but the rest is all jumbled up.

5.2.12.png — Figure \(\PageIndex{12}\): The order of node visitation in **post-order** traversal.

Finally, to traverse a tree in-order, we:

Treat the left child and all its descendents as a subtree, and traverse it in its entirety.
Visit the root.
Traverse the right subtree in its entirety.

So instead of visiting the root first (pre-order) or last (post-order) we treat it in between our left and right children. This might seem to be a strange thing to do, but there’s a method to the madness which will become clear in the next section.

For the sample tree, the first visited node is D. This is because it’s the first node encountered that doesn’t have a left subtree, which means step 1 doesn’t need to do anything. This is followed by O and I, for the same reason. We then visit K before its right subtree, which in turn visits C, M, and E, in that order. The final order is: D, O, I, K, C, M, E, G, A, H, F, B, L, N. (See Figure \(\PageIndex{13}\) .)

If your nodes are spaced out evenly, you can read the in-order traversal off the diagram by moving your eyes left to right. Be careful about this, though, because ultimately the spatial position doesn’t matter, but rather the relationships between nodes. For instance, if I had drawn node I further to the right, in order to make the lines between D–O–I less steep, that I node might have been pushed physically to the right of K. But that wouldn’t change the order and have K visited earlier.

5.2.13.jpg — Figure \(\PageIndex{13}\): The order of node visitation in **in-order** traversal.

Finally, it’s worth mentioning that all of these traversal methods make elegant use of recursion. Recursion is a way of taking a large problem and breaking it up into similar, but smaller, subproblems. Then, each of those subproblems can be attacked in the same way as you attacked the larger problem: by breaking them up into subproblems. All you need is a rule for eventually stopping the “breaking up" process by actually doing something.

Every time one of these traversal processes treats a left or right child as a subtree, they are “recursing" by re-initiating the whole traversal process on a smaller tree. Pre-order traversal, for instance, after visiting the root, says, “okay, let’s pretend we started this whole traversal thing with the smaller tree rooted at my left child. Once that’s finished, wake me up so I can similarly start it with my right child." Recursion is a very common and useful way to solve certain complex problems, and trees are rife with opportunities.

Sizes of binary trees

Binary trees can be any ragged old shape, like our Figure \(\PageIndex{10}\) example. Sometimes, though, we want to talk about binary trees with a more regular shape, that satisfy certain conditions. In particular, we’ll talk about three special kinds:

full binary tree.

A full binary tree is one in which every node (except the leaves) has two children. Put another way, every node has either two children or none: no stringiness allowed. Figure \(\PageIndex{10}\) is not full, but it would be if we added the three blank nodes in Figure \(\PageIndex{14}\) .

5.2.14.png — Figure \(\PageIndex{14}\): A full binary tree.

By the way, it isn’t always possible to have a full binary tree with a particular number of nodes. For instance, a binary tree with two nodes, can’t be full, since it inevitably will have a root with only one child.

complete binary tree.

A complete binary tree is one in which every level has all possible nodes present, except perhaps for the deepest level, which is filled all the way from the left. Figure \(\PageIndex{14}\) is not full, but it would be if we fixed it up as in Figure \(\PageIndex{15}\).

5.2.15.png — Figure \(\PageIndex{15}\): A complete binary tree.

Unlike full binary trees, it is always possible to have a complete binary tree no matter how many nodes it contains. You just keep filling in from left to right, level after level.

perfect binary tree.

Our last special type has a rather audacious title, but a “perfect" tree is simply one that is exactly balanced: every level is completely filled. Figure \(\PageIndex{15}\) is not perfect, but it would be if we either added nodes to fill out level 4, or deleted the unfinished part of level 3 (as in Figure \(\PageIndex{16}\) .)

5.2.16.png — Figure \(\PageIndex{16}\): A “perfect" binary tree.