5.6: Counting Labeled Trees
How many trees are there with vertex set \([n]=\{1,2,…,n\}\)? Let \(T_n\) be this number. For \(n=1\), there is clearly only one tree. Also, for \(n=2\), there is only one tree, which is isomorphic to \(K_2\). In determining ,\(T_3\), we finally have some work to do; however, there's not much, since all trees on 3 vertices are isomorphic to \(P_3\). Thus, there are \(T_3=3\) labeled trees on 3 vertices, corresponding to which vertex is the one of degree 2. When \(n=4\), we can begin by counting the number of nonisomorphic trees and consider two cases depending on whether the tree has a vertex of degree 3. If there is a vertex of degree 3, the tree is isomorphic to \(K_{1,3}\) or it does not have a vertex of degree three, in which case it is isomorphic to \(P_4\), since there must be precisely two vertices of degree 2 in such a graph. There are four labelings by \([4]\) for \(K_{1,3}\) (choose the vertex of degree three). How many labelings by \([4]\) are there for \(P_4\)? There are \(C(4,2)\) ways to choose the labels \(i,j\) given to the vertices of degree 2 and two ways to select one of the remaining labels to be made adjacent to \(i\). Thus, there are 12 ways to label \(P_4\) by \([4]\) and so \(T_4 = 16\).
To this point, it looks like maybe there's a pattern forming. Perhaps it is the case that for all \(n \geq 1, T_n=n^{n−2}\). This is in fact the case, but let's see how it works out for \(n=5\) before proving the result in general. What are the nonisomorphic trees on five vertices? Well, there's \(K_{1,4}\) and \(P_5\) for sure, and there's also the third tree shown in Figure 5.38 . After thinking for a minute or two, you should be able to convince yourself that this is all of the possibilities. How many labelings by \([5]\) does each of these have? There are 5 for \(K_{1,4}\) since there are 5 ways to choose the vertex of degree 4. For \(P_5\), there are 5 ways to choose the middle vertex of the path, \(C(4,2)=6\) ways to label the two remaining vertices of degree 2 once the middle vertex is labeled, and then 2 ways to label the vertices of degree 1. This gives 60 labelings. For the last tree, there are 5 ways to label the vertex of degree 3, \(C(4,2)=6\) ways to label the two leaves adjacent to the vertex of degree 3, and 2 ways to label the remaining two vertices, giving 60 labelings. Therefore, \(T_5 = 125 = 5^3 = 5^{5-2}\).
It turns out that we are in fact on the right track, and we will now set out to prove the following:
The number \(T_n\) of labeled trees on \(n\) vertices is \(n^{n-2}\) .
This result is usually referred to as Cayley's Formula, although equivalent results were proven earlier by James J. Sylvester (1857) and Carl W. Borchardt (1860). The reason that Cayley's name is most often affixed to this result is that he was the first to state and prove it in graph theoretic terminology (in 1889). (Although one could argue that Cayley really only proved it for \(n=6\) and then claimed that it could easily be extended for all other values of \(n\), and whether such an extension can actually happen is open to some debate.) Cayley's Formula has many different proofs, most of which are quite elegant. If you're interested in presentations of several proofs, we encourage you to read the chapter on Cayley's Formula in Proofs from THE BOOK by Aigner, Ziegler, and Hofmann, which contains four different proofs, all using different proof techniques. Here we give a fifth proof, due to Prüfer and published in 1918. Interestingly, even though Prüfer's proof came after much of the terminology of graph theory was established, he seemed unaware of it and worked in the context of permutations and his own terminology, even though his approach clearly includes the ideas of graph theory. We will use a recursive technique in order to find a bijection between the set of labeled trees on \(n\) vertices and a natural set of size \(n^{n−2}\), the set of strings of length \(n−2\) where the symbols in the string come from \([n]\).
We define a recursive algorithm that takes a tree \(T\) on \(k \geq 2\) vertices labeled by elements of a set \(S\) of positive integers of size \(k\) and returns a string of length \(k−2\) whose symbols are elements of \(S\). (The set \(S\) will usually be \([k]\), but in order to define a recursive procedure, we need to allow that it be an arbitrary set of \(k\) positive integers.) This string is called the Prüfer code of the tree \(T\). Let prüferprüfer(\(T\)) denote the Prüfer code of the tree \(T\), and if \(v\) is a leaf of \(T\), let \(T−v\) denote the tree obtained from \(T\) by removing \(v\) (i.e., the subgraph induced by all the other vertices). We can then define prüferprüfer(\(T\)) recursively by the following procedure.
- If \(T≅K_2\), return the empty string.
- Else, let \(v\) be the leaf of \(T\) with the smallest label and let \(u\) be its unique neighbor. Let \(i\) be the label of \(u\). Return (\(i\), prüfer(\(T - v\))).
Before using Prüfer codes to prove Cayley's Formula, let's take a moment to make sure we understand how they are computed given a tree. Consider the 9-vertex tree \(T\) in Figure 5.41 .
How do we compute prüfer(\(T\))? Since \(T\) has more than two vertices, we use the second step and find that \(v\) is the vertex with label 2 and \(u\) is the vertex with label 6, so prüfer(\(T\))=(6,prüfer(\(T−v\))). The graph \(T−v\) is shown in Figure 5.42 .
The recursive call prüfer(\(T−v\)) returns (6,prüfer(\(T−v−v′\))), where \(v′\) is the vertex labeled 5. Continuing recursively, the next vertex deleted is 6, which appends a 4 to the string. Then 7 is deleted, appending 3. Next 8 is deleted, appending 1. This is followed by the deletion of 1, appending 4. Finally 4 is deleted, appending 3, and the final recursive call has the subtree isomorphic to \(K_2\) with vertices labeled 3 and 9, and an empty string is returned. Thus, prüfer(\(T\)) = 6643143.
We're now prepared to give a proof of Cayley's Formula.
- Proof
-
It is clear that prüfer(T) takes an \(n\)-vertex labeled tree with labels from \([n]\) and returns a string of length \(n−2\) whose symbols are elements of \([n]\). What we have yet to do is determine a way to take such a string and construct an \(n\)-vertex labeled tree from it. If we can find such a construction, we will have a bijection between the set \(T_n\) of labeled trees on \(n\) vertices and the set of strings of length \(n−2\) whose symbols come from \([n]\), which will imply that \(T_n = n^{n-2}\).
First, let's look at how prüfer(T) behaves. What numbers actually appear in the Prüfer code? The numbers that appear in the Prüfer code are the labels of the nonleaf vertices of \(T\). The label of a leaf simply cannot appear, since we always record the label of the neighbor of the leaf we are deleting, and the only way we would delete the neighbor of a leaf is if that neighbor were also a leaf, which can only happen \(T≅K_2\), in which case prüfer(T) simply returns the empty string. Thus if \(I \subset [n]\) is the set of symbols that appear in prüfer(T), the labels of the leaves of \(T\) are precisely the elements of \([n] - I\).
With the knowledge of which labels belong to the leaves of \(T\) in hand, we are ready to use induction to complete the proof. Our goal is to show that if given a string \(s=s_1s_2 \cdot \cdot \cdot s_{n−2}\) whose symbols come from a set \(S\) of \(n\) elements, there is a unique tree \(T\) with prüfer(T)=\(s\). If \(n=2\), the only such string is the empty string, so 1 and 2 both label leaves and we can construct only \(K_2\). Now suppose we have the result for some \(m \geq 2\), and we try to prove it for \(m+1\). We have a string \(s=s_1s_2 \cdot \cdot \cdot s_{m−1}\) with symbols from \([m+1]\). Let \(I\) be the set of symbols appearing in \(s\) and let \(k\) be the least element of \([m+1]−I\). By the previous paragraph, we know that \(k\) is the label of a leaf of \(T\) and that its unique neighbor is the vertex labeled \(s_1\). The string \(s′=s_2s_3 \cdot \cdot \cdot s_{m−1}\) has length \(m−2\) and since \(k\) does not appear in \(s\), its symbols come from \(S=[m+1]−\{k\}\), which has size \(m\). Thus, by induction, there is a unique tree \(T′\) whose Prüfer code is \(s′\). We form \(T\) from \(T′\) by attaching a leaf with label \(k\) to the vertex of \(T′\) with label \(s_1\) and have a tree of the desired type.
We close this section with an example of how to take a Prüfer code and use it to construct a labeled tree. Consider the string \(s=75531\) as a Prüfer code. Then the tree \(T\) corresponding to \(s\) has 7 vertices, and its leaves are labeled 2, 4, and 6. The inductive step in our proof attaches the vertex labeled 2 to the vertex labeled 7 in the tree \(T′\) with Prüfer code 5531 and vertex labels \(\{1,3,4,5,6,7\}\), since 2 is used to label the last vertex added. What are the leaves of \(T′\)? The symbols in \(\{4,6,7\}\) do not appear in 5531, so they must be the labels of leaves, and the construction says that we would attach the vertex labeled 4 to the vertex labeled 5 in the tree we get by induction. In Figure 5.44 , we show how this recursive process continues.
We form each row from the row above it by removing the first label used on the edge added from the label set and removing the first symbol from the Prüfer code. Once the Prüfer code becomes the empty string, we know that the two remaining labels must be the labels we place on the ends of \(K_2\) to start building \(T\). We then work back up the edge added column, adding a new vertex and the edge indicated. The tree we construct in this manner is shown in Figure 5.45 .