# 3.3: Compactness

- Page ID
- 9695

The Completeness Theorem finishes our link between deducibility and logical implication. The Compactness Theorem is our first use of that link. In some sense, what the Compactness Theorem does is focus our attention on the finiteness of deductions, and then we can begin to use that finiteness to our advantage.

Theorem 3.3.1: Compactness Theorem

*Let \(\Sigma\) be any set of axioms. There is a model of \(\Sigma\) if and only if every finite subset \(\Sigma_0\) of \(\Sigma\) has a model.*

We say that \(\Sigma\) is **satisfiable** if there is a model of \(\Sigma\), and we say that \(\Sigma\) is **finitely satisfiable** if every finite subset of \(\Sigma\) has a model. So the Compactness Theorem says that \(\Sigma\) is satisfiable if and only if \(\Sigma\) is finitely satisfiable.

proof

For the easy direction, suppose that \(\Sigma\) has a model \(\mathfrak{A}\). Then \(\mathfrak{A}\) is also a model of every finite \(\Sigma_0 \subseteq \Sigma\).

For the more difficult direction, assume there is no model of \(\Sigma\). Then \(\Sigma \models \perp\). By the Completeness Theorem, \(\Sigma \vdash \perp\), so there is a deduction \(D\) of \(\perp\) from \(\Sigma\). Since \(D\) is a deduction, it is finite in length and thus can only contain finitely many of the axioms of \(\Sigma\). Let \(\Sigma_0\) be the finite set of axioms from \(\Sigma\) that are used in \(D\). Then \(D\) is a deduction from \(\Sigma_0\), so \(\Sigma_0 \vdash \perp\). But then by the Soundness Theorem, \(\Sigma_0 \models \perp\), so \(\Sigma_0\) cannot have a model.

Corollary 3.3.2.

*Let \(\Sigma\) be a set of \(\mathcal{L}\)-formulas and let \(\theta\) be an \(\mathcal{L}\)-formula. \(\Sigma \models \theta\) if and only if there is a finite \(\Sigma_0 \subseteq \Sigma\) such that \(\Sigma_0 \models \theta\).*

proof

\[\begin{array}{rll} \Sigma \models \theta & \text{iff} \: \Sigma \vdash \theta & \text{Soundness and Completeness} \\ & \text{iff} \: \Sigma_0 \vdash \theta \: \text{for a finite} \: \Sigma_0 \subseteq \Sigma & \text{deductions are finite} \\ & \text{iff} \: \Sigma_0 \models \theta & \text{Soundness and Completeness} \end{array}\]

Now we are in a position where we can use the Compactness Theorem to get a better understanding of the limitations of first-order logic - or, to put a more positive spin on it, a better understanding of the richness of mathematics!

Example 3.3.3:

Suppose that we examine the \(\mathcal{L}_{NT}\)-structure \(\mathfrak{N}\), whose universe is the set of natural numbers \(\mathbb{N}\), endowed with the familiar arithmetic functions of addition, multiplication, and exponentiation and the usual binary relation less than. It would be nice to have a collection of axioms that would characterize the structure \(\mathfrak{N}\). By this we mean a set of sentences \(\Sigma\) such that \(\mathfrak{N} \models \Sigma\), and if \(\mathfrak{A}\) is any \(\mathcal{L}_{NT}\)-structure such that \(\mathfrak{A} \models \Sigma\), then \(\mathfrak{A}\) is "just like" \(\mathfrak{N}\). (\(\mathfrak{A}\) is "just like" \(\mathfrak{N}\) if there \(\mathfrak{A}\) and \(\mathfrak{N}\) are isomorphic - see Exercise 5 in Section 1.6).

Unfortunately, we cannot hope to have such a set of sentences, and the Compactness Theorem shows us why. Suppose we took any set of sentences \(\Sigma\) that seemed like it ought to characterize \(\mathfrak{N}\). Let us add some sentences to \(\Sigma\) and create a new collection of sentences \(\Theta\) in an extended language \(\mathcal{L} = \mathcal{L}_{NT} \cup \{ c \}\), where \(c\) is a new constant symbol:

\[\Theta = \Sigma \cup \{ 0 < c, S0 < c, SS0 < c, \ldots, SSS \cdots S0 \left( nS \text{'s} \right) < c, \ldots \}.\]

Now notice that \(\Theta\) is finitely satisfiable. If \(\Theta_0\) is a finite subset of \(\Theta\), then \(\Theta_0\) is a subset of

\[\Theta_n = \Sigma \cup \{ 0 < c, S0 < c, SS0 < c, \ldots, SSS \cdots S0 \left( nS \text{'s} \right) < c \}\]

For some natural number \(n\). But \(\Theta_n\) has a model \(\mathfrak{N}_n\), whose universe is \(\mathbb{N}\), the functions and relations are interpreted in the usual way, and \(c^{\mathfrak{N}_n} = n + 1\). So every finite subset of \(\Theta\) has a model, and thus \(\Theta\) has a model \(\mathfrak{A}^\prime\). Now forget the interpretation of the constant symbol \(c\) and you are left with an \(\mathcal{L}_{NT}\)-structure \(\mathfrak{A} = \mathfrak{A}^\prime \upharpoonright_{\mathcal{L}_{NT}}\). This model \(\mathfrak{A}\) is interesting, but we cannot claim that \(\mathfrak{A}\) is "just like" \(\mathfrak{N}\), since \(\mathfrak{A}\) has an element (the ting that used to be called \(c^{\mathfrak{A}^\prime}\)) such that there are infinitely many elements \(x\) that stand in the relation \(<\) with that element, while there is no such element of \(\mathfrak{N}\). The element \(c^{\mathfrak{A}^\prime}\) is called a nonstandard model of arithmetic, a model of arithmetic that is not isomorphic to \(\mathfrak{N}\). We first encountered nonstandard models of arithmetic in Exercise 7 of Section 2.8.

So no set of first-order sentences can completely characterize the natural numbers.

*Chaff:* Isn't this neat! Notice how each of the \(\mathfrak{N}_n\)'s in the last example were perfectly ordinary models that looked just like the natural numbers, but the thing that we got at the end looked entirely different!

Definition 3.3.4

If \(\mathfrak{A}\) is an \(\mathcal{L}\)-structure, we define the **theory of \(\mathfrak{A}\)** to be \(Th \left( \mathfrak{A} \right) = \{ \phi | \phi \: \text{is an} \: \mathcal{L} \text{-formula and} \: \mathfrak{A} \models \phi \}\). If \(\mathfrak{A}\) and \(\mathfrak{B}\) are \(\mathcal{L}\)-structures such that \(Th \left( \mathfrak{A} \right) = Th \left( \mathfrak{B} \right)\), then we say that \(\mathfrak{A}\) and \(\mathfrak{B}\) are **elementarily equivalent**, and write \(\mathfrak{A} \equiv \mathfrak{B}\).

If \(\mathfrak{A} \equiv \mathfrak{N}\), we say that \(\mathfrak{A}\) is a **model of arithmetic**.

Notice that the weird structure \(\mathfrak{A}\) that we constructed above can be a model of arithmetic if we just let the \(\Sigma\) of our constructure be \(Th \left( \mathfrak{N} \right)\). Exercise 2 asks you to prove that in this case we have \(\mathfrak{A} \equiv \mathfrak{N}\). Since \(\mathfrak{A}\) certainly is not anything like the usual model of arithmetic on the natural numbers, calling \(\mathfrak{A}\) a nonstandard model of arithmetic makes pretty good sense. The difficult thing to see is that although the universe \(A\) certainly contains nonstandard elements, they don't get in the way of elementary equivalence. The reason for this is that the language \(\mathcal{L}_{NT}\) can't refer to any nonstandard element explicitly, so we can't express a statement that is (for example) true in \(\mathfrak{N}\) but false in \(\mathfrak{A}\). So the lesson to be learned is that it is much easier for two structures to be elementarily equivalent than it is for them to be isomorphic. Our structure \(\mathfrak{A}\) is not isomorphic to \(\mathfrak{N}\), but \(\mathfrak{A}\) is elementarily equivalent to \(\mathfrak{N}\).

Example 3.3.5:

Remember those \(\varepsilon\)'s and \(\delta\)'s from calculus? They were introduced in the nineteenth century in an attempt to firm up the foundations of the subject. When they were developing the calculus, Newton and Leibniz did not worry about limits. They happily used quantities that were infinitely small but not quite zero and they ignored the logical difficulties this presented. These infinitely small quantities live on in today's calculus textbooks as the differentials \(dx\) and \(dy\).

Most people find thinking about differentials much easier than fighting through limit computations, and in 1961 Abraham Robinson developed a logical framework for calculus that allowed the use of these infinitesimals in a coherent, noncontradictory way. Robinson's version of the calculus came to be known as nonstandard analysis. Here is a rough introduction (for a complete treatment, see [Keisler 76]).

Taking as our starting point the real numbers that you know so well, we construct a language \(\mathcal{L}_\mathbb{R}\), the language of the real numbers. For each real number \(r\), the language \(\mathcal{L}_\mathbb{R}\) includes a constant symbol \(\dot{r}\). So the language \(\mathcal{L}_\mathbb{R}\) includes constant symbols \(\dot{0}\), \(\dot{\pi}\), and \(\frac{\dot{2}}{7}\). For each function \(f : \mathbb{R}^n \rightarrow \mathbb{R}\), we toss in a function symbol \(\dot{f}\), and for each \(n\)-ary relation \(R\) on the reals we add an \(n\)-ary relation symbol \(\dot{R}\). So our language includes, for example, the function symbols \(\dot{+}\) and \(\dot{\cos}\) and the relation symbol \(\dot{<}\).

Now we define \(\mathfrak{R}\) to be the \(\mathcal{L}_\mathbb{R}\)-structure \(\left( \mathbb{R}, \{ r \}, \{ f \}, \{ R \} \right)\), where each symbol is interpreted as meaning the number, function, or relation that gave rise to the symbol. So the function symbol \(\dot{+}\) stands for the function addition, and the constant symbol \(\dot{\pi}\) refers to the real number that is equal to the ratio of the circumference of a circle to its diameter.

Given this structure \(\mathfrak{R}\) (notice that \(\mathfrak{R}\) is not anything fancy - it is just the real numbers you have been working with since high school), it generates the set of formulas \(Th \left( \mathfrak{R} \right)\), the collection of first-order \(\mathcal{L}_\mathbb{R}\)-formulas that are true statements about the real numbers. Now it is time to use compactness.

Let \(\mathcal{L}^\prime = \mathcal{L}_\mathbb{R} \cup \{ c \}\), where \(c\) is a new constant symbol, and look at the collection of \(\mathcal{L}^\prime\)-sentences

\[\Theta = Th \left( \mathfrak{R} \right) \cup \{ \dot{0} \dot{<} c \} \cup \{ c \dot{<} \dot{r} | r \in \mathbb{R}, r > 0 \}.\]

(Are you clear about the difference between the dotted and the undotted symbols in this definition?)

By the Compactness Theorem, \(\Theta\) has a model, \(\mathfrak{A}\), and in the model \(\mathfrak{A}\), the element denoted by \(c\) plays the role of an infinitesimal element: It is positive, yet it is smaller than every positive real number. Speaking roughly, in the universe \(A\) of the structure \(\mathfrak{A}\) there are three kinds of elements. There are pure standard elements, which constitute a copy of \(\mathbb{R}\) that lives inside \(A\). Then there are pure nonstandard elements, for example, the element denoted by \(c\). Finally, there are elements such as the object denoted by \(\dot{17} \dot{+} c\), which has a standard part and a nonstandard part. (For more of the details, see Exercise 11 in Section 3.4.)

The nonstandard elements of the structure \(\mathfrak{R}\) provide a method for developing derivatives without using limits. For example, we can define the derivative of a function \(f\) at a standard element \(a\) to be

\[f^\prime \left( a \right) = \: \text{the standard part of} \: \frac{f \left( a + c \right) - f \left( a \right)}{c}.\]

As you can see, there is no limit in the definition. We have traded the limits of calculus for the nonstandard elements of \(\mathfrak{A}\), and the slope of a tangent line is nothing more than a slope of a line connecting two points, one of which is not standard. Nonstandard analysis has been an area of active study for the past forty years, and although it is not exactly mainstream, it has been used to discover some new results in various areas of classical analysis.

Example 3.3.6:

The idea of coloring a map is supposed to be intuitive. When you were in geography class as a child, you were doubtless given a map of a region and asked to color in the various countries, or states, or provinces. And you were missing the point if you used the same color to shade two countries that shared a common border, although it was permitted to use the same color for two countries whose borders met at a single point. (The states of Utah, Colorado, Arizona, and New Mexico do this in the United States, so coloring both Colorado and Arizona with the color red would be permitted.) The question of how many colors are needed to color any map drawn on the plane was first posed in 1852 by Francis Guthrie, and the answer, that four colors suffice for any such map (as long as each political division consists of a single region - Michigan in a map of the United States or pre-1971 Pakistan in a map of Asia would not be permitted), was proven in 1976 by Kenneth Appel and Wolfgang Hakin. We are not going to prove the Four-Color Theorem here; rather, we extend this result by considering maps with infinitely many regions.

Let \(R\) be a set (I'm thinking of the elements of \(R\) as being the regions of a map with infinitely many countries) with a symmetric binary relation \(A\) (adjacency). Let \(k\) be a natural number. We claim that it is possible to assign to each region of \(R\) one of \(k\) possible colors in such a way that adjacent regions receive different colors if and only if it is possible to so color each finite subset of \(R\).

We will prove this using the Compactness Theorem. One of the tricks to using compactness is to choose your language wisely. For this example, let the language \(\mathcal{L}\) consist of a collection of constants \(\{ r \}_{r \in R}\), one for each region, and a collection of unary predicates \(\{ C_i \}_{1 \leq i \leq k}\), one for each color. So the atomic statement \(C_i \left( r \right)\) will be interpreted as meaning that region \(r\) gets colored with color \(i\). We will also need a binary relation symbol \(A\), for adjacency.

Let \(\Sigma\) be the collection of sentences:

\[\Sigma = \begin{cases} \begin{array}{ll} C_1 \left( r \right) \lor C_2 \left( r \right) \lor \cdots \lor C_k \left( r \right) & \text{for each} \: r \in R \\ \neg \left[ C_i \left( r \right) \land C_j \left( r \right) \right] & r \in R, i \neq j \\ A \left( r, r^\prime \right) \rightarrow \left( \neg C_i \left( r \right) \land C_i \left( r^\prime \right) \right) & r, r^\prime \in R, 1 \leq i \leq k \\ A \left( r, r^\prime \right) & r, r^\prime \in R, r \: \text{adjacent to} \: r^\prime \\ \neg A \left( r, r^\prime \right) & r, r^\prime \in R, r \: \text{not adjacent to} \: r^\prime \end{array} \end{cases}\]

*Chaff:* Stop now for a minute and make sure that you understand each of the sentences in \(\Sigma\). You ought to be able to say, in ordinary English, what each sentences asserts. For example, \(C_1 \left( r \right) \lor C_2 \left( r \right) \lor \cdots \lor C_k \left( r \right)\) says that region \(r\) must be given one of the \(k\) colors. In other words, we have to color each region on the map. Take the time now to translate each of the other statement types of \(\Sigma\) into English.

But now our claim that an infinite map is \(k\)-colorable if and only if each finite subset of the map is \(k\)-colorable is clear, as a coloring of (a finite subset of) \(R\) corresponds to a model of (a finite subset of) \(\Sigma\), and the Compactness Theorem says that \(\Sigma\) has a model if and only if every finite subset of \(\Sigma\) has a model.

Notice that no quantifiers are used in this example, so we really only needed compactness for predicate logic, not first-order logic. If you are comfortable with the terms, notice also that the proof works whether there are a countably infinite or an uncountably infinite collection of countries.

If You have really been paying attention, you noticed that we did not use the fact that the maps are drawn on the plane. So if we draw a map on a donut with uncountably many countries, it only takes seven colors to color the map, as it was proven in 1890 by Percy John Heawood that seven colors suffice for finite maps drawn on a donut.

Example 3.3.7:

You may well be familiar with mathematical trees, as they are often discussed in courses in discrete mathematics or introductory computer science courses. For our purposes a **tree** is a set \(T\) partitioned into subsets \(T_i\), \(\left( i = 0, 1, 2, \ldots \right)\), called the levels of the tree, together with a function \(a\) such that:

- \(T_0\) consists of a single element (called the root of the tree).
- \(a : \left( T - T_0 \right) \rightarrow T\) such that if \(t \in T_i, i > 0\), then \(a \left( t \right) \in T_{i - 1}\).

A **path** through \(T\) consists of a subset \(P \subseteq T\) such that \(P \cap T_i\) contains exactly one element for each \(i\) and \(P\) is closed under \(a\). If \(t \in T\), the **immediate predecessor** of \(t\) is \(a \left( t \right)\). And an element \(t_2\) is said to be a **predecessor** of \(t_1\) if \(t_2 = a \left( a \left( \cdots a \left( t_1 \right) \right) \right) \left( k \: a \text{'s} \right)\) for some \(k \geq 1\).

We can now use the Compactness Theorem to prove

lemma 3.3.8: König's Infinity Lemma

*Let \(T\) be a tree all of whose levels are finite and nonempty. Then there is a path through \(T\).*

proof

Suppose that we are given such a tree \(T\). Let \(\mathcal{L}\) be the language consisting of one constant symbol \(\hat{t}\) for each element \(t \in T\), a unary relation symbol \(Q\), which will be true for elements on the path, and one unary function symbol \(p\), where \(p \left( \hat{t}_i \right)\) is intended to be the immediate predecessor of \(t_i\).

Let \(\Sigma\) be the following set of \(\mathcal{L}\)-formulas:

\[\Sigma = \begin{cases} \begin{array}{ll} p \left( \hat{t}_1 \right) = \hat{t}_2 & \text{for each} \: t_1, t_2 \in T \: \text{such that} \: a \left( t_1 \right) = t_2 \\ Q \left( \hat{t}_1 \right) \lor \cdots \lor Q \left( \hat{t}_k \right) & \text{where} \: T_n = \{ t_1, t_2, \ldots, t_k \} \: \text{(for each} \: n \text{)} \\ \neg \left( Q \left( \hat{t}_1 \right) \land Q \left( \hat{t}_2 \right) \right) & \text{for} \: t_1, t_2 \in T_n, t_1 \neq t_2 \\ Q \left( \hat{t} \right) \rightarrow Q \left( p \left( \hat{t} \right) \right) & \text{for each} \: t \in T - T_0 \end{array} \end{cases}\]

We claim that \(\Sigma\) is finitely satisfiable: Let \(\Sigma_0\) be a finite subset of \(\Sigma\), and let \(n\) be so large that if \(\hat{t}\) is mentioned in \(\Sigma_0\), then \(t \in T_0 \cup T_1 \cup \cdots \cup T_n\). Pick any element \(t^* \in T_{n + 1}\), and build an \(\mathcal{L}\)-structure \(\mathfrak{A}\) by letting the universe \(A\) be the tree \(T\), \(\hat{t}^\mathfrak{A}\) be \(t\), letting \(p^\mathfrak{A}\) be the function \(a\), and letting \(Q^\mathfrak{A}\) be the collection of predecessors of \(t^*\). It is easy to check that \(\mathfrak{A}\) is a model of \(\Sigma_0\), and thus by compactness, there is a structure \(\mathfrak{B}\) such that \(\mathfrak{B}\) is a model of \(\Sigma\). If we let \(P = \{ t \in T | \hat{t}^\mathfrak{B} \in Q^\mathfrak{B} \}\), then \(P\) is a path through \(T\), and König's Infinity Lemma is proven.

## Exercises

- A common attempt to try to write a set of axioms that would characterize \(\mathfrak{N}\) (see Example 3.3.3) is to let \(\Sigma\) be the collection of
*all*\(\mathcal{L}_{NT}\)-formulas that are true in \(\mathfrak{N}\), and then to argue that this is an element of \(\Sigma\):

\[\left( \forall x \right) \left( \exists n \right) \left( x = SSS \cdots S0 \left( nS \text{'s} \right) \right).\]

Therefore, there can be no nonstandard elements in any model of \(\Sigma\). Explain why this reasoning fails. - Show that if we let \(\Sigma = Th \left( \mathfrak{N} \right)\) in the construction of Example 3.3.3, then the structure \(\mathfrak{A}\) that is constructed is elementarily equivalent to the structure \(\mathfrak{N}\). Thus \(\mathfrak{A}\) is a model of arithmetic.
- Show that if \(\mathfrak{A}\) and \(\mathfrak{B}\) are \(\mathcal{L}\)-structures such that \(\mathfrak{A} \cong \mathfrak{B}\), then \(\mathfrak{A} \equiv \mathfrak{B}\).
- Suppose that \(\Sigma\) is a set of \(\mathcal{L}\)-sentences such that at least one sentence from \(\Sigma\) is true in each \(\mathcal{L}\)-structure. Show that the disjunction of some finitely many sentences from \(\Sigma\) is logically valid.
- Show that every nonstandard model of arithmetic contains an infinite prime number, that is, an infinite number \(a\) such that if \(a = bc\), then either \(b = 1\) or \(c = 1\).
- Show that if \(\phi \left( x \right)\) is a formula with one free variable in \(\mathcal{L}_{NT}\) such that there are infinitely many natural numbers \(a\) such that \(\mathfrak{N} \models \phi \left( x \right) \left[ s \left[ x | a \right] \right]\), then in every nonstandard model of arithmetic \(\mathfrak{N}^*\) there is an infinite number \(b\) such that \(\mathfrak{N}^* \models \phi \left( x \right) \left[ s \left[ x | b \right] \right]\).
- Verify that we can use the Compactness Theorem in Example 3.3.5 by verifying that every finite subset of \(\Theta\) has a model.
- (a) Using only connectives, quantifiers, variables, and the equality symbol, construct a set of sentences \(\Sigma\) such that every model of \(\Sigma\) is infinite.

(b) Prove that if \(\Gamma\) is a set of sentences with arbitrarily large finite models, then \(\Gamma\) has an infinite model.

(c) Show that there can be no set of sentences in first-order logic that characterizes the finite groups. (See Exercise 3 in Section 2.8.)

(d) Prove that there is no finite set of sentences

\[\Phi = \{ \phi_i, \phi_2, \ldots, \phi_n \}\]

such that \(\mathfrak{A} \models \Phi\) if and only if \(A\) is infinite. [*Suggestion:*Look at \(\neg \left( \phi_1 \land \phi_2 \land \cdots \land \phi_n \right)\).] - Suppose that \(\Sigma_1\) and \(\Sigma_2\) are two sets of sentences such that no structure is a model of both \(\Sigma_1\) and \(\Sigma_2\). Show there is a sentence \(\alpha\) such that every model of \(\Sigma_1\) is also a model of \(\alpha\) and furthermore, every model of \(\Sigma_2\) is a model of \(\neg \alpha\).
- A binary relation \(<\) on a set \(A\) is said to be a
**linear order**if

(a) \(<\) is irreflexive - \(\left( \forall a \in A \right) \left( \neg a < a \right)\).

(b) \(<\) is transitive - \(\left( \forall a, b, c \in A \right) \left( \left[ a < b \land b < c \right] \rightarrow a < c \right)\).

(c) \(<\) satisfies trichotomy - \(\forall a, b \in A\) exactly one of the following is true: \(a < b\), \(b < a\), or \(a = b\).

If a linear order \(<\) has the additional property that there are no infinite descending chains - there do not exist \(a_1, a_2, \ldots \in A\) such that \(a_1 > a_2 > a_3 \cdots\) (where \(a_1 > a_2\) means \(a_2 < a_1\)), then the relation \(<\) is a**well-order**of the set \(A\). Suppose that \(\mathcal{L}\) is a language containing a binary relation symbol \(<\). Show there is not set of \(\mathcal{L}\)-sentences \(\Sigma\) such that \(\Sigma\) has both of the following properties:

(a) \(\Sigma\) has an infinite model \(\mathfrak{A}\) in which \(<^\mathfrak{A}\) is a linear order of \(A\).

(b) If \(\mathfrak{B}\) is any infinite model of \(\Sigma\), then \(<^\mathfrak{B}\) is a well-ordering of \(B\). - Show that \(<\) is not a well-order in any nonstandard model of arithmetic.
- (a) In the structure \(\mathfrak{A}\) that was built in Example 3.3.5, explain how we know that

\[\mathfrak{A} \models \left( \forall x \right) \left[ \left( x \dot{>} \dot{0} \right) \rightarrow \left( x \dot{/} \dot{2} \dot{>} \dot{0} \land x \dot{>} x \dot{/} \dot{2} \right) \right].\]

(b) Show that \(<\) is a linear order of \(A\), the universe of \(\mathfrak{A}\).

(c) Show that \(<\) is not a well-order in this structure.