6.2: Infinite Sets
Infinite sets are mysterious. Many classical paradoxes address historical confusions about the idea of infinity. At the same time, mathematicians from the ancient Greeks on have found it impossible to develop mathematical thinking without the use of infinity. Why is this so? From a metaphysical point of view, the idea of infinity is probably not necessary. From a physical point of view, there is no evidence for infinity. That is, the universe, as we understand it, is finite. Even from a theological point of view, infinity is to some extent the complement of the finite - and correspondingly gives rise to its own paradoxes.
Infinity has troubled some mathematicians and philosophers, and a few have tried to dispense with it. There aren’t many adherents to this school. The idea of infinity is so useful that the mathematics student will have to develop some comfort with the idea - and its logical consequences. At any rate, infinity clearly exists in the mathematical universe, whether or not it exists in the natural world, and using infinity has been crucial to developing a mathematical understanding of the natural world. In this section we begin an investigation of infinite sets.
We shall use injections and surjections to build some analytical machinery for comparing sets.
NOTATION. \(\preceq\) Let \(X\) and \(Y\) be sets. We write \(X \preceq Y\) if there is an injection \[f: X \rightarrow Y \text {. }\] This notation suggests that, under the conditions of the definition, we think of \(X\) as being "no bigger than" than \(Y\) . This makes sense, since we are able to associate to any element of \(X\) a distinct element of \(Y\) . If \(f\) in the definition is a surjection, then \(f\) is also a bijection and \(|X|=|Y|\) . Otherwise, \(f\) is a function that associates with each element of the range of \(f\) (which is a proper subset of \(Y\) ) a unique element of \(X\) , and \(Y\) still has some elements unaccounted for by \(f\) . So \(Y\) might be "bigger" than \(X\) , but it certainly won’t be "smaller". You might wish to consider this definition in the special case of finite sets \(X\) and \(Y\) . You will observe that \[X \preceq Y \Longleftrightarrow|X| \leq|Y| .\] In Exercise 6.3 you are asked to prove that \(\preceq\) is transitive and reflexive.
REMARK. Are any two sets comparable with respect to \(\preceq\) ? Rather surprisingly, it requires a more advanced assumption, called the Axiom of Choice (see Appendix B), in order to guarantee the comparability of all pairs of sets. Virtually all mathematicians accept the Axiom of Choice. We shall assume the Axiom of Choice in this text.
If \(X \preceq Y\) and \(Y \preceq X\) , we would hope that \(X\) and \(Y\) are the same size. This is indeed true, though the proof is a little tricky. The result is very useful, because it is often much easier to write down two injections than one bijection. THEOREM 6.4. Schröder-Bernstein Theorem Let \(X\) and \(Y\) be sets. If \(X \preceq Y\) and \(Y \preceq X\) , then \(|X|=|Y|\) .
Discussion. The idea behind this proof is as follows. We show that \(|X|=|Y|\) by constructing a bijection \(F: X \nrightarrow Y\) . We are given injections \(f: X \rightarrow Y\) and \(g: Y \rightarrow X\) . We build \(F\) using the injections \(f\) and \(g\) as guides. That is, we wish to define \(F\) so that for each \(x \in X\) , either \(F(x)=f(x)\) or \(F(x)=g^{-1}(x)\) . It is obvious that this cannot be accomplished blindly. For instance, if \(x \in X \backslash g[Y]\) , our hand is forced, and \(F(x)=f(x)\) . Similarly, if \(y \in Y \backslash f[X]\) , our only chance of achieving our objective is for \(F(g(y))=y\) . If we make the wrong choice for \(F(x)\) , we shall lose the use of \(f\) and \(g\) as guides. We might consider \(F\) undecided about \(x\) since \(f\) and \(g^{-1}\) do not agree. The solution is to use \(f\) and \(g\) to move back and forth between \(X\) and \(Y\) until we find that our hand is forced.
Proof. Let \[f: X \rightarrow Y\] and \[g: Y \rightarrow X\] be injections. We may assume that \(X\) and \(Y\) are disjoint.
Discussion. If \(X\) and \(Y\) are not disjoint, we can replace \(X\) with \(X \times\{0\}\) and \(Y\) with \(Y \times\{1\}\) . The existence of a bijection \[g: X \times\{0\} \mapsto Y \times\{1\}\] clearly implies the existence of a bijection from \(X\) to \(Y\) .
If \(x \in X\) we say \(y \in Y\) is the predecessor of \(x\) if \(g(y)=x\) . Analogously, if \(y \in Y\) we say that \(x \in X\) is the predecessor of \(y\) if \(f(x)=y\) . It is possible for an element not to have a predecessor. For example, if \(x \in X \backslash g[Y]\) , then \(x\) has no predecessor. However, if an element does have a predecessor, that predecessor is unique (since \(f\) and \(g\) are both injections). Given an element \(w\) , let \(m(w)\) be 0 if \(w\) does not have a predecessor. Otherwise, let \(m(w)\) be the maximum number \(N \geq 1\) such that there is a finite sequence \(\left\langle z_{n} \mid 0 \leq n \leq N\right\rangle\) for some \(N \geq 1\) satisfying
(1) \(w=z_{N}\)
(2) For \(n<N, z_{n}\) is the predecessor of \(z_{n+1}\) , if the maximum exists. If the maximum doesn’t exist (i.e. if one can make arbitrarily long chains of predecessors), let \(m(w)=\infty\) .
Now define \[\begin{aligned} X_{e} &=\{x \in X \mid m(x) \text { is even }\} \\ X_{o} &=\{x \in X \mid m(x) \text { is odd }\} \\ X_{i} &=\{x \in X \mid m(x)=\infty\} \\ Y_{e} &=\{y \in Y \mid m(y) \text { is even }\} \\ Y_{o} &=\{y \in Y \mid m(y) \text { is odd }\} \\ Y_{i} &=\{y \in Y \mid m(y)=\infty\} \end{aligned}\] The collection \[\left\{X_{e}, X_{o}, X_{i}\right\}\] is obviously a partition of \(X\) . Similarly, \[\left\{Y_{o}, Y_{e}, Y_{i}\right\}\] is a partition of \(Y\) .
We are now in a position to define a bijection between \(X\) and \(Y\) . Let \[F(x)=\left\{\begin{array}{ccc} f(x) & \text { if } & x \in X_{i} \\ f(x) & \text { if } & x \in X_{e} \\ g^{-1}(x) & \text { if } & x \in X_{o} . \end{array}\right.\] DiscussiON. We have some work left in this proof. We need to verify that \(F\) is a bijection from \(X\) to \(Y\) . The idea behind the definition of \(F\) may not be obvious, so let’s investigate the motivation for the definition. Suppose that \(f\) and \(g\) fail to be surjections (if either of the functions is a surjection there would be nothing to prove, since it would also be a bijection). Let \(x \in X \backslash g[Y]\) and \(y \in Y \backslash f[X]\) . Since \(x \notin g[Y]\) , the only possible choice for \(F(x)\) is \(f(x)\) . Similarly, \(y \notin f[X]\) , and the only possible value of \(F^{-1}(y)\) is \(g(y)\) . But this does not solve all of our problems. The set \(X \backslash g[Y]\) is made up of those members of \(X\) that have no predecessors, and \(Y \backslash f[X]\) is composed of the members of \(Y\) with no predecessors. If we are to define \(F\) by piecing together \(f\) and \(g\) , we found that our hands were forced with these sets. Now suppose that \(x \in X\) has exactly one antecedent. Then \(g^{-1}(x)\) has no predecessor. As we observed earlier, we need to satisfy \[F^{-1}\left(g^{-1}(x)\right)=g\left(g^{-1}(x)\right)=x\] and therefore we must satisfy \[F(x)=g^{-1}(x) .\] Similarly, if \(y \in Y\) has exactly one antecedent, we must satisfy \[F^{-1}(y)=f^{-1}(y) .\] If an element \(w\) of \(X \cup Y\) has finitely many antecedents, \(\left.F\right|_{A(w)}\) will be determined by the constraint imposed by the antecedent with no predecessor.
We claim that \[F: X \mapsto Y .\] It is easily seen that \(F\) is well-defined. Since \(X_{o} \subseteq g[Y]\) and \(g\) is an injection, \(\left.F\right|_{X_{o}}=\left.g^{-1}\right|_{X_{o}}\) is well defined. That \(F\) is well defined on \(X_{e}\) and \(X_{i}\) is obvious. Furthermore \[\begin{gathered} F\left[X_{e}\right]=f\left[X_{e}\right]=Y_{o} \\ F\left[X_{o}\right]=g^{-1}\left[X_{o}\right]=Y_{e} \end{gathered}\] and \[F\left[X_{i}\right]=f\left[X_{i}\right]=Y_{i} .\] Discussion. Although we had no choice in the definition of \(F\) on \(X_{e}\) and \(X_{o}\) , we could have defined \(F\) so that \(\left.F\right|_{X_{i}}=\left.g^{-1}\right|_{X_{i}}\) .
Therefore,
\(F[X]=F\left[X_{e} \cup X_{o} \cup X_{i}\right]=f\left[X_{e}\right] \cup g^{-1}\left[X_{o}\right] \cup f\left[X_{i}\right]=Y_{o} \cup Y_{e} \cup Y_{i}=Y .\) So \(F\) is a surjection. We show that \(F\) is an injection. Let \(x, z \in X\) , and suppose \(F(x)=F(z)\) . If \(x \in X_{e}\) , then \(F(x) \in Y_{o}\) and \(z \in X_{e}\) . Hence \[F(x)=f(x)=f(z)=F(z) .\] Since \(f\) is an injection, so is \(\left.f\right|_{X_{e}}\) . Therefore \(x=z\) .
If \(x \in X_{o}\) , then \(F(x) \in Y_{e}\) and \(z \in X_{o}\) . So \[F(x)=g^{-1}(x)=g^{-1}(z)=F(z) .\] The function \(g\) is an injection, therefore \(\left.g^{-1}\right|_{X_{o}}\) is an injection and so \(x=z\) .
Finally, if \(x \in X_{i}\) , then \(F(x) \in X_{i}\) and \(z \in X_{i}\) . So \[F(x)=f(x)=f(z)=F(z) .\] Since \(f\) is an injection, \(x=z\) .
Therefore \(F\) is an injection. Hence, \[F: X \mapsto Y\] and \[|X|=|Y| .\] THEOREM 6.5. \(\mathbb{N}\) is an infinite set.
DiscussiON. We show that any function with domain \(\ulcorner n\urcorner\) , for \(n \in\) \(\mathbb{N}\) , fails to be a surjection. Therefore \(\mathbb{N}\) is not finite.
Proof. Assume \(n \in \mathbb{N}\) , and \[f:\ulcorner n\urcorner \longrightarrow \mathbb{N} .\] Let \[a=1+\sum_{i=0}^{n-1} f(i) \in \mathbb{N} .\] Clearly \(a \notin f[\ulcorner n\urcorner]\) , so \(f\) is not a surjection. Consequently, there is no \(n \in \mathbb{N}\) which can be mapped surjectively onto \(\mathbb{N}\) . Therefore \(\mathbb{N}\) is not finite. Not only is \(\mathbb{N}\) an infinite set, it is in some sense the "smallest" infinite set.
THEOREM 6.6. If \(X\) is infinite, then \(\mathbb{N} \preceq X\) .
Discussion. We shall define an injection \(f: \mathbb{N} \rightarrow X\) inductively, building it up one step at a time.
ProOF. As \(X\) is infinite, it is non-empty, so must contain some element \(x_{0}\) . Define \(f(0)=x_{0}\) .
Now, suppose that \(x_{0}=f(0), x_{1}=f(1), \ldots, x_{n}=f(n)\) have all been chosen, so that \[\left.f\right|_{\{0,1, \ldots, n\}}=\left.f\right|_{n+1\urcorner}: k \mapsto x_{k}\] is injective. As \(X\) is infinite, the function \(\left.f\right|_{\left.{ }_{n+1}\right\urcorner}\) that we have defined cannot be surjective. So there exists some \(x_{n+1}\) in \(X \backslash\left\{x_{0}, \ldots, x_{n}\right\}\) . Define \(f(n+1)=x_{n+1}\) . Continuing in this way, we attain an injection \(f\) defined on all of \(\mathbb{N}\) .
REMARK. The astute reader may have noticed that in the previous proof, we end up making an infinite number of choices of elements of \(X\) .
DEFINITION. Cardinality, \(\aleph_{0}\) We use the expression \(\aleph_{0}\) (read "aleph nought" 1 ) for the size of \(\mathbb{N}\) . That is \[\aleph_{0}:=|\mathbb{N}| \text {. }\] The size of a set is called the cardinality of the set. Any set which is bijective with \(\mathbb{N}\) has cardinality \(\aleph_{0}\) . A finite set has cardinality equal to the unique natural number with which it is bijective.
DEFINITION. Countable A set that is finite or has cardinality \(\aleph_{0}\) is called a countable set.
We are not formally developing the idea of cardinality. This would require working with ordinals, which would distract us from more immediate mathematical interests. However we shall use the language
\({ }^{1} \aleph\) is the first letter of the Hebrew alphabet. and conventions of cardinals where it is intuitive and does not interfere with our program.