1.7: Combinatorial Number Theory

Last updated
Save as PDF

Page ID: 24700

Leo Moser
University of Alberta via The Trilla Group

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

There are many interesting questions that lie between number theory and combinatorial analysis. We consider first one that goes back to I. Schur (1917) and is related in a surprising way to Fermat’s Last Theorem. Roughly speaking, the theorem of Schur states that if n is fixed and sufficiently many consecutive integers 1, 2, 3, . . . are separated into n classes, then at least one class will contain elements \(a\), \(b\), \(c\) with \(a + b = c\).

Consider the fact that if we separate the positive integers less than \(2^n\) into \(n\) classes by putting 1 in class 1, the next 2 in class 2, the next 4 in class 3, etc., then no class contains the sum of two of its elements. Alternatively, we could write every integer m in form \(2^k \theta\) where \(\theta\) is odd, and place \(m\) in the \(k\)th class. Again the numbers less than \(2^n\) will lie in \(n\) classes and if \(m_1 = 2^k \theta_1\) and \(m_2 = 2^k \theta_2\) are in class \(k\) then \(m_1 +m_2 = 2^k(\theta_1 + \theta_2)\) lies in a higher numbered class. The more complicated manner of distributing integers outlined below enables us to distribute 1, 2, ..., \(\dfrac{3^n−1}{2}\) into \(n\) classes in such away that no class has a solution to \(a + b = c\):

1 2 5
3 4 6
10 11 7
13 12 8
. . .
. . .
. . .

On the other hand, the theorem of Schur states that if one separates the numbers 1, 2, 3, . . . , \([n! e]\) into \(n\) classes in any manner whatsoever then at least one class will contain a solution to \(a + b = c\). The gap between the last two statements reveals an interesting unsolved problem, namely, can one replace the \([n! e]\) in Schur’s result by a considerably smaller number? The first two examples given show that we certainly cannot go as low as \(2n − 1\), and the last example shows that we cannot go as low as \(\dfrac{3^n−1}{2}\).

We now give a definition and make several remarks to facilitate the proof of Schur’s theorem.

Let \(T_0 = 1\), \(T_n = nT_{n−1} + 1\). It is easily checked that

\(T_n = n! (1 + \dfrac{1}{1!} + \dfrac{2}{2!} \cdot\cdot\cdot + \dfrac{1}{n!}) = [n!e].\)

Thus Schur’s theorem can be restated as follows: If 1, 2, ... , \(T_n\) are separated into \(n\) classes in any manner whatever, at least one class will contain a solution of \(a + b = c\). We will prove this by assuming that the numbers 1, 2, ..., \(T_n\) have been classified n ways with no class containing a solution of \(a + b = c\) and from this obtain a contradiction. Note that the condition \(a + b \ne c\) means that no class can contain the difference of two of its elements.

Suppose that some class, say \(A\), contains elements \(a_1 < a_2 < \cdot\cdot\cdot\). We form differences of these in the following manner:

\(b_1 = a_2 - a_1, b_2 = a_3 - a_1, b_3 = a_4 - a_1, ...\)
\(c_1 = b_2 - b_1, c_2 = b_3 - b_1, c_3 = b_4 - b_1, ...\)
\(d_1 = c_2 - c_1, d_2 = c_3 - c_1, d_3 = c_4 - c_1, ...\)

and so on. We note that all the \(b\)'s, \(c\)'s, \(d\)'s, etc. , are differences of \(a\)'s and hence cannot lie in \(A\).

Now, we start with \(T_n\) elements. At least

\(\lfloor \dfrac{T_n}{n} + 1 \rfloor = T_{n - 1} + 1\)

of these must lie in a single class, say \(A_1\). We then form \(T_{n - 1}\) \(b\)'s. These do not lie in \(A_1\), and hence lie in the remaining \(n - 1\) classes. At least

\(\lfloor \dfrac{T_{n - 1}}{n - 1} + 1 \rfloor = T_{n - 1} + 1\)

of them must lie in a single class, say \(A_2\). Form their \(T_{n - 2}\) differences, the \(c\)'s. These yield \(T_{n - 2}\) numbers neither in \(A_1\) nor \(A_2\). Continuing in this manner yields \(T_{n - 3}\) numbers not in \(A_1, A_2, A_3\). In this manner we eventually obtain \(T_0 = 1\) number not belonging to \(A_1, A_2, ..., A_n\). But all numbers formed are among the numbers \(1, 2, ..., T_n\) so we have a contradiction, which proves the theorem.

We state, without proof, the connection with Fermat’s last theorem. A natural approach to Fermat’s theorem would be to try to show that \(x^n + y^n = z^n\) (mod \(p\)) is insolvable modulo some \(p\), provided \(p\) does not divide \(x \cdot y \cdot z\). However, Schur’s theorem can be used to show that this method must fail and indeed if \(p > n!e\) then \(x^n +y^n = z^n\) (mod \(p\)) has a solution with \(p\) not a factor of \(xyz\).

Somewhat related to Schur’s theorem is a famous theorem of Van der Waerden, which we briefly investigate. In the early 1920’s the following problem arose in connection with the theory of distribution of quadratic residues. Imagine the set of all integers to be divided in any manner into two classes. Can one assert that arithmetic progressions of arbitrary length can be found is at least one of these classes? The problem remained unsolved for several years in spite of concentrated efforts by many outstanding mathematicians. It was finally solved in 1928 by Van der Waerden. As is not uncommon with such problems, Van der Waerden’s first step was to make the problem more general, and hence easier.

Van der Waerden proved the following: Given integers \(k\) and \(\ell\), there exists an integer \(W = W(k, \ell)\) such that if the numbers 1, 2, 3, ..., \(W\) are separated into \(k\) classes in any manner, then at least one class will contain l terms in arithmetic progression. We will not give Van der Waerden’s proof here. It is extremely tricky, difficult to see through, and leads only to fantastically large bound for W(k,l). For this reason the reader might consider the very worthwhile unsolved problem of finding an alternative simpler proof that \(W(k, \ell)\) exists and finding reasonable bounds for it. We will have a little more to say about the function \(W(k, \ell)\) a little later.

Our next problem of combinatorial number theory deals with “nonaverag- ing” sequences. We call a sequence \(A: a_1 < a_2 < a_3 < \cdot\cdot\cdot\) non-averaging if it does not contain the average of two of its elements, i.e., \(a_i + a_j \ne 2a_k\) (\(i \ne j\)). Let A(n) denote the number of elements in \(A\) not exceeding \(n\). The main problem is to estimate how large \(A(n)\) can be if \(A\) is nonaveraging. We can form a nonaveraging sequence by starting with 1, 2, ... and then always taking the smallest number that does not violate the condition for nonaveraging sets. In this way we obtain 1, 2, 4, 5, 10, 11, 13, 14, 28, 29, 31, ... . It is an interesting fact that this sequence is related to the famous Cantor ternary set. Indeed, we leave it as an exercise to prove that this sequence can be obtained by adding 1 to each integer whose representation in base 3 contains only 0’s and 1’s . This sequence is maximal in the sense that no new number can be inserted into the sequence without destroying its nonaveraging character. This, as well as other facts, led Szekeres (about 1930) to conjecture that this set was as dense as any nonaveraging set. For this set, the counting function can easily be estimated to be \(\thicksim n^{\log 2 / \log 3}\). It therefore came as a considerable surprise when Salem and Spencer (1942) proved that one could have a nonaveraging set of integers \(\le n\) containing at least \(n^{1 - c/\sqrt{\log\log n}}\) elements.

Given a number \(x\), written in base ten, we decide whether \(x\) is in \(R\) on the basis of the following rules.

First we enclose \(x\) in a set of brackets, putting the first digit (counting from right to left) in the first bracket, the next two in the second bracket, the next three in the third bracket, and so on. If the last nonempty bracket (the bracket furthest to the left that does not consist entirely of zeros) does not have a maximal number of digits, we fill it with zeros. For instance, the numbers

\(a = 32653200200\), \(b = 100026000150600\), \(c = 1000866600290500\)

would be bracketed

\(a = (00003)(2653)(200)(20)(0),\)
\(b = (10002)(6100)(150)(60)(0),\)
\(c = (10008)(6600)(290)(50)(0),\)

respectively. Now suppose the \(r^{\text{th}}\) bracket in \(x\) contains nonzero digits, but all further brackets to the left are 0. Call the number represented by the digits in the \(i^{\text{th}}\) bracket \(x_i\), \(i = 1, 2, ..., r - 2\). Further, denote by \(\bar{x}\) the number represented by the digit in the last two brackets taken together, but excluding the last digit. For \(x\) to belong to \(R\) we require

the last digit of \(x\) must be 1,
\(x_i\) must begin with 0 for \(i = 1, 2, ..., r - 2,\)
\(x_1^2 + \cdot\cdot\cdot x_{r - 2}^2 = \bar{x}.\)

In particular, note that a satisfies (2) but violates (1) and (3) so that \(a\) is not in \(R\); but \(b\) and \(c\) satisfy all three conditions and are in \(R\). To check (3) we not that \(60^2 + 150^2 = 26100\).

We next prove that no three integers in \(R\) are in arithmetic progression. First note that if two elements of \9R\) have a different number of nonempty brackets their average cannot satisfy (1). Thus we need only consider averages of elements of \(R\) having the same number of nonempty brackets. From (1) and (3) it follows that the two elements of \(R\) can be averaged bracket by bracket for the first \(r − 2\) brackets and also for the last two brackets taken together. Thus, in our example,

\(\dfrac{1}{2} (60 + 50) = 55, \dfrac{1}{2} (150 + 290) = 220,\)
\(\dfrac{1}{2} (100026100 + 100086600) = 100056350,\)
\(\dfrac{1}{2} (b + c) = (10005)(6350)(220)(55)(0)\)

This violates (3) and so is not in \(R\). In general we will prove that if \(x\) and \(y\) are in \(R\) then \(\bar{z} = \dfrac{1}{2} (x + y)\) violates (3) and so is not in \(R\).

Since \(x\) and \(y\) are in \(R\),

\(\bar{z} = \dfrac{\bar{x} + \bar{y}}{2} = \sum_{i = 1}^{r - 2} \dfrac{x_i^2 + y_i^2}{2}.\)

On the other hand, \(z\) in \(R\) implies

\(\bar{z} = \sum_{i = 1}^{r - 2} z_i^2 = \sum_{i = 1}^{r - 2} \dfrac{(x_i + y_i)^2}{2}.\)

Hence, if \(z\) is in \(R\) then

\(\sum_{i = 1}^{r - 2} \dfrac{x_i^2 + y_i^2}{2} = \sum_{i = 1}^{r - 2} \dfrac{(x_i + y_i)^2}{2}.\)

Thus

\(\sum_{i = 1}^{r - 2} \dfrac{(x_i - y_i)^2}{2} = 0,\)

which implies \(x_i = y_i\) for \(i = 1, 2, ..., r - 2\). This together with (1) and (2) implies that \(x\) and \(y\) are not distinct.

Szekeres’ sequence starts with 1, 2, 4, 5, 10, 11, ... . Our sequence starts with

100000, 1000100100, 1000400200, ....

Nevertheless, the terms of this sequence are eventually much smaller than the corresponding terms of Szekeres’ sequence. We now estimate how many integers in \(R\) contain exactly \(r\) brackets. Given \(r\) brackets we can make the first digit in each of the \(r - 2\) brackets 0. We can fill up the first \(r - 2\) brackets in as arbitrary manner. This can be done in

\(10^{0 + 1+ 2 + \cdot\cdot\cdot + (r - 2)} = 10^{\dfrac{1}{2} (r - 1)(r - 2)}\)

ways. The last two brackets can be filled in such a way as to satisfy (1) and (3).

To see this we need only check that the last two brackets will not be overfilled, and that the last digit, which we shall set equal to 1, will not be interfered with. This follows from the inequality

\((10^1)^2 + (10^2)^2 + \cdot\cdot\cdot + (10^{r - 2})^2 < 10^{2(r - 1)}.\)

For a given \(n\) let \(r\) be the integer determined by

\[10^{\dfrac{1}{2}r(r + 1)} \le n < 10^{\dfrac{1}{2}(r + 1)(r + 2)}.\]

Since all the integers with at most \(r\) brackets will not exceed \(n\), and since \(r\) brackets can be filled to specification in \(10^{\dfrac{1}{2} (r - 2)(r - 1)}\) ways, we have

\[R(n) \ge 10^{\dfrac{1}{2} (r - 2)(r - 1)}\]

From the right hand side of (7.1) we have

\(r + 2 > \sqrt{2 \log n}\)

so that (7.2) implies that

\(R(n) \ge 10^{\dfrac{1}{2} (r - 2)(r - 1)} > 10^{\log n - c\sqrt{\log n}} > 10^{(\log n)(1 - c/\sqrt{\log n})}\)

where all logs are to base 10.

An old conjecture was that \(\dfrac{A(n)}{n} \to 0\) for every nonaveraging sequence. This has only been proved quite recently (1954) by K. F. Roth. His proof is not elementary.

L. Moser has used a similar technique to get lower bounds for the Van der Waerden function \(W(k, \ell)\). He proved that \(W(k, \ell) > \ell k^{\log k}\), i.e., he showed how to distribute the numbers, 1, 2, ..., \([\ell k^{\log k}]\) into \(k\) classes in such a way that no class contains 3 terms in arithmetic progression. Using a quite different method Erdo\(\ddot{o}\)s and Rado have shown that \(W(k, \ell) > \sqrt{2 \ell k^{\ell}}\).

Erd\(\ddot{o}\)s has raised the following question: What is the maximum number of integers \(a_1 < a_2 < \cdot\cdot\cdot < a_k \le n\) such that \(2^k\) sums of distinct \(a\)'s are all distinct? The powers of 2 show that one can give \(k + 1\) \(a\)'s not exceeding \(2^k\) and one can in fact give \(k + 2\) \(a\)'s under \(2^k\) satisfying the required condition. On the other hand, all the sums involved are less than \(kn\) so that

\[2^k \le kn,\]

which implies

\[k < \dfrac{\log n}{\log 2} + (1 + o(1)) \dfrac{\log \log n}{\log 2}.\]

We now show how Erd\(\ddot{o}\)s and Moser improved these estimates (Publisher’s note: The current best lower bound may be found in I. Aliev, “Siegel’s lemma and sum-distinct sets,” Discrete Comput. Geom. 39 (2008), 59–66.) to

\[2^k < 4\sqrt{k} n,\]

which implies

\[k < \dfrac{\log n}{\log 2} + (1 + o(1)) \dfrac{\log \log n}{2\log 2}.\]

The conjecture of Erd\(\ddot{o}\)s is that

\[k = \dfrac{\log n}{\log 2} + o(1).\]

Denote the sum of distinct \(a\)'s by \(s_1, s_2, ..., s_{2^k}\) and let \(A = a_1 + a_2 + \cdot\cdot\cdot a_k\). Observe that the average sum is \(\dfrac{A}{2}\) since we can pair each sum with the sum of the complementary set. This suggests that we estimate \(\sum_i (s_i - \dfrac{A}{2})^2\).

We have

\(\sum_i (s_i - \dfrac{A}{2})^2 = \sum \dfrac{1}{2}(\pm a_1 \pm a_2 \pm \cdot\cdot\cdot \pm a_k)^2,\)

where the last sum runs over the \(2^k\) possible distributions of sign. Upon squaring we find that all the cross terms come in pairs while each \(a_i^2\) will appear \(2^k\) times. Thus

\(\sum_i (s_i - \dfrac{A}{2})^2 = 2^k \sum a_i^2 < 2^{k - 2} n^2 k.\)

Thus the number of sums \(s_i\) for which

\(|s_i - \dfrac{A}{2}| \ge n \sqrt{k}\)

cannot exceed \(2^{k - 1}\). Since all the sums are different, we have \(2^{k - 1}\) distinct numbers in a range of length \(2n \sqrt{k}\). This yields \(2^{k - 1} \le 2n \sqrt{k}\) as required.

Let \(a_1 < a_2 < ...\) be an infinite sequence of integers and define \(f(n)\) to be the number of solutions of \(n = a_i + a_j\) where all solutions count once. G. A. Dirac and D. J. Newman gave the following interesting proof that \(f(n)\) cannot be constant from some stage on. If \(f(\ell + 1) = f(\ell + 2) = \cdot\cdot\cdot\) we would have

\(\dfrac{1}{2} (\sum a^{a_k})^2 + \sum z^{2a_k} = \sum f(n) z^n\)
\(= P_{\ell} (z) + a\dfrac{z^{\ell + 1}}{1 - z}, \ \ \ \ \ (f(\ell + 1) = a),\)

where \(P(z)\) is a polynomial of degree \(\le \ell\). If \(z \to -1\) on the real axis the right side remains bounded, but the left side approaches infinity, since both terms on the left side are positive, and the second tends to infinity. This contradiction proves the theorem.

Turan and and Erd\(\ddot{o}\)s conjectured that if \(f(n) > 0\) for all sufficiently large \(n\) then lim sup \(f(n) = \infty\) but this seems very difficult to prove. A still stronger conjecture would be that if \(a_k > ck^2\) then lim sup \(f(n) = \infty\). The best known result in this direction is only lim sup \(f(n) \ge 2\).

Fuchs and Erdös recently proved that

\(\sum_{k = 1}^{n} f(k) = cn + o(\dfrac{n^{\dfrac{1}{4}}{\log n})\)

is impossible. If \(a_k = k^2\) one comes to the problem of lattice points in a circle of radius \(n\). Here Hardy and Landau proved

\(\sum_{k = 1}^n f(k) = \pi n + o(n \log n)\)

does not hold. Though not quite as strong as this, the result of Erd\(\ddot{o}\)s and Fuchs is applicable to a much more general situation and is much easier (but not very easy) to prove.

Let \(a_1 < a_2 < \cdot\cdot\cdot\) be an infinite sequence of integers. Erd\(\ddot{o}\)s conjectured, and G. G. Lorentz proved, that there exists a sequence \({b_i}\) of zero density such that every integer is of the form \(a_i + b_j\).

An interesting unsolved problem along these lines is to find a sequence \(B\): \(b_1 < b_2 < \cdot\cdot\cdot\) with counting function \(B(n) < \dfrac{cn}{\log n}\) such that every integer is of the form \(2^k + b_j\).

Let \(a_1 < a_2 < \cdot\cdot\cdot < a_{2n}\) be \(2n\) integers in the interval \([1,4n]\) and \(b_1 < b_2 < \cdot\cdot\cdot < b_{2n}\) the remaining numbers in the interval. Erd\(\ddot{o}\)s conjectured that there exists an integer \(x\) such that the number of solutions of \(a_i + x = b_j\) is at least \(n\). It is quite easy to show that there exists an \(x\) so that the number of solutions of \(a_i + x = b_j\) is at least \(\dfrac{n}{2}\). We merely observe that the number of solutions of \(a_i + y = b_j\) is \(4n^2\) and that there are 8n possible choices of \(y\), i.e., \(−4n \le y \le 4n\), \(y \ne 0\). Thus for some \(y_0\) there are at least \(\dfrac{n}{2}\) \(b\)’s in \(a_i + y_0\) as stated.

P. Scherk improved the \(\dfrac{n}{2}\) to \(n(2 - \sqrt{2}) = .586n\). By an entirely different method L. Moser improved this further to \(.712n\). On the other hand Selfridge, Ralston and Motzkin have used S.W.A.C. to disprove the original conjecture and have found examples where no number is representable more than \(.8n\) times as a difference between an \(a\) and \(a\) \(b\).

Still another set of interesting problems of combinatorial number theory revolve about the concept of addition chain introduced by A. Scholz. An addition chain for \(n\) is a set of integers \(1 = a_0 < a_1 < \cdot\cdot\cdot < a_r = n\) such that every element \(a_p\) can be written as a sum \(a_{\sigma} + a_{\tau}\) of preceding elements of the chain. For example for \(n = 666\)

1, 2, 4, 8, 16, 24, 40, 80, 160, 320, 640, 664, 666

form a chain with \(r = 12\); the same holds for

1, 2, 3, 6, 9, 18, 27, 54, 81, 162, 324, 648, 666.

In any case we must have \(a_1 =2\), and \(a_2 = 3\) or 4. By the length \(\ell = \ell(n)\) Scholtz understands the smallest \(\ell\) for which there exists an addition chain \(a_0, a_1, ..., a_{\ell} = n\).

Scholtz stated the following:

\(m + 1 \le \ell(n) \le 2m\) for \(2^m + 1 \le n \le 2^{m + 1}\) (\(m \ge 1\));
\(\ell(ab) \le \ell(a) + \ell(b);\)
\(\ell (2^{m + 1} - 1) \le m + \ell (m + 1).\)

The first two of these are easy to prove. The third we would conjecture to be false. Scholtz surmised that the first could be improved and raised the question of whether or not

\(1 \le \text{lim sup}_{n \to \infty}\ \dfrac{\ell(n)}{\log_2 n} \le 2\)

could be improved.

In what follows we prove (1) and outline a proof due to A. Brauer that

\(\ell(n) \thicksim \log_2 n.\)

Suppose integers are written in base 2 and we seek an addition chain for 10110110 say. We might form the chain

1, 10, 100, 101, 1010, 1011, 10110, 101100, 101101, 1011010,
1011011, 10110110, 101101100, 101101101.

In this process, each digit “costs” at most two elements in the chain so that \(\ell < 2 \log_2 n\). Since the left hand side of the inequality of (1) is trivial the method suggested above yields a proof of (1).

Brauer’s idea is to build up a large stock of numbers first and use it when the occasion arises. Suppose \(n\) is about \(2^m\). We start out with the chain 1, 2, ..., \(2^r\), where \(r\) will be determined later. We can now break up the digits of \(n\) into \(m/r\) blocks with \(r\) digits in each block. For example, suppose

\(n = (101)(110)(010)(101)(111)\)

Here \(m = 15\), \(r = 3\).

Starting with our stock of all 3 digit numbers we can proceed as follows:

1, 10, 100, \(\underline{101}\), 1010, 10100, 101000, \(\underline{101110}\),
1011100, 10111000, 101110000, \(\underline{101110010}\), ...

where between the underlined stages we double and at the underlined stages we add the appropriate number from our stock to build up \(n\). In this case we would need \(2^3 + 2^{15} + 5\) steps. In general, the number of steps for a number under \(2^m\) would be about \(2^r + m + \dfrac{m}{c}\). By appropriate choice of \(r\) we could make \(2^r + \dfrac{m}{r}\) as small as we please in comparison with \(m\). Indeed, using this idea Brauer proved in general

\(\ell (n) < \log_2 n {1 + \dfrac{1}{\log \log n} + \dfrac{2 \log 2}{(\log n)^{1 - \log 2}} }.\)