Expansion, divisibility and parity: an explanation (Part I)

The following is meant as an exposition of my preprint with Maksym Radziwiłł, Expansion, divisibility and parity. The main result is a statement on how a linear operator defined in terms of divisibility by primes has small norm. In this exposition, we will choose to start from one of its main current applications, namely,

\[ \frac{1}{\log x} \sum_{n\leq x} \frac{\lambda(n) \lambda(n+1)}{n} = O\left( \frac{1}{\sqrt{\log \log x}}\right), \]

which strengthens results by Tao and Tao-Teräväinen. (Here \(\lambda(n)\) is the Liouville function, viz., the completely multiplicative function such that \(\lambda(p)=-1\) for every prime \(p\).) There are other corollaries, some of them subsuming the above statement; it is also true that the proof of the above statement contains qualitative improvements that are not obvious from the statement itself. Nevertheless, it is a concrete statement that is obviously interesting, being a step towards Chowla’s conjecture (“logarithmic Chowla in degree 2”), and so it is a convenient initial goal on which to center.


First, some meta comments. We may contrast two possible ways of writing a paper — what may be called the incremental and the retrospective approaches.

These are of course two extremes. Few people write nothing down while solving a problem, and the way one followed to reach the solution generally has some influence on the finished paper. In the case of my paper with Maksym, what we followed was mainly the retrospective approach, with some incremental elements, mainly to deal with technical complications that we had to deal with after we had an outline of a proof. It is tempting to say that that is still too much incrementality, but, in fact, some of the feedback we have received suggests a drawback of the retrospective approach that I had not thought of before.

When faced with a result with a lengthy proof, readers tend to come up with their own “natural strategy”. So far, so good: active reading is surely a good thing. What then happens, though, is that readers may see necessary divergences from their “natural strategy” as technical complications. They may often be correct; however, they may miss why the “natural strategy” may not work, or how it leads to the main, essential difficulty — the heart of the problem, which they may then miss for following the complications.

What I will do in this write-up is follow, not an incremental approach, but rather an idealized view of what the path towards the solution was or could have been like; a recreated incrementality with the benefit of hindsight, then, starting from a “natural strategy”, with an emphasis on what turns out to be essential.


Let us set out to reprove Tao’s “logarithmic Chowla” statement, that is,

\[ \frac{1}{\log x} \sum_{n\leq x} \frac{\lambda(n) \lambda(n+1)}{n} \to 0 \]

as \(x\to \infty\). Now, Tao’s method gives a bound of\(O(1/(\log \log \log \log x)^{\alpha})\) on the left side (as explained here, with \(\alpha=1/5\)), while Tao-Teräväinen should yielda bound of \(O(1/(\log \log \log x)^{\alpha})\) for some \(\alpha>0\). Their work is based on depleting entropy, or, more precisely, on depleting mutual information. Our method gives stronger bounds (namely, \(O(1/\sqrt{\log \log x})\)) and is also “stronger” in ways that will later become apparent. Let us focus, however, simply on giving a different proof, and welcome whatever might come from it.

The first step will be consist of a little manipulation as in Tao, based on the fact that \(\lambda\) is multiplicative. Let \(W = \sum_{n\leq x} \lambda(n) \lambda(n+1)/n\). For any prime (or integer!) \(p\),

\[ \begin{aligned} \frac{1}{p} W &= \sum_{n\leq x} \frac{\lambda(p n) \lambda(p n + p)}{p n}\\ &= \sum_{n\leq p x:\; p|n} \frac{\lambda(n) \lambda(n+p)}{n} = \sum_{n\leq x:\; p|n} \frac{\lambda(n) \lambda(n+p)}{n} + O\left(\frac{\log p}{p}\right).\end{aligned} \]

Hence, for any set of primes \(\mathbf{P}\),

\[ \sum_{p\in \mathbf{P}} \sum_{n\leq x:\; p|n} \frac{\lambda(n) \lambda(n+p)}{n} = W \mathscr{L} + O\left(\sum_{p\in \mathbf{P}} \frac{\log p}{p}\right), \]

where \(\mathscr{L} = \sum_{p\in \mathbf{P}} 1/p\). If \(H\) is such that \(p\leq H\) for all \(p\in \mathbf{P}\), then, by the prime number theorem, \(\sum_{p\in \mathbf{P}} (\log p)/p \ll \log H\). Thus

\[ W = \frac{1}{\mathscr{L}} \sum_{n\leq x} \sum_{p\in \mathbf{P}: p|n} \frac{\lambda(n) \lambda(n+p)}{n} + O\left(\frac{\log H}{\mathscr{L}}\right). \]

Assuming \(H = x^{o(1)}\) (so that \(\log H = o(\log x)\)) and \(\mathscr{L}\geq 1\), and using a little partial summation, we see that it is enough to prove that \(S_0 = o(N\mathscr{L})\), where

\[ S_0 = \sum_{N < n\leq 2 N}\sum_{p\in \mathbf{P}: p|n} \lambda(n) \lambda(n+p). \]

Let us make this sum a little more symmetric. Let \(\mathbf{N} = \{n\in \mathbb{Z}: N < n\leq 2 N\}\), and define

\[ S = \sum_{n\in \mathbf{N}} \sum_{\sigma = \pm 1} \sum_{\substack{p\in \mathbf{P}: p|n\\ n+\sigma p\in \mathbf{N}}} \lambda(n) \lambda(n+\sigma p). \]

Then \(S = 2 S_0 + O(\sum_{p\in \mathbf{P}} 1) = 2 S_0 + O(H)\), and thus it is enough to prove that

\[ S = o(N \mathscr{L}). \]

Objectives. Tao showed that there exists a set \(\mathbf{P}\) of primes (very small compared to \(N\)) such that \(S = o(N \mathscr{L})\). It is our aim to prove that \(S = o(N \mathscr{L})\) for every set \(\mathbf{P}\) of primes satisfying some simple conditions. (As we said, we assume \(p\leq H\), and it is not hard to see that we have to assume \(\mathscr{L}\to\infty\); it will also be helpful to assume that no \(p\in \mathbf{P}\) is tiny compared to \(H\).) We will in fact be able to show that \(S = O(N \sqrt{\mathscr{L}})\), which is essentially optimal.


We now set out on our own. It is a deep-seated instinct for an analytic number theorist to apply Cauchy-Schwarz:

\[ \begin{aligned} S^2 &\leq \left(\sum_{n\in \mathbf{N}} \sum_{\sigma = \pm 1} \sum_{\substack{p\in \mathbf{P}: p|n\\ n+\sigma p\in \mathbf{N}}} \lambda(n) \lambda(n+\sigma p)\right)^2\\ &\leq N \sum_{n\in \mathbf{N}} \sum_{\sigma_1,\sigma_2 = \pm 1} \sum_{\substack{p_1,p_2\in \mathbf{P}\\ p_i|n,\; n+\sigma_i p_i\in \mathbf{N}}} \lambda(n+\sigma_1 p_1) \lambda(n+\sigma_2 p_2)\\ &\leq \sum_{n\in \mathbf{N}}\; \sum_{\sigma_1,\sigma_2 = \pm 1} \sum_{\substack{p_1,p_2\in \mathbf{P}\\ p_1|n, p_2|n+\sigma_1 p_1\\ n+\sigma_1 p_1, n+\sigma_1 p_1 + \sigma_2 p_2 \in \mathbf{N}}} \lambda(n) \lambda(n+\sigma_1 p_1+\sigma_2 p_2),\end{aligned} \]

where we are changing variables in the last step.

We iterate, applying Cauchy-Schwarz \(\ell\) times:

\[ S^{2^\ell} \leq N^{2^\ell-1} \sum_{n\in \mathbf{N}} \sum_{\substack{\sigma_i= \pm 1,\; p_i\in \mathbf{P}\\ \forall 1\leq i\leq 2^\ell: p_i|n+\sigma_1 p_1 + \dotsc + \sigma_{i-1} p_{i-1}}} \lambda(n) \lambda(n+ \sigma_1 p_1 + \dotsc + \sigma_{2^\ell} p_{2^{\ell}}). \]

We can see \(n+\sigma_1 p_1 + \dotsc + \sigma_{2^\ell} p_{2^\ell}\) as the outcome of a “walk” of length \(2^\ell\).

Suppose for a moment that, for \(k=2^\ell\) large, the number of walks of length \(k\) from \(n\) to \(m\) is generally about \(\psi(m)\), where \(\psi\) is a nice continuous function. Then \(S^{2^\ell}\) would tend to

\[ N^{2^\ell-1} \sum_{n\in \mathbf{N}} \sum_{m} \lambda(n) \lambda(n+m) \psi(m). \]

Matomäki-Radziwiłł would then give us a bound on that double sum. Let us write that bound in the form

\[ \left|\sum_{n\in \mathbf{N}} \sum_{m} \lambda(n) \lambda(n+m) \psi(m)\right| \leq \text{err}_2 \cdot N \mathscr{L}^{k}, \]

since \(|\psi|_1\) should be about \(\mathscr{L}^k\), and write our statement on convergence to \(\psi\) in the form

\[ \sum_{n\in \mathbf{N}} \sum_m \left|\psi(m) - \sum_{\substack{\sigma_i= \pm 1,\; p_i\in \mathbf{P}\\ \forall 1\leq i\leq 2^\ell: p_i|n+\sigma_1 p_1 + \dotsc + \sigma_{i-1} p_{i-1}\\ \sigma_1 p_1 + \dotsc + \sigma_k p_k = m}} 1\right| = \text{err}_1 \cdot N \mathscr{L}^k. \]

Then

\[ S\leq (\text{err}_1^{1/k} + \text{err}_2^{1/k})\cdot N \mathscr{L}. \]

Here already we would seem to have a problem. The “width” \(M\) of the distribution \(\psi\) (meaning its scale) should be \(\ll \sqrt{k} \cdot \mathbb{E}(p: p\in \mathbb{P})\leq \sqrt{k} H\); the distribution could be something like a Gaussian at that scale, say. Now, the bound from Matomäki-Radziwiłł is roughly of the quality \(\text{err}_2\leq 1/\log M\). One can use intermediate results in the same paper to obtain a bound on \(\text{err}_2\) roughly of the form \(1/M^{\delta}\), \(\delta>0\), if we remove some integers from \(\mathbf{N}\). At any rate, it is clear that we would need, at the very least, \(k\) larger than any constant times \(\log H\). As it turns out, all of that is a non-issue, in that there is a way to avoid taking the \(k\)th root of \(\text{err}_2\) altogether. Let us make a mental note, however.


The question now is how large \(\ell\) has to be for the number of walks of length \(k=2^{\ell}\) from \(n\) to \(n+m\) to approach a continuous distribution \(\psi(m)\). Consider first the walks \(n, n+\sigma_1 p_1,\dotsc,n+\sigma_1 p_1 + \dotsb + \sigma_k p_k\) such that no prime \(p_i\) is repeated. Fix \(\sigma_i\), \(p_i\) and let \(n\) vary. By the Chinese Remainder Theorem, the number of \(n\in \mathbf{N}\) such that

\[ p_1|n,\; p_2|n+\sigma_1 p_1,\; \dotsc,\; p_k|n+\sigma_1 p_1 + \dotsc + \sigma_{k-1} p_{k-1} \]

is almost exactly \(N/p_1 p_2 \dotsb p_k\). In other words, the probabilityof that walk being allowed is almost exactly \(1/p_1 \dotsc p_k\). We may thus guess that \(\psi\) has the same shape as (\(\mathscr{L}^k\) times) the distribution of the endpoint of a random walk where each edge of length \(p\) is taking with probability \(1/p_i\) (divided by \(\mathscr{L}\), so that the probabilities add up to \(1\)). That distribution should indeed tend to a continuous distribution — namely, a Gaussian — fairly quickly. Of course, here, we are just talking about the contribution of walks with distinct edges \(p_i\) to

\[ \sum_{n\in \mathbf{N}} \sum_m \left(\psi(m) - \sum_{\substack{\sigma_i= \pm 1,\; p_i\in \mathbf{P}\\ \forall 1\leq i\leq 2^\ell: p_i|n+\sigma_1 p_1 + \dotsc + \sigma_{i-1} p_{i-1}\\ \sigma_1 p_1 + \dotsc + \sigma_k p_k = m}} 1\right), \]

without absolute values, but we can get essentially what we want by looking at the variance

\[ \sum_{n\in \mathbf{N}} \sum_m \left(\psi(m) - \sum_{\substack{\sigma_i= \pm 1,\; p_i\in \mathbf{P}\\ \forall 1\leq i\leq k: p_i|n+\sigma_1 p_1 + \dotsc + \sigma_{i-1} p_{i-1}\\ \sigma_1 p_1 + \dotsc + \sigma_k p_k = m}} 1\right)^2, \]

and considering the contribution to this variance made by closed walks

\[ \begin{aligned}n, &n+\sigma_1 p_1, \dotsc, n+\sigma_1 p_1 + \dotsb + \sigma_k p_k = m,\\ &n+\sigma_1 p_1 + \dotsb + \sigma_k p_k-\sigma_{k+1} p_{k+1},\dotsc ,m-(\sigma_{k+1} p_{k+1} + \dotsc + \sigma_{2 k} p_{2 k})=n\end{aligned} \]

with \(p_1,p_2,\dotsc,p_{2 k}\) distinct.

The contribution of these closed walks is almost exactly what we would obtain from the naïve model we were considering, viz., a random walk where each edge \(p_i\) is taken with probability \(1/(\mathscr{L} p_i)\), and so we should have the same limiting distribution as in that model.

What about walks where some primes \(p_i\) do repeat? At least some of them may make a large contribution that is not there in our naïve model. For instance, consider walks of length \(2 k\) that retrace their steps, so that the \((n+1)\)th step is the \(n\)th step backwards, the \((n+2)\)th step is the \((n-1)\)th step backwards, etc.:

\[ \begin{aligned} n, &n+\sigma_1 p_1, \dotsc, n+\sigma_1 p_1 + \dotsb + \sigma_k p_k,\\ &n+\sigma_1 p_1 + \dotsb + \sigma_{k-1} p_{k-1},\dotsc ,n+\sigma_1 p_1, n,\end{aligned} \]

with

\[ p_1|n,\; p_2|n+\sigma_1 p_1,\; \dotsc,\; p_k|n+\sigma_1 p_1 + \dotsc + \sigma_{k-1} p_{k-1}, \]

\[ p_k|n+\sigma_1 p_1 + \dotsc + \sigma_{k-1} p_{k-1} + \sigma_k p_k,\; \dotsc,\; p_2|n+\sigma_1 p_1 + \sigma_2 p_2,\; p_1|n+\sigma_1 p_1. \]

The second row of divisibility conditions here is obviously implied by the first row. Hence, again by the Chinese Remainder Theorem, the walk is valid for almost exactly \(N/p_1 p_2 \dotsb p_k\) elements \(n\in \mathbf{N}\),rather than for \(N/(p_1 p_2 \dotsb p_k)^2\) elements. The contribution of such walks to

\[ \sum_{n\in \mathbf{N}} \sum_{\substack{\forall 1\leq i\leq 2 k: \sigma_i= \pm 1,\; p_i\in \mathbf{P}\\ \forall 1\leq i\leq 2 k: p_i|n+\sigma_1 p_1 + \dotsc + \sigma_{i-1} p_{i-1}\\ \sigma_1 p_1 + \dotsc + \sigma_{2 k} p_{2 k} = 0}} 1 \]

(which is the interesting part of the variance we wrote down before) is clearly \(N \mathscr{L}^k\). In order for it not to be of greater order than what one expects from the limiting distribution, we should have \(N \mathscr{L}^k \ll N \mathscr{L}^{2 k}/M\), where \(M\), the width of the distribution, is, as we saw before, very roughly \(\sqrt{k} H\). Thus, we need \(k\gg (\log H)/(\log \mathscr{L})\).

There are of course other walks that make similar contributions; take, for instance,

\[ n, n+p_1, n, n-p_3, n-p_3 + p_4, n-p_3, n-p_3+p_6, n-p_3, n \]

for \(k=3\). These are what we may call trivial walks, in the sense that a word is trivial when it reduces to the identity. It is tempting to say that their number is \(2^k C_k\), where \(C_k\leq 2^{2 k}\) is the \(k\)th Catalan number (which, among other things, counts the number of expressions containing \(k\) pairs of parentheses correctly matched: for example, \(() (())\) would correspond to the trivial walk above). In fact, the matter becomes more subtle because some primes may reappear without taking us one step further back to the origin of the walk; for instance, in the above, we might have \(p_4=p_1\), and that is a possibility that is not recorded by a simple pattern of correctly matched parentheses— yet it must be considered separately. Here again we make a mental note. When we say we reduce a walk, we mean an analogous procedure to that of reducing a word.

It is, incidentally, no coincidence that, when we try to draw the trivial walk above, we produce a tree:

Any trivial walk gives us a tree (traversal) when drawn.

Now let us look at walks that fall into neither of the two classes just discussed; that is, walks where we do have some repeated primes \(p_i=p_{i'}\) even after we reduce the walk. Then, far from being independent, the condition

\[ p_i|n + \sigma_1 p_1 + \dotsc + \sigma_{i-1} p_{i-1} \]

either implies or contradicts the condition

\[ p_i=p_{i'}|n+\sigma_1 p_1 + \dotsc + \sigma_{i'-1} p_{i'-1} \]

for given \(\{(\sigma_i,p_i)\}_i\), depending on whether

\[ p_i|n+\sigma_{i+1} p_{i+1} + \dotsc + \sigma_{i'-1} p_{i'-1}. \]

We may draw another graph, emphasizing the two edges with the same label \(\pm p_i\):

At this point it becomes convenient to introduce the assumption that \(p\geq H_0\) for all \(p\in \mathbf{P}\). Then it is clear that, if \(p_j\ne p_i\) for all \(i < j < i'\), the divisibility condition \(p_i|n+\sigma_{i+1} p_{i+1} + \dotsc + \sigma_{i'-1} p_{i'-1}\) may hold only for a proportion \(\ll 1/H_0\) of all tuples \((p_{i+1},\dotsc,p_{i'-1})\).

So far, so good, except that we are in a similar situation as before: it is not enough to save one factor of \(H_0\), and indeed we should save a factor of at least \(M\), which is roughly in the scale of $H$, not \(H_0\). Obviously, for \(\mathscr{L}\to \infty\) to hold, we need \(H_0 = H^{o(1)}\), and so we need to save more than any constant number of factors of \(H_0\).

We have seen three rather different cases. In general, we would like to have a division of all walks into three classes:

  1. walks containing enough non-repeated primes \(p_i\) that their contribution is one would expect from the hoped-for limiting distribution;
  2. rare walks, such as, for example, trivial walks;
  3. walks for which there are many independent conditions of the form \(p_i|n+\sigma_{i+1} p_{i+1} + \dotsc + \sigma_{i'-1} p_{i'-1}\) as above.

Some initial thoughts on the third case. We should think a little about what we mean or should mean by “independent”. It is clear that, if we have several conditions \(p|L_j(p_1,\dotsc,p_{2 k})\), where the \(L_j\) are linear forms spanning a space of dimension \(D\), then, in effect, we have only \(D\) distinct conditions. It is also clear that, while having several primes \(p_i\) divide the same quantity \(L(p_1,\dotsc,p_{2 k})\) ought to give us more information than just knowing one prime divides it, that is true only up to a point: if \(L(p_1,\dotsc,p_{2 k})=0\) (something that we expect to happen about \(1/\sqrt{k} H\) of the time), then every condition of the form \(p_i|L(p_1,\dotsc,p_{2 k})\) holds trivially.

It is also the case that we should be careful about which primes do the dividing. Say two indices \(i\), \(i'\) are equivalent if \(p_i=p_{i'}\). Choose your equivalence relation \(\sim\), and paint the indices \(i\) in some equivalence classes blue, while painting the indices \(i\) in the other equivalence classes red. It is not hard to show, using a little geometry of numbers, that, if \(p_{i_j}|L_j(p_1,\dotsc,p_{2 k})\) for some blue indices \(i_j\) and linear forms \(L_j\), \(j\in J\), and the space spanned by the forms \(L_j\) considered as formal linear combinations on the variables \(x_i\) for \(i\) blue is \(D\), we can gain a factor of at least \(H_0^D\) or so: the primes \(p_i\) for \(i\) red have to lie in a fixed lattice of codimension \(D\) and index \(\geq H_0^D\). A priori, however, it is not clear which primes we should color blue and which ones red.

We have, at any rate, arrived at what may be called the core of the problem – how to classify our walks in three classes as above, and how to estimate their contribution accordingly.


It is now time to step back and take a fresh look at the problem. Matters will become clearer and simpler, but, as we will see, the core of the problem will remain.

We have been talking about walks. Now, walks are taken in a graph. Thinking about it for a moment, we see that we have been considering walks in the graph \(\Gamma\) having \(V=\mathbf{N}\) as its set of vertices and \(E=\{{n,n+p}: n,n+p\in \mathbf{N}, p\in \mathbf{P}, p|n\}\) as its set of edges. (In other words, we draw an edge between \(n\) and \(n+p\) if and only if \(p\) divides \(n\).) We also considered random walks in what we called the “naïve model”; those are walks in the weighted graph \(\Gamma'\) having \(\mathbf{N}\) as its set of vertices and an edge of weight \(1/p\) between any \(n, n+p\in \mathbf{N}\) with \(p\in \mathbf{P}\), regardless of whether \(p|n\).

Questions about walks in a graph \(\Gamma\) are closely tied to the adjacency operator \(\textrm{Ad}_\Gamma\). This is a linear operator on functions \(f:V\to \mathbb{C}\) taking \(f\) to a function \(\textrm{Ad}_\Gamma f:V\to \mathbb{C}\) defined as follows: for \(v\in V\), \[ (\textrm{Ad}_\Gamma f)(v) = \sum_{w: \{v,w\}\in E} f(w). \] In other words, \(\textrm{Ad}_\Gamma\) replaces the value of \(f\) at a vertex \(v\) by the sum of its values \(f(w)\) at the neighbors \(w\) of \(v\). The connection with walks is not hard to see: for instance, it is very easy to show that, if \(1_v:V\to \mathbb{C}\) is the function taking the value \(1\) at \(v\) and \(0\) elsewhere, then, for any \(w\in V\) and any \(k\geq 0\), \(((\textrm{Ad}_\Gamma)^k 1_v)(w)\) is the number of walks of length \(k\) from \(v\) to \(w\).

The connection between \(\textrm{Ad}_\Gamma\) and our problem is very direct, in that it can be stated without reference to random walks. We want to show that \[ \sum_{n\in \mathbf{N}} \sum_{\sigma = \pm 1} \sum_{\substack{p\in \mathbf{P}: p|n\\ n+\sigma p\in \mathbf{N}}} \lambda(n) \lambda(n+\sigma p) = o(N \mathscr{L}). \] That is exactly the same as showing that \[ \langle \lambda, \textrm{Ad}_{\Gamma} \lambda\rangle = o(\mathscr{L}), \] where \(\langle \cdot, \cdot\rangle\) is the inner product defined by \[ \langle f,g\rangle = \frac{1}{N} \sum_{n\in \mathbf{N}} f(n) \overline{g(n)} \] for \(f,g:V\to \mathbb{C}\).

The behavior of random walks on a graph — in particular, the limit distribution of their endpoints — is closely related to the notion of expansion. A regular graph \(\Gamma\) (that is, a graph where every vertex has the same degree \(d\)) is said to be an expander graph with parameter \(\epsilon>0\) if, for every eigenvalue \(\gamma\) of \(\textrm{Ad}_\Gamma\) corresponding to an eigenfunction orthogonal to constant functions, \[|\gamma|\leq (1-\epsilon) d.\] (A few basic remarks may be in order. Since \(\Gamma\) is regular of degree \(d\), a constant function on \(V\) is automatically an eigenfunction with eigenvalue \(d\). Now, \(\textrm{Ad}_\Gamma\) is a symmetric operator, and thus it has full real spectrum: the space of all functions \(V\to \mathbb{C}\) is spanned by a set of eigenfunctions of \(\textrm{Ad}_\Gamma\), all orthogonal to each other; the corresponding eigenvalues are all real, and it is easy to see that all of them are at most \(d\) in absolute value.)

It is clear that we need something both stronger and weaker than expansion. First of all, our graph \(\Gamma\) is not regular; its average degree is \(\mathscr{L}\). We need a stronger bound than that provided by expansion: we want to show, not just that \(|\langle \lambda, \textrm{Ad}_\Gamma \lambda\rangle|\) is \(\leq (1-\epsilon) \mathscr{L}\), but that it is \(= o(\mathscr{L})\). There is nothing unrealistically strong here — in the strongest kind of expander graph (Ramanujan graphs), the absolute value of every eigenvalue is at most \(2\sqrt{d-1}\).

At the same time, we cannot ask for \(\langle f, \textrm{Ad}_\Gamma f\rangle/|f|_2^2 = o(\mathscr{L})\) to hold for every \(f\) orthogonal to constant functions. Take \(f = 1_{I_1}-1_{I_2}\), where \(I_1\), \(I_2\) are two disjoint intervals of the same length \(\geq 100 H\), say. Then \(f\) is orthogonal to constant functions, but \((\textrm{Ad}_\Gamma f)(n)\) is equal to \(\omega(n) f(n)\), except possibly for those \(n\) that lie at a distance \(\leq H\) of the edges of \(I_1\) and \(I_2\). Hence, \(\langle f,\textrm{Ad}_{\Gamma} f\rangle/|f|_2^2\) will be close to \(\mathscr{L}\). It follows that \(\textrm{Ad}_\Gamma\) will have at least one eigenfunction orthogonal to constant functions and with eigenvalue close to \(\mathscr{L}\).

(This observation is related to the fact that endpoint of a short random walk on \(\Gamma\) cannot be approximately equidistributed, as it is in an expander graph: the edges of \(\Gamma\) are too short for that. The most we could hope for is what we were aiming for, namely, that the distribution of the endpoint converges to a nice distribution, centered at the starting point.)

We could aim to show that \(\langle f,\textrm{Ad}_\Gamma f\rangle/|f|_2^2\) is small whenever \(f\) is approximately orthogonal to approximately locally constant functions, say. Since Matomäki-Radziwiłł can be interpreted as the statement that \(\lambda\) is approximately orthogonal to such functions, we would then obtain what we wanted to prove for \(f=\lambda\).

We will find it cleaner to proceed slightly differently. Recall our weighted graph \(\Gamma'\), which was meant as a naïve model for \(\Gamma\). It has an adjacency operator \(\textrm{Ad}_{\Gamma'}\) as well, defined as before. (Since \(\Gamma'\) has weights \(1/p\) on its edges, \((\textrm{Ad}_{\Gamma'} f)(n) = \sum_{p\in \mathbf{P}} (f(n+p) + f(n-p))\).) It is not hard to show, using the techniques in Matomäki-Radziwiłł, that \[\langle \lambda,\textrm{Ad}_{\Gamma'} \lambda\rangle = o(\mathscr{L}).\] (In fact, what amounts to this statement has already been shown, in Tao (Lemma 3.4-3.5); the main ingredient is Theorem 1.3 in Matomäki-Radziwiłł-Tao, which applies and generalizes the main theorem in Matomäki-Radziwiłł. Their bound is a fair deal smaller than \(o(\mathscr{L})\).) We define the operator \[A = \textrm{Ad}_{\Gamma}-\textrm{Ad}_{\Gamma'}.\] It will then be enough to show that \[\langle \lambda,A\lambda\rangle = o(\mathscr{L}),\] as then it will obviously follow that \[\langle \lambda,\textrm{Ad}_\Gamma \lambda\rangle = \langle \lambda, A\lambda \rangle + \langle \lambda, \textrm{Ad}_{\Gamma'} \lambda \rangle = o(\mathscr{L}).\]

It would be natural to guess, and try to prove, that \(\langle f, Af\rangle = o(\mathscr{L})\) for all \(f:V\to \mathbb{C}\) with \(|f|_2=1\), i.e., that all eigenvalues of \(A\) are \(o(\mathscr{L})\).

We cannot hope for quite that much. The reason is simple. For any vertex \(n\), \(\langle A 1_n, A 1_n\rangle\) equals the sum of the squares of the weights of the edges \(\{n,n'\}\) containing \(n\). That sum equals \[\sum_{\substack{p\in \mathbf{P}\\ p|n}} \left(1-\frac{1}{p}\right)^2 + \sum_{\substack{p\in \mathbf{P}\\ p\nmid n}} \frac{1}{p^2},\] which in turn is greater than \(1/4\) times the number \(\omega_{\mathbf{P}}(n)\) of divisors of \(n\) in \(\mathbf{P}\). Thus, \(A\) has at least one eigenvalue greater than \(\sqrt{\omega_{\mathbf{P}}(n)}/2\). Now, typically, \(n\) has about \(\mathscr{L}\) divisors in \(\mathbf{P}\), but some integers \(n\) have many more; for some rare \(n\), in fact, \(\omega_{\mathbf{P}}(n)\) will be greater than \(\mathscr{L}^2\), and so there have to be eigenvalues of \(A\) greater than \(\mathscr{L}\).

It is thus clear that we will have to exclude some integers, i.e., we will define our vertex set to be some subset \(\mathscr{X}\subset \mathbf{N}\) with small complement. We will set ourselves the goal of proving that all of the eigenvalues of the operator \(A|_\mathscr{X}\) defined by \[(A|_\mathscr{X})(f) = (A(f|_\mathscr{X}))|_\mathscr{X}\] are \(o(\mathscr{L})\). (Here \(f|_\mathscr{X}\) is just the function taking the value \(f(n)\) for \(n\in \mathscr{X}\) and \(0\) for \(n\not\in \mathscr{X}\).) Then, for \(f=\lambda\), or for any other \(f\) with \(|f|_\infty\leq 1\), \[\langle f, A f\rangle = \langle f, (A|_\mathscr{X}) f\rangle + O\left(\sum_{n\in \mathbf{N}\setminus \mathscr{X}} (\omega_{\mathbf{P}}(n)+1)\right),\] where, if \(\mathbf{N}\setminus \mathscr{X}\) is small enough (as it will be), it will not be hard to show that the sum within \(O(\cdot)\) is quite small. We will then be done: obviously \(\langle f, (A|_\mathscr{X}) f\rangle\) is bounded by the largest eigenvalue of \(A|_\mathscr{X}\) times \(|f|_2\) (which is \(\leq |f|_\infty\leq 1\)), and so we will indeed have \(\langle f, A f\rangle = o(\mathscr{L})\).

We will in fact be able to prove something stronger: there is a subset \(\mathscr{X}\subset \mathbf{N}\) with small complement such that all eigenvalues of \(A|_\mathscr{X}\) are \[O(\sqrt{\mathscr{L}}).\] (This bound is optimal up to a constant factor.) This is our main theorem.

We hence obtain that \[\langle \lambda, A\lambda\rangle = O(\sqrt{\mathscr{L}}).\] (We also obtain \(\langle f,A f\rangle=O(\sqrt{\mathscr{L}})\) for any \(f\) with \(|f|_\infty\leq 1\), or for that matter by any \(f\) with \(|f|_4\leq e^{100 \mathscr{L}}\) and \(|f|_2\leq 1\).) It is thus that we get the bound \[ \frac{1}{\log x} \sum_{n\leq x} \frac{\lambda(n) \lambda(n+1)}{n} = O\left( \frac{1}{\sqrt{\log \log x}}\right) \] we stated at the beginning, and other consequences besides.


Now that we know what we want to prove, let us come up with a strategy.

There is a completely standard route towards bounds on eigenvalues of operators such as \(A\), relying on the fact that the trace is invariant under conjugation. Because of this invariance, the trace of a power \(A^{2 k}\) is the same whether \(A\) is written taking a full family of orthogonal eigenvectors as a basis, or just taking the characteristic functions \(1_n\) as our basis. Looking at matters the first way, we see that \[\textrm{Tr} A^{2 k} = \sum_{i=1}^N \lambda_i^{2 k},\] where \(\lambda_1,\lambda_2,\dotsc,\lambda_N\) are the eigenvalues corresponding to the basis made out of eigenvectors. Looking at matters the second way, we see that \(\textrm{Tr} A^{2 k} = N_{2 k}\), where \(N_{2 k}\) is the sum over all closed walks of length \(2 k\) of the products of the weights of the edges in each walk: \[N_{2 k} = \sum_{n\in \mathscr{X}} \sum_{\substack{p_1,\dotsc,p_{2 k}\in \mathbf{P}\\ \sigma_1,\dotsc,\sigma_{2 k}\in \{-1,1\}\\ \forall 1\leq i\leq 2 k: n+\sigma_1 p_1 + \dotsc + \sigma_i p_i\in \mathscr{X} }} \prod_{i=1}^{2 k} \left(1_{p_i|n+\sigma_1 p_1+\dotsc+\sigma_{i-1} p_{i-1}} - \frac{1}{p_i}\right)\] where we adopt the convention \(1_\text{true}=1\), \(1_\text{false}=0\).

Since all eigenvalues are real, it is clear that \[\lambda_i^{2 k} \leq N_{2 k}\] for every eigenvalue \(\lambda_i\). Often, and also now, that inequality is not enough in itself for a good bound on \(\lambda_i\). What is then often done is to show that every eigenvalue must have multiplicity \(\geq M\), where \(M\) is some large quantity. Then it follows that, for every eigenvalue \(\gamma\), \[M \gamma^{2 k} \leq N_{2 k},\] and so \(|\gamma|\leq (N_{2 k}/M)^{1/2 k}\).

We do not quite have high multiplicity here (why would we?) but we have something that is almost as good: if there is one large eigenvalue, then there are many mutually orthogonal functions \(g_i\) of norm \(1\) with \(\langle g_i, A g_i\rangle\) large. Then we can bound \(\textrm{Tr} A^{2 k}\) from below, using these functions \(g_i\) (and some arbitrary functions orthogonal to them) as our basis, and, since \(\textrm{Tr} A^{2 k}\) also equals \(N_{2 k}\), we can hope to obtain a contradiction with an upper bound on \(N_{2 k}\).

For simplicity, let us start by sketching a proof that, if \(|\langle f, A f\rangle|\) is large (\(\geq \rho \mathscr{L}\), say) for some \(f\) with \(|f|_\infty\leq 1\), then there are many orthogonal functions \(g_i\) of norm \(1\) and \(\langle g_i, A g_i\rangle\) large (meaning \(\geq \rho \mathscr{L}/2\), say). This weaker statement suffices for our original goal, since we may set \(f\) equal to the Liouville function \(\lambda\).

Let \(I_1,I_2,\dotsc \subset \mathbb{N}\) be disjoint intervals of length \(\geq 10 H/\rho\) (say) covering \(\mathbb{N}\). Edges starting at a vertex \(v\) in \(I_i\) end at another vertex in \(I_i\), unless they are close to the edge. Hence, \(\sum_i |\langle f|_{I_i},A \left(f|_{I_i}\right)\rangle|\) is not much smaller than \(|\langle f,A f\rangle|\), and then it follows easily that \(\langle f|_{I_i}, A \left(f|_{I_i}\right)\rangle/|f|_{I_i}|_2^2\) must be large for many \(i\). Thus, setting \(g_i = f|_{I_i}/|f|_{I_i}|\) for these \(i\), we obtain the desired statement.

To prove truly that \(A\) has no large eigenvalues, we should proceed as we just did, but assuming only that \(|f|_2\leq 1\), not that \(|f|_\infty \leq 1\). The basic idea is the same, except that (a) pigeonholing is a little more delicate, (b) if \(f\) is almost entirely concentrated in a small subset of \(\mathbf{N}\), then we can extract only a few mutually orthogonal functions \(g_i\) from it. Recall that we are anyhow restricting to a set \(\mathscr{X}\subset \mathbf{N}\). A brief argument suffices to show that we can avoid the problem posed by (b) simply by making \(\mathscr{X}\) a little smaller (essentially: deleting the support of such \(g_i\), and then running through the entire procedure again), while keeping its complement \(\mathbf{N}\setminus \mathscr{X}\) very small.

In any event: we obtain that, if, for some \(X\subset \mathbf{N}\), \(\textrm{Tr} (A|_X)^{2 k}\) is not too large (smaller than \((\rho \mathscr{L}/2)^{2 k} N/H\) or so) then there is a subset \(\mathscr{X}\subset X\) with \(X\setminus \mathscr{X}\) small such that every eigenvalue of \(A|_\mathscr{X}\) is small (\(\leq \rho \mathscr{L}\)). It thus remains to prove that \(\textrm{Tr} (A|_X)^{2 k}\) is small for some \(X\subset \mathbf{N}\) with small complement \(\mathbf{N}\setminus X\).

Recall that \(\textrm{Tr} (A|_X)^{2 k} = N_{2 k}\) (with \(N_{2 k}\) defined as above, except with \(X\) instead of \(\mathscr{X}\)) and that \(X\) should not include integers \(n\) with many more prime divisors in \(\mathbf{P}\) than average. Our task is to bound \(N_{2 k}\).


We have come full circle, or rather we have arrived twice at the same place. We started with a somewhat ingenuous approach that lead us to random walks. Then we took a step back and analysed the situation in a way that turned out to be cleaner; for instance, the problem involving \(\textrm{err}_2^{1/k}\) vanished. As it happens, that cleaner approach took us to random walks again. Surely this is a good sign.

It is also encouraging to see signs that other people have thought in the same direction. The paper by Matomäki-Radziwiłł-Tao on sign patterns of \(\lambda\) and \(\mu\) is based on the examination of a graph equivalent to \(\Gamma\); what they show is, in essence, that \(\Gamma\) is almost everywhere locally connected. Being connected may be a much weaker property than expansion, but it is a step in the same direction. As for expansion itself, Tao (§4) comments that “some sort of expander graph property” may hold for that graph (equivalent to \(\Gamma\)) “or [for] some closely related graph”. He goes on to say:

Unfortunately we were unable to establish such an expansion property, as the edges in the graph […] do not seem to be either random enough or structured enough for standard methods of establishing expansion to work.”

And so we will set about to establish expansion by our methods (standard or not).

In any event, our initial discussion of random walks is still pertinent. Recall the plan with which we concluded, namely, to divide walks into three kinds: walks with few non-repeated primes, walks imposing many independent divisibility conditions, and rare walks. This plan will shape our approach to bounding \(N_{2 k}\) in our following post.