Chapter 5

Section 5.1, "Sums over square-free numbers"

All computations are embedded in the TeX file, except for one, which has
been separated into its own file so as not to slow down compilation.

Within the section 4.1 directory, type 

sage checkQ_2.sage

if you have sage installed. This is a minor computation that could easily be
coded in C, just like everything else, rather than Python. (Consider it a small
demonstration of how computing times and readability depend on the language.)

Section 5.2, "Sums over primes"

The (free) package primesieve (http://primesieve.org) needs to be installed.
The version used was 6.3. The following programs link to it. (Compilation
instructions are given at the beginning of each file.) As always, we also
need the file "int_double14.0.h" to be in the directory.

-----------------------------
./estpsi LB N M flag

computes the minimum and the maximum of (psi(x)-x)/x for all LB<=x<=N,
using a sieve on segments of length M.

(Here psi(x) = \sum_{n\leq x} \Lambda(n),
where Lambda is the von Mangoldt function.)

If flag is present and not 0,
  estpsi outputs the current state of the computation to stderr
                  every M steps.

Default values: LB=100000, N=1000000, M=10000, flag=0.
If N and M are present but LB and flag are omitted,
the program sets LB to the integer part of 2*sqrt(N),
and also sets flag to 0.

Examples used in the text:

A.
./estpsi 
bounds (psi(x)-x)/x in the range 10^5<=n<=10^6.

B.
./estpsi 10000000000 10686500000000
bounds (psi(x)-x)/x in the range 10^10<=n<=e^30.

Alternatively, we may type
./estpsi 10000000000 10686500000000 10000000 1
to do the same;
having output on stderr can be helpful,
since this computation runs overnight/over the weekend.

The final output on stdout should be:

psi(10686500000000)-10686500000000 lies in [-532644.49144683 -532622.30627288]
min of (psi(x)-x)/x for 10000000000<=x<=10686500000000 is at least -6.7684520269899e-06
max of (psi(x)-x)/x for 10000000000<=x<=10686500000000 is at most  6.382912813058e-06

----------------

./estpsin LB N M flag

computes the minimum and the maximum of S(x)/(x^2/2) for all LB<=x<=N,
 where S(x) = \sum_{n\leq x} \Lambda(n) n
using a sieve on segments of length M.

If flag is present and not 0,
  estpsi outputs the current state of the computation to stderr every M steps.

Default values: LB=100000, N=3000000, M=10000, flag=0.
If N and M are present but LB and flag are omitted,
the program sets LB to the integer part of 2*sqrt(N),
and also sets flag to 0.

Examples used in the text:

A.
./estpsin

Output:
Let S(x) be the sum of Lambda(n) n from 1 to x.
Then S(3000000) lies in [4500023737623.7 4500023737671.1]
min of S(x)/(x^2/2) for 100000<=x<=3000000 is at least 0.9970494965257
max of S(x)/(x^2/2) for 100000<=x<=3000000 is at most  1.0030143812146

B.
 ./estpsin 1 3000000 10000
Output:
Let S(x) be the sum of Lambda(n) n from 1 to x.
Then S(3000000) lies in [4500023737623.7 4500023737671.1]
min of S(x)/(x^2/2) for 1<=x<=3000000 is at least 0
max of S(x)/(x^2/2) for 1<=x<=3000000 is at most  1.2401527609228


---------------

./esttheta LB N M flag

computes the minimum and the maximum of theta(x)/x for all LB<=x<=N,
using a sieve on segments of length M.

(Here theta(x) = \sum_{p\leq x} \log p.)

If flag is present and not 0,
  esttheta outputs the current state of the computation to stderr
                  every M steps.

Default values: LB=100000, N=1000000, M=10000, flag=0.
If N and M are present but LB and flag are omitted,
the program sets LB to the integer part of 2*sqrt(N),
and also sets flag to 0.

Examples used in the text:

./esttheta 100000 1160000
bounds theta(x)/x in the range 10^5<=n<=1160000

----------------

./esttheta2 LB N M flag

computes the minimum and the maximum of (theta(x)-x)/sqrt(x) for all LB<=x<=N,
using a sieve on segments of length M.

The treatment of parameters LB, N, M, flag is as above.

Examples used in the text:

./esttheta2 100000 10000000000
bounds (theta(x)-x)/sqrt(x) in the range 10^5<=n<=10^10

./esttheta2 1 10000000000
bounds (theta(x)-x)/sqrt(x) in the range 1<=n<=10^10

Section 5.3, "Sums of mu"

./checkm N v n0 n1 n2 M flag

computes m_v(x), \check{m}_v(x) and \check{\check{m}}_v(x) for x\leq N,
and computes various bounds on them
for n0<=x<=N, n1<=x<=N and n2<=x<=N (respectively)
Default values: n0 = n1 = n2 = 2, n0 = n1 = n2 = 3

uses a sieve on segments of length M (default value: floor of 2 sqrt(N))

if flag=1 (default value: flag=0), give more output (useful for debugging)

For n<1000000, intervals [n,n+v] are subdivided for greater precision
For n>=1000000, \log(1+v/n) is implemented as a quartic polynomial on v/n

Examples used in the text:
A.
./checkm 1000000 1

Output:
Let T_0(x) = m_1(x), 
    T_1(x) = \check{m}_1(x) - 1,
    T_2(x) = \check{\check{m}}_1(x) - 2\log x + 2 \gamma.
For x\leq 1000000
	|T_1(x)|\leq 1/x + 0.01998230204/sqrt(x)
	|T_2(x)|\leq 1.15443/x + 0.0007491798977/sqrt(x)
For 2\leq x\leq 1000000:	 |T_0(x)|\leq 0.8660254038/sqrt(x) 
For 2\leq x\leq 1000000:	 |T_1(x)|\leq 0.4339556359/sqrt(x) 
For 2\leq x\leq 1000000:	 |T_2(x)|\leq 0.3515595006/sqrt(x) 

B.
./checkm 1000000 2

Output:
Let T_0(x) = m_2(x),
    T_1(x) = \check{m}_2(x) - 2,
    T_2(x) = \check{\check{m}}_2(x) - 4\log(x/2) + 4 \gamma.
For x\leq 1000000
	|T_1(x)|\leq 2.9/x + 0.0038631572/sqrt(x)
	|T_2(x)|\leq 5.77/x + 0.001812332853/sqrt(x)
For 3\leq x\leq 1000000:	 |T_0(x)|\leq 1.490711985/sqrt(x) 
For 3\leq x\leq 1000000:	 |T_1(x)|\leq 1.561250875/sqrt(x) 
For 3\leq x\leq 1000000:	 |T_2(x)|\leq 3.280422965/sqrt(x) 

C.
./checkm 1000000 2 3 2001 3

Output:
Let T_0(x) = m_2(x),
    T_1(x) = \check{m}_2(x) - 2,
    T_2(x) = \check{\check{m}}_2(x) - 4\log(x/2) + 4 \gamma.
For x\leq 1000000
	|T_1(x)|\leq 2.9/x + 0.0038631572/sqrt(x)
	|T_2(x)|\leq 5.77/x + 0.001812332853/sqrt(x)
For 3\leq x\leq 1000000:	 |T_0(x)|\leq 1.490711985/sqrt(x) 
For 2001\leq x\leq 1000000:	 |T_1(x)|\leq 0.06819892262/sqrt(x) 
For 3\leq x\leq 1000000:	 |T_2(x)|\leq 3.280422965/sqrt(x) 


The examples given below for checkmopt will also run, but it is advisable
to give them to checkmopt instead to save on running time.

For instance:

./checkmopt 1000000000000 1 3 11 2
Output:
Let m_1(x) = \check{m}(x) - 1,
    m_2(x) = \check{\check{m}}(x) - 2\log x + 2 \gamma.
For x\leq 1000000000000
	|m_1(x)|\leq 1/x + 0.01998230204/sqrt(x)
	|m_2(x)|\leq 1.15443/x + 0.002323466569/sqrt(x)
For 3\leq x\leq 1000000000000:	 |m_0(x)|\leq 0.5694485495/sqrt(x) 
For 11\leq x\leq 1000000000000:	 |m_1(x)|\leq 0.02341873602/sqrt(x) 
For 2\leq x\leq 1000000000000:	 |m_2(x)|\leq 0.3515595006/sqrt(x) 

./checkm 1000000000000 2 1423 19341 77581
Output:
Let T_0(x) = m_2(x), 
    T_1(x) = \check{m}_2(x) - 2,
    T_2(x) = \check{\check{m}}_2(x) - 4\log(x/2) + 4 \gamma.
For x\leq 1000000000000
	|T_1(x)|\leq 2.9/x + 0.01346398873/sqrt(x)
	|T_2(x)|\leq 5.77/x + 0.001812332853/sqrt(x)
For 1423\leq x\leq 1000000000000:	 |T_0(x)|\leq 0.3900551317/sqrt(x) 
For 19341\leq x\leq 1000000000000:	 |T_1(x)|\leq 0.02535769352/sqrt(x)
For 77581\leq x\leq 1000000000000:	 |T_2(x)|\leq 0.01997968023/sqrt(x) 

------------


./checkmopt N v n0 n1 n2 M flag

does exactly the same as checkm (see above), except that the implementation
of the sieve has been optimized for large N (>=10^9, say). The ideas are
explained in Chapter 4.3 (and in the code).

Some more details on the implementation:
\mu is stored as two bitarrays,
one specifying whether $|\mu(n)|=1$, i.e., whether $n$ is squarefree,
and the other one specifyinh whether $\mu(n) = 1$ or $\mu(n)=-1$.
The ceiling of log_{256} m is an integer between 0 and 15 (in fact,
        between 0 and 7) and can thus be stored in 4 bits.

Possible further optimizations left to the impatient reader:
* Code log(1+v/n) as a quadratic polynomial on v/n for n large
* Approximate log(1+v/n) as a linear polynomial on short segments for n large;
  thus log will be replaced by addition most of the time
* Use parallel programming (see musum, below)

Examples used in the text:
(Warning - either takes a couple of days)

A.
./checkmopt 1000000000000 1 3 11 2

Let T_0(x) = m_1(x),
Let T_1(x) = \check{m}_1(x) - 1,
    T_2(x) = \check{\check{m}}_1(x) - 2\log x + 2 \gamma.
For x\leq 1000000000000
        |T_1(x)|\leq 1/x + 0.01998230204/sqrt(x)
        |T_2(x)|\leq 2 gamma/x + 0.002323466569/sqrt(x)
For 3\leq x\leq 1000000000000:   |T_0(x)|\leq 0.5694485495/sqrt(x) 
For 11\leq x\leq 1000000000000:  |T_1(x)|\leq 0.02341873602/sqrt(x) 
For 2\leq x\leq 1000000000000:   |T_2(x)|\leq 0.3515595006/sqrt(x) 

...

B.

./checkmopt 1000000000000 2 1423 19341 77581

Let T_0(x) = m_2(x),
    T_1(x) = \check{m}_2(x) - 2,
    T_2(x) = \check{\check{m}}_2(x) - 4\log(x/2) + 4 \gamma.
For x\leq 1000000000000
	|T_1(x)|\leq 2.9/x + 0.01346398873/sqrt(x)
	|T_2(x)|\leq 5.77/x + 0.001812332853/sqrt(x)
For 1423\leq x\leq 1000000000000:	 |T_0(x)|\leq 0.3900551317/sqrt(x) 
For 19341\leq x\leq 1000000000000:	 |T_1(x)|\leq 0.02535769352/sqrt(x) 
For 77581\leq x\leq 1000000000000:	 |T_2(x)|\leq 0.01997968023/sqrt(x) 

------------

musum N M

finds the maximum of |\sum_{n\leq x} \mu(n)/n| \sqrt{x}
for all 3<=x<=N

It also computes M(x) = \sum_{n\leq x} \mu(n)  (Mertens' function)
for x = N and some smaller values of x (useful for double-checking)

M is optional; its default value is the floor of 4 \sqrt{x}

Example used in text:

./musum 100000000000000

Output (on a 12-core machine):

There are 24 threads
40180140 180180 30030
Mertens at 1000000000: -222
Mertens at 2000000000: 6556
Mertens at 3000000000: 7452
Mertens at 4000000000: -11741
Mertens at 5000000000: 4625
Mertens at 6000000000: -8485
Mertens at 7000000000: -6355
Mertens at 8000000000: 20001
Mertens at 9000000000: -1892
Mertens at 10000000000: -33722
...
Mertens at 99995000000000: -822297
Mertens at 99996000000000: -831739
Mertens at 99997000000000: -845428
Mertens at 99998000000000: -867455
Mertens at 99999000000000: -857397
Mertens at 100000000000000: -875575
Time spent: 2.17613e+06 seconds
M(100000000000000) lies in [-8.02621015245214e-09 -8.01187575050187e-09]
Final value of Mertens: 100000000000000 -875575
For 3<=x=1000000000000, |m(x)| sqrt(x) is at most
    0.569448504591237

--------------------------------------------
Since that example takes a couple of weeks, it may be interesting to run a small
example first:
         
./musum 1000000000000

The output then is:

There are 4 threads
Mertens at 1000000000: -222
Mertens at 2000000000: 6556
Mertens at 3000000000: 7452
Mertens at 4000000000: -11741
Mertens at 5000000000: 4625
Mertens at 6000000000: -8485
...
Mertens at 994000000000: 115945
Mertens at 995000000000: 62830
Mertens at 996000000000: 81978
Mertens at 997000000000: 91715
Mertens at 998000000000: 97398
Mertens at 999000000000: 53052
Mertens at 1000000000000: 62366
Time spent: 27866.7 seconds
m(1000000000000) lies in [5.42755378390405e-08 5.42767294251743e-08]
For 3<=x=1000000000000, |m(x)| sqrt(x) is at most
    0.569448472609122

Or also:
./musum 10000000000000

Output:

There are 4 threads
Mertens at 1000000000: -222
Mertens at 2000000000: 6556
Mertens at 3000000000: 7452
Mertens at 4000000000: -11741
Mertens at 5000000000: 4625
Mertens at 6000000000: -8485
Mertens at 7000000000: -6355
Mertens at 8000000000: 20001
Mertens at 9000000000: -1892
Mertens at 10000000000: -33722
...
Mertens at 9995000000000: 562936
Mertens at 9996000000000: 561048
Mertens at 9997000000000: 584933
Mertens at 9998000000000: 574029
Mertens at 9999000000000: 562252
Mertens at 10000000000000: 599582
Time spent: 252439 seconds
m(10000000000000) lies in [5.91807491757646e-08 5.91837502953072e-08]
For 3<=x=10000000000000, |m(x)| sqrt(x) is at most
    0.569448484907215
