the rank of the jacobian of modular curves: analytic methodskowalski/these.pdf · standing of...

171
THE RANK OF THE JACOBIAN OF MODULAR CURVES: ANALYTIC METHODS BY EMMANUEL KOWALSKI

Upload: others

Post on 12-Sep-2019

2 views

Category:

Documents


0 download

TRANSCRIPT

THE RANK OF THE JACOBIAN OF MODULARCURVES: ANALYTIC METHODS

BY EMMANUEL KOWALSKI

ii

Preface

The interaction of analytic and algebraic methods in number theory is as old as Euler,and assumes many guises. Of course, the basic algebraic structures are ever present inany modern mathematical theory, and analytic number theory is no exception, but toalgebraic geometry in particular it is indebted for the tremendous advances in under-standing of exponential sums over finite fields, since Andre Weil’s proof of the Riemannhypothesis for curves and subsequent deduction of the optimal bound for Kloostermansums to prime moduli.

On the other hand, algebraic number theory has often used input from L-functions;not only as a source of results, although few deep theorems in this area are proved with-out some appeal to Tchebotarev’s density theorem, but also as a source of inspiration,ideas and problems.

One particular subject in arithmetic algebraic geometry which is now expected tobenefit from analytic methods is the study of the rank of the Mordell-Weil group ofan elliptic curve, or more generally of an abelian variety, over a number field. Thebeautiful conjecture of Birch and Swinnerton-Dyer asserts that this deep arithmeticinvariant can be recovered from the order of vanishing of the L-function of the abelianvariety at the center of the critical strip.

This conjecture naturally opens two lines of investigation: to try to prove it, buthere one is, in general, hampered by the necessary prerequisite of proving analyticcontinuation of the L-function up to this critical point, before any attempt can bemade; or to take it for granted and use the information and insight it gives into thenature of the rank as a means of exploring further its behavior. This is justified by thetrust put into the truth of the conjecture.

Indeed, the first approach has been quite successful: in some cases, most notablylarge classes of elliptic curves over Q, analytic continuation is known, and partial resultstowards the conjecture have been obtained when the rank is 0 or 1.

On the other hand, even when this is so, the second approach has had the aestheticdisadvantage that most studies of the rank, whether based on the assumption of the fullstatement of the conjecture or on known cases of it, have also assumed other analyticfacts about the L-function, most notably that it satisfies the Generalized RiemannHypothesis. This is somewhat unsatisfactory, inasmuch as this appears to be a muchharder problem than even the Birch and Swinnerton-Dyer conjecture, although zerosof the L-function are of course very relevant to the problem.

The contribution of this thesis is to show that analytic methods and techniques canindeed provide sharp, unconditional answers to some of the questions thus raised. Thisdemonstrates that the implicit promise of the conjecture of Birch and Swinnerton-Dyer,of furnishing an effective way of answering questions about the rank through its analyticinterpretation, can be kept without additional assumptions.

The main results have been obtained in collaboration with Philippe Michel, andsome auxiliary propositions had been proved earlier in the course of other work with

ii

iii

William D. Duke.

This volume is organized in six chapters. The first contains an introduction tothe theory of abelian varieties and the Birch and Swinnerton-Dyer conjecture whichis the motivating problem, and ends with the precise statements of the two principaltheorems. The second chapter takes up the analytic side of the story. It recalls theresults of Eichler-Shimura and Gross-Zagier which make the link between the algebraicgeometry and modular forms, and provides an informal, but quite detailed, sketch of theproofs of the theorems. The extent of this first part, which is not original, stems fromthe fact that whereas the motivating problem lies in arithmetic algebraic geometry, analmost complete translation to a problem of analytic number theory is made, and thisproblem has intrinsic interest. Perchance, readers of both backgrounds will want tolook at this document, and a goal of the text is to give to all an understanding of theother side of the story.

The preliminaries over, at last, the process of proving is engaged with a stiff upper-lip. The third chapter contains a result about the “almost-orthogonality” of the sym-metric squares of modular forms which is crucial later for both results, and the fourthdeals with another aspect of this kind of orthogonality principle. Then the last twochapters take each theorem in turn. There are a number of similarities in the prin-ciples and in some of the steps of both proofs, but since they seem to hold the samevirtues of attraction and worth, the ordering is rather arbitrary and to accommodatethe random-minded reader whose interest would lie in only one of the two, a certainamount of redundancy has been introduced, or cross-references sometimes inserted.

A conclusion comes, not surprisingly, to conclude all this with some reflections aboutthe meaning of the results and possible developments.

iii

iv

Table of Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiNotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

1. Context and statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1. Abelian varieties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2. The Jacobian of an algebraic curve . . . . . . . . . . . . . . . . . . . . . 111.3. The modular curves and their Jacobians . . . . . . . . . . . . . . . . . . 18

2. The analytic side . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.1. Reducing to modular forms . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.1.1. Hecke theory, primitive forms . . . . . . . . . . . . . . . . . . . . 272.1.2. Eichler-Shimura theory and corollaries . . . . . . . . . . . . . . . 302.1.3. The Gross-Zagier formula and consequences . . . . . . . . . . . . 32

2.2. Sketch: the upper bound . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.3. Sketch: the lower bound . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3. Mean-value and symmetric square . . . . . . . . . . . . . . . . . . . . . 463.1. The symmetric square of modular forms . . . . . . . . . . . . . . . . . . 463.2. The mean-value estimate . . . . . . . . . . . . . . . . . . . . . . . . . . 483.3. Proof of the mean-value estimate . . . . . . . . . . . . . . . . . . . . . . 513.4. Notational matters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593.5. Removing the harmonic weight: the tail . . . . . . . . . . . . . . . . . . 60

3.5.1. Sketch of the idea . . . . . . . . . . . . . . . . . . . . . . . . . . 623.5.2. The tail of the series . . . . . . . . . . . . . . . . . . . . . . . . . 63

4. The Delta symbol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684.1. The Delta symbol for primitive forms . . . . . . . . . . . . . . . . . . . 684.2. The Delta symbol for odd primitive forms . . . . . . . . . . . . . . . . . 704.3. The Delta-symbol without weight . . . . . . . . . . . . . . . . . . . . . . 71Appendix: Multiplicativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5. The upper bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755.1. The explicit formula: reduction to a density theorem . . . . . . . . . . . 755.2. The density theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 825.3. The harmonic second moment . . . . . . . . . . . . . . . . . . . . . . . . 86

5.3.1. The square of the L-function . . . . . . . . . . . . . . . . . . . . 875.3.2. Computation of the harmonic second moment . . . . . . . . . . . 905.3.3. Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 915.3.4. Estimation of the harmonic second moment . . . . . . . . . . . . 92

iv

v

5.4. Removing the harmonic weight: the head, I . . . . . . . . . . . . . . . . 96

6. The lower bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1056.1. Non-vanishing in harmonic average . . . . . . . . . . . . . . . . . . . . . 105

6.1.1. Preliminary: a refined statement . . . . . . . . . . . . . . . . . . 1056.1.2. Computation of the first moment . . . . . . . . . . . . . . . . . . 1076.1.3. Computation of the second moment . . . . . . . . . . . . . . . . 1106.1.4. The preferred quadratic form, I . . . . . . . . . . . . . . . . . . . 1266.1.5. Harmonic non-vanishing . . . . . . . . . . . . . . . . . . . . . . . 134

6.2. Removing the harmonic weight: the head, II . . . . . . . . . . . . . . . . 1366.2.1. Computation of the first moment . . . . . . . . . . . . . . . . . . 1386.2.2. Computation of the second moment . . . . . . . . . . . . . . . . 1396.2.3. Mutations of the second moment . . . . . . . . . . . . . . . . . . 1396.2.4. The preferred quadratic form, II . . . . . . . . . . . . . . . . . . 1436.2.5. Optimization of the preferred form . . . . . . . . . . . . . . . . . 1446.2.6. The second part of the main term . . . . . . . . . . . . . . . . . 1486.2.7. The residual quadratic forms . . . . . . . . . . . . . . . . . . . . 1496.2.8. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

Appendix: Extending the mollifier . . . . . . . . . . . . . . . . . . . . . . . . 152Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

v

vi

Notations

We introduce here some basic notations.

• For any set X, |X| denotes its cardinality, a natural number if X is finite, and+∞ if it is infinite.

• If A is an algebraic variety defined over a field k and K is an extension of k, A(K)is the set of points of A with coordinates in K, also called K-rational points.

• Fq denotes a field with q elements (q = pn is the power of a prime number p),with Fp = Z/pZ for a prime p.

• If k/Q is a number field, Ok will denote its ring of integers. The norm of an ideala ⊂ Ok is written Na.

• In any group G, the subgroup generated by a subset H ⊂ G is denoted by <H>;if H = h is a singleton, we also write simply <h>.

• For any ring A, A× is the group of units of A.

• For any two objects X and Y of any category, X ' Y means that X and Y areisomorphic.

• The notation χD designates the Kronecker symbol of a quadratic field with dis-criminant D.

• If σ is an automorphism of a field K, the action of σ on x ∈ K is denoted by σ(x)as well as by xσ.

The following pertain to analytic number theory.

• The notation n ∼ N , for real numbers n and N , means N < n 6 2N .

• For any integer q > 1, εq is the trivial Dirichlet character modulo q, and ζq(s) =L(s, εq) is the corresponding L-function, the Riemann Zeta function with theEuler factors for p | q removed.

• We denote by τ , ϕ, µ, Λ the classical arithmetic functions, namely (respectively)the divisor function, the Euler function, the Mobius function and the Van Man-goldt function, so∑

n>1

τ(n)n−s = ζ(s)2,∑n>1

ϕ(n)n−s =ζ(s− 1)ζ(s)

∑n>1

µ(n)n−s = ζ(s)−1,∑n>1

Λ(n)n−s = −ζ′(s)ζ(s)

.

vi

vii

• We also introduce the multiplicative functions N and M defined by

N(r) =∏p|r

p, M(r) =∏p||r

p

(so N is the squarefree kernel).

• A summation over a family of Dirichlet characters of the form∑∗

χ

αχ

means that χ runs only through the primitive characters in the family. Similarly,a summation of the form ∑∗

n mod r

αn

means that x runs only through invertible classes modulo r, and when a fixedmodulus q is clear from the context ∑∗

n6N

αn

means that the summation is restricted to integers coprime with q.

• For z ∈ C, e(z) = e2πiz.

• For integers m > 0, n > 0 and c > 1, the Kloosterman sum S(m,n; c) is

S(m,n; c) =∑∗

x mod c

e(mx+ nx

c

)where x is the inverse of x modulo c. For n = 0, this is the Ramanujan sum whichhas another expression as

S(m, 0; c) =∑d|(m,c)

dµ( cd

).

in particular for (m, c) = 1, we have S(m, 0; c) = µ(c).

• For any s ∈ R, Js, Ks and Ys are the Bessel functions usually denoted this way,except when (as in [G-R], where Ns replaces Ys) some other notation is used.Among many formulae and representations, we recall:

Jn(x) =(x

2

)n∑k>0

(−1)k

k!(k + n)!

(x2

)2kfor n ∈ Z, [G-R, 8.440] (1)

=1π

∫ π

0cos(nθ − x sin θ)dθ for n > 0, [G-R, 8.411.1] (2)

Y0(x) = − 2π

∫ +∞

1cos(x

2(u+ u−1)

)duu

[G-R, 3.714.2] (3)

K0(x) =∫ +∞

1exp(−x

2(u+ u−1)

)duu

[G-R, 8.432.1] (4)

vii

viii

J1, K0 and Yn are the Bessel functions that will actually occur in the text, for realarguments x > 0. Notice how the integral representations of K0 and Y0 reveal asimilarity, not coincidental, with Kloosterman sums. Here are plots of the Besselfunctions J1 (upper left), K0 (upper right) and Y0.

• The notation1

2iπ

∫(σ)

f(s)ds

designates the integral over the line Re (s) = σ in the complex plane. When s isused as a complex variable, the notation σ = Re (s), t = Im (s) is sometimes usedwithout comments or reminder.

• We sometimes write log2 x = log log x.

Finally, here is the standing convention concerning the use of Vinogradov’s symbol and Landau’s O: the implied constant in both cases is meant to be absolute; thisis usually repeated when it appears in the formal statements of theorems, lemmas,propositions, etc. . . In case there are other parameters involved, say ε, ∆, we (usually)indicate the dependency of the constants by the subscript notations ε,∆ or Oε,∆( ).Sometimes it would be cumbersome to write down explicitly all dependencies, and theyare then meant to be understandable according to the context. The following rulesare strictly followed, when a parameter appears in the right-hand side of an inequalitywithout being present on the left-hand side, and without other precisions in the text: εand ε denote positive real numbers, with the meaning that for all ε > 0, or all ε > 0, theinequality holds (with an implied constant depending on ε, ε), while uppercase lettersA, B. . . (resp. δ) denote a large enough (resp. small enough) positive constant whichis signaled to exist such that the inequality holds.

The reader with a more algebraic turn of mind is encouraged to show good willtowards analytic number theorists and interpret such inequalities in the most reasonableway (provided it is correct and proves the result which is sought...)

viii

1

Chapter 1

Abelian varieties, Jacobian varieties, modular curves andtwo theorems

Je crois que je l’ai su tout de suite : je partirais sur le Zetace serait mon navire Argo, celui qui me conduirait a travers

la mer jusqu’au lieu dont j’avais reve, a Rodrigues,pour ma quete d’un tresor sans fin.

J.M.G Le Clezio, “Le chercheur d’or”

The motivating context of this work is the Birch and Swinnerton-Dyer conjecturefor abelian varieties over number fields, and therefore belongs to arithmetic algebraicgeometry, and similarly the final results, stated in that context, are more likely tointerest the experts in this subject. The methods which will be used to prove them,however, are those of analytic number theory, and the proofs may therefore be of moreinterest to analytic number theorists. This introductory chapter aims to give a simpleaccount of abelian varieties and to state the Birch and Swinnerton-Dyer conjecture inthe case which is considered here, for readers with little knowledge of algebraic geometry.It ends by the statements of the main theorems of this thesis. The next chapter willthen move to the side of analytic number theory.

1.1 Abelian varieties

There are probably few people interested in number theory nowadays who have notbeen exposed to some aspect of the theory of elliptic curves by expository articles orbooks ([Si1] being the standard reference). Abelian varieties are natural generalizationsof elliptic curves. The main difference in trying to present their definition and prop-erties is that, whereas elliptic curves can be fairly easily given by explicit polynomialequations, in characteristic different from 2 and 3 by the simple Weierstrass form

E : Y 2 = X3 + aX + b, (1.1)

(where a and b are such that ∆(E) = −16(4a3 + 27b2) is not zero), there does notexist a similar way of writing down concretely all abelian varieties of a given dimensionas the solution set of simple explicit polynomial equations. The definition is thereforenecessarily more abstract. The starting point is the property (already used, in concretespecial cases, by Diophantus) that the set of solutions of such a Weierstrass equation,with the addition of a point at infinity (in other words, the set of points on the ellipticcurve E in the projective plane), is a commutative group, the group law being describedby the beautiful geometric condition that three points P , Q and R on E add up to zero,P +Q+R = 0, if and only if P , Q and R are collinear. Abelian varieties are obtainedby taking the group law as the focus of attention.

2

Definition 1. Let k ⊂ C be a subfield of the complex numbers. An abelian varietyA defined over k is a smooth, irreducible, projective variety1 together with a fixedk-rational point 0 ∈ A(k) and an algebraic group law, also defined over k, namelyalgebraic morphisms

+ : A×A→ A

i : A→ A

(both defined over k) satisfying the usual axioms for a group, with i(a) the inverse(opposite) of a ∈ A and 0 as identity element.

Remark The epithet “abelian” carries a strong suggestion of commutativity, butno such assumption is made in the definition. It is actually the case that the group lawof an abelian variety is necessarily commutative (this is essentially a consequence of theprojectivity; indeed, there is no lack of non-projective and non-commutative algebraicgroups, such as GL(n), SL(n),. . . ), but the name itself is quite independent of this fact.

Concretely, recall that a projective algebraic variety A defined over k is a subset A ⊂PN for some integer N > 1, for which there are finitely many homogeneous polynomials(fi) in k[X0, X1, . . . , XN ] such that A is the set common zeros x = (x0, . . . , xn) of thefi’s:

x ∈ A ⇐⇒ fi(x) = 0, for all i;

and A is furthermore smooth if the Jacobian matrix2 ( ∂fi∂Xj)i,j is of maximal rank at

every point x ∈ A. To be an abelian variety, A must have the additional group structuredescribed in the definition. We only recall that an algebraic map is one given by rationalfunctions, at least “locally” (in the Zariski topology; this means that a given expressionfor the group law will only hold when some polynomials do not vanish; if it does,then some other formula will be valid, and both will coincide when none does!). Thecondition of existence of the identity element 0 ∈ A(k) is not completely innocuouswhen k is not algebraically closed, say when k = Q. The curve (considered by Selmer)

S : 3X3 + 4Y 3 + 5Z3 = 0

in P2 satisfies all conditions to be an elliptic curve, except that there is no solution(X,Y, Z) ∈ Q3.

With the geometric definition of the group law given above, the opposite of a pointP = (x, y) being −P = (x,−y), and the identity element 0 being the point at infinity(which is simply [0 : 1 : 0] in homogeneous coordinates in P2, and therefore is definedover Q, hence a fortiori over any field k ⊂ C), the curve E given by the Weierstrassequation (1.1) made homogeneous, namely

Y 2Z = X3 + aXZ2 + bZ3,

is an elliptic curve, provided ∆ 6= 0. It is defined over k if the coefficients a and b arein k (see [Si1, III]). The non-vanishing of the discriminant ∆ is equivalent here to thesmoothness.

1The “naıve” language of algebraic geometry is used here, so A is identified with its complex points.

2Not to be mistaken with the Jacobian variety to be defined later.

3

As a concrete example in higher dimension, we quote, a particular case of equationsfound on page 3.174 of [Mu2], the following affine equations

4x+ 8x2 + 32x3 + 160x4 + 160xy2 + 8y2 + 16yt = 1 + 8z2

24x2 + 64x3 + 288x4 + 1536x5 + 16xz + 192xyt+ 960x2y2

+96x2z + 640x3z + 64xz2 + 4z + 8z2 + 8t2 = 1 + 8y2 + 32y2z

(1.2)

which provide an embedding in C4 of a dense affine subset of an abelian variety ofdimension 2.

Other examples are given by observing that a product A×B of abelian varieties isanother abelian variety. The next section will explain a general method which associatesa certain abelian variety to any algebraic curve (its Jacobian variety), thus creating avery strong link between both theories.

We introduce some further vocabulary in connection with abelian varieties.

Definition 2. A morphism f : A → B of abelian varieties (defined over k) is amorphism (defined over k) of algebraic varieties which is also a group homomorphism.Morphisms can be composed, so there exists a category of abelian varieties over k.

An isogeny f : A→ B is a morphism of abelian varieties with finite kernel ker(f) =f−1(0); one says that A and B are isogenous. The relation “there exists an isogenyfrom between A to B” is an equivalence relation.

To get a feeling about abelian varieties, it is useful to look at them first as complexmanifolds. Let A be an abelian variety. By definition A ⊂ PN

C is a smooth algebraicsubvariety of some projective space and from the structure of complex manifold of theprojective space it acquires one itself. We denote by Aan this complex manifold tocarefully emphasize that it now carries the “usual” topology of the complex projectivespace, instead of the algebraic Zariski topology. In particular, Aan is compact, since Awas closed and Pn

C is compact. The group structure, being algebraic, is also analyticand therefore Aan is a compact, connected and commutative Lie group. Lie groupsof this type are quite easy to classify: it is a basic result that there exists a latticeΛ ⊂ Cd – that is a free Z-module of rank d – and an analytic isomorphism of complexLie groups (a fortiori of complex manifolds)

Aan∼→ Cd/Λ.

The integer d, naturally, is the dimension of the abelian variety d, it is the same asthe dimension which is defined algebraically for any algebraic variety.

So the complex points of A are analytically the same as a complex torus. If we areonly interested in the group structure of A, this solves the problem: A is isomorphic,as a group, to S1× . . .×S1, the product of 2d circles. Indeed, every lattice Λ ⊂ Cd hasa basis (ω1, . . . , ω2d), where the ωi are vectors in Cd which are R-linearly independent,such that

Λ = Zω1 ⊕ . . .⊕ Zω2d.

All this, of course, generalizes the well-known fact that every elliptic curve E overC “is” a torus, the quotient of C by a lattice Λ = Zω1 ⊕ Zω2, with ω1 and ω2 twoR-linearly independent complex numbers.

4

A very important difference occurs, however, when considering the converse asser-tion. Given a lattice Λ ⊂ Cd, it is natural to ask if there is a corresponding abelianvariety. This requires in particular that the quotient Cd/Λ has an algebraic structure;more precisely it is enough that there exists an holomorphic embedding f : Cd/Λ→ Pn

C

(of complex manifolds) of the torus inside some projective space.3 The existence of sucha morphism f is by no means obvious. It implies that there exist (many) meromor-phic functions on Cd/Λ, by taking the composition of rational functions, which aremeromorphic functions on Pn

C, with f . This means that there must exist meromorphicfunctions on Cd which are periodic with a group of period exactly equal to Λ. Thisexistence is a difficult problem of complex analysis.

When d = 1, this is always true, and indeed it is classically shown that every torusC/Λ is an elliptic curve by constructing the Weierstrass ℘ function of the lattice, whichis the meromorphic function on C given by

℘Λ(z) =1z2

+∑λ∈Λλ 6=0

( 1(z − λ)2

− 1λ2

)

and then showing that the map z 7→ (℘Λ(z), ℘′Λ(z)) provides an isomorphism of complexmanifolds (here, Riemann surfaces, since d = 1) between the torus C/Λ and the ellipticcurve E given by the Weierstrass equation

y2 = 4x3 − 60G4x− 140G6

where we have defined, for k > 1

G2k =∑λ∈Λλ 6=0

λ−2k ∈ C.

The corresponding assertion, however, fails completely when d > 2. Indeed, alreadyfor d = 2, one can construct complex tori C2/Λ on which the only meromorphic func-tions are the constants (in a certain sense, this is actually true for almost all higherdimensional tori, see [Mu1, page 36]). The necessary and sufficient conditions for agiven torus to be algebraic, and hence to “be” an abelian variety, were already foundby Riemann.

Theorem 1. (Riemann) Let Λ ⊂ Cd be a lattice. Then the torus Cd/Λ has a structureof a projective variety if and only if there exists a positive definite hermitian form Hon Cd such that E = ImH is integral-valued when restricted to Λ.

This condition appears, roughly, as one tries to construct meromorphic functionson the torus as quotients of entire functions f on Cd which are “automorphic” withrespect to the action of Λ on Cd , i.e. for which f(z + λ) is related to f(z) by somesimple transformation rule for λ ∈ Λ. Those are basically theta functions and somekind of integral-valued positive definite quadratic form is necessary to define them byseries.

3The sufficiency of this condition follows from Chow’s theorem, according to which every closedanalytic subvariety of a projective space is actually algebraic.

5

A bilinear form H as in the Theorem is called a polarization of the complex torus,or of the abelian variety.

Remark This theorem applies to the case d = 1 and proves again that all quotientsC/Λ are algebraic: let ω1 and ω2 be a basis of Λ, and put

H(z, w) =|ω1|−2

Im (ω2/ω1)zw;

then it is immediately checked that the bilinear form H is a polarization for the torusC/Λ.

This analytic side of the theory of abelian varieties is very useful for gaining insightsinto their geometric properties. Translating a given problem into one about complextori, for which an answer might be more easily derived, can help to guess what thesolution is in general. For instance, algebraic morphisms between two abelian varietiesA1 and A2 over C, with Aani = Cdi/Λi, correspond to C-linear maps f : Cd1 → Cd2

such that f(Λ1) ⊂ Λ2, etc. . . 4

The analytic theory can also be useful for arithmetic. For example, an easy con-sequence of the group structure of an abelian variety A is the determination of thestructure of its torsion points. Let A[n] be the group of n-torsion points of A (n > 1an integer),

A[n] = x ∈ A | nx = 0,

then by the above “uniformization” of A as a complex torus it follows that A[n] '(Z/nZ)2d. Now, if A is defined over a subfield k ⊂ C, the algebraicity of the grouplaw implies that there is an action of the absolute Galois group of k on A[n], by actionon the coordinates, and since A[n] is finite, it follows that points in A[n] have onlyfinitely many conjugates, so they are algebraic over k. If k is a number field, this groupA[n], with its action of the Galois group of k, is a very important arithmetic object,one of the main tools in the study of the arithmetic of abelian varieties. The analytictheory thus provides its basic structure as a group, which is in no way obvious fromthe algebraic viewpoint. It can also be proved algebraically – such proofs are of courserequired when considering similar questions for abelian varieties over fields of non-zerocharacteristic, such as finite fields.5

Coming back now to the arithmetic side of the theory, let A be an abelian variety,of dimension d, defined over a number field k. As is the case with all algebraic varietiesover k, a basic, and often deep and fascinating, problem of diophantine geometry is tounderstand the set of points of A with coordinates in k, denoted A(k): do points exist,and if yes, how many are there, how are they distributed, and sundry other questionsarise, and have arisen since the very beginning of mathematics. For an abelian variety,by the very definition there must be at least one rational point, the identity element0 ∈ A(k), and moreover, the set A(k) has actually a structure of abelian group with 0as identity. It could be however that A(k) is reduced to 0, and this certainly is oftenthe case.

4From now on, the use of the superscript an will be relaxed; it is hoped that it will be clear in thecontext whether an abelian variety is considered in its complex analytic or its algebraic guise.

5We have not really defined what this means.

6

From the analytic theory, we know that A(k) must be a subgroup of a product of2d circles, but this is very weak information in itself. The first basic theorem assertsthat there is a much stronger restriction.

Theorem 2. (Mordell-Weil) The group A(k) is a finitely generated abelian group.

This was first conjectured (apparently: the statement is quite obscure) by Poincare,then proved by Mordell, both for d = 1 (elliptic curves) and k = Q, before Weil provedthe (essentially) general case in his thesis. The group A(k) is called the Mordell-Weilgroup of A. For a proof of the theorem when d = 1, see [Si1, VIII]; the general case isvery similar in principle, although the algebraic geometry necessary for d > 2 is quiteharder ([C-S, ch. 5 and 6]).

Using the structure theory of finitely generated abelian groups, we can thus writeuniquely

A(k) = A(k)tors ⊕ Zr

where A(k)tors is the torsion subgroup of A(k), which is finite, and r > 0 is an integer,the rank of the Mordell-Weil group. By definition, r is also called simply the rank ofthe abelian variety, and denoted by rankA. It is the arithmetic invariant with whichthis thesis is concerned, in a special case. Note that r is zero if and only if A(k) isfinite.

Both the rank and the torsion subgroup are very subtle arithmetic invariants. Com-puting them explicitly, as one can gather by trying to play with some concrete equations,is extremely hard. An instance of this diophantine problem arises from the followingclassical question: which positive squarefree integers are the area of a right trianglewith sides of rational length? Such numbers are called congruent numbers. As the areaof the famous right triangle with sides 3, 4 and 5, n = 6 is congruent, but it is verydifficult to find out whether a given n is. It is elementary (though not obvious) that n iscongruent if and only if the elliptic curve En : y2 = x3−n2x over Q has infinitely manyrational points, namely if its rank is at least one, and ad-hoc solutions for specific valuesof n can be traced back to Fermat (n = 1, which is not congruent), Euler (n = 7, whichis) and beyond to Arab and Greek mathematicians (see [Kob] for a detailed treatment,including Tunnell’s spectacular characterization of congruent numbers, which gives aneasy way of finding out whether a given integer is a congruent number6).

The Mordell-Weil theorem, in a way, provides an upper-bound on the size of thegroup of rational points. The proof, however, yields little more information that can bedirectly used to obtain any form of tighter control of it. More than seventy years afterthe proof of this result, the most important, and most natural, questions about theMordell-Weil group remain unsolved. Let us take some time to digress on some exem-plary problems and the progress that has been made in a few millennia of mathematicalresearch.• The case of the torsion subgroup is (maybe) simpler. How large is the torsion

subgroup A(k)tors for given A? Or how large can it be for all A of a given dimension,defined over a given field? In the case of elliptic curves, there is a simple algorithm tocompute E(k)tors, using integrality properties of the coordinates of torsion points. Over

6This solution is “almost” complete: in one direction, it still depends on unsolved cases of the Birchand Swinnerton-Dyer conjecture.

7

Q, it was moreover conjectured already by Beppo Levi in 1908, and later by Nagell,still later again by Ogg, that only a finite number of torsion subgroups were possible(with an explicit list of all of them). This was proved by Mazur in 1977. For arbitrarynumber fields, but still for elliptic curves, Merel [Me] finally proved in 1995 that there isa bound N(f) such that for all elliptic curves E/k, where k is a number field of degreeat most f over Q, the bound |E(Q)tors| 6 N(f) holds. In higher dimensions, almostnothing is known, although one expects that similar uniform finiteness holds.

The rank r, on the other hand, remains mostly terra incognita.• Is there an algorithm to compute the rank of an abelian variety given explicitly

by equations? Or even for elliptic curves?The proof of the Mordell-Weil theorem is in large part effective, so that an upper-

bound can in principle be deduced for r from other “more accessible” invariants, but itdoes not yield an effective algorithm to compute, in guaranteed finite time, a basis ofthe free part of A(k). The problem is that there is no limit known on the “size” of thepoints in a basis.7 Therefore (except if enough independent rational points are obtainedto reach the upper-bound, which is extremely unlikely, given its size in comparison withwhat is expected to be true in general), there is no means of knowing when to stoplooking for possible “bigger” solutions, without some other information. This wouldbecome available if another invariant of A, known as the Tate-Shafarevitch group ofA, was known to be finite, but this is another major unsolved problem. However, thetechniques related to it yield an algorithm which is guaranteed to terminate and returnthe rank of any (explicitly given) abelian variety which has finite Tate-Shafarevitchgroup, so presumably works with any abelian variety (for all this, see [Si1, X]).

The size of the solutions can be notoriously very big: as an example, consider theelliptic curve E : y2 = x3 + 877x. Then Bremner and Cassels proved that E has rank1, and that a generator P = (x, y) has

x =(612776083187947368101

7884153586063900210

)2.

• Can the rank r be arbitrarily large for a given k, especially for elliptic curves Eover Q?

A positive answer in this special case would of course also settle the general case.While for some time it was thought (at least by some mathematicians) that the rankof elliptic curves over Q should be absolutely bounded, the opposite is now more oftenexpected. Neron, then Mestre and others, have had great success in constructing ellipticcurves with rather large rank. The current records are that there exist infinitely many(non-isomorphic) elliptic curves with rank at least 14 (due to Kihara), and one of rankat least 22 (due to Fermigier). Let us quote this example from [Si1, page 234]: thecurve

y2 − 246xy + 36599029y = x3 − 81199x2 − 19339780x− 36239244

has rank at least 12 over Q.• In general, given a family of abelian varieties, how does the rank vary among

members of this family?

7Measured, for instance (in the simplest case of Q), by taking the maximum of the numerator anddenominator of the affine coordinates of the solutions in some open affine subspace in the projectivespace containing A.

8

This question is rather imprecise, as we have not defined what is meant by a family.The congruent number curves y2 = x3 − nx defined above can be considered this way.Usually, the family will be some infinite set, and a typical setup is to define somepositive real parameter, say q, such that there are only finitely many elements of theset with parameter less than q, and then ask how the average rank among those finitelymany varieties, say r(q), varies when q gets large. There is also a very precise notion ofan algebraic family of abelian varieties which makes it possible to frame the problemin very precise terms, but it doesn’t cover all the interesting cases.

Those are in a way the most “basic” questions. Of course by making abelian varietiesinteract with other objects of arithmetic, more will arise. Let A/k be defined over anumber field; then in addition to the Mordell-Weil group A(k), there is a group A(K)for any extension field K/k. If the extension is finite, this is simply the Mordell-Weilgroup of A when considered as defined over K only, and as such it remains of finitetype. How does the rank of the groups A(K) vary when K runs over some family ofnumber fields? How fast is it increasing?

Considering the difficulty of getting hold of the rank of abelian varieties, it is re-markable that there should exist a completely different way of recovering it, which –if true – would yield tremendous insight into its properties, and into the arithmeticof abelian varieties in general. This is the content of the Birch and Swinnerton-Dyerconjecture, which may be interpreted as a form of local-global principle.

In number theory, and in diophantine geometry especially, the term “global” refersto properties of objects over a number field (fixed), say Q to simplify the discussion(see [Maz] for a recent survey of this kind of ideas), whereas “local” refers to thefact that many objects and properties of interest can be reduced modulo p for allprime numbers p, and then studied in this (usually simpler) context. Consideringall primes together may then help in the study of the global object. For instance,an integer solution x = (x0, . . . , xn) of an homogeneous polynomial equation f(x) = 0(f ∈ Z[X0, . . . , Xn]) gives a solution modulo p, x = (x0, . . . , xn) of the reduced equationf(x) = 0, f ∈ Fp[X0, . . . , Xn]. Thus, a necessary condition for the existence of a globalsolution x to this equation is the existence of one modulo p for every p (and also of onein R: this corresponds to the infinite place ∞ of Q). The converse to this statementis false in general (as an example, again, the Selmer curve 3X3 + 4Y 3 + 5Z3 = 0 haspoints modulo p for all p and points in R, but doesn’t have rational points). Whensome form of it holds, however, it is called a local-global principle, or a Hasse principle.The best known example is the Hasse-Minkowski theorem according to which, over anynumber field k, for any quadratic form

Q =∑i,j

a(i, j)XiXj

with coefficients a(i, j) ∈ Ok, there exists an x = (xi) with Q(x) = 0 if and only if forevery prime ideal p ∈ O (and for every archimedean completion k∞ of k), there existsa local solution xp = (xp,i) modulo p (respectively, a solution x∞ ∈ k∞). This giveseasily, for instance, a complete algorithmic solution to the problem of knowing whethera quadric over k has a rational point.

The Birch and Swinnerton-Dyer conjecture indicates a much subtler relationshipbetween the local and global arithmetic properties of abelian varieties over numberfields. The local properties it involves are those encoded in the Hasse-Weil zeta function

9

(also called simply the L-function). Let A/k be an abelian variety. The L-function ofA is an holomorphic function defined by an Euler product

L(A, s) =∏p

Lp(A,Np−s)−1

where p runs over all prime ideals p ∈ Ok and Lp is a monic polynomial, with integercoefficients, of degree equal to 2 dimA (for all but finitely many p, at most 2 dimAotherwise). This polynomial can be defined from a generating series for the number ofpoints of A in the finite extensions of the residue field Ok/p, or more directly from thecharacteristic polynomial of a Frobenius endomorphism acting on some finite dimen-sional Qλ-vector space, where λ 6= p is another prime ideal in Ok. A full descriptionwill be given in the next section in the case of the Jacobian of an algebraic curve whichis the case of interest in the rest of the thesis.

Conjecture 1. (Birch and Swinnerton-Dyer) Let A/k be an abelian variety definedover a number field k. Then the rank of the Mordell-Weil group A(k) of A is equal tothe order of vanishing of the L-function of A at s = 1:

rankA = ords=1 L(A, s). (1.3)

This is a beautiful conjecture,8 but not one which allows us to indulge very long incontemplating its beauties, because it presents some immediate difficulties. The troubleis that the L-function is defined by the infinite Euler product which is only convergentfor Re (s) > 3

2 . The special point s = 1 is not in this region, and the only way tomake sense of this statement is to assume beforehand that L(A, s) can be analyticallycontinued at least to some open subset of the complex plane containing 1. This is nota major difficulty in trying to understand the meaning of the conjecture, since it is astandard assumption that L(A, s) actually admits an analytic continuation to an entirefunction, but it is much more troublesome when trying to prove it, because this analyticcontinuation is known in a few cases only. Those are, basically, the elliptic curves overQ to which the methods of Wiles [Wil] and Taylor-Wiles apply (notably, all semi-stableelliptic curves, and in any case a significant proportion of all elliptic curves over Q),abelian varieties with complex multiplication (Deuring, Shimura, Taniyama. . . ) andthe Jacobians of modular curves, by Eichler-Shimura theory (see the next section andthe next chapter about the last, which will be the main focus of attention).

Let us now assume that L(A, s) is entire, in the more precise form postulating alsothe existence of a standard functional equation relating the value of the L-function ats and at 2− s. This then shows that the critical strip containing all non-trivial zeros9

of L(A, s) is the strip 12 6 Re (s) 6 3

2 , and especially that s = 1 is the real point on thecritical line Re (s) = 1. According to the Generalized Riemann Hypothesis for L(A, s),all non-trivial zeros of L(A, s) should lie on this line. Of course, it is to be expected

8As a matter of fact, this is only a weak form of the Birch and Swinnerton-Dyer conjecture, whichhas been refined to include in particular an exact formula for the leading term of the Taylor expansionof L(A, s) at s = 1, involving other arithmetic invariants of A such as the order – presumed to befinite. . . – of the Tate-Shafarevitch group of A, etc. . .

9The trivial zeros are those accounted for by the poles of the Gamma factors which have to beinserted in the functional equation; they are the integers −n, with n > 0.

10

that the point involved in an equality such as (1.3) should be on the critical line. Sinceabelian varieties of positive rank do exist, the corresponding equality with s = s0 inlieu of s = 1 would violate the Riemann Hypothesis if Re (s0) 6= 1. Thus the Birchand Swinnerton-Dyer conjecture is another indication that the rank is a very delicateinvariant: whether an L-function vanishes at some point of the critical line is no trivialmatter.

We now ask a few questions, which could be asked of any mathematical conjecture.First, what is the evidence available that it is true? Of course, there are some obviouscompatibilities which show that the conjecture can not be disregarded because of sometrivial matter: both sides, for instance (!) are integers greater than or equal to zero,and both are additive under products; also both are invariant under isogeny: if A andB are defined over k and f : A → B is a k-isogeny it is shown quite easily thatrankA = rankB, and without too much difficulty that the L-functions are actuallyidentical, L(A, s) = L(B, s) (the converse of this last statement, known as Faltings’sisogeny theorem, is also true, but is much deeper). More to the point, there is actuallysignificant evidence now, at least for the elliptic curves over Q, for which the analyticcontinuation is true. It started with the Coates-Wiles theorem, according to which anelliptic curve E over Q with complex multiplication, such that L(E, 1) 6= 0, has rank 0(that is, its Mordell-Weil group E(Q) is finite). Now, the works of Rubin, Gross-Zagierand Kolyvagin, all put together, show that for modular elliptic curves the equality (1.3)is true if the order of vanishing of L(E, s) at 1 is at most 1. But no clue seems to beavailable concerning the higher ranks r > 2.

A second natural question: is this conjecture useful, or a mere thing of beauty, tobe gazed and wondered at? Obviously, this will depend on the definition of “useful”that is adopted. Let’s see if it sheds some light on the different problems about therank that we mentioned earlier. Concerning the question of finding an algorithm, itwould seem to bring a solution: simply compute the values of the successive derivativesof L(A, s) at s = 1 until one is non-zero. However, in practical terms, this solutionrequires the ability to compute to a high degree of accuracy the value of the L-functionand its derivatives at s = 1. When the curve is given by a Weierstrass equation withlarge coefficients, this is by no means an easy matter: the numbers involved quicklybecome too large (for the known curves with high ranks, for instance, it can not bechecked today that their L-function vanishes to a high order).

Similarly, the conjecture yields no insight, even at the most heuristic or intuitivelevel, concerning the existence of elliptic curves over Q with arbitrarily large rank. Ex-perts in the analytic theory of L-functions have no reason to believe in either possibility.

Much more promising is the third problem of studying the behavior of the rank infamilies. Indeed, the study of the vanishing or non-vanishing of families of L-functionson average at various points is a standard subject in analytic number theory, and manymethods and results have been developed in this direction. The application of some ofthese to abelian varieties was initiated by Mestre [Mes], but the location of the specialpoint at the center of the critical strip complicates the analysis enormously and Mestre,as Brumer [Br1] and other investigators of this subject after him, had to appeal tothe Generalized Riemann Hypothesis, in addition to the other assumptions, to obtaininteresting results. Of course, this is considered a very safe assumption, as assumptionsgo, but it seems it must delay unconditional proofs along those lines until some quitedistant future.

11

At this point, which might be considered not highly satisfactory, it is good to re-member that despite sometimes stubborn indications to the contrary, equality is asymmetric relation and a = b might also mean b = a. And it turns out that the mostspectacular application of the Birch and Swinnerton-Dyer conjecture to date doesn’trely on it as a way of investigating the rank of an abelian variety, but as one of pre-dicting the existence of L-functions with rather large order of vanishing at the centerof the critical strip! This is the content of Goldfeld’s work [Gol], which culminated,using the theorem of Gross-Zagier, in the first non-trivial effective lower-bound for theclass number h(−D) of the imaginary quadratic field Q(

√−D), D > 1 a squarefree

integer. Roughly speaking, Goldfeld showed, using beautiful analytic techniques, thatif there existed one L-function, of degree 2, satisfying some standard assumptions, andvanishing at order r > 3 at the center of its critical strip, then there exists an explicitand absolute constant C > 0 (depending only on the specific L-function used) such that

h(−D) > C(logD)α(D) (1.4)

(where α(D) is very small (almost constant), precisely Oesterle has shown that one cantake

α(D) =∏p|D

(1−

[2√p]

p+ 1

)so α(D)ε (logD)ε).

Without the Birch and Swinnerton-Dyer conjecture, it might have been thought thatthe existence of such L-functions was very unlikely, but because of it and because someelliptic curves of rank larger or equal to 3 were already known, Goldfeld only neededit to be checked for any particular one of those high-rank curves to prove this lowerbound unconditionally. Gross and Zagier, using their theorem about the case of rank1 of the Birch and Swinnerton-Dyer conjecture, finally proved the required property ofsome of the rank 3 curves.

After Oesterle succeeded in computing a possible value of the constant C, the exactbound was found to be sufficient to solve the class number 3 problem of Gauss, namelyfinding all imaginary quadratic fields with class number equal to 3 (the class number 1problem had already been solved previously by Heegner, whose proof was mistakenlythought to be in error, then Stark and Baker independently, and the class number 2problem by Baker and Stark).

In this thesis, we will show that some of the results proved by Mestre and Brumerusing the Birch and Swinnerton-Dyer conjecture and the Generalized Riemann Hypoth-esis are within reach of current methods of analytic number theory, and in particularthat the progress towards the Birch and Swinnerton-Dyer conjecture makes it possibleto give lower bounds for the rank of some special abelian varieties which are uncon-ditional and sharp. Thus, the implicit promise of this conjecture, that it can be usedeffectively to study the rank of abelian varieties, can be kept today without additionalassumptions.

1.2 The Jacobian of an algebraic curve

The abelian varieties which we will study in detail belong to the special class of theJacobians of algebraic curves. Historically, it is actually through the Jacobians that

12

abelian varieties were first considered by Abel and Jacobi, and it was not before thetwentieth century that the general abstract definition above was formulated.

It is again quite easy to describe the analytic aspects of the theory, over C. Ellipticcurves were first implicitly studied when Euler, Fagnano, Legendre, Jacobi, and manyothers were studying the functions defined by so called “elliptic integrals”, for instance

K(λ) =∫ 1

0

dx√(1− x2)(1− λ2x2)

which is related to the elliptic curve y2 = (1 − x2)(1 − k2x2). Abel was the firstto attempt to develop a similar theory for integrals related to more general Riemannsurfaces, or algebraic curves of higher genus.

Let C be a compact Riemann surface of genus g: topologically, C is represented bythe usual picture of a doughnut with g holes, see the picture below. We always assumehere that g > 1 (the only compact Riemann surface with g = 0 is the projective lineP1

C; its Jacobian, inasmuch as it exists, is reduced to 0). The case g = 1 correspondsto a torus, and a curve of genus one over C is the same as an elliptic curve over C. TheJacobian appears when trying to compute integrals of holomorphic differential formsalong paths of C. We denote by Ω1(C) the vector space of holomorphic 1-forms onC. This is a finite dimensional vector space of dimension exactly equal to g, as hasbeen known since Riemann.10 For instance, if E is the Riemann surface associated to anelliptic curve with Weierstrass equation (1.1), Ω1(E) is isomorphic to C, and a generatoris the holomorphic differential ω = dx

y (notice the similarity with the integrand in theelliptic integral K(k)).

Instead of trying to integrate one form ω ∈ Ω1(C) at a time, Abel found that itwas best to consider all of them simultaneously. Fix a base point x0 ∈ C, and a basis(ω1, . . . , ωg) of the space Ω1(C). We want to build a map C → Cg

x 7→(∫ x

x0

ω1, . . . ,

∫ x

x0

ωg) (1.5)

but of course this is not well-defined as stated, because the value of any of the integralswill depend on the path of integration chosen between x0 and x. The difference betweenthe integrals along two different paths, however, can be written as an integral along aloop based at x0: ∫

γ1

ωj =∫γ2

ωj +∫γ1γ−12

ωj ,

where γi (i = 1, 2) is a path γi : [0, 1] 7→ C ending at x (γi(1) = x), and the compositionand inverse in γ1γ

−12 refer to the group law in the fundamental group of C, i.e this loop

is the concatenation of γ1 (going from x0 to x) and the reverse of γ2 (going back fromx to x0).

Consequently, if we let Λ denote the subgroup of Cg generated by the integrals alongall loops γ based at x0, (

∫γ ω1, . . . ,

∫γ ωg), then the abortive “map” (1.5) does become

10This can be taken as a rigorous definition of g. It is actually very convenient in the algebraic settingwhen the topological picture might not be available.

13

well-defined if we push the image to the quotient by Λ: C → Cg/Λ

x 7→(∫ x

x0

ω1, . . . ,

∫ x

x0

ωg) (1.6)

(taking the classes modulo Λ of the integrals). The Jacobian J(C) of C is this quotientCg/Λ, and the map is called the Abel-Jacobi map of C (based at x0).

As a matter of fact, this first definition remains unsatisfactory, because it dependson a particular choice of a base-point x0 ∈ C, and of a basis of Ω1(C). A more intrinsicdefinition is possible, but we need to define the first integral homology group of C. LetC1(C,Z) be the group of 1-chains in C: by definition, this is the free abelian groupwhose generators are the paths in C, i.e. the continuous maps λ : [0, 1]→ C. Elementsin C1(C,Z) are finite formal sums of the type

n∑i=1

niλi

where each λi is a path. Among chains, we distinguish the subgroup Z1(C,Z) of 1-cycles, which are the chains without boundary. The boundary homomorphism ∂ isdefined for a path λ by ∂(λ) = λ(1)− λ(0) – where the sum lives in C0(C,Z), the freeabelian group generated11 by points of C – and extended by additivity, and Z1(C,Z)is the kernel of ∂. If the path λ is a loop, so λ(1) = λ(0), for example, it is a 1-cycle.

Lastly, there is the subgroup of C1(C,Z) of 1-boundaries. This is the image of theboundary map from 2-chains, which are defined analogously as the elements of the freegroup generated by maps : [0, 1]× [0, 1]→ C; by “going along the boundary” of thesquare, one obtains a (formal) sum of four paths in C, and B1(C,Z) is the image ofall 2-chains by this morphism. Clearly, the boundary of a square is a loop, so there isan inclusion B1(C,Z) ⊂ Z1(C,Z). The first integral homology group H1(C,Z) is thequotient group

H1(C,Z) = Z1(C,Z)/B1(C,Z).

As an aside, we mention quickly the relation between this group and the – maybe– more familiar fundamental group π1(C, x0) of loops in C issuing from x0 ∈ C, withconcatenation acting as group law, taken modulo homotopies of loops. Quite obviouslythere is a surjective map π1(C, x0)→ H1(C,Z), and one can show (Hurewicz theorem)that the kernel of this map is the commutator subgroup [π1(C, x0), π1(C, x0)]: in grouptheoretic language, H1(C,Z) is the abelianization of π1(C, x0) – here the base-point isno longer of importance because of the commutativity.

To think of homology classes, consider a simple loop: it defines a homology class.When is it zero? The loop itself is a boundary if one can, in a way, “fill in” continuouslythe “interior” of the loop, painting something homeomorphic to a square, as it were.For instance, on an annulus Aa,b = z ∈ C | a < |z| < b, any circle of radius r,a < r < b will define a non-trivial homology class. But two circles with different radiusdefine the same (or opposite) class, since one can map a square to cover the annulusbetween them. Actually, H1(Aa,b,Z) ' Z, with the class of any such circle as generator.

11This is also known as the group of divisors of C, see below.

14

Now, coming back to the Jacobian, let V = Ω1(C)∗, the dual vector space of Ω1(C).The main point of the introduction of H1(C,Z) is that it is possible to integrate aholomorphic 1-form against an homology class λ ∈ H1(C,Z), extending the integrationof differential forms along a path. Putting aside matters of differentiability which mightcause trouble but don’t, this follows easily from Stokes’s theorem: it is enough to showthat if a path λ is the boundary of a square , then

∫λ ω = 0 for any ω ∈ Ω1(C). But

one has ∫λω =

∫∂ω =

∫dω = 0,

since locally ω = f(z)dz, with f holomorphic, hence dω = ∂f(z)dz ∧ dz = 0.Thus we can define a homomorphism H1(C,Z) → V

λ 7→ (ω 7→∫λω)

and let Λ ⊂ V be the subgroup of V generated by its image. The Jacobian of C isdefined to be J(C) = V/Λ.

The basic homology theory of surfaces shows that H1(C,Z) is a free abelian groupof rank 2g. The figure below shows a possible basis λ1,. . . ,λ2g: around each of the gholes there are two loops.

A Riemann surface and an homology basis of cycles

The map H1(C,Z)→ V above is shown to be injective, and this implies that Λ ⊂ Vis a lattice, so J(C) is a complex torus of dimension g. What is more, it is in fact anabelian variety. The reason for this can be seen using Riemann’s condition of Theorem 1:indeed, there exists a natural choice of a polarization H : V × V → C satisfying theconditions there. This is dual to the bilinear form Ω1(C) × Ω1(C) → C defined byintegration i

∫C ω ∧ ω′. Clearly this form H is hermitian, and positive definite since

i

∫Cω ∧ ω > 0

for all ω 6= 0 in Ω1(C). The necessary integrality property H(Λ × Λ) ⊂ Z comes fromthe compatibility of this bilinear form (and of the injective map H1(C,Z) → V ) withthe intersection pairing in homology. This is the form defined, on a basis as in thepicture, by simply counting (with orientation) the number of intersection points of twoloops, so it is clearly integral valued. In fact, with a suitable enumeration of the basis(λ1, . . . , λ2g) of H1(C,Z), this produces a form Λ× Λ→ Z with matrix(

0 −IdgIdg 0

)This definition J(C) = Ω1(C)∗/H1(C,Z), in particular, determines immediately the

cotangent space T ∗0 J(C) of J(C) at the identity element:

T ∗0 J(C) ' Ω1(C). (1.7)

15

As an exercise, take for C an elliptic curve C/Λ, and show that there exists acanonical isomorphism C ' J(C).

Riemann’s theorem shows that J(C) is an abelian variety, in particular an algebraicvariety. However, this description over C is analytic in nature: although it is somewhatexplicit, it works with complex integrals and such apparently transcendental constructs.If C/k is an algebraic curve defined over a subfield k ⊂ C, we know that J(C) can beembedded in some projective space as the solution set of some polynomial equations,and it is natural to wonder to which field the coefficients of those polynomials belong,and in particular: is J(C), by any chance, also defined over the field k? It is of courseof particular importance if one wishes to study the arithmetic of the Jacobian of somecurve to know that there is such a thing! If the Jacobian could only be defined overa transcendental extension of Q, there would be no notion, for instance, of rationalpoints on it. Andre Weil was the first to be confronted to this, as he was trying toextend Mordell’s theorem to the Jacobians of algebraic curves. It was felt more acutelyeven when trying to prove the Riemann Hypothesis for curves over finite fields: it wasnecessary to have a Jacobian variety associated to those curves, with properties similarto the usual ones over C (largely unstated yet), even though the analytic constructionmakes no sense in such a context. It was part of Weil’s achievement that he succeededin solving this problem by finding an appropriate, purely algebraic construction of theJacobian variety.12 We summarize this.

Theorem 3. Let C/k be an algebraic curve defined over a subfield k of C. Then theJacobian variety J(C) of C is an abelian variety defined over k. If C has a k-rationalpoint x, then the Abel-Jacobi map C → J(C) sending x to 0 is an algebraic morphismdefined over k.

If k is a number field, then J(C) has good reduction at every prime ideal p ⊂ Okwhere C has good reduction.

The key to this theorem is to find an algebraic description of the Jacobian, and infact one had already been discovered by Abel and Jacobi. Again, a few definitions arerequired. Let C be a smooth projective algebraic curve (over C, or any algebraicallyclosed field). The group Div(C) of divisors of C is, by definition, the free abelian groupgenerated by points x ∈ C. An element D ∈ Div(C) is thus a formal linear combination∑

x∈C nxx of points x ∈ C, with nx ∈ Z and only finitely many non-zero nx. There is adegree homomorphism deg : Div(C)→ Z defined by deg(D) =

∑x nx for D as above.

We let Div0(C) be its kernel. It is generated by the divisors of the form x− x0 for anyfixed x0 ∈ C, as a simple computation reveals.

Divisors arise naturally when looking at the zeros and poles of a non-zero rationalfunction and their multiplicity. Let f ∈ C(C) be a rational function on C, f 6= 0. Thedivisor of f , which is denoted div(f), is defined to be

div(f) =∑x∈C

ordx(f)x

(zeros of f minus poles, with multiplicity). Divisors of rational functions are calledprincipal. It is well-known that “there are as many poles as zeros”, or in other words

12Weil’s first proof only showed that the Jacobian was defined over a finite extension of the field ofdefinition of the curve; later Chow, then Weil by other methods, showed that it was defined over thesame field.

16

deg div(f) = 0 for all non-zero rational functions. What about the converse? Is it truethat if D ∈ Div(C) is of degree 0, there exists f with div(f) = D? The answer is “No”in general13, but there is a simple additional condition which ensures that a degree zerodivisor is principal, and it is related to the Jacobian.

Fixing a base point x ∈ C, we have an Abel-Jacobi map ι : C → J(C), which sendsx to 0 ∈ J(C). Since J(C) is a group, it is tempting to try to “add” points of C toget points in J(C) via this map, and of course Div(C) is just the group where it makessense (at least, formal sense) to add points of C, so ι extends to a group homomorphismι : Div(C)→ J(C). Now the Abel-Jacobi theorem states that ι is surjective, and thata degree zero divisor D ∈ Div0(C) is principal if and only if ι(D) = 0.

In the special case of an elliptic curve E, when J(E) ' E, and ι is the identity if0 is chosen as base-point, this recovers one of the first and most classical results of thetheory of elliptic functions ([Si1, VI-2.2]): if f is an elliptic function for a lattice Λ ⊂ C,then ∑

x∈C/Λ

ordx(f)x ∈ Λ,

where this time the sum is taken in C. The proof goes by integrating the holomorphicfunction z f

′(z)f(z) along the boundary of suitable fundamental domain for C/Λ; to prove

that ι(D) = 0 for D principal on C of higher genus, a similar argument can be used,but it is necessary to use a fundamental domain in the hyperbolic plane, a hyperbolicpolygon.

As a consequence of the Abel-Jacobi theorem, there is a bijection

J(C) ' Pic0(C)

where, by definition, the degree zero Picard group Pic0(C) of C is the quotient ofDiv0(C) by the group of principal divisors. With this identification, the Abel-Jacobimap ι is induced by the map which sends x ∈ C to the class of the divisor x−x0. Thereis a moral to this: the “formal” way of adding points on C is closely related to thatafforded by the group law of the Jacobian variety. In any case, the main point is thatthe Picard group, as well as the principal divisors, can be defined14 for any algebraicvariety over a field. To define the Jacobian variety without recourse to analysis, it isenough to find a way of putting the structure of an abelian variety on the abstract groupPic0(C)! This is by no means easy but it is essentially the way that Weil proceeded.As a starting point for the investigation of J(C), when enough machinery of algebraicgeometry is available, it can be surprisingly efficient to deduce properties of J(C) bysimply assuming that some abelian variety does exist which has this property (see [Har,pages 323–325]).

The following statement summarizes some notable geometric facts, which will notbe used but deserve mention.

Proposition 1. Let k be an algebraically closed field k ⊂ C.(1) Let C be a smooth projective algebraic curve of genus g > 1 over k. Then the

Abel-Jacobi map C → J(C) is a closed embedding.

13It is “Yes” when the genus is 0.

14With some care in the definitions.

17

(2) Let A/k be any abelian variety. There exists a curve C/k and a surjective mapJ(C)→ A (this can be used in many cases to reduce problems about abelian varieties tothe corresponding problems restricted to Jacobians; it is not true however that all abelianvarieties are Jacobians, and it is a very interesting problem to characterize those thatare among all abelian varieties).

(3) (Torelli’s theorem) Let C/k be a smooth projective algebraic curve of genus g > 1,and let (J(C), H) be the pair consisting of the Jacobian variety of C with its canonicalpolarization (see above). Then C is determined, up to k-isomorphism, by (J(C), H): if(J(C ′), H ′) is another such pair and there exists an isomorphism

J(C ′) ' J(C)

compatible with the polarizations (in an obvious sense), then C is isomorphic to C ′ overk. However, if one discards the polarizations, there exist non-isomorphic curves C andC ′ over C with J(C) ' J(C ′) as abelian varieties.

If C is an algebraic curve defined over a number field k, knowing that its Jaco-bian variety is also defined over k allows us to ask questions about its arithmetic. Inparticular, the rank of the Mordell-Weil group of J(C) is defined, and the Birch andSwinnerton-Dyer conjecture should apply to it. It is possible to express the L-functionof a Jacobian variety in terms of the original curve in a simple way. Let p ⊂ Ok be aprime ideal at which the curve C has good reduction, so the curve reduced modulo p isa smooth projective curve Cp over the residue field Fq = Ok/p with q = Np elements.The congruence zeta function Z(Cp) of this reduced curve is defined by the formalpower series

Z(Cp) = exp(∑n>1

|Cp(Fqn)|Tn

n

)(which is a way – eccentric, it might seem at first glance – of encoding the number ofpoints of Cp in all finite degree extensions of the residue field).

Using the Riemann-Roch theorem, Schmidt proved in 1931 (Artin had done it inspecial cases) that Z(Cp) is the Taylor expansion of a rational function. Following Weil’sproof of the Riemann Hypothesis for curves over finite fields, this rational function canbe written in the form

Z(Cp) =P1

(1−X)(1− qX)where P1 is a monic polynomial with integral coefficients, of degree 2g, such that thereciprocals αi of its complex roots (1 6 i 6 2g) satisfy the Riemann Hypothesis (provedby Weil in 1946, using heavily his algebraic theory of the Jacobian, in fact):

|αi| = Np1/2 = q1/2.

Comparing this expression with the definition of Z(Cp), one derives easily a formula,and a sharp asymptotic, for the number of points of Cp in finite fields:

|Cp(Fqn)| = qn + 1 +∑

16i62g

αni = qn +O(qn/2).

Theorem 4. (Weil) For all but finitely many p, the p-factor of the L-function ofthe Jacobian J(C) is equal to the polynomial P1 (in other words, the Hasse-Weil zetafunction of J(C) is equal to that of the curve C).

18

Remark 1. It is useful to notice here that, in the statement of the Birch andSwinnerton-Dyer conjecture, it is not necessary to use the complete L-function: if S isany finite set of places of k, and LS(A, s) is the Euler product extended only to primesp not in S, then – since the Euler factors themselves do not vanish in C – the orderof vanishing of L(A, s) is the same as that of LS(A, s). This is significant because inmany cases it is easier to determine the L-function up to finitely many “bad” primes.Such a change is also usually innocuous from the analytic point of view.

Remark 2. As a rather humorous observation, it is possible to give a very quickand simple definition of the Jacobian, by an abstract universal property; fix a basepoint x ∈ C, then J(C) is characterized by the following: it is an abelian variety suchthat there is a morphism ι : C → J(C) sending x to 0 ∈ J(C), which is universal formorphisms of C into abelian varieties sending x to 0: for every abelian variety A/kand every morphism f : C → A with f(x) = 0, there is a unique morphism of abelianvarieties ϕ : J(C)→ A with ϕ ι = f , or in other words the diagram

Cι−→ J(C)

f ↓ ϕA

commutes.

1.3 The modular curves and their Jacobians

The subject of this thesis is the study of the rank of some abelian varieties by analyticmeans. The only assumption that we are willing to admit at any point is the Birchand Swinnerton-Dyer conjecture itself, because we want to explore the possibilitiesit offers for such study. Furthermore, because our techniques will be analytic, it ispretty much necessary to consider an infinite family and not only an individual abelianvariety. As mentioned in Section 1.1, the restriction to varieties for which the Hasse-Weil zeta function is known today to be entire limits our choice. The most intriguing,certainly, would be the family (or a large subfamily, say the semi-stable ones) of modularelliptic curves: the behavior of the rank of elliptic curves over Q, on average, is asubject of many guesses and wonder, and any unconditional result would be of greatinterest. Unfortunately, the current state of knowledge of the analytic properties of theL-functions of elliptic curves is not sufficient to approach this case unconditionally. Sowe turn to one of the other families: the Jacobians of modular curves.

It is now time to define those curves and discuss the properties of their Jacobianswhich make them more amenable to today’s techniques of analytic number theory.

They are the curves defined over Q which are usually denoted by X0(q). Theparameter q, an integer q > 1, is called the level (or conductor). We start as usual bydiscussing them over C, i.e. by considering them as Riemann surfaces. The theory isthen very explicit.

Let H denote the Poincare upper half-plane,

H = z ∈ C | z = x+ iy, y > 0which is a model of the hyperbolic plane of constant negative curvature when equippedwith the Riemannian metric

ds2 =dx2 + dy2

y2.

19

The group G = GL(2,R)0 of 2 by 2 matrices with real coefficients and positivedeterminant acts isometrically on H by linear fractional transformations, namely(

a bc d

)· z =

az + b

cz + d.

The discrete subgroup SL(2,Z) < G of integer matrices with determinant 1 actsproperly discontinuously on H, and its quotient PSL(2,Z) by the center ±Id actsfaithfully. The Hecke congruence subgroup of level q, q > 1 an integer, is the subgroupof SL(2,Z) defined by

Γ0(q) =(a b

c d

)∈ SL(2,Z) | c ≡ 0 mod q

(in what follows, when referring to an element γ ∈ G, we always mean by a, b, c, andd the four coordinates as in this matrix). Hence Γ0(1) = SL(2,Z) and Γ0(q) is a finiteindex subgroup of SL(2,Z), with

[SL(2,Z) : Γ0(q)] = q∏p|q

(1 + p−1). (1.8)

The Riemann surface Y0(q) is by definition the quotient space Y0(q) = Γ0(q)\H,with the complex structure induced by the one on H (except for some subtlety at thefixed points of some γ ∈ Γ0(q), which are however finite in number (modulo Γ0(q)),for which the uniformizer has to be different from the identity function z in orderto avoid singularities, see [Sh1]). This quotient is a connected Riemann surface, butit is not compact, because of the presence of cusps. We only define what cusps aregeometrically: the best known picture in the theory (see [Se1, ch. 7]) is that of the“standard” fundamental domain F for SL(2,Z) acting on H: let

F = z ∈ H | −1 < Re (z) 6 1, |z| > 1 (> 1 if Re (z) < 0)

then F “represents” the quotient space SL(2,Z)\H. This means that for every z in Hthere exists a unique γ ∈ SL(2,Z) with γz ∈ F , and if z, w are in F and γ ∈ SL(2,Z)satisfies γz = w, then γ = 1.

It is obvious that F is not compact, because the y coordinate in F can go to infinity:this infinity is interpreted as a point on the boundary P1

R = R ∪ ∞ of H, and onesays that∞ is a cusp (the only cusp) of Y0(1). On the other hand, F has finite volume:since it is a hyperbolic triangle (with a vertex at infinity), the Gauss-Bonnet formulaimplies

Vol (F ) =π

3

where the volume refers to the (hyperbolic) measure

dµ(z) =dxdy

y2

which is invariant under the action of G. Now for any q > 1, since Γ0(q) is of fi-nite index in SL(2,Z), one sees easily that for any finite set (γj) of representatives of

20

SL(2,Z)/Γ0(q), the set⋃j γjF has the same properties of a fundamental domain for

Γ0(q). Hence it has finitely many cusps, the images of ∞ by the γj , and finite volume

Vol (Γ0(q)\H) =π

3[SL(2,Z) : Γ0(q)].

As a Riemann surface, X0(q) is obtained by compactifying Y0(q) by adjoining thecusps. It is useful to notice that the set of cusps – in this case – can be identified withP1

Q = Q∪∞modulo the natural action of Γ0(q). The uniformizer at a cusp is deducedby conjugation from the one for the cusp at infinity, which is q = e(z) = exp(2πiz).

It is usually possible, as far as analysis goes, to work almost exclusively with Y0(q),looking at the behavior at the cusps when needed, or even with H and the group action.

The interest of number theorists for curves such as Y0(q) or X0(q) came originallyfrom the remarkable properties of automorphic forms with respect to Γ0(q), which canbe thought of as objects living on the quotient space. We only require the special caseof forms of weight 2, but will give the definition for any (even) integer k > 2. Anautomorphic form (also called a modular form) f of weight k and level q is a functionf : H→ C which satisfies the automorphy relation

f(γz) = (cz + d)kf(z)

for all γ ∈ Γ0(q) and is holomorphic on H, and also at all cusps. We recall the meaningof the latter, for the cusp at infinity: from the automorphy relation for the specialelement (a generator of the stabilizer of ∞ in Γ0(q))

γ =(

1 10 1

)it follows that f(z + 1) = f(z). Hence there is a holomorphic function

g : z ∈ C | 0 < |z| < 1 −→ C

such that f(z) = g(e(z)) for all z ∈ H. The form f is said to be holomorphic at infinityif and only if g extends to a function holomorphic in the whole unit disc z ∈ C | 0 6|z| < 1. Writing its Taylor expansion around 0, we conclude that f has a so-calledFourier expansion at infinity of the form

f(z) =∑n>0

af (n)e(nz)

for some af (n) ∈ C. Then f is further said to vanish at infinity if af (0) = 0. A similardefinition, reducing by conjugacy to this case, holds for all cusps.

Definition 3. The vector space Sk(q) of cusp forms of weight k and level q is the spaceof all automorphic forms f which vanish at all cusps.

The link with the compactified curve X0(q), and with its Jacobian, is first hintedat in the next proposition.

Proposition 2. Let q > 1 be an integer. There is an isomorphism between the spaceof weight 2 cusp forms and the space Ω1(X0(q)) of holomorphic differentials on X0(q),given by

S2(q) → Ω1(q)f 7→ f(z)dz

21

Proof. For any holomorphic function f on H, the differential form f(z)dz is well-definedon H. But because of the chain-rule formula

d(γz) = (cz + d)−2dz

for all γ ∈ SL(2,R), it follows that f is automorphic of weight 2 if and only if f(z)dzis invariant by Γ0(q), hence descends to a differential form on the quotient Y0(q):

f(γz)d(γz) = f(z)dz.

Moreover, letf(z) =

∑n>0

af (n)e(nz) =∑n>0

af (n)qn

be the Fourier expansion of f at infinity. Then since dqq = 2πidz one has, in terms of

the uniformizer q at infinity

f(z)dz =1

2πi

(∑n>0

af (n)qn−1)dq

so that f(z)dz is holomorphic at infinity if and only if af (0) = 0. A similar reasoningholds at all other cusps. Conversely, starting from a holomorphic differential ω onX0(q), its pull-back to H must be of the form f(z)dz, and the same reasoning showsthat f must be in S2(q), thereby finishing the proof. 2

In particular, S2(q) is finite dimensional, of dimension equal to the genus of X0(q),which is also the dimension of its Jacobian variety. This can be computed explicitly,and we quote the result from [Sh1], Propositions 1.40 and 1.43:

dim Ω1(X0(q)) =112

[SL(2,Z) : Γ0(q)] + 1− 12

∑d|q

ϕ(

(d,q

d))− ν2(q)

4− ν3(q)

3(1.9)

where ν2 and ν3 are defined by

ν2(q) =

0 if 4 divides q∏p|q

(1 +

(−1p

))otherwise

ν3(q) =

0 if 9 divides q∏p|q

(1 +

(−3p

))otherwise

and this simplifies, for q prime, to the simpler formula

dim Ω1(X0(q)) =q + 1

12− 1

4

(1 +

(−1q

))− 1

3

(1 +

(−3q

)). (1.10)

Notice in particular that the genus tends to infinity as q tends to infinity, moreprecisely it grows like Vol X0(q):

g =q

12

∏p|q

(1 + p−1) +Oε(q12

+ε) =1

4πVol X0(q) +Oε(q

12

+ε) (1.11)

22

and for q prime the approximation is even better:

g =q

12+O(1). (1.12)

So for any fixed integer g > 1, there are only finitely many values of q with X0(q)of genus g. Here is a table listing the genus for q = 1 and for the first few primes:[

q = 1 2 3 5 7 11 13 17 19 23 29 31 37 41g = 0 0 0 0 0 1 0 1 1 2 2 2 2 3

(1.13)

The smallest value of q for which the genus is 1 (so X0(q) is an elliptic curve) is q = 11,and the largest is q = 49. There are, in principle, algorithms to find explicit polynomialequations for X0(q) (but they rapidly become intractable). All this is quite classical,and was extensively studied in the early twentieth century by Fricke, Klein, Weber andmany others. For q = 11, one finds the elliptic curve

X0(11) : y2 = 4x3 − 4.313x− 41.61

27;

the coefficients are rational, hence X0(11) is defined over Q. As we shall see a littlelater, this is a general fact.

The space S2(q) also acquires a Hilbert space structure with the Petersson innerproduct (f, g) defined by

(f, g) =∫

Γ0(q)\Hf(z)g(z)dµ(z)

(where the integration can also be performed on any fundamental domain in H forΓ0(q)). This is a very important analytical fact.

It is possible to write down an explicit generating set of cusp forms. They are thePoincare series Pm (m > 1 an integer) defined by averaging over the group

Pm(z) =∑

γ∈Γ∞\Γ0(q)

e(mγz)(cz + d)−2 (1.14)

where

Γ∞ =(1 b

0 1

)∈ SL(2,Z) | b ∈ Z

is the stabilizer of ∞ in Γ0(q).

That the Pm, m > 1, span S2(q) can be proved rather more easily than one mightexpect, using decisively the Hilbert space structure (see [Iw2, ch. 3]). Unfortunately,this generating set is infinite and there is no simple way of finding the relations betweenits elements. Nevertheless, it can be very useful.

In the next chapter, we will see how automorphic forms can be linked more deeply tothe arithmetic of X0(q). But before that we must show that this makes sense. It is againnot obvious that the Riemann surfaces X0(q), or the automorphic forms of level q, arearithmetic in nature. That this is so is seen by means of their modular interpretation15:

15The word “modular”, which comes from this source, refers to the terminology used by Cayley forparameters that can be used to classify algebraic varieties, in particular algebraic curves.

23

it is possible to interpret the points of Y0(q) as parameterizing isomorphism classes ofelliptic curves over C (i.e. simply quotients C/Λ of C by a lattice) with some additional“level q structure”. Since it is possible to speak about elliptic curves defined over anysubfield k ⊂ C, it is plausible that there is some arithmetic in this definition.

Let E = C/Λ an elliptic curve over C, q > 1 an integer. A level q structure on Eis a cyclic subgroup H of E of order q.

Proposition 3. Let q > 1 be an integer. There is a bijection between Y0(q) and the setof isomorphism classes of elliptic curves with a level q structure (E,H), where we saythat (E,H) is isomorphic to (E′, H ′) if there exists an isomorphism f : E → E′ suchthat H ′ = f(H). This bijection is induced by the map

τ 7→(C/(Z⊕ τZ), <q−1>

).

Proof. Consider first the case q = 1: then 0 is the only level 1-structure on any ellipticcurve, the set considered is simply the set, say Ell, of isomorphism classes of ellipticcurves over C. Each is isomorphic to C/Λ for some lattice Λ ⊂ C. Moreover, C/Λ1 'C/Λ2 if and only if there exists α ∈ C× such that Λ1 = αΛ2, where the multiplicationof a lattice by a complex number is defined in the obvious way. Hence ω1Z ⊕ ω2Z =ω1

(Z ⊕ τZ

)for τ = ω2

ω1. Since ω1 and ω2 are R-linearly independent, Im (τ) 6= 0.

Switching ω1 and ω2 if necessary we thus see that every elliptic curve is isomorphic toone of the form Eτ = C/Λτ with Λτ = Z⊕ τZ and τ ∈ H. This gives a surjective mapH→ Ell.

Now suppose Eτ ' Eτ ′ : this means there exists α ∈ C× satisfying

α(Z⊕ τZ

)= Z⊕ τ ′Z,

in particular there exist integers a, b, c and d such thatα = d+ cτ ′

ατ = b+ aτ ′

so, dividing out

τ =(a bc d

)τ ′.

Exchanging the role of τ and τ ′, we find that this integral matrix has an integralinverse. It follows that Eτ ' Eτ ′ if and only if there exists γ ∈ SL(2,Z) with γτ = τ ′

(the determinant can not be −1 because τ and τ ′ are both in H). Hence the abovemap yields the bijection of the proposition when q = 1.

For q > 2, the reasoning is the same, but it is necessary to keep track of thelevel q structure. First one shows that every (E,H) is isomorphic to one of the form(Eτ , <q−1>): let Λ be a lattice corresponding to E, take a generator ω of the subgroupH, and extend ω2 = qω, which is in Λ and non-zero by definition of a level q structure,to a basis (ω1, ω2) of Λ; then multiply by the inverse of ω1.

Therefore there is again a surjective map from H to our set of elliptic curves withlevel q structures. But the condition for the isomorphism (Eτ , <q−1>) ' (Eτ ′ , <q−1>)is more stringent than before, and becomes that there exists γ in the subgroup Γ0(q) <SL(2,Z) such that γτ = τ ′ (γ has to leave the subgroup <q−1> invariant, which meansthat it has to be upper triangular modulo q). This completes the proof. 2

24

This proposition makes the next one more believable.

Proposition 4. Let q > 1 an integer. The curve Y0(q) has a structure of a smoothaffine algebraic curve defined over Q. The compactification X0(q) has a structure of asmooth projective algebraic curve over Q. Moreover, Y0(q) has the following modularinterpretation: for every algebraically closed field k ⊂ C, the set Y0(q)(k) of k-valuedpoints of C is naturally isomorphic16 to the set of pairs (E,H), where E/k is an ellipticcurve defined over k and H is a cyclic subgroup of order q of E which is defined overk. If q > 3, the same holds for any field k ⊂ C. The points of X0(q) have a similarinterpretation, the points at the boundary being identified with generalized elliptic curves(certain non-smooth algebraic curves) with level q structures.

Remark Actually, rather more is true: it is possible to define “integral models”of X0(q) defined over the ring Z[q−1]; this is necessary to show that X0(q) has goodreduction at all primes p not dividing q (instead of merely at all but finitely manyprimes), and is therefore important for the fine arithmetic properties of X0(q). But wehave seen that for the Birch and Swinnerton-Dyer conjecture (in the form stated here,at least), this is not crucial.

As an example, take q = 1. The genus of X0(1) is 0 and X0(1)Q ' P1Q; the isomor-

phism is given by the classical j-invariant map, which analytically is the holomorphicisomorphism induced by the modular function j (see [Se1] for instance) given by theformula (involving Ramanujan’s ∆-function and the Eisenstein series E4 of weight 4 forSL(2,Z))

j :

H → P1

C

z 7→ 1728(60E4(z))3

∆(z)=

1q

+ 744 + 196884q + 21493760q2 + . . .

while for an elliptic curve E given by a Weierstrass equation (1.1), it is

j(E) =1728(4a)3

∆(E).

Finally comes the definition we had been aiming at all along.

Definition 4. Let q > 1 an integer. The abelian variety J0(q) is the Jacobian of thecurve X0(q); it is defined over Q of dimension given by (1.9).

The Abel-Jacobi map X0(q) → J0(q) is always understood to take the cusp ∞,which is shown to be a Q-rational point, to 0. Hence it is also defined over Q.

Remark Here and hereafter, the interest lies in large values of q; whenever J0(q)occurs, it is implicit that q is such that the genus of X0(q) is not zero (if q is prime,this means q = 11 or q > 17). Otherwise, take simply J0(q) = 0 for the other values,and the statements should not be too much in error.

The main results of this thesis are the following two theorems, proved in collabora-tion with P. Michel. They will appear in three papers [KM1], [KM2], [KM3].

16 Naturally means that if k ⊂ k′, then the inclusion Y0(q)(k) ⊂ Y0(q)(k′) corresponds to the obviousinclusion of classes of pairs (E,H).

25

Theorem 5. Assume the Birch and Swinnerton-Dyer conjecture.17 There exists anabsolute and effective constant C > 0 such that for any prime number q we have

rank J0(q) 6 C dim J0(q).

Theorem 6. Without any hypothesis, for any ε > 0, and for any prime number qlarge enough in terms of ε, it holds

rank J0(q) >(19

54− ε)

dim J0(q).

(The second result has been proved independently by J. Vanderkam [Vdk], with asmaller constant in place of 19

54).We will discuss the results previously known, and make a number of comments

in the next chapter, after explaining the link between these statements and resultsof vanishing or non-vanishing of automorphic L-functions and their derivatives at thecentral critical point.

As a first remark related to the arithmetic of J0(q), recall that there exists a mor-phism (defined over Q) X0(q) → J0(q) taking the cusp ∞ to 0. This induces in par-ticular a map from X0(q)(Q) to the Mordell-Weil group of J0(q). This might seem tobe a good way of producing points on J0(q), hopefully of infinite order. But this hopeis futile: although some rational points do indeed exist, Mazur showed that the onlyones are among the cusps, except for very few exceptions that he described. Moreover,Manin and Drinfeld have proved that the image of the cusps are all torsion points ofJ0(q).

Bibliographical notes

The following references can be read as sources for the material in this chapter:

• [Si1] contains a simple introduction to the basic theory of elliptic curves, over anyfield, as well as deeper analysis of the arithmetic of elliptic curves.

• [C-S] contains one survey article by M. Rosen about the analytic theory of abelianvarieties, and two by J. S. Milne about the algebraic theory and the theory ofJacobian varieties; the historical and bibliographical sketch at the end of Chapter7 is very interesting. One discovers that the notion of abelian variety, as we know(or learn) it is quite recent.

• [Mu1] starts with the discussion of the analytic theory and then goes to thealgebraic theory.

• [Sh1], [Miy] are two good references for the analytic theory of modular curves;Shimura’s book has then much more about the more algebraic side, includingthe very important Eichler-Shimura theory described in Chapter 2. Edixhoven’snotes [Edx] are a very good introduction to many aspects of the modular curvesX0(q), over Q and even Z[q−1].

17 More precisely: that the rank be smaller than the order of vanishing of the L-function.

26

• [Iw2] covers automorphic forms, emphasizing the point of view of analytic numbertheory.

• [Kob] is a nice introduction to modular forms and the Birch and Swinnerton-Dyerconjecture, using the congruent number problem as fil rouge.

• [I-R], especially Chapters 19 and 20 of the second edition, is a very good surveyof the Birch and Swinnerton-Dyer conjecture and of the progress made towardsit, with special attention to Goldfeld’s work on the class number problem.

• Finally, the theory and language of schemes is now the necessary prerequisiteto the deeper studies of algebraic geometry and its arithmetic applications; thestandard book for this is [Har].

27

Chapter 2

The analytic side of the theorems

Deux ou trois choses que je sais d’L. . .Un film de Jean-Luc Godard.

2.1 Reducing to modular forms

Why is it that the Jacobians of the modular curves X0(q) lend themselves to so precisean analysis of their rank, via the Birch and Swinnerton-Dyer conjecture, while otherabelian varieties do not? The answer lies in their intimate relationship with cusp formsof weight 2. It has already been noticed that there is an isomorphism between thespace Ω1(X0(q)) of holomorphic 1-forms, which is the analytic “building block” of theJacobian, and the space S2(q) of weight 2 cusp forms of level q. This is an analyticlink, but there is a much deeper arithmetic link which is the crucial starting point forthe proof. This is provided by Eichler-Shimura theory, which computes exactly theHasse-Weil zeta function of J0(q) in terms of some specific modular forms.

2.1.1 Hecke theory, primitive forms

We (very) briefly recall the basic theory of Hecke operators and Hecke forms. For anyn > 1, there is an Hecke operator T (n), which is an endomorphism of the Hilbertspace S2(q). The subalgebra of EndS2(q) generated by all T (n) is commutative, andmoreover the operators T (n) with (n, q) = 1 are normal with respect to the Peterssoninner product. Hence they are simultaneously diagonalizable. A Hecke form f ∈ S2(q)is defined to be a form which is a simultaneous eigenvalue of all those operators T (n),n coprime with q, so there is a basis of S2(q) consisting of Hecke forms. But onewould prefer having simultaneous eigenvalues of all the T (n), without exceptions: onlythe L-functions of those can have an Euler product and a good functional equation, forinstance. Unfortunately, as was discovered by Hecke, there does not always exist a basisof S2(q) whose elements have this property. Atkin-Lehner theory, however, providesa canonical subspace (usually large) for which this is true. There is an orthogonaldecomposition (for the Petersson inner product)

S2(q) = S2(q)old ⊕ S2(q)new

where the space S2(q)old of oldforms is generated by all the forms of type f(z) = g(dz),where g is a Hecke form of level q/δ for some δ | q, δ > 1, and d is a divisor of q/δ,and the space S2(q)new is simply its orthogonal complement. It is shown that there is acanonical basis S2(q)∗ of this space, characterized by the fact that every f ∈ S2(q)∗ isan eigenvalue of all the Hecke operators T (n), and its first Fourier coefficient is equal to1. Such forms are called primitive forms (to emphasize the close analogy with primitiveDirichlet characters).

28

We now list notations and facts which will be used extensively in the sequel. Letf ∈ S2(q)∗ be given. We write λf (n) for its Hecke eigenvalues, which also give theFourier expansion of f at infinity:

f(z) =∑n>1

n1/2λf (n)e(nz). (2.1)

The λf (n) are real algebraic numbers, and the Fourier coefficients n1/2λf (n) are evenalgebraic integers. Extracting this factor n1/2 from the Fourier expansion, althoughunsound from the algebraic point of view (for instance, the λf (n) generate an infiniteextension of Q, whereas the Fourier coefficients generate a number field) is convenientfor analysis, in particular when considering averages over forms of different weights.

Deligne’s bound (the former Ramanujan-Petersson conjecture) for the coefficientsof holomorphic cusp forms takes the form

|λf (n)| 6 τ(n) (2.2)

for all n > 1 (for weight 2 it is actually a corollary of the Eichler-Shimura formula (2.17)below and Weil’s proof of the Riemann Hypothesis for curves).

The Hecke L-function of f is

L(f, s) =∑n>1

λf (n)n−s; (2.3)

by Deligne’s bound it is absolutely convergent for Re (s) > 1. Hecke proved the analyticcontinuation and functional equation: let

Λ(f, s) =(√q

)sΓ(s+ 1

2)L(f, s) (2.4)

be the completed L-function. Then Λ(f, s) (hence also L(f, s)) has analytic continuationto an entire function and satisfies the functional equation

Λ(f, s) = εfΛ(f, 1− s) (2.5)

for some εf ∈ ±1 which is called the sign of the functional equation. Thus the criticalline for L(f, s) is the line Re (s) = 1

2 , and the center is at s = 12 . The opposite of the

sign −εf is also the eigenvalue of f for the Atkin-Lehner involution wq:

(f | wq)(z) := f(− 1qz

)= −εff(z).

If q is squarefree, εf can be computed from the Fourier coefficients by the followingformula

εf = −µ(q)q1/2λf (q). (2.6)

We say that a primitive form f ∈ S2(q)∗ is odd (resp. even) if εf = −1 (resp.εf = 1). We will write

ε+f =

1 + εf2

, ε−f =1− εf

2(2.7)

so f 7→ ε+f f is, in the basis S2(q)∗ of S2(q)new, the projection operator of the space of

primitive forms onto the space of even forms, and correspondingly with ε−f for the oddones. In particular, we have

(ε±f )2 = ε±f . (2.8)

29

The fact that f is primitive implies that its L-function L(f, s) has an expression asan Euler product, which is of degree 2:

L(f, s) =∏p

(1− λf (p)p−s + εq(p)p−2s)−1 (2.9)

and the product converges absolutely for Re (s) > 1. This Euler product representationis equivalent with the multiplicativity property of the coefficients λf (n): for any integersn > 1, m > 1,

λf (n)λf (m) =∑

d|(n,m)

εq(d)λf(nmd2

)(2.10)

and in particular λf is a multiplicative arithmetic function: λf (nm) = λf (n)λf (m) for(n,m) = 1. If δ divides q, furthermore, one has for every integer m > 1

λf (δm) = λf (δ)λf (m). (2.11)

The formula (2.10), by Mobius inversion, yields another useful formula

λf (nm) =∑

d|(n,m)

εq(d)µ(d)λf(nd

)λf

(md

). (2.12)

If p is a prime not dividing the level q, we factor the polynomial in the p-factor ofL(f, s) as follows

1− λf (p)X +X2 = (1− αpX)(1− βpX) (2.13)

(and sometimes use αp(f), βp(f) when the dependence on f is important). Thebound (2.2) is equivalent (for n coprime with the level) with the assertion that |αp| =|βp| = 1 for all p not dividing q. For p dividing the level, so the p-factor is of degreeat most 1, we let αp = λf (p), which is shown to be of modulus at most 1 (actually,smaller), and βp = 0 (it is also possible that αp = 0), so the polynomial representingthe p-factor is still expressed as (1− αpX)(1− βpX).

In addition we require the Dirichlet series expansion for the logarithmic derivativeof L(f, s). From the Euler product, using the factorization of the local factors, it follows

−L′

L(f, s) =

∑n>1

bf (n)Λ(n)n−s (2.14)

with coefficients given by

bf (n) =

0, if n is not a power of a prime,αmp + βmp , if n = pm.

(2.15)

Finally, assume that q is a prime number. Then, since X0(1) is of genus 0 (1.13),we have S2(1) ' Ω1(X0(1)) = 0, and hence S2(q)old = 0, S2(q) = S2(q)new. The set ofprimitive forms S2(q)∗ is therefore a basis of the whole space S2(q), and

|S2(q)∗| = dimS2(q) = dim J0(q).

This is the main simplification arising from the assumption that q is prime in The-orems 5 and 6; see also the Remark below.

30

We introduce the notationωf =

14π(f, f)

(2.16)

for any non-zero cusp form f ∈ S2(q) (we call this the “harmonic weight”) and define

the summation symbol∑h

by

∑h

f∈S2(q)∗

αf =∑

f∈S2(q)∗

ωfαf

for any family (αf ) of complex numbers. Because (f, f) is of size about VolX0(q), soabout g = dim J0(q), this behaves like a probability measure, i.e. we have∑h

f∈S2(q)∗

1 ∼ 1

as q tends to infinity (see the Petersson formula (2.29) for m = n = 1).

2.1.2 Eichler-Shimura theory and corollaries

Eichler-Shimura theory computes the L-function of certain quotients of the Jacobiansof modular curves by means of the L-functions of modular forms. A particular caseapplies to J0(q) when q is prime.

Theorem 7. Let q be a prime number. The L-function of the Jacobian variety J0(q)is given by 1

L(J0(q), s) =∏

f∈S2(q)∗

L(f, s− 12). (2.17)

Remark Actually, Eichler and Shimura only proved that for all but finitely manyp the p-factors of the Euler products on both sides coincide. Igusa showed that theexceptional p had to be among those dividing the level q, the fact that even those p-factors are the same is due to Carayol. Notice however (as mentioned in the previouschapter) that it would not matter to the applications to the Birch and Swinnerton-Dyerconjecture, in the form we take it, if the equality was known only for all but finitelymany primes.

This factorization is analogous to the factorization of the Dedekind Zeta functionof the cyclotomic field Q(ζn) (generated by the n-th roots of unity in C) in terms ofDirichlet L-functions,

ζK(s) =∏

χ mod n

L(χ, s).

A word about the proof of (2.17): it proceeds one prime at a time, as is natural.The main step is to find a relation between the Frobenius automorphism at p (andits powers) acting on the curve X0(q) modulo p and the Hecke operator Tp. Thisrequires the geometric interpretation of the Hecke operators, as well as the modularinterpretation of X0(q).

We deduce from the theorem:

1The shift of 12

occurs because of the normalization chosen for the Hecke L-functions, which havethe critical line at Re (s) = 1

2.

31

Corollary 1. Let q be a prime number. The order of vanishing of the L-function ofJ0(q) at s = 1 is the sum of the order of vanishing of the Hecke L-functions at s = 1

2 :

ords=1 L(J0(q), s) =∑

f∈S2(q)∗

ords= 12L(f, s).

If the Birch and Swinnerton-Dyer conjecture holds, then

rank J0(q) =∑

f∈S2(q)∗

ords= 12L(f, s).

This is of great consequence: we see that what the Birch and Swinnerton-Dyerconjecture purports to be the rank of J0(q) appears as an average of the order ofvanishing of automorphic L-functions at the central critical point. Now, the study ofthe average order of vanishing of various families of L-functions is an old and esteemedpart of analytic number theory: this particular case could well be studied independentlyof any motivation from diophantine geometry, although it does acquire stronger status,and a definite cachet, from this association.

Brumer was the first to study the rank of J0(q) along those lines [Br1]. He proposedthe following conjecture:

Conjecture 2. As q tends to infinity, the rank of J0(q) satisfies the asymptotic

rank J0(q) ∼ 12

dim J0(q).

This was justified by a simple heuristic, based on the signs of the functional equationsfor the automorphic L-functions: from (2.5) and the non-vanishing of the Gamma factorat s = 1

2 , it follows that the parity of the order of vanishing of L(f, s) is the same asthat of f . So whenever εf = −1, the L-function has a zero of order at least 1, andconsequently ∑

f∈S2(q)∗

ords= 12L(f, s) > |f ∈ S2(q)∗ | εf = −1|. (2.18)

But it is quite easy to prove that there are about as many odd forms as even ones: forinstance when q is squarefree, from the formula (2.6) the difference between the numberof even forms and that of odd forms is essentially the trace of the Hecke operator T (q),and one can appeal to the Selberg trace formula (or make a direct computation using themodular interpretation of X0(q) and the Lefschetz trace formula instead; see also thePetersson formula (2.29) for m = 1, n = q). Hence, on the Birch and Swinnerton-Dyerconjecture, the lower-bound

rank J0(q) >(1

2+ o(1)

)dim J0(q)

is immediate. Now the heuristic alluded to is that an L-function will only vanish with“good reasons”, such as the one imposed by the sign of the functional equation. Intheir outstanding majority, the forms f should behave in such a way that the order ofvanishing of L(f, s) at s = 1

2 is the smallest compatible with this symmetry condition:if f is even, L(f, 1

2) should be non-zero, and if f is odd, the derivative L′(f, 12) should

be non-zero. If this is true, then the conjecture follows.

32

Assuming in addition the Generalized Riemann Hypothesis for the L(f, s), Brumerproved, as a partial confirmation of this heuristic, that the order of magnitude of therank was not larger than predicted: in the simplest case with which we are concernedhere, for q prime, he obtained the bound∑

f∈S2(q)∗

ords= 12L(f, s) 6

(32

+ o(1))

dim J0(q),

and even managed to improve this on average over q. There has been much progress inthis direction recently: the constant 3

2 was improved to 2322 in [KM1], then (unpublished)

to 1 (assuming moreover the Riemann Hypothesis for Dirichlet L-functions), but Luo-Iwaniec-Sarnak [LIS] have gone much further to obtain a constant c < 1.

Going beyond 1 turns out to be quite significant: in a larger picture, it meansgetting beyond the first discontinuity of the Fourier transform of the density of the pair-correlation measure, and this is very strong evidence in favor of the general conjecturesof Katz-Sarnak about the distribution of zeros of families of L-functions [K-S], [Sar].

In view of Theorem 7, and since it allows the use of the Birch and Swinnerton-Dyer conjecture, in order to prove the upper bound for rank J0(q) of Theorem 5 itsuffices to establish the following theorem where all mention of algebraic geometry hasdisappeared.

Theorem 8. There exists an absolute and effective constant C > 0 such that for anyprime number q we have ∑

f∈S2(q)∗

ords= 12L(f, s) 6 C dim J0(q).

In contrast to this, the lower bound claimed in Theorem 6 scorns the assistanceof the Birch and Swinnerton-Dyer conjecture (as it had better do, since the simpleconsideration of the signs of the functional equations of Hecke L-functions would implyit instantly otherwise) and the beautiful work of Gross and Zagier in the rank 1 case iscalled for to bring such a translation to analytic number theory.

2.1.3 The Gross-Zagier formula and consequences

The following discussion of the remarkable formula of Gross-Zagier [G-Z] is clearlybiased towards its application to the the rank of J0(q), which is our primary concern,as it seems to clarify the flow of the argument. Also we sketch only the result requiredfor this special case, which is far from being the most general considered in [G-Z].

We are looking for a lower bound for rank J0(q): this means we want to prove theexistence of many independent rational points on J0(q). Using the modular interpre-tation of X0(q), and the group law on the Jacobian, a rather large supply of pointswill be found, one for each f ∈ S2(q)∗, say yf , but only defined over some auxiliaryquadratic extension K/Q, there being infinitely many choices of K leading to differentpoints yf . The yf are independent in J0(q)(K) if they are not torsion points, and acriterion is found to ensure this. It is stated, agreeably, in terms of non-vanishing ofthe derivative of the L-function of f (more precisely, of a lift of f to K). Finally forf odd with L′(f, 1

2) 6= 0, it is shown that yf + yf is a rational point which is still of

33

infinite order. This gives the reduction to a non-vanishing problem for special values ofL-functions.

First we construct the points yf , which are called Heegner points (see [Gro] formore details). We first appeal to the modular interpretation of X0(q) to constructsome points in X0(q). Let K be an imaginary quadratic field of discriminant D < 0 2

such that the prime q splits in K. This condition, in terms of the Kronecker symbolχD, is simply χD(q) = 1. This is a congruence condition on D, so there exist infinitelymany quadratic fields K to which this is applicable. Then q factorizes in K, qOK = pp,for some prime ideal p of OK . It follows that OK/p ' Z/qZ.

As an abelian group, OK is a free Z-module of rank 2, so it can be thought of asa lattice in C, and the quotient group C/OK is therefore an elliptic curve. This is anexample of what are called elliptic curves with complex multiplication, or CM ellipticcurves: those are the curves E with the property that the ring of endomorphisms ofE is larger than the subring Z which is obtained by considering the endomorphismsx 7→ nx of multiplication by integers n ∈ Z. Here, obviously, End(E) is isomorphic toOK . All CM curves can be obtained by a process slightly more general than the oneconsidered here.

Now, a priori C/OK is defined over C only, but from the theory of complex multi-plication ([Si2, II], or [Sh1]) we see that C/OK is in fact defined over the Hilbert classfield HK of K, which we recall is the largest abelian extension of K which is everywhereunramified over K. This field, by Class Field Theory, has degree h = h(D) over K, andmore precisely there is a canonical isomorphism (the Artin map)

σ : H(K) −→ Gal(HK/K)

between the Galois group of HK and the ideal class-group H(K) of K, which is inducedfrom the map on fractional ideals defined by multiplicativity from σ(p) = Frp for everyprime ideal p of OK , Frp being the Frobenius endomorphism at p, which is well-definedfor every p because the extension is abelian and everywhere unramified.

Let p be again one of the two prime ideals dividing the principal ideal qOK , and letp−1 be the fractional ideal in K inverse of p (p−1 = x ∈ K | xy ∈ OK for all y ∈ p).Then p−1 is another lattice in C, which obviously contains OK , and from OK/p ' Z/qZit follows p−1/OK ' Z/qZ. Hence, by Proposition 3, the pair (C/OK , p−1/OK) definesa point xK ∈ X0(q) (actually, Y0(q)). Again, the theory of complex multiplicationshows that this point is in Y0(q)(HK).

This gives us algebraic points on the modular curve, but the field HK has largedegree over C (by Siegel’s theorem, [HK : Q] = 2h(D) ε |D|

12−ε for any ε > 0,

the implied constant being ineffective). From an algebraic number x ∈ Q, a way ofgetting a rational number is to take the trace: by Galois theory, the sum

∑σ x

σ of alldistinct conjugates of x in Q is a rational number Q. We can not do the same on Y0(q),the sum doesn’t make sense, but of course we can do it on the Jacobian variety! Letι : X0(q) → J0(q) be the Abel-Jacobi map (with ι(∞) = 0). Then the Heegner pointassociated to K is

yK =∑

σ∈Gal(HK/K)

ι(xK)σ ∈ J0(q)(C) (2.19)

2So K = Q(√d) for some squarefree integer d < 0 and D = d if d ≡ 1 mod 4, D = 4d otherwise.

34

which, by linearity of the action of Galois on points of J0(q) and Galois theory, isactually a point yK ∈ J0(q)(K). Thus, we have descended to a quadratic extension ofQ, but by doing this trace, we might have unfortunately lost everything, if yK turnsout to be of finite order, and this is certainly possible.

For each K (satisfying the conditions described before), we thus obtain a single pointyK . But it is possible to “split” it wondrously, producing one point for each f ∈ S2(q)∗,by decomposing yK into its components under the action of the Hecke algebra on theJacobian J0(q). More precisely, let T be the subalgebra of End(S2(q)) generated by allHecke operators T (n), (n, q) = 1. Recall the isomorphisms

S2(q) ' Ω1(X0(q))

and (see (1.7))Ω1(X0(q)) ' T ∗0 J0(q),

which show that T acts on the cotangent space at 0 of the Jacobian variety. In fact,this action is induced (by differentiation) from an action of T on the Jacobian varietyitself, in a way compatible with these isomorphisms: in the description of the points ofJ0(q) given by the Abel-Jacobi theorem, as divisor classes, and with the identificationof points on X0(q) as pairs (E,H) of an elliptic curve with a level q structure, thedefinition is given by

T (n)(ι(E,<x>)) =∑G

ι(E/G,<x>)

where the sum runs over all subgroups of order n in E which have trivial intersectionwith H = <x>, the cyclic subgroup defining the level q structure (so if (n, q) = 1, Gruns over all subgroups of order n), and x is the image in E/G of the generator x of H,which is still of order q exactly (for more information on this, see [Edx] for instance).

Moreover (and this is quite clear from this last description) this action of T isrational over Q, in particular, it induces an action on the finite dimensional C-vectorspace V = J0(q)(K)⊗C obtained from the Mordell-Weil group of J0(q) over K.

Hence, for any algebra homomorphism λ : T −→ C, one can define the λ-isotypicalcomponent V λ of V :

V λ = y ∈ V | T (y) = λ(T )y, for all T ∈ T.

For instance, a Hecke form f gives rise to a morphism λf characterized by

λf (T (n)) = λf (n)

(remember that the T (n) generate T).It follows from the existence of the height pairing (described below), for which the

action on T is self-adjoint, that there is a direct sum decomposition

V =⊕λ

V λ (2.20)

of V into its isotypical components. In particular, given the point yK ∈ J0(q)(K), andthe corresponding element yK⊗1 ∈ V , one can define the point yf ∈ V , for any primitiveform f ∈ S2(q)∗, as being the λf -isotypical component of yK in this decomposition.

35

For us, the Gross-Zagier formula will give a criterion, in terms of L-functions, bywhich to determine whether a given yf is non-zero, hence contributes to the rank ofJ0(q). This is done by computing the canonical height of yf . In general, for any abelianvariety A/k over a number field, Neron and Tate have constructed a bilinear pairing

<·, ·> : A(k)×A(k) −→ C

which is positive, and has the property that the height h(x) = <x, x> of an algebraicpoint x ∈ A(k) vanishes if and only if x is a torsion point (see [C-S], or [Si1] for the caseof elliptic curves). Hence, for any extension field k′ of k, this Neron-Tate form inducesa positive definite hermitian form on the vector space A(k′)⊗C. Now we can state theGross-Zagier formula.

Theorem 9. (Gross-Zagier). With notations as above, let f ∈ S2(q)∗ be given. Put

LK(f, s) = L(f, s)L(f ⊗ χD, s).

Then

L′K(f, 12) =

8π2(f, f)hw2|D|1/2

h(yf ). (2.21)

Here 2w is the number of units in K and h = h(D) is the class number.

Suppose now that f ∈ S2(q)∗ is odd. Then, by the functional equation, the specialvalue L(f, 1

2) vanishes, and for any discriminant D with χD(q) = 1, the Gross-Zagierformula applies and gives

L′K(f, 12) = L′(f, 1

2)L(f ⊗ χD, 12) =

8π2(f, f)hw2|D|1/2

h(yf ). (2.22)

Now, it is known that among the discriminants D with χD(q) = 1 there existinfinitely many with

L(f ⊗ χD, 12) 6= 0.

This was proved first by Waldspurger, as a consequence of his celebrated theoremrelating this special value of the quadratic twist L(f⊗χD, s) to (the square of) a Fouriercoefficient of a half-integral weight modular form related to f by the Shimura lift, butit is much simpler to derive it from scratch by averaging techniques, similar in principleto the ones applied in Chapters 5 and 6 (if only the existence of those discriminants issought, it is possible to give a very clean and short proof, see [Iw1]).

Fix a discriminant D with this property. Then the formula (2.22), and the definite-ness of the height pairing h, imply that yf ∈ V is non-zero if and only if L′(f, 1

2) 6= 0.Moreover, a simple argument with Heegner points (see [Gro]) shows that the Atkin-

Lehner involution wq, which acts also on J0(q), acts on the Heegner points like complexconjugation, and it respects the decomposition into isotypical components.

In particular, since an odd form is an eigenfunction of wq with eigenvalue −εf = 1,the f -isotypical component of the Q-rational point yK + yK in the vector space W =J0(q)(Q) ⊗ C ⊂ V is 2yf , which means that yf is actually in the subspace W . Fromthe decomposition of W into its isotypical components we see that the Gross-Zagierformula implies the unconditional inequality

rank J0(q) = dimCW > |f ∈ S2(q)∗ | W f 6= 0| (obviously)> |f ∈ S2(q)∗ | εf = −1 and L′(f, 1

2) 6= 0|.= |f ∈ S2(q)∗ | L(f, 1

2) = 0, L′(f, 12) 6= 0|

36

since L(f, 12) = 0 for any odd form f , and the order at 1

2 of an even form is even.So Theorem 6 is a consequence of the following non-vanishing theorem for the deriva-

tives of automorphic L-functions, and again all algebraic geometry has silently vanishedaway.

Theorem 10. For any ε > 0, and for any prime q large enough in terms of ε, we have

|f ∈ S2(q)∗ | L(f, 12) = 0, L′(f, 1

2) 6= 0| >(19

54− ε)

dim J0(q).

Remark 1. This theorem has been proved independently by J. Vanderkam [Vdk],using a different method, but with a smaller constant. Before, the same non-vanishingproblem was considered by Duke in [Du1], who obtained the lower bound

|f ∈ S2(q)∗ | L(f, 12) = 0, L′(f, 1

2) 6= 0| q

(log q)10

with some absolute implied constant. Of course, one can also consider the non-vanishingof the values L(f, 1

2) themselves: see Remark 3 about this problem.

Remark 2. The proof of this theorem can be extended immediately to applyto higher even weights k such that there are no weight k forms of level 1, namelyk 6 10 and k = 14. It can also be made to apply for weight 12 or > 16, by takinginto account, by inclusion-exclusion, the non-primitive forms arising from level 1. Itis then also possible, for all weights including 2, to go beyond prime levels q (at leastto squarefree levels). This becomes however technically very involved. For the non-vanishing of the special values L(f, 1

2), this is done in the forthcoming work of Iwaniecand Sarnak [IS1], [IS2]. An extension of the Theorem with respect to the level isactually, in a sense, of more interest than the extension to higher weights, because thelatter doesn’t have an interaction with algebraic geometry similar to that offered bythe Birch and Swinnerton-Dyer conjecture.

Remark 3. Numerically, one has 1954 = 0.35 . . . Notice that the only forms f to

contribute in the theorem are the odd ones, although we state the density among allprimitive forms. Therefore we say, with an obvious abuse of language, that “at least70 percent” of the odd forms have a zero of order exactly 1 at the critical point. Thiscompares quite favorably with the heuristic behind Brumer’s conjecture. There doesn’tseem to be any significance attached to the value of the constant, and this is in sharpcontradistinction with the corresponding problem of non-vanishing of the values of theL-functions themselves: in the recent work of Iwaniec and Sarnak [IS1] (see also [Sar]),it is shown (among other things) that for any ε > 0 and q large enough in terms of ε,one has

|f ∈ S2(q)∗ | L(f, 12) > (log q)−2| >

(14− ε)

dim J0(q). (2.23)

which says in particular, for the same reason as before, that about half of the L-functionsof even forms do not vanish at 1

2 (and are even as large as (log q)−2; implicit here isthe use of the remarkable fact that L(f, 1

2) > 0: this, which of course follows fromthe Riemann Hypothesis, is true unconditionally by the same theorem of Waldspurgerpreviously mentioned). Furthermore, they show3 that if one could prove (2.23) with

3This was their original motivation for studying this non-vanishing problem.

37

any better constant c > 14 , then it would follow an effective lower bound

L(1, χD) (logD)−2

for the value at 1 of the L-function of real primitive Dirichlet characters χD, or in otherwords there is no Landau-Siegel zero for real Dirichlet characters; this (yet unproved)statement is one of the major unsolved problems of analytic number theory. As aconsequence, by Dirichlet’s class number formula, an effective lower bound

h(−D)√D(logD)−2

would be obtained for the class number of imaginary quadratic fields Q(√−D) (compare

with Goldfeld’s bound (1.4), today the best known effective result).The absence of a barrier for the derivatives similar to that one-fourth for the L-

functions themselves (or, maybe more accurately, the fact that this barrier is locatedmuch further than at 1

4) can be explained, at the heuristic/conjectural level at least,by the prediction of Katz-Sarnak [K-S] that the density of the measure which shouldregulate the imaginary part of the “lowest lying” zero of L(f, s), f odd, vanishes at theorigin (to second order), while the corresponding one for even forms doesn’t.

Remark 4. And what about non-vanishing problems for higher order derivatives?This is a very different problem, as the heuristic showing that this should be a rareoccurrence already makes clear. Analytically, this can be seen from the fact that bycomputing the number of odd forms with L′(f, 1

2) 6= 0, we are already computing infact the number of forms with L(f, s) having a zero of order exactly one at s = 1

2 : thatis to say,

f ∈ S2(q)∗ | εf = −1 and L′(f, 12) 6= 0 = f ∈ S2(q)∗ | L(f, 1

2) = 0, L′(f, 12) 6= 0,

a fact which doesn’t generalize to higher orders of vanishing. As the sketch of the proofbelow will show, this means that to estimate from below

|f ∈ S2(q)∗ | ords= 12L(f, s) = k|

(for any k > 2) by the same methods, we would have to consider averages over forms fsuch that L(k−1)(f, 1

2) = 0, but this set is very mysterious and there is no good “spectralcompleteness” to perform the averages, compared to that which exists for all forms.

2.2 Sketch: the upper bound

We will give a fairly detailed sketch of the proof to orient the reader. This section is notcompletely rigorous but aims for an understanding of the underlying ideas. It providespointers to the complete proof in Chapter 5.

The principle is the same as the one used by Mestre and Brumer; simply, the analysishas to be done without appealing to the Riemann Hypothesis.

The goal is to bound the sum of the multiplicities of the zeros of the L-functionsL(f, s) at s = 1

2 . The standard strategy to do this, exploiting the holomorphy of theL-functions, is to overcount, counting zeros (with multiplicities) in a neighborhood of12 , which is nicely done analytically by means of a contour integration of the logarithmic

38

derivative of L(f, s). Then one tries to evaluate or estimate the latter. The delicatepoint is to be able to do this with a neighborhood which is not too large, so that evenwith the overcounting the result will not be ruined, while the uncertainty principle ofharmonic analysis makes it impossible in practice to take too small a neighborhood.

This is actually done by using the so-called “explicit formulae” of analytic numbertheory. This name refers to formulae of the form∑

L(f,ρ)=0

ψ(ρ) =∑p

λf (p)ψ(p) + other terms (2.24)

relating a sum over primes of the coefficients of the L-function, weighted by some testfunction ψ, with a sum over the non-trivial zeros of L(f, s), weighted by some integraltransform of ψ, the Mellin transform for instance (there are also other terms which arenot important, in this case). The first instance of this appears already in Riemann’smemoir on the Zeta function, and special cases were extensively used by analytic numbertheorists. The general form was presented by Weil for Dirichlet L-functions, and thenextended and applied by Mestre for abelian varieties.

One wants to take ψ such that ψ “approximates” the characteristic function of asuitable small neighborhood of 1

2 , say a circle of radius r0, while the sum over primesremains manageable. On the Riemann Hypothesis, of course, this is a problem of realvariables since the ρ are all on the line Re (s) = 1

2 . It is then not hard to predict thatr0 = (log q)−1 is the limit of the “localization” that one can achieve without gettingembroiled in much harder (and usually unworkable) analysis of the sum over primes,which would become “too long” to be dealt with. This barrier is analytic in nature: itis the expression of the uncertainty principle, and boils down to the simple fact thatthe Fourier transform of the delta function at 0 is the constant function 1.

On the other hand, the classical methods of analytic number theory show that thenumber N(f ;T ) of zeros ρ of L(f, s) such that

|Im (ρ)| 6 T0 6 Re (ρ) 6 1

satisfies the asymptotic

N(f ;T ) =T

π

(log

qT

2πe

)+O(log qT )

Hence, intuitively at least, one can expect that the number of non-trivial zeros ofL(f, s) with imaginary part less than (log q)−1 is – on average – absolutely bounded. Ifwe can reach the limit of the resolution afforded by harmonic analysis, and justify thisintuitive argument, it will be possible to deduce that there exists an absolute constantC > 0 such that, indeed,∑

f∈S2(q)∗

ords= 12L(f, s) 6

∑f∈S2(q)∗

N(f ; (log q)−1) 6 C|S2(q)∗|,

which is the statement we are looking for.4

4While if we took a circle of fixed radius, say 1, the number of zeros would be too large by a factorlog q.

39

This is just a preliminary analysis, which only says that the objective is not utterlyhopeless; it could apply, up to now, to any family of automorphic L-functions. But itis not a given either: the corresponding analysis applied to the Selberg trace formula(instead of (2.24)), with the thought of estimating the multiplicity of the eigenvaluesof the hyperbolic Laplacian on Γ0(q)\H (another major problem of analytic numbertheory), shows that current techniques are doomed to fail, because there are way toomany eigenvalues around a fixed λ = 1

4 + r2 > 0 (about r of them in the segment[r − 1, r + 1]) to apply this kind of harmonic analysis.

The challenge is now to prove that the expected behavior of the zeros in this smallneighborhood, on average over f ∈ S2(q)∗, holds. On the Riemann Hypothesis, it iseasy to find a test function which localizes as desired on the left-hand side of (2.24); soit is from the right-hand side that we must deduce the validity of this surmise. This ispossible because, in a certain precise sense, the Hecke eigenvalues λf (n) of the forms fare independent enough of each other. This property, a kind of spectral completeness,echoes the fact that the primitive forms are an orthogonal basis of the Hilbert spaceS2(q). When summing over f and exchanging the order of summation on the right, onerecovers in the inner summation the so-called “Delta symbol” of S2(q)∗:

∆(m,n) =∑

f∈S2(q)∗

λf (m)λf (n)

which, in the range of m and n involved here, is sufficiently close to the Kroneckerdelta-symbol δ(m,n) for this purpose.

But without the Riemann Hypothesis, there is much additional trouble with thezeros on the left-hand side: when their real part can vary, constructing a suitabletest function becomes a problem.5 This is solved by using a test function with apositivity property allowing us to drop the contribution to the sum of the zeros with|Re (ρ)− 1

2 | < (log q)−1 (their number should be such that this doesn’t affect the result),while the possible other zeros with larger real part are handled by means of a densitytheorem which says, roughly, that on average the L(f, s) have very few zeros ρ withRe (ρ) > 1

2 + δ, where δ is about (log q)−1 (see Theorem 14 for the exact statement).Density theorems of various kinds have a long history in analytic number theory, the

first one, for the zeros of ζ(s) when the imaginary part increases, going back to Bohrand Landau. They are extremely useful to avoid the use of the Riemann Hypothesis.The result we prove is among the more delicate ones, because it must be very efficientnear the critical line (other contexts, such as Linnik’s theorem on the least prime inan arithmetic progression, require on the contrary density theorems near Re (s) = 1for Dirichlet L-functions; the paper [KM2] also contains a density theorem of this typefor automorphic L-functions). Selberg [Sel] proved a result of the kind we need forDirichlet characters, and we borrow some of his techniques, especially a lemma whichreduces the theorem to a bound of the kind∑

f∈S2(q)∗

|M(f, 12 + δ + it)L(f, 1

2 + δ + it)|2 |S2(q)∗|(1 + |t|)B (2.25)

for some B > 0 (where δ = (log q)−1, t ∈ R, and M(f, s) is a partial sum of lengthq∆ for some ∆ > 0 of the inverse Dirichlet series L(f, s)−1). Coincidentally, sums

5The transform ψ is holomorphic, how can it behave like a characteristic function, even smoothed,of an open subset?

40

of the same general shape appear prominently in the proof of Theorem 6. In bothcases, the “independence” (or lack thereof) of the primitive cusp forms plays again anessential part. We refer to the discussion in the next section, or to the detailed proofin Chapter 5, for further discussion.

2.3 Sketch: the lower bound

Again, this section only provides a non-rigorous overview of the proof of Theorem 10.The statements and formulas written down here should not be taken at face value.

Theorem 10, which estimates odd forms with L′(f, 12) 6= 0, will be proved by com-

paring the asymptotics of two different moments of the values L′(f, 12), f running over

odd forms.Such a procedure is quite natural, since the theorem can be interpreted as a state-

ment about the distribution of the values of L′(f, 12) among odd forms f : this strategy

is a special case of the principle that a measure is completely determined by its mo-ments, and that furthermore partial knowledge of even a few of those moments alreadycontains significant information about it (a principle most famously expounded by Serrein studying the Sato-Tate conjecture from the known cases of analytic continuation ofthe symmetric powers L-functions). It was used in similar contexts in [Iw1], [Du1], forinstance.

In this case, we consider mollified first and second moments

M1 =∑

f∈S2(q)∗

ε−fM(f)L′(f, 12)

M2 =∑

f∈S2(q)∗

ε−f |M(f)L′(f, 12)|2.

where, for the time being, M(f) is simply any complex number.

Remark The last sum is very similar to the second moment (with the L-functionitself) considered for the proof of the upper bound (2.25); this coincidence is rather hardto explain. The analysis below applies equally well to this case, with little modificationsand actually a number of simplifications (in the former case the M(f) are actually fixedat the outset).

By Cauchy’s inequality, we have immediately

M21 6M2 × |f ∈ S2(q)∗ | f is odd and L′(f, 1

2) 6= 0|

hence, if M2 6= 0, the lower bound

|f ∈ S2(q)∗ | f is odd and L′(f, 12) 6= 0| > M2

1

M2(2.26)

for the desired cardinality. The choice of the weights M(f) is entirely at our disposal,subject only to M2 being non-zero, and one can of course try to optimize this choice soas to get the best possible bound. One may also prefer to dispense with it altogether,putting M(f) = 1. This seems indeed the most reasonable thing to do, until provedotherwise, but this will turn out to be inefficient: it leads to asymptotics of the type

M1 ∼ c1(log q)|S2(q)∗|M2 ∼ c2(log q)3|S2(q)∗| (2.27)

41

for some (explicit) positive constants c1 and c2, so (2.26) falls short of the goal by a factorlog q. Of course, this is not simply due to some clumsy dealing in the computations,since we have true asymptotics. Rather, this indicates (in itself, a nice fact to know)that while L′(f, 1

2) is of order of magnitude about log q on average for f odd, it isquite frequently as large as (log q)3/2. Also worthy of some notice is the fact that sincec1 > 0, it follows that L′(f, 1

2) is positive (still on average). This is of course individuallya consequence of the Riemann Hypothesis (a negative derivative at 1

2 where L(f, s)vanishes would mean that L(f, s) would be negative for s real a little larger than 1

2 ,and since it tends to 1 as s tends to +∞, there would have to be a zero not on the criticalline), and in this case it is unconditionally known from the Gross-Zagier formula (2.21)but it is also nice to see it come directly from the computation which leads to (2.27).

This brings us back to choosing M(f) non-trivially (and non-artificially: the summust be manageable) to improve, hopefully, on those asymptotics. The heuristic sug-gests to try to dampen the oscillations of L′(f, 1

2) by taking M(f) approximating theinverse L′(f, 1

2)−1. This practice also carries a long history in analytic number theory,first in Bohr-Landau [B-L], and later most notably in the works of Selberg. We selectM(f) as a sum

M(f) =∑m6M

xm√mλf (m) (2.28)

where M > 1 is a parameter, and the xm, m 6 M , are real numbers (not too large,with x1 = 1). Thus M1 is a linear form in the xm and M2 a quadratic form, and thevector (xm) will be chosen, inasmuch as it is possible, in order to optimize the lowerbound for this particular kind of mollifier (the role of M is different, but also crucial: itis expected that the longer the mollifier is, the better the resulting bound – M(f) canbetter approximate the inverse, in a sense –, while if M is too large, the sums becomeanalytically unmanageable; a correct range has to be found, the existence of which isnot entirely clear beforehand).

To handle the sums M1 and M2, we use the standard techniques of analytic numbertheory and the functional equation to express L′(f, 1

2), which is defined by analyticcontinuation, as a sum of two short partial sums of the Dirichlet series, and similarlyfor the square L′(f, 1

2)2. The partial sums must be of length√q so we obtain, roughly

speaking

ε−f L′(f, 1

2) ≈ 2∑n<√q

λf (n)√n

(log√q

n

).

(the formula we get is valid only for odd forms f , otherwise it spells 0 = 0).So, summing over m and f and exchanging the order of summation, we arrive at

2∑m6M

∑n<√q

xm√mn

(log√q

n

)∆−(m,n)

where ∆− is the Delta-symbol associated to odd cusp forms:

∆−(m,n) =∑

f∈S2(q)∗

ε−f λf (m)λf (n).

At this point the nature of the Jacobian variety, by means of the spectral complete-ness of the cusp forms, comes into play, as observed before in Section 2.2. The heuristic

42

is that ∆−(m,n) should behave like |S2(q)∗|2 δ(m,n), at least in some ranges of m and

n. Consider for instance the case of Dirichlet characters modulo q: by orthogonality ofcharacters, the corresponding expression is (for m, n coprime with q, say)∑

χ mod q

χ(m)χ(n) = ϕ(q)δq(m,n)

where δq(a, b) = 1 if a ≡ b mod q and 0 otherwise. Hence if m and n are both less thanq, this is exactly the Kronecker symbol (times the number of characters).

The case of cusp forms can be studied in two ways (at least): in one, the Selbergtrace formula is used to express this Delta symbol in closed form, and the “error term”is a sum of class numbers of definite binary quadratic forms of various discriminants(Hurwitz class numbers). This is the method in [Vdk]; it has the disadvantage that theclass numbers are not very easy to manipulate analytically, which restricts the efficiencyof the final result.

Another method requires to change a little bit the setting: there is a marvelous for-mula, called the Petersson formula which expresses the Delta symbol for an orthonormalbasis of S2(q) in a form which is very congenial to further treatment by methods ofanalytic number theory. Its proof is also much simpler than that of the Selberg traceformula (see [Iw2]); it relies heavily on the Hilbert space structure of S2(q), and on thePoincare series Pm which span it.

Since S2(q)∗ is only an orthogonal basis of S2(q), it must be normalized beforeapplying the formula, which gives therefore an expression for a weighted Delta-symbol

(see the end of Section 2.1.1 for the notation∑h

), namely

∑h

f∈S2(q)∗

λf (m)λf (n) = δ(m,n)− 2π∑

c≡0 mod q

1cS(m,n; c)J1

(4π√mn

c

)(2.29)

where J1 is a Bessel function, and S(m,n; c) a Kloosterman sum – a most delightfulsight to analytic number theorists. Moreover, the factor ε−f can be inserted using (2.7),and (2.6), keeping it within reach of hand. Hence the proof of Theorem 10 proceeds intwo steps: 6

(1) We first prove the theorem in a weighted version:∑h

εf=−1

L′(f, 12

)6=0

1 >(19

54− ε)

for any ε > 0 and q large enough, by using the method outlined above with weightedmoments

M1 =∑h

f∈S2(q)∗

ε−fM(f)L′(f, 12)

andM2 =

∑h

f∈S2(q)∗

ε−f |M(f)L′(f, 12)|2

6Similarly for the case of the upper bound.

43

(recall that∑h

behaves like a probability measure).

(2) We show how to “remove” the harmonic weight, getting the same bound for thenatural average. This is done by interpreting the Petersson norm (f, f) as a special valueat s = 1 (the edge of the critical strip) of the symmetric square L-function L(Sym2 f, s)of f , and by writing the natural sum as an harmonic one with this special value inserted∑

f

αf =∑h

f

4π(f, f)αf .

The removal of (f, f) goes by replacing it by a partial sum of the special valueL(Sym2 f, 1), and dividing this into two parts, a head and a tail. The head is short andis treated by refining the arguments leading to the weighted theorem of Step 1; the tailis shown to be small, in effect by applying the Lindelof Hypothesis, but precisely – sincesuch a tool is not at our disposal – by using a mean-value estimate for the symmetricsquare, which has the same power as the Lindelof Hypothesis on average over f (thefirst step is again used here). For more details on this last part of the argument, seealso Section 3.5.

What do we gain by using this roundabout route (apart from the scenery)? Mainly,flexibility on the side of the series of Kloosterman sums, which in the simple-mindedheuristic is supposed to be an error term. Actually the easy bound

J1(x) x

and Weil’s bound for Kloosterman sums

S(m,n; c) 6 τ(c)(m,n, c)1/2c1/2

suffice to estimate this series (say J (m,n)) quite sharply,

J (m,n) (m,n, q)(log(m,n))2 (mn)1/2

q3/2,

and if m and n are not too large, the heuristic of independence is easily justified.The flexibility will be especially useful when, for the second moment, it will turn out

that in the range we are working with (and which is imposed on us by the nature of theproblem) this heuristic doesn’t hold for odd forms: besides the diagonal term δ(m,n),a second term will emerge from the series of Kloosterman sums, of the same order ofmagnitude as the diagonal. Such phenomena have now been observed and exploited inmany different contexts, for instance by Duke, Friedlander and Iwaniec ([DFI], amongother papers) in their study of automorphic L-functions by the amplification technique,and also by Iwaniec-Sarnak and Luo-Iwaniec-Sarnak in the articles mentioned earlier.The case involved here is actually somewhat simpler (it does not go beyond the rangeof the large-sieve), but the treatment will nevertheless be quite involved.

After the Petersson formula has been applied, what appears is that if M is smallenough (M < q1/4 in this case) then, up to smaller contributions, M1 is a linear formin the xm which looks like

M1 =∑m6M

xmm

(log√q

m

)

44

and M2 a quadratic form which looks like

M2 =∑b

1b

∑m1,m2

τ(m1m2)m1m2

xbm1xbm2

(log

q

m1m2

)3+ other terms.

The problem of performing the choice of xm to optimize the bound (2.26) is then,theoretically speaking, a “simple” problem of linear algebra. However, it is by nomeans easy to solve it as explicitly as we want – a closer look at the quadratic formprovides some indication. It is necessary to diagonalize M2, as much as possible, andthe computations, if left to grow unchecked, can quickly become unwieldy. However,after some fancy footwork, it is done.

To see quickly how this will give a positive proportion (the precise value, in thiscase, seems very hard to predict beforehand, although it is a fact that it is the sameas the proportion of simple zeros of the Riemann zeta function obtained by Conrey,Ghosh and Gonek !), we wave hands, doing as if the divisor function τ was completelymultiplicative, and putting log q simply in place of log

√qm (this retains the true order

of magnitude, as may be expected). Then

M1 ≈ (log q)∑m

xmm

andM2 ≈ (log q)3

∑b

1b

∣∣∣∑m

τ(m)m

xm

∣∣∣2 = (log q)3∑b

1b|yb|2

on doing the linear change of variable

yb =∑m

τ(m)m

xbm, for b 6M.

In terms of these variables, by Mobius inversion, M1 becomes

M1 ≈ (log q)∑k6M

µ(k)k

yk

(µ(k) arises as the Dirichlet convolution corresponding to ζ(s).ζ(s)−2 = ζ(s)−1). Takenow yk = µ(k) for k 6 M (this is optimal by Cauchy’s inequality; this confirms theintuition that the mollifier is closely related to the inverse, although the change ofvariable is needed to make it appear so transparently). Then if M is a positive powerof q, M1 has order of magnitude (log q)2 and M2 has order of magnitude (log q)4. Hencethe lower bound (2.26) gives a positive proportion. Notice indeed that M of smallerorder (a constant, or a power of log q), would not yield this. The value 19

54 arises as Mapproaches the limit q1/4 allowed in our analysis. In an Appendix to Chapter 6, wesketch a way of lengthening the mollifier up to q1/2, following ideas due to Iwaniec andSarnak [IS1]. The proportion of non-vanishing which comes out of this is 7

16 instead of1954 .

Remark A concrete value of xm will be chosen to derive Theorem 10. It would bepossible to start the computations immediately by plugging those values and ignoringthe intermediate step of arriving at them, or even to try to guess what they are (since

45

the heuristic of the inverse is basically correct). However, this would be quite artificialand misleading. The computations involved in the optimization are quite clean, thanksto changes of variables which are not obvious at the start and might be missed inanother derivation.

Also7, in cold logic, the “two-step” process of treating the harmonic counterpartsof M1 and M2 first and then removing the weight, is redundant: the straight roadto the goal would be to study the sums with a short partial sum of the symmetricsquare (of length N) inserted, deduce from this that the head of the weight can behandled, then apply the case N = 1 to deduce the harmonic result, and use this withthe method of the next Chapter to show that the tail of the weight is likewise removable.But again, entangling the first and second moments with this extra partial sum fromthe start would complicate matters quite a bit, while the present derivation gives theopportunity to get acquainted with the kind of objects and terms which appear, beforegetting to the worst part of the argument.

Moreover, the proofs are arranged so as to maximize the amount of work done forthe harmonic moments which can be reused when the second step is started.

7This remark will be clearer after reading the next Chapter.

46

Chapter 3

A mean-value estimate for the symmetric square ofmodular forms

“The Romans,” Roger and the Reverend Dr. Paul de la Nuit weredrunk together one night, or the vicar was, “the ancient Roman priests

laid a sieve in the road, and then waited to see which stalksof grass would come up through the holes.”

Thomas Pynchon, “Gravity’s Rainbow”

3.1 The symmetric square of modular forms

The symmetric square L-function of a modular form f was introduced by Shimurain [Sh2]. It has been a very useful tool in many contexts of analytic number theory,and also, in its algebraic form, for algebraic number theory (for instance, it is crucialto Wiles’s proof of Fermat’s Great Theorem).

Roughly, the symmetric square L-function of f is the Dirichlet series associated tothe Fourier coefficients λf (n2) at the squares. However, to obtain an entire function,some correction has to be made.

Definition 5. Let q > 1 and f ∈ S2(q)∗ a primitive form of level q. The symmetricsquare L-function of f is the Dirichlet series L(Sym2 f, s) defined by

L(Sym2 f, s) = ζq(2s)∑n>1

λf (n2)n−s. (3.1)

We write ρf (n) for the coefficients of this Dirichlet series.

The analytic properties of the symmetric square were proved essentially by Shimura(completed by work of Gelbart-Jacquet [G-J], or Coates-Schmidt [CoS] for the func-tional equation).

Theorem 11. (Shimura) Let f ∈ S2(q)∗ be a primitive modular form. The symmet-ric square L(Sym2 f, s) admits an analytic continuation to an entire function. If q issquarefree,1 the completed L-function

Λ(Sym2 f, s) = π−3s/2qsΓ(s+ 1

2

)2Γ(s

2+ 1)L(Sym2 f, s)

satisfies the functional equation

Λ(Sym2 f, s) = Λ(Sym2 f, 1− s).

1 For q not squarefree, the “correct” symmetric square is not our L(Sym2 f, s), as the ramified primeshave to be treated more precisely. The functional equation is also more complicated to state.

47

The following lemma summarizes some properties of the coefficients ρf (n).

Lemma 1. For any n > 1 we have

ρf (n) =∑`m2=n

εq(m)λf (`2) (3.2)

λf (n2) =∑`m2=n

µ(m)εq(m)ρf (`) (3.3)

and in particular ρf (n) = λf (n2) for n squarefree. Moreover, L(Sym2 f, s) has an Eulerproduct expansion of degree 3

L(Sym2 f, s) =∏

(p,q)=1

(1− α2pp−s)−1(1− p−s)−1(1− α−2

p p−s)−1∏p|q

(1− α2pp−s)−1

where αp is as in (2.13). Finally, for all n > 1 we have

|ρf (n)| 6 τ(n)2. (3.4)

The last estimate is proved using Deligne’s bound |λf (n)| 6 τ(n) and the Eulerproduct.

We will need also estimates for L(Sym2 f, 1).

Lemma 2. For all q > 1 and all f ∈ S2(q)∗, we have

L(Sym2 f, 1) (log q)3 (3.5)

and if q is squarefree

L(Sym2 f, 1) (log q)−1 (3.6)

where the implied constants are absolute in both cases.

Sketch of the proof. The (deeper) lower-bound is the main result of [GHL]; the fact thatq is squarefree ensures that f is not a monomial form. The upper bound is much easierand well-known. For q squarefree (the case which will be used), it can be recovered asfollows: from the functional equation of L(Sym2 f, s), we derive an approximation ofthe symmetric square at s = 1 by a partial sum

L(Sym2 f, 1) =∑n6y

ρf (n)n−1 +O(q2y−1)

and taking y = q3 gives

L(Sym2 f, 1)∑n6q3

∑`m2=n

τ(`2)`m2

(log q)3

because the Dirichlet series ∑n>1

τ(n2)n−s =ζ(s)3

ζ(2s)

has a pole of order 3 at s = 1 (see Lemma 20).One can also improve this to L(Sym2 f, 1) log q, by using the methods of [GHL].

2

48

A natural question is to ask when two primitive forms f and g in S2(q)∗ can havethe same symmetric square L-function. From the definition of ρf (n), L(Sym2 f, s) =L(Sym2 g, s) is equivalent to λf (d2) = λg(d2) for all d > 1 which, in turn, is equivalentby multiplicativity to λf (d)2 = λg(d)2 for all d > 1. Since the Fourier coefficientsof f and g are real, this means that λf (n) = ±λg(n) for all n > 1. Intuitively, itis natural to expect that those signs ± correspond to a quadratic Dirichlet characterχ, so that f should be a quadratic twist of g (or more precisely, the primitive formassociated to a quadratic twist, since the twisted form itself may not be primitive).However, this is quite hard to confirm by a proof, although it seems to have beenwell-known for some time. The precise result we need is proved by D. Ramakrishnanin the Appendix to [D-K], using properties of the Galois representations associated toholomorphic modular forms (in the case of representations associated to elliptic curves,it is actually proved by Serre, at least for non-integral j-invariant, in [Se2, page 324]).We need a special case only.

Proposition 5. (Ramakrishnan). Let f ∈ S2(q)∗ and g ∈ S2(q′)∗ be primitive forms oflevel q and q′ respectively, with q and q′ squarefree. Then L(Sym2 f, s) = L(Sym2 g, s)if and only if q = q′ and f = g.

This is contained in the corollary to Theorem A of the appendix to [D-K]. The factthat the levels are squarefree ensures that f can not be a quadratic twist of g, since theconductor of a quadratic twist always has square factors.

3.2 The mean-value estimate

For our purpose, we only require a property of “almost-orthogonality” of the coefficientsof the symmetric square L-functions of the forms f ∈ S2(q)∗. It is implicitly containedin the second part of [D-K], where it was developed for other applications.

Proposition 6. Let q > 1 be any squarefree integer, and let N > q9 a real number.The inequality ∑

f∈S2(q)∗

∣∣∣∑n6N

anρf (n)∣∣∣2 N(logN)15

∑n6N

|an|2 (3.7)

holds for any finite family (an)16n6N of complex numbers (with an absolute impliedconstant).

We will spend a few minutes explaining the meaning of this statement. It is a weakerform of what would be a large-sieve inequality for the symmetric square L-functions.The classical large-sieve inequality, in its sharp multiplicative form due to Gallagher,Bombieri and others, is∑

q6Q

∑∗

χ mod q

∣∣∣∑n6N

anχ(n)∣∣∣2 6 (Q2 +N)

∑n6N

|an|2 (3.8)

(the optimal form has Q2 +N − 1 in place of Q2 +N , but this has no consequence forapplications).

This has proved to be a very powerful and versatile tool in analytic number the-ory, essential for instance to the proof of the Bombieri-Vinogradov theorem about the

49

equidistribution of primes in arithmetic progressions, which in turn can replace theRiemann Hypothesis for Dirichlet characters in many applications.

To see that this inequality can be as strong as the Riemann Hypothesis on averageover characters, we apply it to the special coefficients an = µ(n) for n 6 N , so that theinner sum is

M(χ,N) =∑n6N

µ(n)χ(n).

There are about Q2 characters on the left-hand side. The large-sieve inequality gives

1Q2

∑q6Q

∑∗

χ mod q

∣∣∣M(χ,N)∣∣∣2 (

1 +N

Q2

)N

and we see that if N 6 Q2 then, in mean-square average over primitive characters ofconductor at most Q, M(χ,N) is bounded by

√N . This is even stronger than what

the Riemann Hypothesis for χ would imply, since it is equivalent with the estimate

M(χ,N)ε N12

for any ε > 0 and N > 1. Another way of saying this is that the large-sieve inequalityshows that for bounded coefficients an, the sums∑

n6N

anχ(n)

are of order of magnitude√N on average, if N 6 Q2, so there is as much cancellation

as can be expected.The analogue of (3.8) for the symmetric square would (essentially) be the inequality∑

f∈S2(q)∗

∣∣∣∑n6N

anρf (n)∣∣∣2 (q +N)

∑n

|an|2

for any real number N > 0. However, the statement of Proposition 6 is quite far fromthis because it requires a much longer inner sum over n than there are modular forms2

f ∈ S2(q)∗. Sharp forms of the large-sieve have been proved for the Fourier coefficientsof modular forms by Iwaniec, for level one forms, with respect to the weight, andby Deshouillers and Iwaniec, for forms of fixed level q, with respect to the weight oreigenvalue for Maass forms, and also for fixed weight and level going to infinity. Thelatter result states that∑

f∈F

∣∣∣∑n6N

anλf (n)∣∣∣2 (

1 +N

q

)∑n

|an|2

where F is any orthonormal basis of the space Sk(q) of holomorphic cusp forms of weightk and level q (which is of dimension about q). Unfortunately, it is not possible todayto prove a corresponding statement when averaging over the level q, which requires to

2Using a variant of a trick of Viola and Forti, it would be sufficient to prove the proposition withN > q replacing N > q9 to obtain the last inequality.

50

restrict the outer summation to primitive modular forms (as in the case of Dirichletcharacters).

Another interpretation of the large-sieve inequality that is sometimes very fruitfulis to consider the inner sum

L(χ) =∑n6N

anχ(n)

as a linear form constructed from the values of a character, and then to employ (3.8)as a way to estimate how many times this can be “large”, say |L(χ)| > V . Indeed,let N(V ) be the number of primitive characters χ modulo q with q 6 Q such that|L(χ)| > V . By positivity we derive now

V 2N(V ) 6 (Q2 +N)∑n6N

|an|2.

In the case of modular forms, this technique is used for instance in [Du2] to give thefirst non-trivial upper-bound on the number of non-dihedral weight 1 forms of primelevel: the fact that they have associated Galois representations (by the Deligne-Serretheorem) with finite image is used to construct linear forms which are indeed “large”for such forms.

To conclude this section we state an easy corollary for the coefficients λf (n2) insteadof the ρf (n).

Corollary 2. Let N > q9 be a real number and (a(n))n∼N any complex numbers whichsatisfy

a(n) (τ(n) log n)A

n

for some constant A > 0. There exists a constant D = D(A) > 0 such that∑f∈S2(q)∗

∣∣∣∑n∼N

a(n)λf (n2)∣∣∣2 (logN)D

(with an absolute implied constant).

Proof of Lemma 31.The point is, of course, that the assumption on the an means that we are essentially

“on the line Re (s) = 1” (or beyond), and in this region the symmetric square behavesas the series ∑

n>1

λf (n2)n−s.

In exacting details, we have from (3.2)∑f∈S2(q)∗

∣∣∣∑n∼N

a(n)λf (n2)∣∣∣2 =

∑f∈S2(q)∗

∣∣∣∑n∼N

∑`m2=n

µ(m)εq(m)ρf (`)a(n)∣∣∣2

=∑

f∈S2(q)∗

∣∣∣ ∑`62N

ρf (`)a(`)∣∣∣2

wherea(`) =

∑qN`<m6

q2N`

µ(m)εq(m)a(`m2).

51

Now we derive from the assumption a bound

a(`) (N`)−1/2(log `)A,

(for some D > 0, with an absolute implied constant), hence the result on applying themean-value estimate of Proposition 6 to the coefficients a(`).

2

3.3 Proof of the mean-value estimate

The proof of Proposition 6 requires more detailed information about the symmetricsquare. This is best expressed in the language of automorphic representations. Wetherefore recall the translation between classical modular forms and automorphic rep-resentations on GL(2); a readable reference for this is [Gel], and a more recent accountis contained in [Bum].

There is an injective map f 7→ πf from S2(q)∗ to a certain subset of the set ofcuspidal automorphic representations of GL(2) over Q (see [Del], or [Gel]), and this mapis compatible with L-functions, in the sense that L(f, s) = Lf (πf , s), where L(πf , s) isthe Jacquet-Langlands L-function (complete with the Gamma factor at infinity), whichis defined in terms of representation theory (here and elsewhere in this section, Lf , forautomorphic-representation L-functions, denotes the finite part of such an L-function,and L is the complete one).

Moreover, Gelbart and Jacquet [G-J] have constructed a symmetric square map π 7→Sym2 π associating an automorphic representation of GL(3) to a cuspidal automorphicrepresentation of GL(2) (this is an instance of the Langlands functoriality principle).This representation-theoretic symmetric square is, for π = πf associated to a modularform f , related to Shimura’s symmetric square L-function in the way that one expects,namely (with the above definition) we have the identity

Lf (Sym2 πf , s) = L(Sym2 f, s)

(recall that q is squarefree; Shimura’s original definition had slightly different Eulerfactors at the ramified primes, and others are also needed for q non-squarefree).

We now begin the proof of the mean-value estimate. Observe first that q beingsquarefree implies that all the symmetric squares Sym2 πf , f ∈ S2(q)∗, are actuallycuspidal (this generalizes Shimura’s result that L(Sym2 f, s) is entire, and is neededfurther on). Indeed, Gelbart and Jacquet have shown that Sym2 π is cuspidal if andonly if π is not monomial, i.e. if π is not obtained from a Hecke character of a quadraticfield by automorphic induction. This is impossible here because the level of a monomialform is never squarefree.

The inequality we claim is equivalent to the estimate

||T ||2 N(logN)15

(with an absolute implied constant) for the norm of the linear operator

T : (an)n6N 7→(∑n6N

anρf (n))f∈S2(q)∗

52

where both the domain and range are finite dimensional Hilbert spaces (with the canon-ical hermitian form). We now make use of the duality principle (the norm of an operatoris the same as that of its adjoint), but in a somewhat unusual form for variety sake. Bygeneral Hilbert-space theory, we have

||T ||2 = ||TT ∗||

and TT ∗ is the linear operator (endomorphism) defined by

(αf )f∈S2(q)∗ 7→(∑

g

αgK(f, g))f∈S2(q)∗

where the “kernel” K is (remember that λf (n), hence ρf (n), is always real)

K(f, g) =∑n6N

ρf (n)ρg(n).

Now we conclude immediately from this that

||T ||2 6 Maxf∈S2(q)∗

∑g∈S2(q)∗

|K(f, g)|,

and it is enough to estimate the sums K(f, g).Actually, one can be analytically more efficient and show the alternate bound

||T ||2 6 Maxf∈S2(q)∗

∑g∈S2(q)∗

|k(f, g)| (3.9)

where the modified kernel k is obtained by choosing (once and for all) a smooth testfunction ψ : [0,+∞[−→ [0, 1], compactly supported in [0, 2] and identically equal to 1between 0 and 1, and taking a smoothed version of K:

k(f, g) =∑n>1

ρf (n)ρg(n)ψ(n/N)

(we leave the proof of this to the reader, or refer to [D-K] where a more classicalderivation is given).

We will study the sums k(f, g) by studying the analytic properties of the Dirichletseries

Lb(f ⊗ g, s) =∑n>1

ρf (n)ρg(n)n−s

(which might be called the “bilinear” convolution of the symmetric squares) and ex-pressing the sums as Mellin transforms.

The necessary properties of Lb are consequences of a result which compare it to theRankin-Selberg convolution of Sym2 πf and Sym2 πg. In complete generality, Jacquet,Piatetskii-Shapiro and Shalika have developed a theory of Rankin-Selberg convolutionsof automorphic representations on GL(m1) and GL(m2) ([JPS] and other papers) overarbitrary global fields. In particular, they have defined a corresponding L-function andstudied its properties (analytic continuation and functional equation). Some pointswhich they didn’t treat have been established by various other authors (among whomShahidi, and Moeglin-Waldspurger).

53

This allows us to consider the L-function L(Sym2 πf ⊗Sym2 πg, s) of the representa-tion-theoretic convolution of the symmetric squares Sym2 πf and Sym2 πg. 3 We willprove below a lemma comparing it with Lb, which we state in greater generality (asin [D-K]), since the proof is not harder.

For any automorphic representation π over Q, of conductor q > 1, we denote byλπ(n) the coefficients of the finite part of its L-function. We say that π satisfies theRamanujan-Petersson bound if the bound

λπ(n)ε nε

holds for any ε > 0. Because of the Euler product expansion, this will then actuallyhold uniformly in π (it means that the roots of the local factors of Lf , at the unramifiedprimes, are all of absolute value 1). For f ∈ S2(q)∗, this is Deligne’s bound, and bythe Euler product for the symmetric square, it is also true for Sym2 πf . The bilinearconvolution for automorphic representations π1 and π2 is defined as before,

Lb(π1 ⊗ π2, s) =∑n>1

λπ1(n)λπ2(n)n−s.

Lemma 3. Let π1 and π2 be automorphic representations of GL(n1) and GL(n2) withconductors q1 and q2 respectively, which satisfy the Ramanujan-Petersson bound. Thereexists an Euler product

H(π1, π2; s) =∏p

Hp(π1, π2; p−s)

where Hp(π1, π2) is a rational function for all p and a polynomial (of degree bounded bya constant depending only on n1 and n2) for almost all p, such that H(π1, π2) convergesabsolutely for Re(s) > 1

2 (in particular, has no poles in this region), and

Lb(π1 ⊗ π2, s) = H(π1, π2; s)Lf (π1 ⊗ π2, s).

Moreover, we have for any ε > 0 and uniformly for Re(s) = σ > 12 a bound

H(π1, π2; s) [q1, q2]εH(σ)

where H is a fixed Dirichlet series absolutely convergent for Re(s) > 12 satisfying in this

regionH(σ) (σ − 1

2)−A

for some A > 0 depending only on n1 and n2.

This lemma simply reflects the fact that the coefficients of Lf (π1⊗π2, s) and Lb(π1⊗π2, s) are the same for squarefree integers n, so their analytic behavior is the same upto the critical line.

Coming back to the proposition, we derive from the lemma the analytic continuationof Lb up to the critical line, because Lf (π1 ⊗ π2, s) has a meromorphic continuation

3This convolution has already been used in other contexts of analytic number theory by Hoffstein andLockhart [H-L] and by Luo, Rudnick, Sarnak [LRS] to obtain deep results about GL(2) automorphicforms, especially non-holomorphic Maass forms.

54

to C by the work of Jacquet, Piatetskii-Shapiro and Shalika. Now we apply Mellininversion and derive

k(f, g) =1

2iπ

∫(3)

N sψ(s)Lb(Sym2 πf ⊗ Sym2 πg, s)ds

=1

2iπ

∫(3)

N sψ(s)H(Sym2 πf ,Sym2 πg; s)Lf (Sym2 πf ⊗ Sym2 πg, s)ds,

where ψ(s) =∫ +∞

0 ψ(x)xs dxx is the Mellin transform of ψ.Move the line of integration to Re(s) = 1

2 + c, where c < 12 will be chosen later;

the Mellin transform ψ is holomorphic for Re(s) > 0 and quickly decreasing in anyvertical strip δ < Re(s) < b with δ > 0; the other terms in the integral being atmost of polynomial growth (see [R-S] for the Rankin-Selberg convolution), shifting thecontour is possible and the only singularities we can pick up by doing so are those ofLf (Sym2 πf ⊗ Sym2 πg, s). They are known from the Rankin-Selberg theory. Precisely,it is proved in [M-W]:

Theorem 12. Let π1 and π2 be cuspidal automorphic representations of GL(n1) andGL(n2) respectively (over a number field). If there are no t ∈ C such that π1 = π2⊗|·|t,then L(π1 ⊗ π2, s) is entire.

If π1 = π2, then L(π1 ⊗ π2, s) has two simple poles at 0 and 1 and is holomorphicoutside those points.

In our case, Sym2 πf and Sym2 πg both have trivial central character so we haveSym2 πf = Sym2 πg ⊗ | · |t for t = 0 only, and this theorem says that poles arise onlywhen Sym2 πf = Sym2 πg.

Keeping this in mind, we estimate the integral on the other line, namely

12iπ

∫( 1

2+c)

N sψ(s)H(Sym2 πf ,Sym2 πg; s)Lf (Sym2 πf ⊗ Sym2 πg, s)ds.

We are only interested in the q-aspect of the matters. By the bounds for H inLemma 3, for any ε > 0 we have

H(π1, π2; 12 + c+ it)ε q

εc−A.

As for the Rankin-Selberg convolution, after inserting the correct Gamma factorsit has a functional equation relating its value at s with that at 1 − s (because it isself-contragredient; for this, see the references to several articles of Shahidi in [M-W]):

L(Sym2 πf ⊗ Sym2 πg, s) = τ(f, g)q(f, g)12−sL(Sym2 πf ⊗ Sym2 πg, 1− s)

where τ(f, g) is a complex number of absolute value 1 and q(f, g) = q(Sym2 πf ⊗Sym2 πg) is the conductor of Sym2 πf ⊗ Sym2 πg. By a theorem of Bushnell and Hen-niart [B-H], it is bounded by the product of the cubes of the conductors of Sym2 πf andSym2 πg, each of which is at most q2, so q(f, g) 6 (q4)3 = q12.

From the functional equation, Stirling’s formula and the convexity principle ofPhragmen-Lindelof, this implies in turn

Lf (Sym2 πf ⊗ Sym2 πg,12 + c+ it) q12(1/4−c/2)|t|E = q3−6c|t|E

55

for some (absolute) E > 0. Taken together with the previous bound for H, and usingthe fact that ψ decreases faster than any polynomial on the line, we see that the integralon Re (s) = 1

2 + c is ε c−AN

12

+cq3−6c+ε. Since by assumption N > q9, we deducefrom this by taking c = (log q)−1 (so that 1 qc 1, N c 1 and c−A = (log q)A)that for any ε > 0

k(f, g) = δ(Sym2 πf ,Sym2 πg)ψ(1)NRf +Oε(N5/6+ε) (3.10)

where δ(·, ·) is the Kronecker delta, and Rf is the residue at s = 1 of the bilinearconvolution Lb(Sym2 πf ⊗ Sym2 πf , s). To determine that only the diagonal f = gcontributes to the main term, we appeal to Proposition 5: since the level q is squarefree,Sym2 πf = Sym2 πg, which implies L(Sym2 f, s) = L(Sym2 g, s), is equivalent to f = g.Then knowing that f = g we can go around Rf easily by coming back to the definition

k(f, f) =∑n>1

|ρf (n)|2ψ(n/N) 6∑n62N

|ρf (n)|2

6∑n62N

τ(n)4 N(logN)15

(where the implied constant is absolute). From this and (3.9) we therefore get

||T ||2 ε N(logN)15 + qN5/6+ε N(logN)15

by taking ε small enough, with an absolute constant. There is actually some margin leftin the bound, but it doesn’t seem to matter in the current applications; it is still veryfar away from a sharp large-sieve type inequality, which would be much more widelyapplicable.

We now prove Lemma 3. We write

Lf (πi, s) =∑n>1

λi(n)n−s

for the finite part of the standard L-functions, so

Lb(π1 ⊗ π2, s) =∑n>1

λ1(n)λ2(n)n−s;

we have to compare this Dirichlet series and the Rankin-Selberg convolution Lf (π1 ⊗π2, s).

The Rankin-Selberg convolution has an Euler product by the general theory, andthe bilinear convolution also has one because it’s a Dirichlet series whose coefficientsare multiplicative:

Lb(π1 ⊗ π2, s) =∏p

∑k>0

λ1(pk)λ2(pk)p−ks.

Therefore, since we claim the existence of an Euler product

H(π1, π2) =∏p

Hp(π1, π2)

56

relating the two, we can proceed locally for each prime p.For any automorphic L-function, we denote by Lp its p-factor, considered as a

polynomial (in p−s) with complex coefficients.Assume first that p is an unramified prime of the Rankin-Selberg convolution. This

is true for almost all p, and we will prove now the existence of a polynomial Hp(π1, π2)such that ∑

k>0

λ1(pk)λ2(pk)Xk = Hp(π1, π2)Lp(π1 ⊗ π2). (3.11)

We know that p is unramified for both π1 and π2, so that the p-factor of the standardL-function is

Lp(πi)−1 =∏

16j6ni

(1− αi,jX) (3.12)

where αi,j are the Satake parameters of the local representation for πi at p. Again, thegeneral theory gives the p-factor of the Rankin-Selberg convolution

Lp(π1 ⊗ π2)−1 =∏

16j6n116k6n2

(1− α1,jα2,kX).

Assume, to begin with, that the αi,j are all distinct and the α1,jα2,k also. Comingthen to the p-factor of the bilinear convolution, we deduce from the Dirichlet series forLf (πi) ∑

k>0

λi(pk)Xk =∏

16j6ni

(1− αi,jX)−1

=∑

16j6ni

ri,j1− αi,jX

for some complex numbers ri,j (partial fraction expansion, since the α’s are distinct),whence

λi(pk) =∑

16j6ni

ri,jαki,j .

This implies ∑k>0

λ1(pk)λ2(pk)Xk =∑k>0

( ∑16i6n116j6n2

r1,ir2,jαk1,iα

k2,j

)Xk

=∑i,j

r1,ir2,j

1− α1,iα2,jX.

Reducing to a common denominator, which is exactly Lp(π1 ⊗ π2), we get therequired formula (3.11).

Moreover, it is obvious that the coefficients of Hp(π1, π2) are polynomials in theα’s and since the Ramanujan bound implies |αi,j | 6 1 it follows that those coefficientsare bounded by some constants depending only on n1 and n2. Hence the absoluteconvergence (and the absence of poles) in Re(s) > 1

2 of the product over the unramifiedprimes will follow if we can show that the coefficient of X of Hp(π1, π2) vanishes, sincethere is no term in p−s then.

57

But for any rational function

r =f

g

with polynomials f and g, satisfying r(0) = 1, the coefficient of X of the numerator f ofr is f ′(0), and so equals g(0)r′(0) + g′(0). If r =

∑k bkX

k is the power series expansionof r, we have therefore

f ′(0) = g(0)b1 + g′(0).

Assume moreover that g =∏j (1− βjX). Then

f ′(0) = b1 −∑j

βj .

Applying this to the local factor of Lb, which is of this form, we see that thecorresponding coefficient is indeed zero since

λ1(p)λ2(p) =∑i,j

α1,iα2,j , (3.13)

(which means simply that the bilinear and the true convolution have the same coeffi-cients for squarefree integers).

We can now use a continuity argument to deduce that the existence of the polynomialHp satisfying formula (3.11) and the vanishing of the coefficient of X remain valid whensome of the roots of the local L-functions are the same.

It remains to treat the case of the ramified primes. The local factor at p of theL-functions of π1 and π2 is still of the form

Lp(πi) =∏

16j6n′i

(1− αi,jX)−1

for some n′i 6 ni. The same proof as the unramified case shows again that the localfactor of the bilinear convolution is a rational function which has poles only among thereciprocals of the products α1,jα2,k. So we can define Hp(π1, π2) by

Hp(π1, π2) =(∑k>0

λ1(pk)λ2(pk)Xk)Lp(π1 ⊗ π2)−1 (3.14)

and it’s also a rational function.It remains to establish that the finite product over the ramified primes has no pole

for Re(s) > 12 . But a pole s0 of Hp(π1, π2, p

−s) must satisfy

α1,jα2,kp−s0 = 1

(for some j and k), so by the Ramanujan bound again we get Re(s0) 6 0.As for bounding H(π1, π2; s), clearly by the Ramanujan bound the product over the

unramified primes is absolutely convergent for Re(s) > 12 . It is dominated (termwise)

by the Euler product H whose factors are obtained by taking the corresponding factorof Hp and replacing each coefficient of the polynomial by its absolute value, which inturn, since the coefficient of X2 is absolutely bounded (say by A), is dominated by an

58

Euler product which may be written (by factoring by force ζ(2s)) as ζ(2s)AJ(s) whereJ(s) is absolutely convergent for Re(s) > 1

3 . The estimate

H(σ) (σ − 12)−A

then follows directly.We now estimate the product over the ramified primes∏

p|[q1,q2]

Hp(π1, π2; p−s)

using (3.14). For Lp(π1⊗π2)−1, which is a polynomial of degree at most n1n2, we writeby the Ramanujan bound:∏

p|[q1,q2]

Lp(π1 ⊗ π2; p−s)−1 6∏

p|[q1,q2]

(1 + p−σ)n1n2 6 (∏

p|[q1,q2]

2)n1n2

ε [q1, q2]ε

for any ε > 0, since the number of prime divisors of an integer n is O(log n/ log log n).On the other hand, still by Ramanujan, for any ε > 0∑

k>0

λ1(pk)λ2(pk)p−ks ε

∑k>0

pk(ε−s) =1

1− p−s+ε

so that taking the product over p | [q1, q2] we obtain by the same reasoning the samebound as above for the product of those terms, and finally∏

p|[q1,q2]

Hp(π1, π2; p−s)ε [q1, q2]ε .

This concludes the proof of the mean-value estimate. As mentioned, it is very farfrom what one might expect by analogy with other large-sieve inequalities. We endthis section by showing that the proof given, relying on the duality principle and theanalytic properties of the Rankin-Selberg convolutions, can not yield any improvementin the range of N for which the mean-value estimate is valid.

Indeed, for the special case of the symmetric squares of two forms in S2(q)∗, we canmake the comparison in Lemma 3 much more explicit. Take a prime p, unramified forf and g, and consider the p-factors of the symmetric squares. They can be written

Lp(Sym2 f) = (1− α2X)(1−X)(1− α−2X)

andLp(Sym2 g) = (1− β2X)(1−X)(1− β−2X),

say, for some α and β of absolute value 1. Using patience and cunning, or any symboliccomputation software, we find from this (the formula λf (p2) = α2 + 1 + α−2 is usedrepeatedly) the p-factor Hp of the lemma, namely

Hp = 1− λf (p2)λg(p2)X2 +(λf (p2)2 + λg(p2)2 − 2

)X3 − λf (p2)λg(p2)X4 +X6.

Disregarding the ramified primes, which will not interfere with this discussion, wesee that, up to another Euler product which is absolutely convergent (and uniformly

59

bounded) for Re (s) > 1/3, the comparison function H(Sym2 f, Sym2 g) is equal to theinverse of the Rankin-Selberg convolution Lf (Sym2 f ⊗ Sym2 g, 2s)−1 (or of Lb, whichis the same in that sense). Therefore, we can not go beyond the critical line withoutencountering poles from the zeros of this convolution: the Riemann Hypothesis entersthe game, and there is no hope of going further in this easy way.

A curious fact perhaps deserves mention here: the polynomial Hp is self-reciprocaland even, so if x is a root of Hp, then −x, x−1 and −x−1 are also roots. But thereis no reason to expect that Hp satisfies the local Riemann Hypothesis, in other wordsthat all its roots are on the unit circle. However, because of the symmetry, if the rootsare distinct, then one being on the unit circle implies that four of them at least are.Actually, if one considers Hp as a polynomial depending on two parameters α and β(on the unit circle), the set R of points (α, β) such that all the roots of Hp are unitaryis a rather large subset of S1 × S1.

And in practice, working numerically with the L-functions of some concrete ellipticcurves (without CM), one finds that a large percentage of primes p are such that Hp

does satisfy the local Riemann Hypothesis. This is accounted for by the Sato-Tateconjecture on the distribution of the arguments of the αp, and in fact the proportion ofp agrees (experimentally) with the measure of the set R (with respect to the productof the Sato-Tate measure on each component).

If a practical analytic interpretation of the roots of the local factors Hp could befound, it might be possible to provide a rigorous proof that this agreement holds as ptends to infinity, and this would provide some theoretical evidence for the Sate-Tateconjecture.

3.4 Notational matters

We will be dealing quite extensively with sums over f ∈ S2(q)∗. The following notationsare designed to emphasize the underlying structure. We usually suppose given a familyα = (αf ) of complex numbers, defined for all forms f ∈ S2(q)∗, q being any level,or maybe restricted to squarefree or prime levels. We then introduce the “natural”averaging operator

A[α] =∑

f∈S2(q)∗

αf

where we only sum over forms of a fixed level, and consider the behavior of A[α] asa function of the level q, asymptotically as q gets large. So the interpretation of aninequality which is written

A[α] 6 f(q)

(respectively, >,. . . ), for any function f , is the following: for q large enough (possibly,q prime large enough), it holds ∑

f∈S2(q)∗

αf 6 f(q)

(respectively, >, or there exists an absolute constant C > 0 with∑f∈S2(q)∗

αf 6 Cf(q)

60

for all q large enough).Similarly we define the “harmonic” averaging operator

Ah[α] =∑h

f∈S2(q)∗

αf

recalling (see (2.16)) that∑h

means

∑h

f

αf =∑f

ωfαf

withωf =

14π(f, f)

.

As a matter of notational convenience, we often write somewhat loosely Ah[αf ]instead of Ah[α], which avoids losing a letter to denote each family (αf ) we wish toaverage.

3.5 Removing the harmonic weight: the tail

The relevance of Proposition 6 to this thesis, as we mentioned at the end of Section 2.3,is that the harmonic weight ωf = 1

4π(f,f) , which is required to express the ∆-symbolof the modular forms by Petersson’s formula, is related to the special value of thesymmetric square L-function at s = 1, which is the edge of the critical strip (in our“analytic” normalization). This is essentially due to Shimura [Sh3].

Proposition 7. Let q be squarefree and f ∈ S2(q)∗ a primitive form. Then

4π(f, f) =q

12ζ(2)L(Sym2 f, 1) +Oε(q1/2+ε)

for any ε > 0, and if q is prime

4π(f, f) =dim J0(q)ζ(2)

L(Sym2 f, 1) +O((log q)3) (3.15)

uniformly in f as q tends to infinity. In particular we have

ωf log qq

(3.16)

uniformly for f ∈ S2(q)∗.

Sketch of the proof. We introduce the Rankin-Selberg convolution of f with itself

L(f ⊗ f, s) = ζq(2s)∑n>1

λf (n)2n−s.

By a simple formal computation using the multiplicativity of the Fourier coefficients,we have the identity

L(f ⊗ f, s) = ζq(s)L(Sym2 f, s).

61

On the other hand, L(f ⊗ f, s) is given by the Rankin-Selberg integral (see [Iw2,page 245])

(4π)−1−sΓ(s+ 1)ζq(2s)−1L(f ⊗ f, s) =∫

Γ0(q)\H|f(z)|2E(z, s)dxdy

where E(z, s) is the non-holomorphic Eisenstein series for Γ0(q).The right-hand side of this equation is meromorphic with a simple pole at s = 1,

from the known analytic properties of E(z, s): this already shows that L(f ⊗ f, s)is meromorphic, hence L(Sym2 f, s) also (the point in Shimura’s theorem is that thesymmetric square is actually entire).

The residue of E(z, s) at s = 1 is the constant function Vol X0(q)−1, so the compu-tation of the residue at s = 1 on both sides yields

1(4π)2ζ(2)

∏p|q

(1 + p−1)−1L(Sym2 f, 1) =(f, f)

Vol X0(q)

hence

4π(f, f) =Vol X0(q)

4πζ(2)

∏p|q

(1 + p−1)−1L(Sym2 f, 1)

=q

12ζ(2)L(Sym2 f, 1) +Oε(q

12

+ε) (by (1.11) and (3.5)).

Moreover, the more precise formula (1.12) gives indeed, for q prime

4π(f, f) =dim J0(q)ζ(2)

L(Sym2 f, 1) +O((log q)3)

and the last statement is a consequence of the previous ones and the lower bound (3.6)by the very definition ω−1

f = 4π(f, f). 2

Suppose we have a family α = (αf ) of complex numbers, for all f ∈ S2(q)∗ withprime level q, and that we know the behavior of the weighted sum

Ah[α] =∑h

f∈S2(q)∗

αf

(for instance, we have an asymptotic formula for q going to infinity), but wish to obtainthe same information for the natural sum

A[α] =∑

f∈S2(q)∗

αf .

Since, by Petersson’s formula for m = n = 1, Ah[1] = 1 +O(q−3/2), we expect thatwhen α is well-distributed and not biased against the Petersson inner-product (or, whatamounts to the same thing, against the value of the symmetric square at s = 1), weshould have

A[α] ∼ dim J0(q)Ah[α]

62

meaning that L(Sym2 f, 1) and αf act here as independent random variables would,with the average of L(Sym2 f, 1) equal to the obvious constant factor ζq(2) (which isequivalent to ζ(2) as q tends to infinity).

In this section we build a method to approach this problem, and – using the mean-value estimate established before – prove a result which solves part of the problem forquite general vectors α. This reduces to another estimate which has to be suppliedindependently in each case, as we will do in the next chapters when the time comes toconclude the proofs of theorems 8 and 10.

3.5.1 Sketch of the idea

We sketch the idea first, since the technical details will tend to obscure it. We makethe assumption that α = (αf ) satisfies the conditions

Ah[|αf |] (log q)A (for some absolute A > 0) (3.17)

Maxf∈S2(q)∗

|ωfαf | q−δ (for some δ > 0) (3.18)

as the level q (prime) tends to infinity. Neither of these conditions is very restrictivein practice: the first one is interpreted as saying that |αf | is “almost” bounded, andcan often be achieved by some normalization. If this is true, the second condition isfairly reasonable since we have shown in (3.16) that ωf (log q)q−1. In other words,by normalizing if necessary, both conditions can be expected to hold whenever the sizeof αf doesn’t increase or oscillate wildly.

We write the unweighted average as a weighted one and replace the Petersson innerproduct by the special value of the symmetric square (3.15):

A[α] =∑h

f∈S2(q)∗

4π(f, f)αf (3.19)

=dim J0(q)ζ(2)

∑h

f∈S2(q)∗

L(Sym2 f, 1)αf +O((log q)3Ah[|αf |]).

We wish to replace the value of the symmetric square by a partial sum of theDirichlet series. This can be done for a long enough sum, say of length y. Thus wedefine

ωf (y) =∑n6y

ρf (n)n−1 (3.20)

and the sum above is, up to a very small quantity, equal to∑h

f∈S2(q)∗

ωf (y)αf .

This is now a finite sum of averages over the αf , twisted by symmetric squarecoefficients ρf (n): ∑h

f∈S2(q)∗

ωf (y)αf =∑n6y

1nAh[ρf (n)αf ].

If by any chance the methods which give us control over the average Ah[αf ] (cor-responding to n = 1) also apply to the twisted ones, in the range n < y, then we are

63

done. Unfortunately, in applications this will only be the case for very small values ofy. For instance, in the cases we will consider, we can control the twists only for y < qε,where ε > 0 is very small (indeed, arbitrarily so). So, can we recover the L-functionfrom such a short sum (3.20)?

On the Riemann hypothesis (more precisely, the Lindelof hypothesis suffices for thispurpose) this is indeed true: then we have, for any real number σ and any s withRe (s) > σ > 1

2

L(Sym2 f, s) =∑n<qε

ρf (n)n−s +O(q−δ) (3.21)

for any ε > 0, and some δ(σ, ε) > 0, with an absolute implied constant. This is anencouraging sign, but of course this statement is out of reach, and individually wecan only take y much larger (y = q2 or maybe y = q), and indeed too large for ourapplication.

But we can again exploit the average over f that we have to (even want to) perform.As discussed in Section 3.2, a sharp form of the mean-value estimate of Proposition 6would be as powerful as the Lindelof hypothesis on average over f , and it might be usedto implement (3.21) on average. But of course, only the weak form of the proposition isat our disposal. Thankfully, we are still saved because we are looking at the symmetricsquare at a point on the edge of the critical strip, where the Dirichlet series almostconverges absolutely. Then the “extra length” needed to enter the effective range of nfor the mean-value estimate will not matter, much as the partial sums∑

n<qδ

n−1

of the harmonic series are of the same size as q tends to infinity for any fixed δ > 0.

3.5.2 The tail of the series

We come to the implementation of this idea. Let therefore α = (αf )f∈S2(q)∗ be givenfor all q prime, satisfying the conditions (3.17). First, since the conductor of Sym2 f forf ∈ S2(q)∗ is q2, the functional equation and the usual estimates give the approximation

L(Sym2 f, 1) = ωf (y) +O(q2y−1) (3.22)

(with an absolute implied constant). We assume log y = O(log q), say y < q10.Now let x < y be given. The partial sum is further decomposed as

ωf (y) = ωf (x) + ωf (x, y)

whereωf (x, y) =

∑x<n6y

ρf (n)n−1.

We consider here the weighted average built with the tail, namely

Ah[ωf (x, y)αf ] =∑h

f∈S2(q)∗

ωf (x, y)αf .

We will use Holder’s inequality to separate ωf (x, y) and αf . The former is handled bythe following lemma.

64

Lemma 4. Let r > 1 be an integer, such that xr > q11. There exists a positive constantC = C(r) > 0 such that

A[ωf (x, y)2r] (log q)C

where the implied constant is absolute.

The proof starts with some other lemmas. We say that an integer n is squarefullif for any prime p dividing n, p2 divides n; in other words, for all p dividing n, thevaluation of p in n is at least 2. Notice that∑

n squarefull

n−s =∏p

(1 + p−2s + p−3s + . . .)

which converges absolutely for Re (s) > 12 , hence we have∑

n squarefulln>z

n−1 z−1/2 (3.23)

with an absolute implied constant.

Lemma 5. For any integer r > 1 and any f ∈ S2(q)∗, we can write

ωf (x, y)r =∑

xr<mn6yrλf (m2)

c(m,n)mn

(3.24)

with c(m,n) = 0 unless n can be written

n = dn1, with d | m, n1 squarefull. (3.25)

and there exists γ = γ(r) > 0 such that

|c(m,n)| 6 τ(mn)γ .

Moreover, the coefficients c depend on r, x and y but not on the form f .

Proof. We proceed by induction on r. For r = 1, we write by (3.2)

ωf (x, y) =∑

x<n6y

1n

∑`m2=n

εq(m)λf (`2)

=∑

x<`m26y

λf (`2)εq(m)`m2

so we can take c(`,m) = 0 unless m is square and c(`,m2) = εq(m).Assume that (3.24) holds for some r and s as claimed, with coefficients c (for r) and

c′ (for s). Then

ωf (x, y)r+s =∑

xr<m1n16yrxs<m2n26ys

λf (m21)λf (m2

2)c(m1, n1)c′(m2, n2)

m1n1m2n2

=∑

xr<m1n16yrxs<m2n26ys

∑d|m2

1

d|m22

λf

(m21m

22

d2

)εq(d)c(m1, n1)c′(m2, n2)m1n1m2n2

65

by multiplicativity for λf .Now d can be written uniquely as d = d1d

22 with d1 squarefree and then we have

d | m2 if and only if d1d2 | m. Therefore we can writem1 = d1d2m

′1

m2 = d1d2m′2

and then

ωf (x, y)r+s =∑

xr<d1d2m′1n16yr

xs<d1d2m′2n26ys

λf((d1m

′1m′2)2)εq(d1d2)c(d1d2m

′1, n1)c′(d1d2m

′2, n2)

(d1d2)2m′1m′2n1n2

.

Now write m0 = d1m′1m′2, n0 = d1d

22n1n2. By the induction hypothesis we see that

if c(m1, n1) 6= 0 and c′(m2, n2) 6= 0, then n0 can be written as δn′0 with δ | m0 and n′0squarefull (this is not absolutely obvious because m1m2 does not divide m0, but theextra prime divisors can be pushed to the squarefull part).

Estimating rather trivially the multiplicity of representation of m0, we find thedesired representation. This immediately concludes the induction. 2

Lemma 6. Let z > 1 be given and the coefficients c(m,n) be as in lemma 5 for r.Then there exists A = A(r) > 0 such that∑

xr<mn6yrn>z

λf (m2)c(m,n)mn

= O(z−1/2(log qz)A

).

Proof. By Deligne’s bound we have∑xr<mn6yr

n>z

λf (m2)c(m,n)mn

6∑

xr<m6yr

τ(m)m

∑xrm−1<n6yrm−1

n>z

|c(m,n)|n

but using the condition on the support of c(m,n), the inner sum is∑xrm−1<n6yrm−1

n>z

|c(m,n)|n

6 τ(m)γ∑d|m

1d

∑n squarefull

dn>z

τ(n)γ

n

τ(m)γ+1z−1/2(log z)A

(by (3.23) and the result follows. 2

Lemma 7. There exists a real number M such that xrz−1 < M 6 yrz, and realnumbers c(m) such that we have∑

f∈S2(q)∗

ωf (x, y)2r (log qz)B∑

f∈S2(q)∗

∣∣∣ ∑m∼M

λf (m2)c(m)m

∣∣∣2 +O(qz−1/2(log qz)B)

and|c(m)| 6 τ(m)C(log qm)C

for some C > 0.

66

Proof. By the previous lemma

ωf (x, y)2r =∑n6z

∣∣∣ ∑xr<mn6yr

λf (m2)c(m,n)mn

∣∣∣+O(qz−1/2(log qz)A).

Write ξn = sign(∑

xr<mn6yr λf (m2) c(m,n)mn

), split the summation over dyadic inter-

vals in m, then use Cauchy’s inequality and sum over f : the result follows for some Mwith

c(m) =∑

xrm−1<n6z

ξnc(m,n)n

τ(m)C(log qm)C

for some C > 0, as desired. 2

This now easily implies Lemma 4: we take z = q2, then the assumption xr > q11

implies that M > q9 and we may appeal to the mean-value estimate of Corollary 2 tobound the first term, with logM log q.

Proposition 8. Let (αf ) be complex numbers satisfying conditions (3.17), and x = qκ

for some κ > 0. There exists an absolute constant γ = γ(κ, δ) > 0 (δ the exponentin (3.17)) such that

Ah[ωf (x, y)αf ] q−γ

and

A[αf ] =dim J0(q)ζ(2)

Ah[ωf (x)αf ] +O(q1−γ).

Proof. Let r > 1 be any integer. By Holder’s inequality we have (with s the comple-mentary exponent to 2r, (2r)−1 + s−1 = 1)

Ah[ωf (x, y)αf ] =∑h

f∈S2(q)∗

ωf (x, y)αf

=∑

f∈S2(q)∗

ωfωf (x, y)αf

6 A[ωf (x, y)2r]12r

( ∑f∈S2(q)∗

(ωf |αf |)s) 1s

6 A12rA[ωf (x, y)2r]

12rAh[|αf |]

1s

where we have denotedA = Max

f∈S2(q)∗ωf |αf |.

Take now r large enough so that xr > q11 (r = [11κ−1] + 1 suffices). Then Lemma 4gives

A[ωf (x, y)2r]12r (log q)D

for some D = D(κ) > 0, while we have, from (3.18) and (3.17) respectively,

A12r q−γ0 for some γ0 = γ0(κ, δ) > 0,

Ah[|αf |] (log q)C for some absolute constant C > 0.

67

Hence the proposition, the last equality being an immediate corollary of the formula

A[αf ] =dim J0(q)ζ(2)

Ah[L(Sym2 f, 1)αf ] +O((log q)3Ah[|αf |])

and the decomposition

L(Sym2 f, 1) = ωf (x) + ωf (x, y) +O(q2y−1)

applied with y = q3. 2

When the average Ah[αf ] is of smaller order of magnitude (being part of an errorterm), it is often possible to use the known individual upper bounds on ωf to show thatthe corresponding A[αf ] is also small enough. Such a result is contained in the followinglemma. It will not be used in the sequel, but is mentioned here for completeness.

Lemma 8. Let (αf ) be complex numbers (defined for f ∈ S2(q)∗ for all q prime)satisfying

Ah[|αf |] (log q)−(3+δ)

for some δ > 0. ThenA[|αf |] q(log q)−δ

(for all q prime) with an implied constant only depending on the one in the assumption.

Proof. Using (3.19), it is enough to quote again

L(Sym2 f, 1) (log q)3

from Lemma 2. 2

68

Chapter 4

The Delta symbol for automorphic forms of fixed level

This short chapter studies the fundamental properties of the Delta symbol of primitiveforms of prime level in various circumstances, each of which will be used in the courseof the last two chapters. For a related treatment of the Delta symbol in weight aspect,see Section 5.5 of [Iw2].

4.1 The Delta symbol for primitive forms

For q prime we define the Delta symbol ∆(m,n) associated to primitive weight 2 formsof level q by

∆(m,n) =∑h

f∈S2(q)∗

λf (m)λf (n). (4.1)

We first state, and sketch of proof of, the Petersson formula.

Theorem 13. (Petersson). The Delta symbol ∆(m,n) is given by

∆(m,n) = δ(m,n)− J (m,n)

where

J (m,n) = 2π∑

c≡0 mod q

1cS(m,n; c)J1

(4π√mn

c

)(4.2)

and S(m,n; c) is a Kloosterman sum, J1 is a Bessel function of the first kind.

Sketch of the proof.In abstract principle, this is easily understood. Let af (n) denote the Fourier coeffi-

cients of a cusp form f ∈ S2(q). The map

f 7→ n−1/2af (n)

is a linear form on the Hilbert space S2(q) (in the basis S2(q)∗, it is simply f 7→ λf (n),see (2.1)), and the latter is finite dimensional, hence by the Riesz representation theoremthere exists a unique cusp form pn ∈ S2(q) such that

(f, pn) = n−1/2af (n) (4.3)

for every f ∈ S2(q). We can decompose pn spectrally in terms of the basis S2(q)∗, so

pn =∑h

f∈S2(q)∗

(pn, f)f =∑h

f∈S2(q)∗

λf (n)f

69

since f has real coefficients. Taking the m-th Fourier coefficient of both sides (precisely,computing the scalar product with pm) yields

(pn, pm) =∑h

f∈S2(q)∗

λf (n)λf (m) = ∆(m,n).

This is the abstract form of the Petersson formula. To give it a concrete shape, it isnecessary to identify the forms pn and compute their mutual inner products (pm, pn):this is done by a simple computation of the inner product of a form f with the Poincareseries Pn (1.14) which shows that a suitable multiple of Pn has the required prop-erty (4.3). For the details, see [Iw2], Sections 3.2 and 3.3.

2

The first lemma gives the basic estimate on the complementary term J (m,n), test-ing in effect the range of m and n for which the forms f are really “independent of eachother”.

Lemma 9. For any m > 1, n > 1, we have

J (m,n) (m,n, q)(mn)1/2(log(m,n))2q−3/2 (4.4)

with an absolute implied constant, hence for m and n less than q,

∆(m,n) = δ(m,n) +O((log q)2q−1/2).

Proof. We use Weil’s bound for Kloosterman sums ([Iw2, 4.3] for example)

S(m,n; c) 6 τ(c)(m,n, c)1/2√c (4.5)

and the easy bound (from the series expansion; we do not need any more preciseinformation such as J1(x) x−1/2 for x large; small values of r are not very importanthere)

J1(x) x

valid for all x > 0 with an absolute implied constant. Directly from this we have

J (m,n)√mn

q3/2

∑r>1

(m,n, qr)1/2τ(qr)1r3/2

(m,n, q)1/2(mn)1/2(log(m,n))2q−3/2

using (m,n, qr) | (m,n, q)(m,n, r) and a crude estimate∑r>1

τ(r)(m,n, r)1/2

r3/2

∑d|(m,n)

τ(d)ϕ(d)1/2

d3/2 (log(m,n))2

which is sufficient to get the equally crude factor (log(m,n))2. 2

This factor (log(m,n))2 would not appear if the weight were greater than 2, as theBessel function Jk−1 satisfies

Jk−1(x) xk−1

in general, which would make the series over r converge absolutely. However, (4.4) willonly be applied in situations where we seek (and obtain) a power saving in q with mand n powers of q, so it is only an unimportant annoyance.

70

4.2 The Delta symbol for odd primitive forms

Keeping q prime, we now consider the Delta symbol associated to primitive formsf ∈ S2(q)∗ with a given parity. We define ∆+ and ∆− by

∆±(m,n) = 2∑h

f∈S2(q)∗

εf=±1

λf (m)λf (n) (4.6)

so ∆(m,n) = 12(∆+(m,n) + ∆−(m,n)). From (2.7) we see that

∆±(m,n) = 2∑h

f∈S2(q)∗

ε±f λf (m)λf (n)

=∑h

f∈S2(q)∗

(1± εf )λf (m)λf (n)

= ∆(m,n)± q1/2∑h

f∈S2(q)∗

λf (q)λf (m)λf (n) (by (2.6))

= ∆(m,n)± q1/2∆(m,nq) (4.7)

(by (2.11)), which shows how the restriction of the summation affects the analyticproperties of ∆± through the insertion of a factor q in the sums. It is not possible toget a good bound for ∆± by applying Lemma 9 to this decomposition.

Lemma 10. Let m < q, n > 1 and define

J ′(m,n) = − 2π√q

∑(r,q)=1

1rS(mq, n; r)J1

(4πr

√mn

q

). (4.8)

Then

q1/2J (m, qn) = J ′(m,n) +O(√mn

q2(log q)2

)with an absolute implied constant.

Proof. We write

J (m, qn) =2πq

∑r>1

1rS(m, qn; qr)J1

(4πr

√mn

q

)and we estimate first the contribution of those r with (r, q) > 1 (so q | r since q isprime), by the same method as in the previous lemma:

2πq

∑(r,q)>1

1rS(m, qn; qr)J1

(4πr

√mn

q

)√mn

q5/2(log q)2

(using here that m < q hence (m, q) = 1). The part of the sum which remains, with(r, q) = 1, is equal to

2πq

∑(r,q)=1

1rS(m, qn; qr)J1

(4πr

√mn

q

)

71

but by the twisted multiplicativity of Kloosterman sums ([Iw2, 4.3]) we have, for rcoprime with q

S(m, qn; qr) = S(mq, n; r)S(mr, 0; q) = −S(mq, r; r)

since, again using (m, q) = 1, S(mr, 0; q) = S(1, 0; q) is a Ramanujan sum with q prime,hence S(1, 0; q) = µ(q) = −1. This gives the desired expression on multiplying by q1/2.2

We derive from this another basic decomposition and estimate.

Lemma 11. Let m < q and n > 1. Then

∆±(m,n) = δ(m,n)∓ J ′(m,n) +O(√mnq3/2

(log q)2)

(4.9)

= δ(m,n) +O(√mn

q(log q)2

). (4.10)

Proof. From the Petersson formula and (4.7) we obtain

∆±(m,n) = δ(m,n)± q1/2δ(m, qn) − J (m,n)∓ q1/2J (m, qn).

Since m < q, this gives from Lemma 9 and 10

∆±(m,n) = δ(m,n)∓ J ′(m,n) +O(√mnq3/2

(log q)2).

Yet again we apply the same method as in Lemma 9 to estimate

J ′(m,n)√mn

q(log q)2

and therefore the second bound follows. 2

In Chapter 5, only ∆(m,n) appears, in ranges such that Lemma 9 is adequate, butin treating the second moment of mollified derivatives in Chapter 6, the correspond-ing (4.10) is not, and a precise analysis of the contribution arising from J ′(m,n) willbe required.

4.3 The Delta-symbol without weight

When applying the technique of Chapter 3 to go from an average Ah[αf ] weighted byωf to the natural average A[αf ], we will use variants of the Delta-symbol, twisted bysymmetric-square coefficients.

Let x <√q be given. Then we define

∆n(m,n) =∑`6x

Ah[ρf (`)λf (m)λf (n)]

=∑d`26x

1d`2

∑h

f∈S2(q)∗

λf (d2)λf (m)λf (n) (4.11)

(and similarly for ∆n±).

72

Lemma 12. For all m, n, we have

∆n(m,n) =∑d`26x

1d`2

∑r|(d2,m)

∆(md2

r2, n)

(and the same holds with ∆n± in place of ∆n), and if m < q, n < q,

∆n(m,n) =∑d`26x

1d`2

∑r|(d2,m)

δ(md2

r2, n)

+O(

(log q)3x√mn

q3/2

)(4.12)

∆n±(m,n) =

∑d`26x

1d`2

∑r|(d2,m)

δ(md2

r2, n)

+O(

(log q)3x√mn

q

). (4.13)

Proof. The first statement is an immediate consequence of the definition of ∆n and themultiplicativity formula

λf (d2)λf (m) =∑

r|(d2,n)

λf

(md2

r

)and the second is a consequence of this, and Lemma 9 for ∆n, or Lemma 11 for ∆n

±. 2

We see that, in first approximation, the difference is that the natural Delta symbol∆n(m,n) detects the condition that n and m differ by a square, instead of being equal.

Appendix: Digression on multiplicativity

The natural Delta-symbol ∆n, when applied, will need further treatment, in particularin Chapter 6. For lack of a better place, we offer here a digression with the intent ofclarifying the apparently complicated details which will be involved in the computations.The situation will be the following: we wish to diagonalize a quadratic form Q whichis of the form

Q(y) =∑m1,m2

g(m1m2)m1m2

ym1ym2

with some arithmetic function g, and the variables m1, m2 are restricted by m1, m2 6M , for some M > 0, and m1 and m2 squarefree.

If g were totally multiplicative, the form Q would be diagonalized by making thechange of variable

zk =∑m

g(m)m

ym

so thatQ(y) =

∑k

|yk|2.

In practice, g will not satisfy this strong condition, but will retain some multiplica-tive property. In greater generality, this can be defined as follows, to present a clearpicture.

73

Definition 6. Let g be an arithmetic function. We say that g is mutative (for k factors)if and only1 if the arithmetic function

(m1, . . . ,mk) 7→ g(m1 · · ·mk)

defined for all integers mi > 1 coprime in pairs is a finite sum of product functions

(m1, . . . ,mk) 7→ h1(m1) · · ·hk(mk).

If this is so for every k > 1, then g is simply called mutative. If D > 1 is the leastinteger D such that one can write

g(m1 · · ·mk) =∑

16i6D

h1,i(m1) · · ·hk,i(mk)

(for mi coprime in pairs) then g is called D-mutative (for k factors).

In practice, the functions in the decomposition are explicitly known for 2-factors,and are themselves 2-mutative, so g is automatically mutative for any number of factors.

Here are a few (obvious) examples and properties properties of this fancy notion:

1. Every multiplicative function is clearly 1-mutative. Actually, every 1-mutativefunction g is the product of a constant (the value of g at 1) and a multiplicativefunction.

2. Sums and products of mutative functions are again mutative, and the same appliesto Dirichlet convolutions.

3. Additive functions are 2-mutative for 2 factors:

g(m1m2) = g(m1) + g(m2)

and take h1,1 = h2,2 = g, h1,2 = h2,1 = 1. Hence they are mutative for anynumber of factors. In particular, n 7→ (logQ/n)k is mutative for any k > 1 andany Q > 0.

4. As a consequence of the two previous remarks, mutative functions arise naturallyas coefficients of derivatives of Dirichlet series with multiplicative coefficients.

5. The function n 7→ (n− 1)−1 is not mutative for 2 factors: indeed, if it were, thenthe vector space generated by the rational functions (aX − 1)−1 for all positiveintegers a would be of finite dimension.

Now consider a quadratic form Q as above, with g mutative (for 3 factors wouldsuffice). Then we “diagonalize”2 Q by introducing a = (m1,m2), so m1 = an1, m2 =an2 for some integers n1 and n2. Since m1 and m2 are squarefree, we have (a, n1) =

1Only by the greatest force of will does the author manage to refrain giving a French-style definitionin terms of tensor products.

2This method will only produce a true diagonalization in special cases, such as g multiplicative, butfor convenience the name is retained.

74

(a, n2) = 1, hence by the mutativity of g, Q is a finite sum of quadratic forms of thetype ∑

a

h1(a2)a2

∑(n1,n2)=1

h2(n1)h3(n2)n1n2

yan1yan2

and h2 and h3 are still 2-mutative.Then we remove the condition (n1, n2) = 1 by Mobius inversion, exploiting again the

fact that n1 and n2 are squarefree and the mutativity to write Q as a sum of quadraticforms of the type∑

a

h1(a2)a2

∑d

µ(d)h2(d)2

d2

∑n1,n2

h3(n1)h4(n2)n1n2

yadn1yadn2

(while h1 is the same as before, h2, h3 might be different), which are “diagonalized” as∑k

ν(k)wkzk (4.14)

with

ν(k) =1k2

∑ad=k

µ(d)h2(d)2h1(a2),

wk =∑n

h3(n)n

ykn,

zk =∑n

h4(n)n

ykn.

In practice, the function g is D-mutative for small D (often D = 1, or D = 2), andthere are not many terms involved, but it is best to get a feeling for the general process.The ultimate goal will be to estimate the value of Q for a specific choice of (ym), andmost of the terms can be seen very quickly to be less than what the main term isexpected to be, so it is not even necessary to write the full decomposition explicitly.

75

Chapter 5

Proof of the upper bound

A monk asked Joshu: “Does a doghave the nature of Buddha, or not?”

Joshu answered: “µ.”Mumon, “The Gateless Gate”

We recall the statement to be proved: there exists an absolute and effective constantC > 0 such that for any prime number q∑

f∈S2(q)∗

ords= 12L(f, s) 6 C dim J0(q). (5.1)

In this chapter, q is always a fixed prime number.

5.1 The explicit formula: reduction to a density theorem

The explicit formulae, discovered in essence by Riemann, and later extended and for-malized by Weil, have been used first by Mestre in studying abelian varieties. Wechoose the following variant.

Proposition 9. Let ψ : ]0,+∞[−→ R be a C∞ function with compact support, satis-fying ψ(x) = ψ(x−1) for all x, and ψ its Mellin transform, which is an entire function.Then for any primitive form f ∈ S2(q)∗

∑ρ

ψ(ρ− 12) = ψ(1) log q − 2

∑n>1

bf (n)√n

Λ(n)ψ(n)

+1

2iπ

∫(1/2)

2(Γ′

Γ(s+ 1

2)− log 2π)ψ(s− 1

2)ds (5.2)

the summation on the left-hand side being extended over all zeros ρ of L(f, s) in thecritical strip – those with 0 6 Re (s) 6 1 – counted with multiplicity. The coefficientsbf (n) are defined in (2.15).

Proof. We only give a sketch, as this is well-known (see [KM1], [Br1], [P-P]. . . ) in oneform or another.

We define ϕ(x) = x−1/2ψ(x) so that ϕ(s) = ψ(s − 12) for all s ∈ C. The condition

ψ(x) = ψ(x−1) means ϕ(x) = x−1ϕ(x−1), or ϕ(s) = ϕ(1− s).The logarithmic derivative of the functional equation(√q

)sΓ(s+ 1

2)L(f, s) = εf

(√q2π

)1−sΓ(3

2 − s)L(f, 1− s)

76

gives the identity

−L′

L(f, s)− L′

L(f, 1− s) = log q +

Γ′

Γ(s+ 1

2) +Γ′

Γ(3

2 − s)− 2 log 2π.

Multiplying by ϕ(s), which is rapidly decreasing in vertical strips, and integrating, weobtain (using the symmetry ϕ(s) = ϕ(1− s))

12iπ

∫(σ)

−L′

L(f, s)ϕ(s)ds+

12iπ

∫(σ)

−L′

L(f, 1− s)ϕ(s)ds

= ϕ(1) log q +1

2iπ

∫(1/2)

2(Γ′

Γ(s+ 1

2)− log 2π)ϕ(s)ds

for any σ ∈ R (provided L(f, s) doesn’t vanish on Re (s) = σ). We take σ = −14 , then

L(f, s) has no trivial zeros with Re (s) > σ by the functional equation and the Eulerproduct. In the first integral, shifting the contour to Re (s) = 5

4 gives

12iπ

∫(σ)

−L′

L(f, s)ϕ(s)ds =

∑ρ

ϕ(ρ) +1

2iπ

∫(1−σ)

−L′

L(f, s)ϕ(s)ds

and the latter integral is the same as the second one, and from the Dirichlet seriesexpansion (2.14) of the logarithmic derivative of L(f, s), each is equal to

12iπ

∫(1−σ)

−L′

L(f, s)ϕ(s)ds =

∑n>1

bf (n)Λ(n)ϕ(n).

Putting all together, the proposition is proved. 2

In this chapter, ρ will always designate such a “non-trivial” zero of L(f, s), and wealways write

ρ = β + iγ

so γ = Im (ρ), β = Re (ρ). For any α with 0 6 α 6 1, and any real numbers t1 6 t2,we define N(f ;α, t1, t2) to be the number of zeros ρ = β + iγ of L(f, s), counted withmultiplicity, such that

β > α, t1 6 γ 6 t2,

and for any T > 0 we let

N(f ;α, T ) = N(f ;α,−T, T ), N(f, T ) = N(f ; 0, T ).

It is a basic fact of the standard theory of L-functions thatN(f, T ) has an asymptotic

N(f, T ) = T (logqT

2πe) +O(log qT ). (5.3)

We need a test function ψ with certain good properties, which we now describe as we

assert its existence.

77

Proposition 10. There exists a C∞ function ψ : ]0,+∞[−→ R+ which satisfies:(1) It has compact support in [e−1, e] and ψ(1) = 1, ψ(0) > 0.(2) For all x ∈]0,+∞[,

ψ(x) = ψ(x−1).

(3) For all s ∈ C with |Re (s)| 6 1,

Re (ψ(s)) > 0. (5.4)

Proof. The crucial part is of course (5.4); functions of this type were constructed byPoitou and others for the purpose of obtaining lower-bounds for the discriminant ofnumber fields [Poi] (they use the Laplace transform instead of the Mellin transform, soψ is not exactly as stated, but ψ is the same as their Laplace transform). We sketchthe principle of such constructions: define f(y) = ψ(ey), which is an even, compactlysupported, C∞ function on R, and

ψ(s) =∫Rf(y)esydy

for all s ∈ C, hence

Re (ψ(s)) =∫Rf(y)eσy cos(ty)dy

=∫Rf(y) cosh(σy) cos(ty)dy (by symmetry)

where σ = Re (s). From the maximum modulus principle for harmonic functions, theinequality Re (ψ(s)) > 0 will hold for |σ| 6 1 if and only if the Fourier transform of theeven function g(y) = f(y) cosh(y) is non-negative. Conversely, if a function g (even,smooth and compactly supported) with this property is given, a suitable test functionψ is easily obtained by reversing this procedure. Now if g0 is any smooth compactlysupported positive function on R, then the convolution square g = g0 ? g0 will work,since g = g2

0.The support of ψ and its value at 1 can be easily adjusted by homogeneity. 2

Remark In [KM1], a specific test function F is used, which had been constructedpreviously by Perelli and Pomykala [P-P], which is more subtle. However in our situa-tion, it is not actually necessary to use it (this was observed by Pomykala).

Henceforth we fix a test function ψ as given by the proposition, and for any realnumber λ > 0 we define the function ψλ by

ψλ(x) = ψ(xλ−1

)

so that its Mellin transform isψλ(s) = λψ(λs).

The parameter λ will be used to effect a localization in detecting the zeros around12 in the explicit formula.

Lemma 13. For any s ∈ C, and any integer k > 1, it holds

ψ(s)k1

|Im (s)|keRe (s) (5.5)

where the implied constant depends only on k (and on the specific choice of ψ).

78

Proof. This is quite clear by successive integrations by parts, since ψ has compactsupport in [e−1, e]:

ψ(s) =(−1)k

s(s+ 1) . . . (s+ k)

∫ +∞

0ψ(x)xs+k

dx

x

hence the result. 2

Let q be a prime number and f ∈ S2(q)∗ a primitive form of level q. We takeλ = θ log q, where θ is a constant,1 a sufficiently small real number which will be chosenlater. This choice agrees with the heuristic that we should not lose the result we seekby counting zeros with imaginary part at most λ. Applying the explicit formula (5.2)to f with the test function ψλ, we obtain∑

ρ

ψλ(ρ− 12) = log q − 2

∑n>1

bf (n)√n

Λ(n)ψλ(n) +O(1)

after having estimated the integral in (5.2) by

12iπ

∫(1/2)

2(Γ′

Γ(s+ 1

2)− log 2π)ψλ(s− 1

2)ds λ

∫ +∞

−∞(1 + |u|)ψ(λiu)du 1

uniformly in λ (we have usedΓ′

Γ(s) |s|

for Re (s) = 1, and Lemma 13). Let T > 1 be a parameter which will be fixed later on.Using (5.3), we estimate first the sum over zeros with γ = Im (ρ) > T :

∑γ>T

ψλ(ρ− 12)

+∞∑t=[T ]

N(f, 0; t, t+ 1) sups∈[0,1]×[t,t+1]

ψλ(ρ− 12)

=+∞∑t=[T ]

λN(f, 0; t, t+ 1) sups∈[0,1]×[t,t+1]

ψ(λ(ρ− 12))

k

+∞∑t=[T ]

λ(t+ 1)(log q(t+ 1))(λt)−keλ/2

k qθ/2(log qT )(λT )−(k−2)

for any integer k > 1, by Lemma 13. The same holds, of course, for zeros with γ < −T .Then we isolate the multiplicity of the zero at 1

2 , and further distinguish among theremaining zeros ρ between those which are close to 1

2 , precisely those with |β− 12 | 6 λ

−1,and the others. On the other side we use the fact that Λ is supported on powers ofprimes, and put the primes apart from the squares and higher powers. This way werewrite the outcome of the explicit formula:

λψ(0)ords= 12L(f, s) + Ξ1(f, λ) + Ξ2(f, λ) = log q − 2S1(f, λ)− 2S2(f, λ)

+Ok(qθ/2(log qT )(λT )−k) +O(1) (5.6)

1Also, q is large enough that λ > 1.

79

with:

Ξ1(f, λ) = λ∑|γ|6T

|β− 12|6λ−1

ψ(λ(ρ− 12)) (5.7)

Ξ2(f, λ) = λ∑|γ|6T

|β− 12|>λ−1

ψ(λ(ρ− 12)) (5.8)

S1(f, λ) =∑p

λf (p)√p

(log p)ψλ(p) (5.9)

S1(f, λ) =∑n>2

∑p

λf (pn)pn/2

(log p)ψλ(pn). (5.10)

The various terms will be treated differently.

Lemma 14. For all f ∈ S2(q)∗, we have

S2(f, λ) λ.

Proof. Since ψ has compact support in [e−1, e], the sum over n-th powers of primes isvoid as soon as pn/λ > e, namely as soon as n log p > λ. Since ψ is bounded and

|bf (n)| 6 2

for all n, this yields

S2(f, λ)∑

p6exp(λ/2)

log pp

+∑

36n6λ

∑log p6λ/n

log ppn/2

λ.

2

Now, in (5.6), we take the real part. For a zero ρ appearing in Ξ1(f, λ), we have|Reλ(ρ− 1

2)| 6 1, henceΞ1(f, λ) > 0

by the positivity property (5.4) of the test function ψ. Therefore we can drop this termby positivity and get

λψ(0)ords= 12L(f, s) 6 log q − 2S1(f, λ) + Re (Ξ2(f, λ))

+O(λ) +Ok(qθ/2(log qT )(λT )−k).

Again, intuitively, this application of positivity should not affect the chances ofproving the result being sought, since the number of zeros dropped in the sum Ξ1(f, λ),on average over f , should be bounded.

Performing the average over f , we have consequently

λψ(0)∑

f∈S2(q)∗

ords= 12L(f, s) 6 (log q) dimJ0(q)− 2

∑f∈S2(q)∗

S1(f, λ)

+∑

f∈S2(q)∗

Re (Ξ2(f, λ))

+O(λq) +Ok(q1+θ/2(log qT )(λT )−k).

(5.11)

80

Lemma 15. Assume θ < 34 . There exists a constant δ = δ(θ) > 0 such that∑

f∈S2(q)∗

S1(f, λ) q1−δ. (5.12)

Proof. We write ∑f∈S2(q)∗

S1(f, λ) = A[S1(f, λ)]

and proceed to estimate this by the method of Chapter 3. We need to check theconditions (3.17) to apply Proposition 8. The individual bound (3.18) is easy: since

S1(f, λ) =∑p

λf (p)√pψλ(p)

and the support of ψ limits the summation to primes with log p 6 λ, i.e. p 6 qθ, itholds

S1(f, λ)∑p6qθ

p−1/2 qθ/2

while ωf (log q)q−1 by (3.16).We next estimate the harmonic average Ah[|S1(f, λ)|]. Since the sum S1(f, λ) is

real, Cauchy’s inequality implies

Ah[|S1(f, λ)|] 6 Ah[S1(f, λ)2]1/2Ah[1]1/2 Ah[S1(f, λ)2]1/2.

Now we compute

S1(f, λ)2 =∑p,p′

λf (p)λf (p′)√pp′

(log p)(log p′)ψλ(p)ψλ(p′)

=∑p

λf (p)2

p(log p)2ψλ(p)2 +

∑p 6=p′

λf (pp′)√pp′

(log p)(log p′)ψλ(p)ψλ(p′)

so (using λf (p)2 = λf (p2)+1, which is true for all primes p occurring in the summationsince ψλ(p) = 0 for log q > λ, i.e. for p > qθ) we get

Ah[S1(f, λ)2] =∑p

(log p)2

pψλ(p)2(∆(1, 1) + ∆(1, p2))

+∑p 6=p′

(log p)(log p′)√pp′

ψλ(p)ψλ(p′)∆(1, pp′).

Since evidently pp′ 6= 1, we obtain from Lemma 9

Ah[S1(f, λ)2]∑p6qθ

(log p)2

p+

(log q)2

q3/2

∣∣∣∑p6qθ

(log p)∣∣∣2 (log q)2

since 2θ < 32 .

From Proposition 8 of Chapter 3, we conclude that there exists a constant δ = δ(θ)such that

A[S1(f, λ)] =dim J0(q)ζ(2)

Ah[ωf (x)S1(f, λ)] +O(q1−δ)

81

with x = qκ, κ < 14 being a (small) parameter to be chosen below.

From the definition of ωf (x), we derive

Ah[ωf (x)S1(f, λ)] =∑d`26x

λf (d2)d`2

∑p

λf (p)√p

(log p)ψλ(p)

=∑d`26x

1d`2

∑p

(log p)√p

ψλ(p)∆(p, d2)

(log q)2

q3/2

∑d`26x

1`2

∑p6qθ

(log p)

(log q)2qθ+κ−32

and the lemma follows by taking κ small enough that the exponent is negative. 2

Remark In [KM1], the analogue of this Lemma is quoted from [Br1], where itwas proved by means of the Selberg trace formula. The present approach is probablysomewhat simpler, and at least more self-contained.

Thus it only remains to estimate the contribution of Ξ2, the sum over zeros nottoo close to 1

2 . Of course, on the Generalized Riemann Hypothesis, those do not exist,and we see, taking T = q and then k large enough, that the upper bound above (5.11)immediately implies a weak form of Brumer’s result, namely∑

f∈S2(q)∗

ords= 12L(f, s) dim J0(q)

for q prime. Indeed, up to this point, the treatment is basically the same as Brumer’s.But handling Ξ2 without appealing to the Riemann Hypothesis is precisely the crux ofthe matter. It will be possible to show that if there are zeros in the region |β− 1

2 | > λ−1,then they are very few in number, in a very precise sense, which we now describe.

Theorem 14. Let q be a prime number. There exists an absolute constant A > 0 suchthat for any T > 0 and any real numbers t1, t2 with

−T 6 t1 < t2 6 T

t2 − t1 >1

log q,

for any α > 12 + (log q)−1 and any c, 0 < c < 1

4 , it holds∑f∈S2(q)∗

N(f ;α, t1, t2) (1 + T )Aq1−c(α− 12

)(log q)(t2 − t1), (5.13)

the implied constant depending only on c.

The bulk of this chapter will be devoted to proving this result.Remark In this density theorem, only the q-aspect is taken into consideration,

and this statement is indeed trivial with respect to T . However, it is important (asthe deduction of the upper bound from the density theorem shows) that the bounds

82

obtained be at most polynomial in the imaginary part T . Thus, in the rest of thischapter, inequalities of the form

f(q, T ) (1 + |T |)Bg(q)

will often be encountered; the constant B > 0 may appear, or its value may change,from line to line without further comment.

Assuming Theorem 14, we can now estimate Ξ2. We argue for each quadrant sepa-rately. Subdividing the region [λ−1, 1

2 ]× [0, T ] into small squares of side λ−1

R(m,n) =[mλ,m+ 1λ

]×[nλ,n+ 1λ

]with 1 6 m 6 λ, 0 6 n 6 λT , we estimate the contribution Ξ1

2 of those zeros:

∑f∈S2(q)∗

Re (Ξ12(f, λ)) 6 λ

λ∑m=1

λT∑n=0

N(f ; 12 + n

λ ,mλ ,

m+1λ ) sup

s∈R(m,n)|ψ(λs)|

λ

λ∑m=1

λT∑n=0

(1 + n+1λ )Aq1−cn

λ (log q)λ−1 × sups∈R(m,n)

|ψ(λs)|

k q(log q)λ∑

m=1

λT∑n=1

(1 + n+1λ )Aq−cm/λem+1n−k

+ q(log q)λ∑

m=1

q−cm/λem+1

q(log q)

if we choose θ < c, and k large enough, so that the sum over m is a convergent geometricseries of the form∑

m

q−cm/λem =∑m

exp(m(1− cθ−1)) 61

1− exp(1− cθ−1).

We can deal by similar dissections with the three other quadrants in Ξ2, hencefrom (5.11), taking T = q, and dividing out by λ, we deduce the desired inequality (5.1),hence Theorem 8.

5.2 The density theorem

Theorem 14 is the analogue of a result proved by Selberg ([Sel], Theorem 4) for Dirichletcharacters in 1946, itself the q-analogue of one of his previous results on the zeros of ζ(s)near the critical line. We will borrow the general principle from this paper (with somesimplifications also found in [Luo]), starting with a crucial lemma which will reduce thetheorem to some estimates of a mollified second moment of values of L(f, s), f ∈ S2(q)∗.

Lemma 16. (Selberg, [Sel, Lemma 14]). Let h be a function holomorphic in the region

s ∈ C | Re (s) > α, t1 6 Im (s) 6 t2

83

satisfyingh(s) = 1 + o

(exp(− π

t2 − t1Re (s)

)(5.14)

in this region, uniformly as Re (s) → +∞. Denoting the zeros of f (in the interior ofthis region) by ρ = β + iγ, we have

2(t2 − t1)∑ρ

sin(πγ − t1t2 − t1

)sinh

(πβ − αt2 − t1

)=∫ t2

t1

sin(πt− t1t2 − t1

)log |h(α+ it)|dt

+∫ +∞

αsinh

(πσ − αt2 − t1

)(log |h(σ + it1)|+ log |h(σ + it2)|)dσ

(where the zeros are also summed with multiplicity).

We refer to [Sel] for the proof, a rather clever exercise in complex integration.This lemma will be applied to the functions 1−(M(f, s)L(f, s)−1)2, where M(f, s)

is a suitable mollifier for which (5.14) holds, for α equal to 12 + (log q)−1. This means

that M(f, s) must approximate quite closely the inverse of L(f, s).

Lemma 17. The inverse L(f, s)−1 is given by the Dirichlet series

L(f, s)−1 =∑m,n>1

εq(n)µ(m)µ(mn)2λf (m)(mn2)−s

which is absolutely convergent for Re (s) > 1.

Proof. This is an immediate consequence of the Euler product expansion

L(f, s)−1 =∏p

(1− λf (p)p−s + εq(p)p−2s)

by multiplicativity (every integer ` > 1 has a unique expression as ` = mn2r with m,n, r coprime in pairs, m and n squarefree and r cubefull). 2

We also define, for every M > 1, a function gM by

gM (x) =

1, if x 6

√M

logM/x

log√M, if

√M 6 x 6M

0, if x > M .

(5.15)

Then for M fixed and any integer 1 6 m 6M , we let

xm(s) =µ(m)

ms− 12

∑n>1

εq(n)µ(mn)2

n2sgM (mn) (5.16)

and we define the mollifier

M(f, s) =∑m6M

xm(s)√m

λf (m) (5.17)

84

(compare (6.4)). We observe that M(f, s) is a Dirichlet polynomial of length at mostM , with coefficients

cf (`) =∑mn2=`

εq(n)µ(m)µ(mn)2λf (m)gM (mn) (5.18)

and by Deligne’s bound, they are bounded by

|cf (`)| 6∑m|`

τ(m) 6 τ(`)2. (5.19)

As in the next chapter, the length M will be a power of q (here any positive powerwould suffice; this would also be the case in Chapter 6, if we only wanted to prove thelower bound on the rank with some constant in place of the more precise 19

54).

Lemma 18. Let M = q∆ with ∆ > 0. We have

M(f, s)L(f, s) = 1 +O((log q)15q∆(1−σ)/2)

uniformly for Re (s) = σ → +∞.

Proof. By the definition of gM , and the Dirichlet series for L(f, s)−1, the Dirichletpolynomial M(f, s) has the same coefficients of n−s as L(f, s)−1 for all n 6

√M , hence

the product M(f, s)L(f, s) is, for Re (s) = σ > 1, of the form

M(f, s)L(f, s) = 1 +∑c>√M

df (n)n−s

with coefficients df (n) bounded by

|df (n)| =∣∣∣∑ab=n

cf (a)λf (b)∣∣∣ 6 τ(n)4

from Deligne’s bound and (5.18). The result follows immediately. 2

The density theorem requires a good estimate for the average of the second momentof M(f, s)L(f, s), Re (s) > 1

2 + (log q)−1.

Proposition 11. Let M = q∆ with ∆ < 14 , and let c be any positive real number with

c < ∆. Then there exists a constant B > 0 such that for all q prime large enough∑f∈S2(q)∗

|M(f, β + it)L(f, β + it)− 1|2 (1 + |t|)Bq1−c(β− 12

). (5.20)

uniformly for β > 12 + (log q)−1 and t ∈ R, the implied constant depending only on ∆

and c.

This is the decisive ingredient, which will be the subject matter of the next twosection. Assuming this, the proof of Theorem 14 can be completed, again followingSelberg’s argument.

85

Thus let α, t1, t2 be as in the statement of the theorem. It is obviously enough toconsider the case t2− t1 = (log q)−1. As in the proposition, we let M = q∆ with ∆ < 1

4 ,and suppose given c < ∆. Write

t′1 = t1 −γ

log q, t′2 = t2 +

γ

log q, α′ = α− 2

log q,

where γ is a positive real number such that

c >π

2γ + 1. (5.21)

Let f ∈ S2(q)∗, and ρ = β + iγ one of the zeros of L(f, s) we wish to count, so

β > α, t1 6 γ 6 t2

and therefore

β − α′ > 12 log q

;

and (from the inequality sinh(πx) > πx)

(log q)(t′2 − t′1) sinh(πβ − α′

t′2 − t′1

)> 1.

Moreover, since t1 6 γ 6 t2, we check (from the definition of t′1, t′2) that γ is nottoo close from t′1 and t′2, so that

sin(πγ − t′1t′2 − t′1

)>

and we obtain the “zero-detecting” inequality: for any zero ρ in the region concerned

1 6 2(log q)(t′2 − t′1) sin(πγ − t′1t′2 − t′1

)sinh

(πβ − α′

t′2 − t′1

)so by summing over the zeros ρ we derive2

N(f ;α, t1, t2) 6 2(log q)(t′2 − t′1)∑ρ

sin(πγ − t′1t′2 − t′1

)sinh

(πβ − α′

t′2 − t′1

).

We can now further extend by positivity the sum to include all the zeros in thelarger range

σ > α′, t′1 6 γ 6 t′2

so N(f ;α, t1, t2) is bounded by log q times the exact expression occurring on the left-hand side of Lemma 16. Since zeros of L(f, s) are zeros (with the same multiplicity orhigher) of

h(f, s) = 1− (M(f, s)L(f, s)− 1)2,

2We extend the summation to include multiplicity in case there are multiple zeros.

86

to which Selberg’s Lemma is applicable, we obtain

N(f, α; t1, t2) 6 (log q)∫ t′2

t′1

sin(πt− t′1t′2 − t′1

)log |h(f, α′ + it)|dt

+ (log q)∫ +∞

α′sinh

(πσ − α′

t′2 − t′1

)(log |h(f, σ + it′1)|+ log |h(f, σ + it′2)|)dσ

and now from the inequalitylog(1− |x|) 6 |x|

valid for all x, this is bounded in turn by

N(f, α; t1, t2) 6 log q∫ t′2

t′1

sin(πt− t′1t′2 − t′1

)|M(f, α′ + it)L(f, α′ + it)− 1|2dt

+ log q∫ +∞

α′sinh

(πσ − α′

t′2 − t′1

)(|M(f, σ + it′1)L(f, σ + it′1)− 1|2

+ |M(f, σ + it′2)L(f, σ + it′2)− 1|2)dσ.

We now average over f and exchange the order of summation, obtaining inner sumsover f to which the estimate for the second moment applies. By (5.20), the first termis estimated by

(log q)q1−c(α′− 12

)(t′2 − t′1)

and the second by

(log q)q1−c(α′− 12

)

∫ +∞

0qσ(π/(2γ+1)−c)dσ (log q)q1−c(α′− 1

2)

by the assumption (5.21) on γ.This concludes the proof of the density theorem, and thereby of the upper bound

for the rank of J0(q), Theorem 5, on the Birch and Swinnerton-Dyer conjecture.

5.3 The harmonic second moment

Proposition 11 will be proved by the method of Chapter 3, going through a correspond-ing weighted result first.

Proposition 12. Let M = q∆ with ∆ < 14 , and β = 1

2 + b(log q)−1, where b > 0 is anyconstant. For all q prime large enough we have∑h

f∈S2(q)∗

|M(f, β + it)L(f, β + it)|2 (1 + |t|)B (5.22)

for some absolute constant B > 0. The implied constant depends only on b and ∆.

The proof is along lines similar to that followed in Chapter 6 when dealing with thesecond moment in Section 6.1.3.

87

We write β = 12 +δ and assume only δ > b(log q)−1; the actual case in the proposition

is δ = b(log q)−1, but we need not assume this so soon. Then we define for simplicity

M2(δ) =∑h

f∈S2(q)∗

|M(f, β + it)L(f, β + it)|2 (5.23)

which we consider as a quadratic form in the coefficients xm = xm(β + it) of themollifier. To emphasize this viewpoint, it will be convenient to simply write xm andM(f) while performing transformations to facilitate the ultimate estimations. This isagain in accordance with the principles followed in Chapter 6 for the special value at 1

2of the derivatives of the L-functions.

5.3.1 The square of the L-function

This section is very similar to the computation of L′(f, 12)2 in Chapter 6. Let f ∈ S2(q)∗

and β = 12 + δ with 0 < δ < 1

2 be given.Choose an integer N > 1 (which will have to be large enough, N = 2 works already)

and a real polynomial G satisfying

G(−s) = G(s), and G(0) = 1 (5.24)G(−N) = . . . = G(−1) = 0 (5.25)

and having no zeros for −12 6 Re (s) 6 1

2 .Let t ∈ R be a fixed real number. Define the entire function Z(f, s) by

Z(f, s) = Λ(f, s+ 12 + it)Λ(f, s+ 1

2 − it)

which satisfies the functional equation

Z(f, s) = Z(f,−s).

Since the Fourier coefficients λf (n) of f are real, we have

|Λ(f, β + it)|2 = Z(f, β). (5.26)

We now consider the complex integral

Iδ =1

2iπ

∫(2)

Z(f, s)G(s+ it)G(s− it) ds

s− δ

=1

2iπ

∫(2)

L(f, s+ 12 + it)L(f, s+ 1

2 − it)H(s)(√q

)2s+1 ds

s− δ

(defined, as a function of δ, for all δ ∈ R) with

H(s) = G(s+ it)G(s− it)Γ(s+ 1 + it)Γ(s+ 1− it).

From (5.25), zeros of the polynomial G cancel the first poles of the Γ function, so His holomorphic for Re (s) > −N − 1. Moreover, the Gamma function has exponential

88

decay in vertical strips, while G has polynomial growth, and more precisely, by Stirling’sformula, H(s) satisfies

H(s) (1 + |t|+ |Im (s)|)Be−π(|t|+|Im (s)|)

for some constant B > 0. Moreover, uniformly for 0 6 δ 6 12 , we have also

H(δ) e−π|t| (5.27)

which will be useful when dividing by this quantity later.Since L(f, s) itself has at most polynomial growth, the integral Iδ is absolutely

convergent. To compute it, we can shift the contour of integration to the line Re (s) =−2; only a simple pole at s = δ appears while shifting, with residue

Ress=δ Z(f, s)G(s+ it)G(s− it) 1s− δ

=( q

4π2

)βH(δ)|L(f, β + it)|2

by (5.26).On the line Re (s) = −2, the integral is seen to be

12iπ

∫(−2)

Z(f, s)G(s+ it)G(s− it) ds

s− δ= −I−δ

by the change of variable s 7→ −s, using the functional equation of Z(f, s) and thesymmetry G(s) = G(−s). Hence we have the formula( q

4π2

)βH(δ)|L(f, β + it)|2 = Iδ + I−δ. (5.28)

On the other hand, in the region of absolute convergence we can expand the productL(f, s+ it)L(f, s− it) into a Dirichlet series and integrate term by term, which gives

Iδ =( q

4π2

)β ∑l1,l2>1

λf (l1)λf (l2)(l1l2)−β( l1l2

)itWδ

(4π2l1l2q

)where

Wδ(y) =1

2iπ

∫(2)

H(s+ δ)y−sds

s. (5.29)

If we defineV (y) = y−δWδ(y) + yδW−δ(y) (5.30)

then (5.28) gives( q

4π2

)δH(δ)|L(f, β + it)|2 =

∑l1,l2>1

λf (l1)λf (l2)(l1l2)−β( l1l2

)itV(4π2l1l2

q

)which is further transformed, using the Hecke relations (2.10) and collecting the variablen = l1l2, to give finally( q

4π2

)δH(δ)|L(f, β + it)|2 =

∑n>1

λf (n)√nηt(n)U

(4π2n

q

)(5.31)

89

where we have defined

U(y) =∑d>1

εq(d)d

V (yd2) (5.32)

and

ηt(n) =∑ab=n

(ab

)it(5.33)

(this notation is slightly different from that used in Chapter 6: what is here ηt is thereηit; but there should be no confusion).

We conclude this section by listing the basic properties of the test function U andthe arithmetic function ηt. These should be skipped and consulted when referred tolater.

Lemma 19. For δ 6= 0, we have

U(y) = H(δ)ζq(1 + 2δ)y−δ +H(−δ)ζq(1− 2δ)yδ +O(yN (1 + |t|)Be−π|t|) (5.34)

for 0 6 y 6 1 andU(y)j y

−j(1 + |t|)Be−π|t|, for all j > 1 (5.35)

for y > 1.

Proof. For the first part, we write

U(y) =1

2iπ

∫(2)

H(s+ δ)ζq(1 + 2s+ 2δ)y−s−δds

s

+1

2iπ

∫(2)

H(s− δ)ζq(1 + 2s− 2δ)y−s+δds

s

and we move the contour of integration to the line Re (s) = −N − δ. In the region thuscovered, both H(s + δ) and H(s− δ) are holomorphic. We only encounter two simplepoles at s = 0 and s = −δ (from ζ) in the first integral, and two simple poles at s = δand s = 0 in the second. The sum of the residues at s = 0 is

H(δ)ζq(1 + 2δ)y−δ +H(−δ)ζq(1− 2δ)yδ

while the residue at s = −δ for the first is

−δ−1(1− q−1)H(0)

which exactly cancels out with the residue at s = δ of the second integral. Now theestimate of the integral on Re (s) = −N − δ gives (5.34).

The second part is easily obtained by shifting the contour as far to the right asnecessary. 2

90

Lemma 20. For all t ∈ R, the arithmetic function ηt is real valued. It satisfies theidentities

ηt(n)ηt(m) =∑

d|(n,m)

ηt

(nmd2

)(5.36)

ηt(nm) =∑

d|(n,m)

µ(d)ηt(nd

)ηt

(md

)(5.37)

∑n>1

ηt(n)n−s = ζ(s− it)ζ(s+ it) (5.38)

∑n>1

ηt(n)2n−s =ζ(s− 2it)ζ(s)2ζ(s+ 2it)

ζ(2s)(5.39)

∑n>1

ηt(n2)n−s =ζ(s− 2it)ζ(s)ζ(s+ 2it)

ζ(2s)(5.40)

and the estimate|ηt(n)| 6 τ(n). (5.41)

Proof. Everything can be checked elementarily by direct computations, but it may aswell be deduced from the fact that ηt(n) is a Hecke eigenvalue for the operator T (n)acting on the derivative at s = 1

2 of the non-holomorphic Eisenstein series E(z, s) oflevel 1, see [Iw3, page 68] for example. 2

5.3.2 Computation of the harmonic second moment

By multiplicativity of the coefficients λf (n), once more, we have

|M(f)|2 =∑b

1b

∑m1,m2

λf (m1m2)√m1m2

xbm1xbm2

so that by (5.31), the second moment M2(δ) of (5.23) satisfies( q

4π2

)δH(δ)M2(δ) =

∑b

1b

∑n>1

∑m1,m2

ηt(n)√m1m2n

xbm1xbm2U(4π2n

q

)∆(m1m2, n)

where ∆ is the Delta-symbol of Chapter 4. Recall from Lemma 9 that

∆(m,n) = δ(m,n) +O((mn)1/2(log q)2q−3/2)

(for m, n 6 q) where the implied constant is absolute: this is the simplest estimatebased on a “trivial” treatment of the remainder term in the Petersson formula, andit will suffice here, in sharp contrast with Chapter 6, where this remainder term willrequire a non-trivial treatment. The reason is that we are performing the average overall forms f ∈ S2(q)∗, whereas in the other case only odd forms are involved, the spectralcompleteness of which, in a sense, is not quite as good.

Using (5.16) to estimate that

xm ζ(1 + 2δ)m−δ

91

the contribution to M2(δ) of the remainder term of ∆(m,n) is at most

(log q)2

q3/2

∑b6M

1b

∣∣∣∑m

τ(m)xbm∣∣∣2∣∣∣∑

n>1

U(4π2n

q

)∣∣∣ (1 + |t|)Be−π|t|ζ(1 + 2δ)2 (log q)2

√q

∑b6M

b−1−2δ∣∣∣ ∑bm6M

τ(m)m−δ∣∣∣2

ζ(1 + 2δ)2(log q)4q−1/2M2(1−δ)(1 + |t|)Be−π|t| (5.42)

where we have used Lemma 19 to get

∑n>1

U(4π2n

q

)=∑n6q

U(4π2n

q

)+∑n>q

U(4π2n

q

) H(δ)qδ

∑n<q

n−δ +H(−δ)q−δ∑n<q

nδ + (1 + |t|)Be−π|t|q2∑n>q

n−2

q(1 + |t|)Be−π|t|.

We now study the “diagonal contribution” where n = m1m2, namely the sum M ′(δ)defined by the equality( q

4π2

)δH(δ)M ′(δ) =

∑b

1b

∑m1,m2

ηt(m1m2)m1m2

xbm1xbm2U(4π2m1m2

q

).

Inserting (5.34), we have( q

4π2

)δH(δ)M ′(δ) =

( q

4π2

)δH(δ)M ′′(δ)

+O((1 + |t|)Be−π|t|ζ(1 + 2δ)2(log q)2q−1/2M2(1−β)) (5.43)

where the sum M ′′(δ) is given by

( q

4π2

)δH(δ)M ′′(δ) =

( q

4π2

)δH(δ)ζq(1 + 2δ)

∑b

1b

∑m1,m2

ηt(m1m2)(m1m2)1+δ

xbm1xbm2

+( q

4π2

)−δH(−δ)ζq(1− 2δ)

∑b

1b

∑m1,m2

ηt(m1m2)(m1m2)1−δ xbm1xbm2 (5.44)

and the error term has been estimated by

(1 + |t|)Be−π|t| 1√q

∑b6M

1b

∣∣∣ ∑bm6M

τ(m)√mxbm

∣∣∣2 (1 + |t|)Be−π|t|ζ(1 + 2δ)2(log q)2q−1/2M2(1−β).

5.3.3 Diagonalization

The sum M ′′(δ) is now ready for diagonalization, again a process similar to the onewhich will be used in Chapter 6, but here much simpler.

92

First, m1 and m2 can be separated in (5.44) by means of the Mobius inversionformula (5.37), so ( q

4π2

)δH(δ)M ′′(δ) =( q

4π2

)δH(δ)ζq(1 + 2δ)

∑b

1b

∑a

µ(a)a2(1+δ)

∑m1,m2

ηt(m1)ηt(m2)(m1m2)1+δ

xabm1xabm2

+( q

4π2

)−δH(−δ)ζq(1− 2δ)

∑b

1b

∑a

µ(a)a2(1−δ)

∑m1,m2

ηt(m1)ηt(m2)(m1m2)1−δ xabm1xabm2

and we can collect the single variable k = ab, introducing the arithmetic function

νδ(k) =∑ab=k

µ(a)a1+2δ

to derive( q

4π2

)δH(δ)M ′′(δ) =

( q

4π2

)δH(δ)ζq(1 + 2δ)

∑k

νδ(k)k

∣∣∣∑m

ηt(m)m1+δ

xkm

∣∣∣2+( q

4π2

)−δH(−δ)ζq(1− 2δ)

∑k

ν−δ(k)k

∣∣∣∑m

ηt(m)m1−δ xkm

∣∣∣2. (5.45)

5.3.4 Estimation of the harmonic second moment

Following Selberg, we notice that for 0 < δ < 12 the inequalities

ζq(1− 2δ) 6 0

H(−δ) = |Γ(1− δ + it)G(−δ + it)|2 > 0

ν−δ(k) =∏p|k

(1− p−1+2δ) > 0

hold. Hence, by positivity

M ′′(δ) 6 ζq(1 + 2δ)∑k

νδ(k)k

∣∣∣∑m

ηt(m)m1+δ

xkm

∣∣∣2 (5.46)

after dividing out by H(δ).Let

yk =∑m

ηt(m)m1+δ

xkm (5.47)

(which is supported on squarefree integers k 6M).

Proposition 13. Assume δ = b(log q)−1 for some (absolute) constant b > 0 and M =q∆ with ∆ < 1

4 . Then for k squarefree, k 6M , we have

kδ+itξ(k)yk 1

log q

whereξ(k) =

∏p|k

(1− p−1/2).

93

Remark This saving of a factor log q is the critical moment. It will come essentiallyfrom cancellation due to the oscillations of the Mobius function, or in other words, fromthe Prime Number Theorem.

Proof. We proceed as in [Luo]. From the definition (5.16), for s = β + it = 12 + δ + it,

we have

xkm =µ(k)kδ+it

× µ(m)mδ+it

∑n

µ(kmn)2

n1+2δ+2itgM (kmn)

(there is no εq(n) since n 6 kmn 6M < q). Therefore

yk =µ(k)kδ+it

∑m,n

µ(kmn)2µ(m)ηt(m)n−it

(mn)1+2δ+itgM (kmn).

Assume first that 1 6 k 6√M (and of course k is squarefree).

Claim. For all integers ` > 1, we have

gM (k`) =1

2iπ

∫(2)

(√M/k)s(M s/2 − 1)

log√M

`−sds

s2. (5.48)

This follows by a direct computation from the well-known formula

12iπ

∫(2)

ysds

s2=

log y, if y > 1,0, if 0 < y 6 1.

From this, by complex integration, it holds

kδ+ityk =1

2iπ

∫(2)

Lk(s+ 1 + 2δ + it)(√M/k)s(M s/2 − 1)

log√M

ds

s2(5.49)

with the ad-hoc Dirichlet series

Lk(s) =∑`>1

µ(k`)2( ∑mn=`

µ(m)ηt(m)n−it)`−s

which is easily computed. Indeed, the inner sum is the coefficient of `−s in the product

ζ(s+ it)∑m>1

µ(m)ηt(m)m−s =∏p

(1− p−s−it)−1(1− p−s(pit + p−it))

(by multiplicativity and the definition of ηt)

=∏p

(1− p−s+it(1− p−s−it)−1

)=∏p

(1− p−s+it∑j>0

p−j(s+it))

and Lk(s) is obtained from this Dirichlet series by taking the subseries restricted tointegers prime to k and squarefree (this is the effect of inserting µ(k`)2 in a Dirichletseries). This gives the very simple answer

Lk(s) = ζk(s− it)−1.

94

From the theorems of Hadamard and de la Vallee-Poussin, ζ(s) has no zeros on theline Re (s) = 1 and more precisely the estimate

ζ(s)−1 log(2 + |Im (s)|) (5.50)

holds with an absolute implied constant (see [Tit, ch. 3]) uniformly for

Re (s) > 1− D

log(2 + |Im (s)|)

(D > 0 being another absolute constant).Let r be small enough so that the circle |s| 6 r is included in this zero-free region,

and 0 < r < 12 (of course, any r < 1

2 will do, the Riemann Hypothesis being numericallyvalid in such a range!). In (5.49), we shift the integration to the contour C consistingof the vertical line Re (s) = 0 from −i∞ to −ir, followed by the half-circle s = re(x)for −π

2 6 x 6 π2 , and then again the line Re (s) = 0 from ir to i∞. By the choice of

r, this is permissible; the contour shift passes through a unique simple pole at s = 0(simple because of the zero of s 7→M s/2− 1), and from the residue and the formula forLk(s) we get

kδ+ityk = ζk(1 + 2δ)−1 +1

2iπ

∫C

ζ(s+ 1 + 2δ + it)−1∏p|k (1− p−(s+1+2δ))

(√M/k)s(M s/2 − 1)

log√M

ds

s2.

The integral over C is now estimated. Using (5.50), the part from ir to i∞ isdominated by

1logM

∣∣∣∫ +∞

r

ζ(1 + 2δ + iu)−1∏p|k (1− p−(1+2δ+iu))

(√M/k)iu(M iu/2 − 1)

du

u2

∣∣∣ 1log q

1ξ(k)

since clearly ∏p|k

(1− p−1−2δ)−1 6 ξ(k)−1.

The same holds without change for the other vertical half-line. For the semi-circle,we use the fact that k 6

√M so that

(√M/k)s(M s/2 − 1) 1

on this semi-circle where Re (s) < 0, and similarly the product over primes dividing kis dominated by its value at s = −r which is∏

p|k

(1− pr−1−δ)−1 6 ξ(k)−1

since r < 12 . Hence the same bound holds again.

In the case√M 6 k 6M , we use a similar reasoning, replacing (5.48) by the other

formula

gM (k`) =1

2iπ

∫(2)

(M/k)s

log√M`−s

ds

s2

95

and using the same contour shift. The corresponding integral over C is estimatedexactly as before, but a double pole is now present at s = 0, with residue

1log√M

Ress=0ζk(s+ 1 + 2δ)−1(M/k)s

s2

which is equal to

1log√M

(ζk(1 + 2δ)−1(log

M

k) + ζ ′k(1 + 2δ)

).

An easy computation gives

ζ ′k(1 + 2δ) =1

νδ(k)ζ ′(1 + 2δ)ζ(1 + 2δ)2

+1

ζ(1 + 2δ)νδ(k)

∑p|k

log pp1+2δ − 1

.

Now we collect the results and we use the assumption that δ = b(log q)−1, whichimplies

ζ(1 + 2δ)−1 (log q)−1.

Hence for k 6√M we obtain immediately the desired bound

ξ(k)k1+δyk 1

log q.

This is also true for√M 6 k 6 M : in the residue, the saving of log q comes from

ζ(1 + 2δ)−1 for the first term, and from (log√M)−1 in the second, while the last is

actually smaller, since ∑p|k

log pp1+2δ − 1

log log k

and both ζ(1 + 2δ)−1 and (log√M)−1 are present (we use again νδ(k)−1 6 ξ(k)−1).

This finishes the proof. 2

Corollary 3. Proposition 12 is true: if M = q∆ with ∆ < 14 and δ = b(log q)−1, then

M2(δ) (1 + |t|)B

for some absolute constant B > 0, the implied constant depending only on b and ∆.

Proof. From the previous proposition and (5.46) we have

M ′′(δ) 6 ζq(1 + 2δ)∑k6M

νδ(k)k|yk|2

ζq(1 + 2δ)(log q)2

∑k6M

µ(k)2νδ(k)ξ(k)2

k−(1+2δ)

1log q

∑k6M

µ(k)2νδ(k)ξ(k)2

k−(1+2δ)

sinceζq(1 + 2δ) log q

96

for δ = b(log q)−1. Now we have

νδ(k) =∏p|k

(1− p−(1+2δ)) 6 1

for all k, so

M ′′(δ) 1log q

∑k6M

µ(k)2

ξ(k)2k−(1+2δ).

The Dirichlet series ∑k>1

µ(k)2

ξ(k)2k−s =

∏p

(1 +

p−s

(1− p−1/2)2

)is absolutely convergent for Re (s) > 1. Clearly it has analytic continuation to theregion Re (s) > 1

2 with a simple pole at s = 1, and therefore∑k6M

µ(k)2

ξ(k)2k−(1+2δ) log q.

To conclude, we look back to the error terms (5.42) and (5.43) introduced in goingfrom the original second moment M2(δ) to M ′(δ) and then to M ′′(δ), and we see thatthey bring a total contribution which is

q−γ(1 + |t|)B

for some γ = γ(∆) if M = q∆ with ∆ < 14 . The polynomial bound in t is correct: when

dividing by H(δ) (as must be done), we have the bound H(δ)−1 eπ|t| previouslyobserved, and the factor e−π|t| in the error terms cancels it. 2

5.4 Removing the harmonic weight: the head, I

Having estimated Ah[|M(f, β + it)L(f, β + it)|2], we now apply Proposition 8 to studyA[|M(f, β + it)L(f, β + it)|2]. The notations and assumptions are the same as at thebeginning of the previous Section: recall that β = 1

2 + δ, and M = q∆ with ∆ < 14 .

First we check the conditions; (3.17) is contained in Proposition 12, while for (3.18),we have

Lemma 21. For all f ∈ S2(q)∗, it holds

ωf |M(f, β + it)L(f, β + it)|2 q−14 (1 + |t|)2

for all β with β > 12 , the implied constant being absolute.

Proof. Using (5.19), the trivial bound for M(f, β + it) is

M(f, β + it)√M(log q)3

while the convexity bound for L(f, s) on the critical line gives

L(f, β + it)ε q14 +ε(1 + |t|)

12 +ε

for β > 12 . Since on the other hand we have ωf (log q)q−1 from (3.16), the result

follows. 2

97

Hence Proposition 8 with x = qκ, for any κ > 0, gives the equality

A[|M(f, β + it)L(f, β + it)|2] =dim J0(q)ζ(2)

Ah[ωf (x)|M(f, β + it)L(f, β + it)|2]

+O((1 + |t|)Bq1−γ)(5.51)

for some γ = γ(∆, κ) > 0 (the dependence in t of the error term has to be checked bylooking back at the proof of the proposition).

We let

M2(δ) = Ah[ωf (x)|M(f, β + it)L(f, β + it)|2]

=∑d`26x

1d`2

∑h

f∈S2(q)∗

λf (d2)|M(f, β + it)L(f, β + it)|2.

Computing as in Section 5.3.2, we get( q

4π2

)δH(δ)M2(δ) =

∑b

1b

∑n>1

∑m1,m2

ηt(n)√m1m2n

xbm1xbm2U(4π2n

q

)∆n(m1m2, n).

From Lemma 12 we have

∆n(m,n) =∑d`26x

1d`2

∑r|(d2,m)

δ(md2

r2, n)

+O(

(log q)3x√mn

q3/2

)and the error term yields a contribution which, by the same computation as in (5.42),is at most

ζ(1 + 2δ)2(log q)7xM2(1−δ)√q

(1 + |t|)Be−π|t| (1 + |t|)Be−π|t|q−γ (5.52)

for some γ > 0, if κ is taken small enough.The diagonal contribution n = md2r−2 is∑

b

1b

∑m1,m2

xbm1xbm2

m1m2

∑d`26x

(d`)−2∑

r|(m1m2,d2)

rηt

(m1m2d2

r2

)U(4π2m1m2d

2

qr2

)and we use (5.34) to get( q

4π2

)δH(δ)M2(δ) =

( q

4π2

)δH(δ)ζq(1 + 2δ)M(δ) (5.53)

+( q

4π2

)−δH(−δ)ζq(1− 2δ)M(−δ) +O((1 + |t|)Beπ|t|q−γ)

for some γ > 0, where the sum M(δ) is

M(δ) =∑b

1b

∑m1,m2

xbm1xbm2

(m1m2)1+δ

∑d`26x

1d2+2δ`2

∑r|(m1m2,d2)

r1+2δηt

(m1m2d2

r2

). (5.54)

We will first compute the inner sum, showing in particular that we can now againextend the summation over d, ` to all integers, and then we compute this completeseries.

98

We define a function u(s, r) for s ∈ C and r > 1 an integer by

u(s, r) =∑ab=r

µ(a)bs =∏p|r

(ps − 1)

and a function vx(s, r), supported on cubefree integers r, by

vx(s, r) =∑d`26xr|d2

`−2d−2sηt

(d2

r

)(5.55)

we also denote by v(s, r) the function obtained by removing the constraint d`2 6 x inthe definition of v(s, r).

From the formula (5.37), we have for every integers m and n∑r|(m,n)

rsηt

(mnr2

)=

∑r|(m,n)

u(s, r)ηt(mr

)ηt

(nr

)hence∑d`26x

1d2+2δ`2

∑r|(m1m2,d2)

r1+2δηt

(m1m2d2

r2

)=

∑r|m1m2

ηt

(m1m2

r

)u(1 + 2δ, r)vx(1 + δ, r).

(5.56)We recall that the multiplicative functions N and M are defined by

N(r) =∏p|r

p, M(r) =∏p||r

p.

Lemma 22. For all cubefree integers r > 1, and s with Re (s) = σ > 12 , we have

vx(s, r) = v(s, r) +O( (log x)3τ(r)

N(r)2σ− 12√x

). (5.57)

Moreover

v(s, 1) =ζ(2)ζ(2s)ζ(2s+ 2it)ζ(2s− 2it)

ζ(4s)

and for all r > 1

v(s, r) = v(s, 1)N(r)−2s∏p||r

ηt(p)1 + p−2s

(in the language of Section 4.3, we see that for s fixed the arithmetic function v(s, r) is1-mutative).

Proof. The point is that for a cubefree integer r and any d > 1, we have r | d2 if andonly if N(r) | d. Since

r = M(r)N(r)2

M(r)2=N(r)2

M(r)

99

we can write

vx(s, r) =∑d`26xN(r)|d

`−2d−2sηt

(d2

r

)

= N(r)−2s∑

d`26x/N(r)

`−2d−2sηt(M(r)d2).

and similarly without constraint for v(s, r). Now, putting y = x/N(r)∑d`2>x/N(r)

`−2d−2sηt(M(r)d2) τ(M(r))(∑`2<y

`−2∑

d>y/`2

τ(d2)d−2σ +∑`2>y

`−2)

τ(r)(log x)3y−1/2

and this gives the first formula.To compute v(s, r) (which is a kind of “non-primitive” symmetric square for ηt), we

definev′(s, r) =

∑d>1

ηt(M(r)d2)d−2s

so that v(s, r) = ζ(2)N(r)−2sv′(s, r). We denote by Z(s) the full symmetric squaregiven by (5.40), and by Zp its p-factor.

Every integer d has a unique expression d = d1d2 with d1 |M(r)∞ and (d2,M(r)) =1 so by multiplicativity we get

v′(s, r) =( ∑

(d,M(r))=1

ηt(d2)d−2s)( ∑

d|M(r)∞

ηt(M(r)d2)d−2s)

= Z(2s)∏p||r

Zp(2s)−1 ×∏p||r

∑k>0

ηt(p2k+1)p−2ks.

Again by multiplicativity,

ηt(p2k+1) = ηt(p)ηt(p2k)− ηt(p2k−1)

for k > 1, so that

(1 + p−2s)∑k>0

ηt(p2k+1)p−2ks = ηt(p)Zp(2s)

which yields

v′(s, r) = Z(2s)∏p||r

ηt(p)1 + p−2s

.

This gives the lemma, since

v(s, 1) = Z(2s) =ζ(2)ζ(2s)ζ(2s+ 2it)ζ(2s− 2it)

ζ(4s)

from (5.40). 2

100

Let now wx(s,m) be the function defined for s ∈ C and m > 1 by

wx(s, r) =∑r|m

ηt

(mr

)u(2s− 1, r)vx(s, r), (5.58)

and let w(s,m) be the same with v(s, r) replacing vx(s, r). Then from (5.56) and (5.54)comes the formula

M(δ) =∑b

1b

∑m1,m2

wx(1 + δ,m)(m1m2)1+δ

xbm1xbm2 . (5.59)

Lemma 23. Assume that δ = b(log q)−1 for any constant b > 0. Then

M(δ) =∑b

1b

∑m1,m2

w(1 + δ,m1m2)(m1m2)1+δ

xbm1xbm2 +O(q−γ)

for some γ = γ(κ,∆) > 0.

Proof. Since m1 and m2 are squarefree, the product m1m2, and any divisor thereof, isalways cubefree. So we use (5.57) to replace vx(1 + δ, r) by v(1 + δ, r). This gives first

w(1 + δ,m) = wx(1 + δ,m) +O(τ(m)3(log x)3

√x

)because the error term is bounded by

∑r|m

τ(mr

)|u(1 + 2δ, r)| (log x)3τ(r)

N(r)32 +2δ√x

τ(m)3(log x)3

√x

by the estimate

|u(1 + 2δ, r)| =∣∣∣∏p|r

(p1+2δ − 1)∣∣∣ 6 N(r)1+2δ.

Then inserting this inside M(δ) gives the result. 2

After all those transformations, we are with M(δ) (more precisely, its main term)in the situation highlighted in the Appendix to Section 4.3 (except that we have a sumover b of quadratic forms as described there). Indeed, we have seen in Lemma 22 thatv(1 + δ, r) is 1-mutative (ie the product of a constant and a multiplicative function),and by Dirichlet convolution it follows that also has w(1 + δ,m) this property:

w(1 + δ,m) = v(1 + δ, 1)w(m)

with w multiplicative.Of course, this is a very trivial case, and the diagonalization of M(δ) can be per-

formed simply by doing the transformations, but the corresponding moment in Sec-tion 6.2 has more sophisticated coefficients. We recall quickly the yoga of extracting

101

the common divisor of m1 and m2 and removing the coprimality condition by Mobiusinversion:∑

b

1b

∑m1,m2

w(1 + δ,m1m2)(m1m2)1+δ

xbm1xbm2 = v(1 + δ, 1)∑b

1b

∑m1,m2

w(m1m2)(m1m2)1+δ

xbm1xbm2

= v(1 + δ, 1)∑b

1b

∑a

w(a2)a2(1+δ)

∑(m1,m2)=1

w(m1)w(m2)(m1m2)1+δ

xabm1xabm2

= v(1 + δ, 1)∑b

1b

∑a

w(a2)a2(1+δ)

∑d

µ(d)w(d)2

d2(1+δ)

∑m1,m2

w(m1)w(m2)(m1m2)1+δ

xadbm1xadbm2

= v(1 + δ, 1)∑k

νδ(k)k

∣∣∣∑m

w(m)m1+δ

xkm

∣∣∣2 (5.60)

with

νδ(k) =∑abd=k

µ(d)w(d)2w(a2)(ad)1+2δ

.

Remember that ν(k) also depends on t (through ηt involved in w).

Lemma 24. There exists an absolute constant b > 0 such that if |δ| 6 b(log q)−1 then

νδ(k) > 0

for all t ∈ R and all k < q, and

v(1 + δ, 1) 1

for all t ∈ R.

Proof. By multiplicativity it is enough to consider k = p prime, p < q. Then

e−2b 6 p2δ 6 e2b.

Butνδ(p) = 1 +

1p1+2δ

(w(p2)− w(p)2)

and by direct computation, from Lemma 22 and the definition of w(1 + δ,m), we have

w(p) = ηt(p) + (p1+2δ − 1)p−2(1+δ) ηt(p)1 + p−2(1+δ)

= ηt(p)p2+2δ + p1+2δ

p2+2δ + 1

and similarly

w(p2) = ηt(p2) + ηt(p)2 p1+2δ − 1p2+2δ + 1

+p1+2δ − 1p2+2δ

= ηt(p)2 p2+2δ + p1+2δ

p2+2δ + 1− 1 +

p1+2δ − 1p2+2δ

102

hence

w(p2)− w(p)2 = −ηt(p)2 p2+2δ + p1+2δ

p2+2δ + 1p1+2δ − 1p2+2δ + 1

− 1 +p1+2δ − 1p2+2δ

.

For given p we can estimate from below using 0 6 ηt(p)2 6 4

νδ(k) > 1− 4p+ 1

p2e−2b + 1pe−2b − 1p2e−2b + 1

− e2b

p+pe−2b − 1p3e−4b

(if b is small enough). As a function of p, for b fixed, this is continuous. For b = 0, itis increasing as a function of p and is positive for p = 2 (where its value is 29/200).Hence the existence of b follows.

As for v(1 + δ, 1), we have by Lemma 22

v(1 + δ, 1) =ζ(2)ζ(2 + 2δ)|ζ(2 + 2δ + it)|2

ζ(4 + 4δ) 1

uniformly in t for all δ > −14 (for instance). 2

Remark Of course, we do not depend on the positivity of νδ(k) for small primes forthe validity of the argument. It is clear from the beginning that for δ of size (log q)−1, wehave νδ(p) > 0 for p large enough, independently of q. The remaining small primes canthen be sifted out from the start. But it is best to avoid such technical complications.

We can now conclude this part of the argument.

Proposition 14. Assume that δ = b(log q)−1 with b > 0 a fixed constant such that theprevious lemma applies. Then∑

f∈S2(q)∗

|M(f, β + it)L(f, β + it)|2 q(1 + |t|)B

for some absolute constant B > 0. The implied constant depends now only on ∆.

Proof. From Lemma 24, and the computation of M(δ) and the subsequent diagonal-ization of the main term, we see that for q large enough we have

M(−δ) > 0

hence, using the same trick as before that ζq(1 − 2δ) 6 0, we get by positivity theinequality

M2(δ) 6 ζ(1 + 2δ)M(δ) v(1 + δ, 1)ζ(1 + 2δ)∑k

νδ(k)k

∣∣∣∑m

w(m)m1+δ

xkm

∣∣∣2.Now, in terms of the linear forms yk introduced in (5.47), we can write∑

m

w(m)m1+δ

xkm =∑a,b

ηt(a)ηt(b)u(1 + δ, b)(ab)1+δ(b2(1+δ) + 1)

xabk =∑b

u(1 + δ, b)ηt(b)b1+δ(b2(1+δ) + 1)

ybk

since for squarefree n we have N(n) = n. But u(1 + δ, b) 6 b1+2δ for b squarefree, andProposition 13 gives immediately∑

m

w(m)m1+δ

xkm 1

k(1+δ)ξ(k)1

log q

and then the proof is completed as for Corollary 3, going back to (5.51) to conclude. 2

103

Proposition 5.20 is an easy consequence of this estimate near the critical line. Indeed,it first immediately provides the bound∑

f∈S2(q)∗

|M(f, β + it)L(f, β + it)− 1|2 q(1 + |t|)B (5.61)

for β = 12 + b(log q)−1. On the other hand, for Re (s) = σ > 1, we have∑

f∈S2(q)∗

|M(f, s)L(f, s)− 1|2 q1−∆(1−σ)(log q)30 (5.62)

as a consequence of the trivial individual bound of Lemma 18.Consider any σ0 > 2. We will interpolate by convexity between the bound (5.61)

near σ = 12 and this bound for σ = σ0, by means of a simple (and well-known) extension

of the classical convexity principle of Phragmen-Lindelof.Recall first that a function f is called subharmonic if its Laplacian ∆f is non-

negative. For example, if f is holomorphic, then |f |2 is subharmonic since

∆|f |2 = 2(u2x + v2

x + u2y + v2

y) + 2u∆v + 2v∆u = 2(u2x + v2

x + u2y + v2

y) > 0

(where u = Re (f), v = Im (f), both harmonic). The fundamental property of subhar-monic functions is that they satisfy the maximum modulus principle.

Lemma 25. Let f1,. . . ,fj be finitely many functions holomorphic inside the strip

0 < a− ϑ < Re (s) = σ < b+ ϑ

(for some ϑ > 0), such that the function

h = |f1|2 + · · ·+ |fj |2

has at most polynomial growth in the strip and satisfies

|h(s)| 6 Cqc|s|B, for Re (s) = a

|h(s)| 6 Cqd|s|B, for Re (s) = b,

where c, d, B, and C are real numbers with B > 0, C > 0. Then for any s in the stripa 6 Re (s) 6 b, we have

|h(s)| 6 Cqα(σ)|s|B

where α is the linear function with α(a) = c, α(b) = d.

Proof. For any ε > 0, let gε(s) be the function

gε(s) = q−α(σ)|s|−B|Γ(1 + εs)|2h(s)

defined in the strip. The point is that

gε(s) =∑i

|q−α(s)/2s−B/2Γ(1 + εs)fi(s)|2

so gε, as a sum of subharmonic functions, is a subharmonic function. We have moreover

|gε(s)| 6 CΓ(1 + εb)2

104

for Re (s) = a or Re (s) = b since |Γ(s)| 6 Γ(Re (s)) for Re (s) > 0. The Gammafunction decreases exponentially, and h has polynomial growth, so gε(s) tends to zerofor |Im (s)| tending to infinity, uniformly for a 6 Re (s) 6 b. Hence by the maximummodulus principle applied in a sufficiently large rectangle [a, b]× [−T, T ], it must hold

|gε(s)| 6 CΓ(1 + εb)2

for a 6 Re (s) 6 b, or

|h(s)| 6 C|s|Bqα(σ) Γ(1 + εb)2

Γ(1 + εs)2,

and letting ε tend to zero gives the required inequality. 2

We deduce from this lemma and (5.61), (5.62) that for

12

+1

log q6 σ 6 σ0

we have ∑f∈S2(q)∗

|M(f, s)L(f, s)− 1|2 ε q1−c(ε)(σ− 1

2)(1 + |t|)B

with c(ε) given by

c(ε) =∆(σ0 − 1)− ε

σ0 − 12

.

This inequality remains valid for Re (s) > σ0, because it is weaker than the trivialbound in this region. Now taking σ0 large enough and ε small enough, we will obtain∑

f∈S2(q)∗

|M(f, s)L(f, s)− 1|2 q1−c(σ− 12

)(1 + |t|)B

uniformly for Re (s) > 12 + b(log q)−1, for any c < ∆. This proves Proposition 11, since

we can obviously assume that b < 1.

105

Chapter 6

Proof of the lower bound

“Oh, in that case I guess it’s okay,”Colonel Korn said, mollified.

Joseph Heller, “Catch-22”

We recall the statement of Theorem 10 to be proved: for any ε > 0, and for anyprime number q large enough in terms of ε, we have

|f ∈ S2(q)∗ | L(f, 12) = 0, L′(f, 1

2) 6= 0| >(19

54− ε)

dim J0(q). (6.1)

and this quantity is the same as

|f ∈ S2(q)∗ | f is odd and L′(f, 12) 6= 0|.

In this chapter again, q is a fixed (large) prime number.

6.1 Non-vanishing in harmonic average

As described in Section 2.3, we will first prove the “harmonic” version of (6.1).

Theorem 15. For any ε > 0, and for any prime number q large enough in terms of ε∑h

εf=−1

L′(f, 12

)6=0

1 >(19

54− ε). (6.2)

6.1.1 Preliminary: a refined statement

We consider a mollified first moment

M1 =∑h

f∈S2(q)∗

ε−fM(f)L′(f, 12)

and a mollified second moment

M2 =∑h

f∈S2(q)∗

ε−f |M(f)L′(f, 12)|2

where the complex numbers M(f) (the “mollifier”) are at our disposal. Comparison ofan upper bound for M2 and a lower bound for M1 yields a lower bound for the quantityin (6.2): by Cauchy’s inequality

M1 6( ∑h

εf=−1

L′(f, 12

)6=0

1)1/2

M1/22

106

so that if the mollifier is such that M2 6= 0 we have∑h

εf=−1

L′(f, 12

) 6=0

1 >M2

1

M2. (6.3)

In order to achieve the best possible estimate, we will seek asymptotics for M1 andM2. If the mollifier is ignored (take M(f) = 1), a factor log q is lost in the final estimate.

To make the sums manageable, and in accordance with the principle that the mol-lifier should dampen the effect of large values of L′(f, 1

2), we choose M(f) of the shape(compare (5.17))

M(f) =∑m6M

xm√mλf (m) (6.4)

for real numbers (xm) (and a parameter M > 0) which we will try to choose to optimizethe resulting bound (6.3). If m > M , it is understood that xm = 0. Now we only imposethat the xm be supported on squarefree integers1 and satisfy

xm (τ(m)(log qm)

)A (6.5)

for some absolute constant A > 0. Henceforth we write M = q∆, and will assume0 6 ∆ < 1

2 , so any m appearing in the mollifier, or any product m1m2, is less than q,hence coprime with q since q is prime. These assumption imply the growth estimate∑

m6M

|xm| (log q)A+2A−1M. (6.6)

The precise form of Theorem 15 that we will prove is

Theorem 16. Let m1, m21, m22 and m2 be the real polynomials in the indeterminate∆ given by

m1 = ∆(∆2

2+

∆2

+14

)(6.7)

m21 = m1 + 6∆2((1

2+ ∆

)2 −∆(1

2+ ∆

)+

∆2

4

)+ 6∆3

(43(1

2+ ∆

)2 −∆(1

2+ ∆

)+

∆2

5

)+ 2∆3

((12

+ ∆)2 −∆

(12

+ ∆)

+∆2

5

)+ 2∆4

(32(1

2+ ∆

)2 −∆(1

2+ ∆

)+

∆2

6

)(6.8)

m22 = −23

∆3((1

2+ ∆

)2 −∆(1

2+ ∆

)+

∆2

5

)− 2

3∆4(3

2(1

2+ ∆

)2 −∆(1

2+ ∆

)+

∆2

6

) (6.9)

m2 =112

m21 −14m22. (6.10)

1This is convenient, although the assumption could be dispensed with, but only to be recovered asa consequence of the choice of xm.

107

Then the function

∆ 7→ m1(∆)2

m2(∆)

is defined and increasing on the real segment [0, 12 [ and for all ∆ ∈ [0, 1

4 [ it holds

∑h

εf=−1

L′(f, 12

)6=0

1 >m1(∆)2

m2(∆)+O

((log log q)4

log q

).

Moreoverm1(1

4)2

m2(14)

=1954.

Clearly, this implies Theorem 15. We now start proving this result.

6.1.2 Computation of the first moment

Recall thatM1 =

∑h

f

ε−fM(f)L′(f, 12).

As in the previous chapter we express L′(f, 12) as a rapidly convergent series using

contour integration and the functional equation. Choose an integer N > 1 (which willhave to be large enough, N = 2 works already) and a real polynomial G satisfying

G(−s) = G(s), and G(0) = 1 (6.11)G(−N) = . . . = G(−1) = 0. (6.12)

Notice that from the first of these, we obtain also

G′(0) = 0, G(3)(0) = 0. (6.13)

Then consider the integral

I =1

2iπ

∫(2)

Λ(f, s+ 12)G(s)

ds

s2.

In I, the Gamma factor Γ(s + 1) involved in Λ(f, s + 12) decreases exponentially

in vertical strips, whereas both the L-function and the polynomial G have at mostpolynomial growth, making it possible to shift the contour of integration to the left, tothe line Re (s) = −2. A double pole at s = 0 is picked up, hence

I = Ress=0Λ(f, s+ 1

2)G(s)s2

+1

2iπ

∫(−2)

Λ(f, s+ 12)G(s)

ds

s2,

but, by applying the functional equation and (6.11), the integral on Re (s) = −2 isequal to εfI, so this yields

2ε−f I = Ress=0Λ(f, s+ 1

2)G(s)s2

.

108

The residue is computed by writing the Taylor expansions

Λ(f, s+ 12) = Λ(f, 1

2) + sΛ′(f, 12) +O(s2)

and (from (6.13), (6.11))G(s) = 1 +O(s2)

so that we derive2ε−f I = Λ′(f, 1

2)

whence, multiplying through by ε−f

2ε−f I = ε−f

(√q2π

)1/2L′(f, 1

2).

On the other hand we can compute I by expanding the L-function in an absolutelyconvergent Dirichlet series on the line Re (s) = 2,

I =1

2iπ

∫(2)

L(f, s+ 12)(√q

)s+1/2Γ(s+ 1)G(s)

ds

s2

=(√q

)1/2∑l>1

λf (l)l−1/2 12iπ

∫(2)

n−s(√q

)sΓ(s+ 1)G(s)

ds

s2

=(√q

)1/2∑l>1

λf (l)l−1/2V(2πl√q

)where the function V is defined by

V (y) =1

2iπ

∫(3/2)

Γ(s+ 1)G(s)y−sds

s2. (6.14)

Putting both computations together we get the desired expression

ε−f L′(f, 1

2) = 2ε−f∑l>1

λf (l)l−1/2V(2πl√q

). (6.15)

We estimate V easily by shifting the contour to the left, or right.

Lemma 26. The function V satisfies

V (y) = − log y − γ +O(yN ) (6.16)

(γ = −Γ′(1) is Euler’s constant) and

V (y)j y−j for all j > 1 (6.17)

(which are all valid for y > 0).

109

Using the definition of M1 and the expression (6.4) for M(f), we obtain at oncefrom (6.15)

M1 =∑l,m

xm(lm)−1/2V(2πl√q

)×∆−(l,m) (6.18)

where ∆− is the Delta symbol for odd forms considered in Section 4.2, namely

∆−(l,m) = 2∑h

f

ε−f λf (l)λf (m).

By (4.10) of Lemma 11, ∆− approximates the Kronecker delta

∆−(l,m) = δ(l,m) +O(√lm

q(log q)2

)and thus we have

M1 =∑m

xmmV(2πm√q

)+O

((log q)2

q

∑l,m

|xm|∣∣∣V (2πl√q

)∣∣∣)=∑m

xmmV(2πm√q

)+O

(M√q

(log q)B)

for some B > 0, depending only on the constant A of (6.5), since

(log q)2

q

∑l,m

|xm|∣∣∣V (2πl√q

)∣∣∣ M

q(log q)B

∑l<√q

∣∣∣V (2πl√q

)∣∣∣+∑l>√q

∣∣∣V (2πl√q

)∣∣∣ M

q(log q)B × (

√q(log q) +

√q)

by (6.6) and Lemma 26, in particular, (6.12) with j = 2 for l >√q. Using now (6.16),

with N = 1, and again (6.6), we have∑m

xmmV(2πm√q

)=∑m6M

xmm

(log√q

2πm− γ)

+O( 1√q

∑m6M

|xm|)

=∑m6M

xmm

(log√q

2πm− γ)

+O(M√q

(log q)B)

which establishes the next proposition.

Proposition 15. Let M = q∆ with ∆ < 12 . Define the real number q by

log q = − log2π√q− γ,

then, for some positive constant δ = δ(∆) > 0, we have

M1 =∑m6M

xmm

(log

q

m

)+O(q−δ). (6.19)

Remark . Here one can take, by the above,

δ = 12 −∆− ε

for any ε > 0 small enough so that this is positive. In the following, when we write anerror term of the form O(q−δ), it is implied that δ > 0, δ only depends on ∆, and thevalue of δ may change from line to line.

110

6.1.3 Computation of the second moment

We now wish to get an expression for M2 as a quadratic form in the xm. A new phe-nomenon appears, however, at the point where we would like to appeal to Lemma 11, asthe remainder term in the Petersson formula (the series of Kloosterman sums J (m,n))cannot be ignored, and has to be analyzed to yield a contribution to the main term.

The square of the special value

The procedure is similar to that followed to express L′(f, 12). We consider this time the

integral

J =1

2iπ

∫(2)

Λ(f, s+ 12)2G(s)

ds

s3

and proceed to evaluate it as before. By shifting the contour to Re (s) = −2 andapplying the functional equation for the square of the L-function

Λ(f, s)2 = Λ(f, 1− s)2

(notice the sign is always +1 in this case), we have

2J = Ress=0Λ(f, s+ 1

2)2G(s)s3

.

Further, from the multiplicativity of the Hecke eigenvalues (2.10), we derive theDirichlet series expansion

L(f, s)2 = ζq(2s)∑n>1

τ(n)λf (n)n−s

so the term by term integration yields

J =√q

∑n>1

λf (n)√nτ(n)W

(4π2n

q

)where

W (y) =1

2iπ

∫(1/2)

ζq(1 + 2s)Γ(s)2G(s)y−sds

s(6.20)

(the integration on Re (s) = 2 can be shifted to Re (s) = 12 in defining the function

W since no poles appear between those two lines and the integrand is exponentiallydecreasing in vertical strips). Comparing we deduce the equality

2×√q

∑n>1

λf (n)√nτ(n)W

(4π2n

q

)= Ress=0

Λ(f, s+ 12)2G(s)

s3. (6.21)

Now if f is odd, we have L(f, 12) = Λ(f, 1

2) = 0 and then we compute the residue

Ress=0Λ(f, s+ 1

2)2G(s)s3

= Λ′(f, 12)2 =

√q

2πL′(f, 1

2)2

111

from the Taylor expansions

Λ(f, s+ 12)2 = s2Λ′(f, 1

2)2 +O(s3)

G(s) = 1 +s2

2G′′(0) +O(s4),

so (6.21) furnishes a formula for L′(f, 12), valid for odd forms:

L′(f, 12)2 = 2

∑n>1

λf (n)√nτ(n)W

(4π2n

q

). (6.22)

For our purpose, W is basically a “cut-off” function, which restricts the summation ton 6 q. Indeed, we have the following

Lemma 27. The function W satisfies

yiW (j)(y)i,j log(y + 1/y)3, for all i > j > 0 (6.23)

yiW (i)(y)j y−j , for all i > 0, j > 1. (6.24)

Moreover, there exists a polynomial P , independent of q, of degree at most 2, such that

W (y) = − 112

(log y)3 + P (log y) +O(q−1(log y)2 + yN ). (6.25)

Proof. The first two inequalities are obtained by the usual contour shifts and differen-tiating under the integral sign. As for the last, we have

W (y) = Ress=0G(s)Γ(s)2ζq(1 + 2s)y−s

s+O(yN )

again by shifting, and the residue is computed by using Taylor expansions, writing

ζq(1 + 2s) = (1− q−1−2s)ζ(2s)

where the first factor contributes to the error term in (6.25), hence we get the polynomialP independent of q. 2

As for V before, we will use this lemma in the following form:∑n>1

a(n)W(4π2n

q

) (log q)3

∑n<q

|a(n)|+ q2∑n>q

|a(n)|n2

(6.26)

so, roughly speaking, if the complex numbers a(n) are “almost bounded”, the sum willbe of size about q, as if the summation had extended only to n 6 q.

Applying Petersson’s formula

The goal now is to compute M2. From the definition (6.4) of M(f), and by multiplica-tivity, the second moment M2 can be written

M2 =∑h

εf=−1

L′(f, 12)2

∑m1,m2

xm1xm2√m1m2

λf (m1)λf (m2)

=∑b

1b

∑m1,m2

xbm1xbm2√m1m2

Ah[ε−f λf (m1m2)L′(f, 12)2] (6.27)

112

(there is no εq(b) because m < q by assumption).We start by investigating the inner sum. Therefore fix some 0 6 ∆ < 1 and an

integer m, 1 6 m 6 q∆ < q. We consider the average

Ah[ε−f λf (m)L′(f, 12)2] =

∑h

f∈S2(q)∗

ε−f λf (m)L′(f, 12)2. (6.28)

From (6.22) we obtain

Ah[ε−f λf (m)L′(f, 12)2] =

∑n>1

τ(n)√nW(4π2n

q

)Ah[2ε−f λf (m)λf (n)]

=∑n>1

τ(n)√nW(4π2n

q

)∆−(m,n).

From Lemma 11, we have

∆−(m,n) = δ(m,n) + J ′(m,n) +O(√mnq3/2

(log q)2)

(6.29)

hence

Ah[ε−f λf (m)L′(f, 12)2] =

τ(m)√mW(4π2m

q

)+∑n>1

τ(n)√nJ ′(m,n)W

(4π2n

q

)+O

(√m

q(log q)6

) (6.30)

where we have estimated that the error term is

√m

q3/2(log q)2

∑n>1

τ(n)∣∣∣W(4π2n

q

)∣∣∣and the inner sum is estimated by the method of (6.26), as follows

∑n>1

τ(n)∣∣∣W(4π2n

q

)∣∣∣ (log q)3∑n<q

τ(n) + q2∑n>q

τ(n)n2

q(log q)4

whence (6.30). Notice already that if M = q∆ with ∆ < 14 , this error term is good

enough to put back into M2 where – as in M1 – we expect a main term which is roughlya power of log q: by (6.27) it yields a contribution to M2 which is

(log q)6

√q

∑b6M

1b

∣∣∣∑m

xm

∣∣∣2 M2

√q

(log q)C

for some C > 0 by (6.6), and this is qδ for some δ > 0 if ∆ < 14 .

The first term of (6.27) is further decomposed by means of (6.25) (with N = 1):define Q by

log Q = logq

4π2,

113

then

τ(m)√mW(4π2m

q

)=

112τ(m)√m

(log

Q

m

)3+τ(m)√mP(

logQ

m

)+O

(mq

). (6.31)

The second term in (6.30) is now the focus of our attention. We call it X(m). Fromthe definition (4.8) of J ′(m,n), this is a sum over integers r > 1 coprime with q, whichwe write

X(m) =2π√q

∑(r,q)=1

1rXr (6.32)

where the term Xr is a weighted sum of Kloosterman sums twisted by the divisorfunction

Xr = −∑n>1

τ(n)√nS(mq, n; r)J1

(4πr

√mn

q

)W(4π2n

q

)ξ(n). (6.33)

For technical reasons we have chosen a fixed test function ξ : R+ −→ [0, 1], whichis C∞ and satisfies

ξ(x) = 0, 0 6 x 6 12 , ξ(x) = 1, x > 1,

and we have inserted in the summation the weight ξ(n): this obviously doesn’t affectthe sum (all positive integers are at least 1!), but will be useful to gain convergence insome series appearing later (this is required only because the weight is 2).

We separate the sum in r in (6.32) in two parts, r 6 R and r > R, R > 0 being aparameter to be fixed later, but assumed to satisfy logR log q with some absoluteimplied constant (see the choice below). The part with r large is handled directly.

Lemma 28. In this situation, it holds

2π√q

∑r>R

(r,q)=1

1rXr

√m

R(log q)11. (6.34)

Proof. The computations are similar to those previously done. For Kloosterman sumswe use Weil’s bound (4.5) and for the Bessel function J1(x) x, and the sum over nis subdivided into n < q and n > q, and accordingly (6.23) or (6.24) is used. Quickly,for instance the part with n 6 q is

√m

q(log q)3

∑n<q

τ(n)∑r>R

τ(r)(m,n, r)1/2

r3/2

and the common divisor is handled by writing

(m,n, r)1/2 6 (m,n, r) =∑

d|(m,n)d|r

ϕ(d)

so ∑r>R

τ(r)(m,n, r)1/2

r3/26

∑d|(m,n)

τ(d)ϕ(d)d3/2

∑rd>R

τ(r)r3/2

1√R

(log q)∑

d|(m,n)

τ(d)ϕ(d)d

1√R

(log q)τ(n)2.

114

The part with n > q is handled similarly, and both together produce the error termannounced in (6.34). 2

We denote now X ′(m) the remaining part of the sum in X(m):

X ′(m) =2π√q

∑r6R

(r,q)=1

1rXr.

Extraction of the main contribution

Let now r < R. In Xr (see (6.33)), we now open the Kloosterman sum

S(mq, n; r) =∑∗

d mod r

e(mqd+ nd

r

)and take the summation over d outside, so

Xr = −∑∗

d mod r

e(mqd

r

)∑n>1

τ(n)e(ndr

)t(n)

where the weight function t : R+ −→ R is

t(x) = J1

(4πr

√mx

q

)W(4π2x

q

)ξ(x)√x. (6.35)

For each d, the summation formula for the divisor function twisted by additivecharacters (an extension of Voronoı’s formula, see [Jut, th. 1.7] for instance) can beapplied.

Proposition 16. Let t : R+ → C be a C∞ function which vanishes in the neighborhoodof 0 and is rapidly decreasing at infinity. Then for c > 1 and d coprime with c, we have∑

m>1

τ(m)e(dmc

)t(m) =

2c

∫ +∞

0(log√x

c+ γ)t(x)dx

− 2πc

∑h>1

τ(h)e(−dhc

)∫ +∞

0Y0

(4π√hx

c

)t(x)dx

+4c

∑h>1

τ(h)e(dhc

)∫ +∞

0K0

(4π√hx

c

)t(x)dx.

Hence, after exchanging again the order of summation, this yields

Xr = −2rS(m, 0; r)

∫ ∞0

(log√x

r+ γ)t(x)dx (6.36)

+2πr

∑h>1

τ(h)S(hq −m, 0; r)∫ +∞

0Y0

(4π√hx

r

)t(x)dx

− 4r

∑h>1

τ(h)S(hq +m, 0; r)∫ +∞

0K0

(4π√hx

r

)t(x)dx.

115

Let L(r) be the first integral without the test function ξ(x):

L(r) =∫ ∞

0(log√x

r+ γ)J1

(4πr

√mx

q

)W(4π2x

q

) dx√x

and for any integer h > 1 let

y(h) =∫ +∞

0Y0

(4π√hx

r

)t(x)dx (6.37)

k(h) =∫ +∞

0K0

(4π√hx

r

)t(x)dx (6.38)

With these definitions, and by removing the weight ξ in the first term of the sum-mation formula, we get

X ′(m) =4π√q

∑r6R

(r,q)=1

1r2S(m, 0; r)L(r) (6.39)

+4π2

√q

∑r6R

(r,q)=1

1r2

∑h>1

τ(h)S(hq −m, 0; r)y(h)

− 8π√q

∑r6R

(r,q)=1

1r2

∑h>1

τ(h)S(hq +m, 0; r)k(h) +O((log q)5

√q

)

because the difference arising from this removal is at most

1√q

∑r<R

1r2|S(m, 0; r)|

∫ 1

0| log

√x

r+ γ||t(x)|dx (log q)5

√q

since t(x) (log q)3x−1/2 for 0 6 x 6 1 by (6.23), and crudely |S(m, 0; r)| 6 r.We reserve for later consideration the last two sums (see Section 6.1.3), and evaluate

exactly the first one, which we call X ′′(m). We have

X ′′(m) = − 4π√q

∑r6R

(r,q)=1

1r2S(m, 0; r)

∫ ∞0

(log√x

r+ γ)J1

(4πr

√mx

q

)W(4π2x

q

) dx√x

= −2∑r6R

(r,q)=1

1rS(m, 0; r)

∫ ∞0

(log√qx

2π+ γ)J1(2

√mx)W (r2x)

dx√x

by the change of variable x 7→ r2

4π2qy. By the definition of W , see (6.20), this is equal

to the complex integral

X ′′(m) =1

2iπ

∫(1/2)

(−2)ZRm(1 + 2s)ζq(1 + 2s)s−1Γ(s)2G(s)L(s)ds, (6.40)

116

withZRm(s) =

∑r6R

(r,q)=1

S(m, 0; r)r−s,

L(s) =∫ +∞

0(log√qx

2π+ γ)J1(2

√mx)x−s−1/2dx.

Both ZRm and L can be computed, the former by extending again the sum over r toinfinity.

Lemma 29. We have for Re (s) = σ > 1

ZRm(s) = ζq(s)−1∑d|m

d1−s +Oσ(τ(m)R1−σ).

Proof. By the formula giving the Ramanujan sum (the∑∗

refers of course to integerscoprime with q)

ZRm(s) =∑∗

r6R

r−s∑

d|(m,r)

dµ(rd

)=∑d|m

d∑∗

fd6R

µ(f)(fd)−s

=∑d|m

d1−sζq(s)−1 +O

((Rd

)1−σ)= ζq(s)−1

∑d|m

d1−s +O(τ(m)R1−σ)

2

Lemma 30. Recall that log Q = log q4π2 . For all s with 1

4 < Re (s) < 1, we have

L(s) = −12ms−1/2Γ(−s)Γ(s)−1

(log

Q

m+ 2γ + ψ(1 + s) + ψ(1− s)

)where ψ = Γ′/Γ.

Proof. The following formula is valid for −2 < Re (s) < −12 (see [G-R, 6.561.14]):

`(s) :=∫ +∞

0J1(x)xsdx = 2sΓ

(1 +

s

2

)Γ(

1− s

2

)−1(6.41)

and putting y = 2√mx in L(s) gives

L(s) = 4sms−1/2((1

2log

Q

m+ γ)`(−2s) + `′(−2s)

).

From (6.41) we deduce

`′(s) = 2sΓ(

1 +s

2

)Γ(

1− s

2

)−1(log 2 +

12ψ(

1 +s

2

)+

12ψ(

1− s

2

))and the result follows. 2

117

From Lemma 29 we obtain

X ′′(m) =1

2iπ

∫(1/2)

(−2)σ−2s(m)s−1Γ(s)2G(s)L(s)ds+O(τ(m)

R(log q)

)

=1

2iπ

∫(1/2)

F (s)ds+O(τ(m)

R(log q)

), say (6.42)

since 1 + 2s is on the line Re (s) = 2, and

ζq(1 + 2s)Γ(s)2G(s)L(s) |Γ(s)Γ(−s)s−1|(log q + |ψ(1 + s)|+ |ψ(1− s)|)

on Re (s) = 12 , and this decreases exponentially on the line.

The lemmas show that the integrand F (s) in (6.42) is

F (s) = m−1/2s−1G(s)ηs(m)Γ(s)Γ(−s)(

logQ

m+ 2γ + ψ(1 + s) + ψ(1− s)

)where ηs is the arithmetic function defined by

ηs(m) =∑ab=m

(ab

)s.

Thus, F (s) is seen to be an odd function of s, which is moreover holomorphic inthe strip |Re (s)| < 1, except for a triple pole at s = 0, and decreases exponentially invertical strips. Shifting the contour to Re (s) = −1

2 and changing then s into −s allowsus to conclude that

X ′′(m) =12

Ress=0F (s) +O(τ(m)

R(log q)

). (6.43)

Around s = 0, the following expansions hold:

s−1Γ(s)Γ(−s) = − 1s3

+γ2 − Γ′′(1)

s+O(s)

2γ + ψ(1 + s) + ψ(1− s) = ψ′′(0)s2 +O(s4)

G(s) = 1 +12G′′(0)s2 +O(s3)

ηs(m) = τ(m) +12T (m)s2 +O(s3)

where T is the arithmetic function defined by

T (m) =∑ab=m

(log

a

b

)2.

Combining those, we obtain

12

Ress=0F (s) = −14T (m)√m

(log

Q

m

)+ α

τ(m)√m

(log

Q

m

), (6.44)

where we have set

α =12

(γ2 − Γ′′(1)− G′′(0)

2− ψ′′(0)

)

118

(a constant).Coming back to the two other integrals contributing to Xr, hence to X ′(m), the

following lemmas will be proved in Section 6.1.3, showing that they are of smaller orderof magnitude than the term just evaluated. We denote those contributions by Y (m)and K(m):

Y (m) =4π2

√q

∑∗

r6R

1r2

∑h>1

τ(h)S(hq −m, 0; r)y(h) (6.45)

and

K(m) = − 8π√q

∑∗

r6R

1r2

∑h>1

τ(h)S(hq +m, 0; r)k(h). (6.46)

Lemma 31. For all m < q we have

K(m)ε

√m

qqε

for all ε > 0.

Lemma 32. Assume R 6 q2. For all m < q we have

Y (m)ε

(√mq

+1√q

)qε

for all ε > 0.

We choose R = q2, and put together all the information gathered, thereby provingan approximate formula for X(m): starting from (6.33), we have in turn

X(m) = X ′(m) +O(√m

R(log q)11

)by (6.34)

= X ′′(m) +K(m) + Y (m) +O((log q)5

√q

)+O

(√m

R(log q)11

)by (6.39)

= −14T (m)√m

(log

Q

m

)+ α

τ(m)√m

(log

Q

m

)+O

(τ(m)R

(log q))

by (6.43), (6.44)

+Oε

(√mqqε)

+Oε

((√mq

+1√q

)qε)

by Lemmas 32 and 31

+O((log q)5

√q

)+O

(√m

R(log q)11

).

Cleaning up, we state this formally.

Proposition 17. Let 0 6 ∆ < 1 and 1 6 m 6 q∆. For any ε > 0

X(m) = −14T (m)√m

(log

Q

m

)+ α

τ(m)√m

(log

Q

m

)+O∆,ε

((√mq

+1√q

)qε).

Together with (6.31), this gives an approximate formula for Ah[ε−f λf (m)L′(f, 12)].

119

Proposition 18. Set P1 = P + αX. Then for 0 6 ∆ < 12 , and 1 6 m 6 q∆, we have

for any ε > 0,

Ah[ε−f λf (m)L′(f, 12)] =

112τ(m)√m

(log

Q

m

)3− 1

4T (m)√m

(log

Q

m

)+τ(m)√mP1

(log

Q

m

)+O∆,ε

(√m

qqε).

For later use, we record a few properties of the function T .

Lemma 33. Let τ (i) be defined for i > 0 by

τ (i)(m) =∑d|m

(log d)i.

Then we haveT (m) = 4τ (2)(m)− 2(logm)τ (1)(m). (6.47)

Moreover, T satisfies

T (m1m2) = τ(m1)T (m2) + τ(m2)T (m1) (6.48)

for (m1,m2) = 1 and more generally

T (m1m2) =∑

d|(m1,m2)

µ(d)(τ(m1

d

)T(m2

d

)+ τ(m2

d

)T(m1

d

))(6.49)

for all integers m1, m2 > 1.

Proof. The first formula is immediate, and the second follows from the third, whichis obtained by differentiating the corresponding identity (see (5.37)) for ηs, remarkingthat ∑

ab=m

(log

a

b

)= 0

for any integer m. 2

Estimation of the integrals

We now proceed to prove Lemmas 31 and 32, vindicating our contention that K(m)and Y (m) are of smaller order of magnitude (in our situation) than the main termisolated in the previous section.

We need some facts about the Bessel functions Y0 (and Yn, n > 0 an integer), Jn,and K0, which we now quote:

K0(y) y−1/2e−y, for all y > 0 (6.50)Y0(y) log y for all y > 0 (6.51)Yν(y)ν 1 for all ν > 1, y > 0. (6.52)

120

Proof of Lemma 31. Because K0 has exponential decay at infinity and ξ cuts off thesmall values of x, this is easy.

We start with k(h), which is given by (6.38), so using (6.50) and J1(x) 1, weestimate

k(h) =∫ +∞

0K0

(4π√hx

r

)J1

(4πr

√mx

q

)W(4π2x

q

)ξ(x)

dx√x

=r√h

∫ +∞

0K0(y)J1

(√m

hqy)W(r2y2

4qh

)ξ( r2y2

16π2h

)dy

r

h

√m

q(log q)3

∫ +∞

√hr−1

y1/2e−ydy

r

h

√m

q(log q)3 exp

(−√h

2r

).

Furthermore this implies

K(m) = − 8π√q

∑∗

r6R

1r2

∑h>1

τ(h)S(hq +m, 0; r)k(h)

√m

q(log q)3

∑h>1

τ(h)h

exp(−√h

2R

)∑∗

r6R

(r, hq +m)r

√m

q(log q)4

∑h>1

τ(h)τ(hq +m)h

exp(−√h

2R

qε√m

q.

2

The estimate involving Y0 is slightly more complicated because Y0 is not decreasingvery fast at infinity, but instead is oscillating: indeed, it satisfies the asymptotic ([G-R,8.451.2])

Y0(x) ∼√

2πx

sin(x− π

4)

as x → +∞. Hence, if Y (m) is small, this is because of cancellation in the oscillatoryintegral y(h). This is similar to the Riemann-Lebesgue Theorem, according to whichthe Fourier coefficients of a C∞ periodic function, which are integrals against the os-cillatory exponential functions n 7→ e(nx), tend to zero faster than any polynomial,and the mainspring of the proof is successive integration by parts, exploiting the recur-rence relations satisfied by Bessel functions (instead of using directly the asymptoticexpansions).

We first prove a general lemma, which is quite standard.

Lemma 34. Let ν > 0 be a real number, J > 0 an integer, f a C∞ test functioncompactly supported on the interval [Y, 2Y ] and let β > 0, ϑ > 0 be real numbers suchthat the bounds

yjf (j)(y)j ϑ(1 + βY )j (6.53)

hold for 0 6 j 6 J , the implied constants depending on j alone. Then for any α > 0such that αY > 1, we have∫ +∞

0Yν(αy)f(y)dy J ϑ

(1 + βY

1 + αY

)JY (6.54)

121

where the implied constant depends only on J and on the implied constants in (6.53).

Proof. One could write the asymptotic development of Y0 to show the oscillating be-havior and integrate by parts, but it is cleaner (and amounts to the same thing) tomake use of the recurrence formula

(yνYν(y))′ = yνYν−1(y)

to get (also by integration by part)∫ +∞

0Yν(αy)f(y)dy =

∫ ∞0

Yν+1(αy)(f ′(y) +

f(y)y

)dy.

Let g(y) = f ′(y) + f(y)/y; by Leibniz’s rule and the assumption, g satisfies

yj+1g(j)(y) = yj+1f (j)(y) +j∑

k=0

(j

k

)(−1)j−k(j − k)!ykf (k)(y)

j ϑ(1 + βY )j+1

for 0 6 j 6 J − 1. Hence, iterating this procedure, we obtain∫ +∞

0Yν(αy)f(y)dy =

1αJ

∫ +∞

0Yν+J(αy)h(y)dy,

where the function h is such that

yJh(y)J ϑ(1 + βY )J

and therefore the result follows by using Yν+J(y)J+ν 1. 2

Proof of Lemma 32. We have

Y (m) =4π2

√q

∑∗

r6R

1r2

∑h

τ(h)S(hq −m, 0; r)y(h).

We make a smooth dyadic partition of unity, so

ξ =∑k>1

ξk

where each ξk is a C∞ function with compact support in a dyadic interval [Xk, 2Xk]that satisfies

xjξ(j)k (x)j 1, for all j > 0, (6.55)

the implied constants depending on j alone, in particular, they are uniform in k. Westudy each ξk individually, but we keep writing ξ instead of ξk, and accordingly we useX rather than Xk.

By the change of variable 2r−1√x = y, the integral is

y(h) = r

∫ +∞

0Y0(2π

√hx)J1

(2π√m

qx)W(π2r2x2

q

)ξ(r2x2

4

)dx, (6.56)

122

so we define the test function f by

f(x) = J1

(2π√m

qx)W(π2r2x2

q

)ξ(r2x2

4

).

This is a C∞ function compactly supported in the dyadic interval [ρ, 2ρ], with

ρ = 2√X

r. (6.57)

We first treat the case 12 6 X 6 q

2 (which involves log q terms).Claim: f satisfies the hypothesis of Lemma 34 with

Y = ρ, α = 2π√h, β = 2π

√m

q, ϑ = (log q)3

and any (fixed) positive integer J > 1.

This follows from the bound

xjW (j)(x)j (log q)3, for all j > 0,

of (6.25), valid for 1/q x q2, the analogue bound for ξ in (6.55), the recurrencerelation

(xνJν(x))′ = xνJν−1(x)

and some elementary, although somewhat lengthy induction arguments and manipula-tions with inequalities. The intuitive reason is that the function W and ξ are “flat”,while J1(αx) oscillates somewhat like e(αx). For a completely detailed proof, proceedstep by step with

f1(x) = ξ(r2x2

4

), f2(x) = W

(π2r2x2

q

),

f3(x) = f1(x)f2(x), f4(x) = J1(αx),f5(x) = f3(x)f4(x) = f(x).

Thus, we are in a position to apply the preceding lemma to f , provided that

2πρ√h > 1. (6.58)

This restriction will complicate things, unfortunately, and it will be necessary tosplit into different cases. If (6.58) holds, we obtain

y(h)J rρ

(1 + ρ

√mq

)J(1 + ρ

√h)J

(log q)3. (6.59)

Consider first the case ρ > 1, or r <√X, in which case, since h > 1, the condition

is satisfied: applying (6.59) with J > 3 (to win convergence in h) yields a contribution

123

in (6.37) which is therefore

J(log q)3

√q

∑∗

r<√X

1r2rρ−(J−1)

(1 + ρ

√m

q

)Jτ(r)

J(log q)3

√q

( ∑∗

r<qmXq

ρτ(r)r

(√m

q

)J+

∑∗

qmXq6r<

√X

τ(r)r

)

J(log q)5+J

√q

q1+J(∆−1)/2, since m/q 6 q∆−1

at which point, since ∆ < 1, we can choose J large enough so that 1 + J(∆− 1)/2 6 0to conclude that this part is

∆,εqε√q. (6.60)

On the other hand, for ρ 6 1, we split the summation in h in the following way∑h>1

=∑

h6ρ−2(1+κ)

+∑

h>ρ−2(1+κ)

where κ > 0 will be chosen (sufficiently small) a little later.For the first sum, where the condition (6.58) is not valid, we come back to (6.37),

using again J1(x) x, Y0(x) (log x) and S(hq −m, 0; r) 6 (hq −m, r) to derive

4π2

√q

∑∗

√X6r6R

1r2

∑h6ρ−2(1+κ)

τ(h)S(hq −m, 0; r)y(h)

(log q)3

√q

∑∗

√X6r6R

1r2

∑h6ρ−2(1+κ)

τ(h)(hq −m, r)Xr

√m

q

√m

q(log q)3X

∑h6(R2/X)1+κ

τ(h)∑∗

√Xhθ6r6R

(hq −m, r)r3

,where θ = (2 + 2κ)−1

√m

q(log q)3X

∑h6(R2/X)1+κ

τ(h)∑

d|hq−m

ϕ(d)d3

∑∗

√Xhθ6dr6R

1r3

√m

q(log q)3X

∑h6(R2/X)1+κ

τ(h)τ(hq −m)X−1h−1+κ/(1+κ)

ε

√m

qqεR2κ/(1+κ) ε

√m

qqεR2κ, for all ε > 0. (6.61)

For the second sum, we have√h > ρ−2 hence ρ

√h > ρ−1 > 1, so applying (6.59)

again for J > 3 entails (recall ρ 6 1)

y(h)J

√Xρ−Jh−J/2(log q)3

124

and

4π2

√q

∑∗

√X6r6R

1r2

∑h>ρ−2(1+κ)

τ(h)S(hq −m, 0; r)y(h)

J(log q)3

√q

∑∗

√X6r6R

1r2

√Xρ−Jρ−2(1+κ)(1−J/2)τ(r)

(log q)3

√q

∑∗

√X6r6R

√X

r2ρ−2+κ(J−2)τ(r). (6.62)

We choose κ = ε/4, then J large enough so that −2 + κ(J − 2) > 0 (in addition tothe previous conditions that j > 3, 1+J(∆−1)/2 6 0); then (6.61) and (6.62) togetherare

∆,ε qε(m1/2

q+

1√q

)(using ρ−2+κ(J−2) 6 1; recall also that R 6 q2 was assumed in the statement ofLemma 32).

Finally, we return to the case X > q2 which remains. We appeal to (6.24) (forj = 2), and again use elementary estimations to prove that for X > q the function fsatisfies the better bound

xjf (j)(x)(

1 + x

√m

q

)jq2(rx)−4

namely, Lemma 34 can be applied now with ϑ = q2(rρ)−4 = 16q2X−2. Hence, whenapplicable, we get

y(h) rρ

(1 + ρ

√mq

)J(1 + ρ

√h)J

q2

X2(log q)3

in addition to the bound (6.59).Since X > q2, the quantity saved is

q2

X2 X−1

which is more than sufficient to allow for the sum over the dyadic values of X involvedto converge, and proves that all the previous bounds where (6.59) was used remainvalid. The only place where this is not the case is the inequality (6.61), but this partof the sum is void for

√X > R and the former estimate works in the larger interval

X 6 R2. 2

A formula for the second moment

We can at last reap the fruits of those efforts.

Proposition 19. Assume M = q∆ with ∆ < 14 . Then there exists δ > 0 such that

M2 =112M21 −

14M22 +M3 +O(q−δ) (6.63)

125

where M21, M22 and M3 are quadratic forms in the variables xm given by

M21 =∑b

1b

∑m1,m2

τ(m1m2)m1m2

xbm1xbm2

(log

Q

m1m2

)3(6.64)

M22 =∑b

1b

∑m1,m2

T (m1m2)m1m2

xbm1xbm2

(log

Q

m1m2

)(6.65)

M3 =∑b

1b

∑m1,m2

τ(m1m2)m1m2

P1

(log

Q

m1m2

). (6.66)

Proof. We appeal to (6.27) and then apply Proposition 18. The first three terms giveexactly the three quadratic forms M21, M22 and M3. Moreover, using (6.6), the errorterm is, for any ε > 0

q−1/2+ε∣∣∣ ∑m6M

xm

∣∣∣2 M2q−1/2+2ε.

If ∆ < 14 , we can take ε small enough so that this is O(q−δ) for some δ > 0. 2

Now, with the expressions of M1 (Proposition 15) and M2 (Proposition 19) as rathersimple linear and quadratic forms in the coefficients xm of the mollifier, it remains tooptimize the bound (6.3), namely to maximize the quadratic form M2 under the linearconstraint given by M1.

Such a problem is solved in principle, for a diagonalized quadratic form, by thefollowing well-known lemma.

Lemma 35. LetL =

∑16k6n

j(k)Xk, Q =∑

16k6n

ν(k)X2k

be a linear form and a positive definite quadratic form on Rn (so ν(k) > 0). Then

supx∈Rn

x6=0

L(x)2

Q(x)= J

with

J =∑

16k6n

j(k)2

ν(k).

Proof. By Cauchy’s inequality

L(x)2 =( ∑

16k6n

j(k)xk)2

=( ∑

16k6n

j(k)√ν(k)

√ν(k)xk

)2

6 J∑

16k6n

ν(k)x2k = JQ(x)

126

for all x = (xk) ∈ Rn, so the supremum is at most J , but the case of equality inCauchy’s inequality shows that the bound is achieved for

xk =j(k)ν(k)

,

hence the result. 2

Unfortunately, the quadratic form M2 as given by the Proposition is not diago-nalized. The strategy of the proof of Theorem 16 is now to write M21 as a linearcombination of easily diagonalized quadratic forms; the simplest in shape, say Π, ischosen and we are able to select (xm) to optimize the value of Π with respect to M1.Then the remaining terms in M21 are evaluated, and so is M22. Both are of the sameorder of magnitude, so our choice may not be perfectly optimal. On the other hand,with our specific choice of xm, we finally prove that M3 gives a smaller contribution,namely that

M3 = O(M21

(log log q)4

log q

). (6.67)

6.1.4 The preferred quadratic form, I

Separating m1 and m2 in (6.69) by means of the formula (compare (5.37)

τ(m1m2) =∑

a|(m1,m2)

µ(a)τ(m1

a

)τ(m2

a

)(6.68)

we get

M21 =∑b

1b

∑a

µ(a)a2

∑m1,m2

τ(m1)τ(m2)m1m2

xabm1xabm2

(log

Q

a2m1m2

)3(6.69)

We define the following arithmetic functions

νt(k) =1k

∑ab=k

µ(a)(log a)t

a, for t = 1, 2, 3. (6.70)

Then expanding the logarithm in (6.69) and rearranging, we see that M21 is a linearcombination of the quadratic forms Π(t, u, v, w) in the xm’s defined by

Π(t, u, v, w) = (log Q)u∑k

νt(k)y(v)k y

(w)k (6.71)

where the new variables y(i)k , for i > 1, are defined by

y(i)k =

∑m

τ(m)m

(logm)ixkm (6.72)

and t, u, v and w are non-negative integers such that t+ u+ v + w = 3.2

2Actually, M3 is also such a linear combination with the difference that t+ u+ v+w 6 2. This willexplain (6.67).

127

We further restrict our attention to Π(u, v, w) := Π(0, u, v, w); again it will be seenthat for the chosen (xm)

Π(t, u, v, w) = O(

Π(0, u, v, w)(log log q)t+2

log q

)(6.73)

which justifies this restriction. Accordingly we write ν for ν0, for which we have theformula

ν(k) =ϕ(k)k2

, for k 6M. (6.74)

The part of the expansion ofM21 involving those Π(u, v, w) is then (using the obvioussymmetry Π(u, v, w) = Π(u,w, v)) denoted by m21:

m21 = Π(3, 0, 0)−6Π(2, 1, 0)+6Π(1, 1, 1)+6Π(1, 2, 0)−6Π(0, 1, 2)−2Π(0, 0, 3). (6.75)

Finally, we select the one quadratic form Π := Π(3, 0, 0) as reference: we will choose(xm) to optimize Π and evaluate afterwards the other Π(u, v, w), for this choice, beforedoing the same with M22.

Optimization of the preferred form

By definition, Π is in the desired diagonalized form

Π = (log Q)3∑k

ν(k)y2k. (6.76)

Conversely, let g = µ ? µ be the Dirichlet convolution inverse of τ , then

xm =∑k

g(k)k

ykm. (6.77)

From this we express the linear form3 in (6.19) in terms of yk

M1 =∑m

xmm

logq

m=∑k

j(k)yk (6.78)

wherej(k) =

1k

∑ab=k

g(a)(

logq

b

).

Lemma 36. For any integer k > 1 we have

j(k) =µ(k)k

(log qk).

Proof. We have ∑k>1

g(k)k−s = ζ(s)−2

3Strictly speaking, the main term of the linear form, but we will keep the same notation.

128

and therefore ∑k>1

j(k)k−s = ζ(s+ 1)−2 ×(

(log q)ζ(s+ 1) + ζ ′(s+ 1))

= (log q)ζ(s+ 1)−1 − (ζ−1)′(s+ 1)

whence the result. 2

By Lemma 35, the best choice to optimize Π with respect to M1 is

yk =

j(k)ν(k)

=kµ(k)ϕ(k)

(log qk), if k 6M

0, if k > M(6.79)

and xm is given by (6.77), from which (and the lemma) the conditions required insection 6.1.2 are immediately verified: obviously, yk is supported on squarefree integersk 6M , hence so is xm. Moreover

g(k) =∑ab=k

µ(a)µ(b) τ(k)

andyk (log q)(log log k)

so (very crudely)

|xm| =∣∣∣∑k

g(k)k

ykm

∣∣∣ (log q)4.

We now compute the various terms in (6.75) to apply the estimate (6.3). 4

Lemma 37. With the previous notations and hypothesis, with M = q∆, we have

M1 = (log q)3 ×∆(∆2

3+

∆2

+14

)+O

((log q)2

)= m1(∆)(log q)3 +O((log q)2),

where m1 is the polynomial (6.7), and

Π = (log q)6 ×∆(∆2

3+

∆2

+14

)+O

((log q)5

).

Proof. By the choice of (yk) and Lemma 36, we have

(log Q)−3Π = M1 =∑k

j(k)2

ν(k)=∑k

µ(k)2

ϕ(k)(log qk)2

and therefore the result follows by partial summation from∑k6K

µ(k)2

ϕ(k)= logK +O(1)

4Since j(k) is about (log k)/k and ν is about k−1, it is already quite clear that we will get a positive(harmonic) proportion if M = q∆ for any ∆ > 0.

129

which is well-known, and immediately derived from the residue at s = 0 of the Dirichletseries ∑

n>1

µ(n)2

ϕ(n)n−s =

∏p

(1 + p−s−1(1− p−1)−1)

=ζ(s+ 1)ζ(2(s+ 1))

∏p

(1 +

1(1− p−1)(1 + p−s−1)

).

2

Estimation of the other quadratic forms

For the other quadratic forms, we have

Π(u, v, w) = (log Q)u∑k

ν(k)y(v)k y

(w)k

where y(i)k is given by (6.72),

y(i)k =

∑m

τ(m)m

(logm)ixkm.

We can express y(i)k in terms of (yk) using the higher Von Mangoldt function Λi,

which is defined by the Dirichlet convolution

Λi = µ ? (log)i,

so that(logm)i =

∑ab=m

Λi(a).

From this, and the fact that the xm’s are supported on squarefree integers, we derive

y(i)k =

∑`6M/k

τ(`)`

Λi(`)yk`. (6.80)

We state the properties of Λi which we will use.

1. Λ1 = Λ, the usual Van-Mangoldt function.

2. Λi is supported on integers having at most i distinct prime factors.

3. If m = p1 . . . pi, for distinct primes p1,. . . , pi, then

Λi(m) = i!(log p1) . . . (log pi).

4. If p1 and p2 are distinct primes, then

Λi(p1) = (log p1)i

Λ3(p1p2) = 3(log p1)(log p2)(log p1p2).

130

All of these are well known and (or) easy to prove from the recurrence relation

Λi+1 = (log)Λi + Λ ? Λi.

In (6.80) we are thus actually dealing with a sum over squarefree ` having at mosti prime factors, and i 6 3. We separate the sum into the parts with a fixed number ofprime factors, which produces multiple (at most triple) sums over primes (of Mertenstype since τ(`)`−1 = 2j`−1 for such ` with ω(`) = j prime factors).

The subsum with i distinct prime factors is, by the above

2ii!∑

`6M/kω(`)=i

Λi(`)`

µ(k`)(log qk`)k`

ϕ(k`)

= (−2)ii!kµ(k)ϕ(k)

∑p1<...<pip1...pi6M/k(p1...pi,k)=1

(log p1) . . . (log pi)p1 . . . pi

(log qkp1 . . . pi) +O((log q)i

k

ϕ(k))

= (−2)ii!kµ(k)ϕ(k)

∑p1<...<pip1...pi6M/k

(log p1) . . . (log pi)p1 . . . pi

(log qkp1 . . . pi) +O((log q)i(log2 q)

k

ϕ(k))

= (−2)ikµ(k)ϕ(k)

∑p1,...,pi

p1...pi6M/k

(log p1) . . . (log pi)p1 . . . pi

(log qkp1 . . . pi) +O((log q)i(log2 q)

k

ϕ(k))

the error term arising from neglecting the smaller contribution from the primes dividingk and replacing ϕ(p)−1 by p−1 using the fact that

∑p

(log p)A

p(p− 1)< +∞.

From Mertens’s formula, the last sum is equal, up to O((log q)i

), to the integral∫

y1>0,...,yi>0y1+...+yi6(logM/k)

(log qk + y1 + . . .+ yi)dy

= (log qk)(

logM

k

)i ∫Si

dx+ i(

logM

k

)i+1∫Si

x1dx

Here Si = (x1, . . . , xi) | xj > 0, x1 + . . . + xi 6 1 is the standard i-simplex. Byinduction, one gets immediately∫

Si

dx =1i!,

∫Si

x1dx =1

(i+ 1)!

so this contribution to the sum (6.80) can be written as

(−2)iµ(k)(i+ 1)!

(log

M

k

)i(log qi+1M ik) +O

((log q)i(log2 q)

k

ϕ(k)). (6.81)

131

This is enough to give y(1)k ; for y(2)

k there is an additional sum over primes which,by similar computations, is

− 2kµ(k)ϕ(k)

∑p6M/K

(log p)2

p(log qkp) +O

((log q)(log2 q)

2 k

ϕ(k))

= −13kµ(k)ϕ(k)

(log

M

k

)2(log q3M2k) +O

((log q)2 k

ϕ(k));

and for y(3)k there are two other sums, first

− 2kµ(k)ϕ(k)

∑p6M/K

(log p)3

p(log qkp) +O

((log q)(log2 q)

3 k

ϕ(k))

= −16kµ(k)ϕ(k)

(log

M

k

)3(log q4M3k) +O

((log q)3 k

ϕ(k));

and finally

12kµ(k)ϕ(k)

∑p1<p2

p1p26M/k

(log p1p2)(log p1)(log p2)p1p2

(log qkp1p2) +O((log q)2(log2 q)

2 k

ϕ(k))

= 12kµ(k)ϕ(k)

∑p1p26M/k

(log p1)2(log p2)p1p2

(log qkp1p2) +O((log q)2(log2 q)

2 k

ϕ(k))

=12kµ(k)ϕ(k)

(log

M

k

)3(log q4M3k) +O

((log q)3 k

ϕ(k)).

From all this we conclude:

Lemma 38. For i = 1, 2, 3, we have

y(i)k = ci

kµ(k)ϕ(k)

(log

M

k

)i(log qi+1M ik) +O

((log q)i(log2 q)

k

ϕ(k))

(6.82)

withc1 = −1, c2 =

13, c3 = 0. (6.83)

It is now easy to finish the computation of the quadratic form m21 for our choice ofyk.

Lemma 39. With notations as in Lemma 37

Π(2, 1, 0) = −(log q)6 ×∆2((1

2+ ∆

)2 −∆(1

2+ ∆

)+

∆2

4

)+O

((log q)5 log2 q

)Π(1, 1, 1) = (log q)6 ×∆3

(43(1

2+ ∆

)2 −∆(1

2+ ∆

)+

∆2

5

)+O

((log q)5 log2 q

)Π(1, 2, 0) =

13

(log q)6 ×∆3((1

2+ ∆

)2 −∆(1

2+ ∆

)+

∆2

5

)+O

((log q)5 log2 q

)Π(0, 1, 2) = −1

3(log q)6 ×∆4

(32(1

2+ ∆

)2 −∆(1

2+ ∆

)+

∆2

6

)+O

((log q)5 log3

2 q)

Π(0, 0, 3) = O((log q)5 log3

2 q).

132

Proof. All are similar, so take for instance Π(0, 1, 2); from the previous lemma

Π(0, 1, 2) = −13

∑k6M

µ(k)2

ϕ(k)

(log

M

k

)3(log q3M2k)(log q2Mk) +O

((log q)5(log2 q)

3)

and the sum, by summation by parts again, is – up to O((log q)5

)– the same as the

integral∫ M

1

(log

M

x

)3(log q3M2x)(log q2Mx)

dx

x=∫ logM

0y3(3 log qM − y)(2 log qM − y)dy

from which the result follows, since moreover log q = log√q +O(1). 2

Therefore, from the definition (6.75), we get:

Corollary 4. We have

m21 = m21(∆)(log q)6 +O((log q)5(log2 q)3)

where m21 is the polynomial (6.8).

The other contribution to the second moment

Recall that

M22 =∑b

1b

∑m1,m2

T (m1m2)m1m2

xbm1xbm2

(log

Q

m1m2

).

The formula (6.49) again separates m1 and m2, so

M22 = 2∑b

1b

∑a

µ(a)a2

∑m1,m2

τ(m1)T (m2)m1m2

xabm1xabm2

(log

Q

a2m1m2

)= 2

∑k

ν(k)∑m1,m2

τ(m1)T (m2)m1m2

xkm1xkm2

(log

Q

m1m2

)− 4

∑k

ν1(k)∑m1,m2

τ(m1)T (m2)m1m2

xkm1xkm2 .

Let m22 denote the first term; this will be the main contribution. The treatment isnow similar to that of m21: define

zk = z(0)k =

∑m

T (m)m

xkm

z(1)k =

∑m

T (m)m

(logm)xkm

and

Π(a, b, c) = (log Q)a∑k

ν(k)y(b)k z

(c)k ;

133

then

m22 = 2(

Π(1, 0, 0)− Π(0, 1, 0)− Π(0, 0, 1))

(6.84)

M22 = m22 − 4∑k

ν1(k)ykzk. (6.85)

Lemma 40. We have

zk = 2∑

`6M/k

(log `)Λ(`)`

yk`

z(1)k =

∑`6M/k

τ(`)Λ(`)`

zk` +∑

`6M/k

T (`)Λ(`)`

yk`.

Proof. For the first one, (6.77) implies

zk =∑`

( ∑mn=`

T (m)m

g(n))yk`

and the Dirichlet generating series for the coefficient of ` is L(s+ 1) where

L(s) = ζ(s)−2∑n

T (n)n−s.

From the first part of lemma 33, we get∑n

T (n)n−s = 4ζζ ′′ − 2(ζζ ′)′ = 2(ζζ ′′ − (ζ ′)2)

so L(s) = 2(ζ ′ζ−1)′. As to z(1)k , write

logm =∑`b=m

Λ(`),

and then

z(1)k =

∑m

T (m)m

∑`b=m

Λ(`)xkm =∑`

Λ(`)`

∑m6M/`

T (`m)m

xk`m

=∑

`6M/k

τ(`)Λ(`)`

zk` +∑

`6M/k

T (`)Λ(`)`

yk`

by the multiplicative property of T . 2

We can now evaluate the quadratic form m22. The mollifier was defined by (6.79).

Lemma 41. We have

zk = −13kµ(k)ϕ(k)

(log

M

k

)2(log q3M2k) +O((log q)2 k

ϕ(k))

= −y(2)k +O

( k

ϕ(k)(log q)2 log2 q

)and

z(1)k = O

( k

ϕ(k)(log q)3

).

134

Proof. We will be brief : on the one hand

zk = −2kµ(k)ϕ(k)

∑p6M/k

(log p)2

plog qkp+O(

k

ϕ(k)(log2 q)

3)

= −2kµ(k)ϕ(k)

∫ logM/k

0y(y + log qk)dy +O

( k

ϕ(k)(log q)2

)= −1

3kµ(k)ϕ(k)

(log

M

k

)2(log q3M2k) +O

( k

ϕ(k)(log q)2

)and on the other hand the two contributions to z(1)

k are respectively (using the previouscomputation)

13kµ(k)ϕ(k)

∑p6M/k

2 log pp

(log

M

p

)2(log q3M2p) =

16kµ(k)ϕ(k)

(log

M

k

)3(log q4M3k)

+O( k

ϕ(k)(log q)3

)and (this is the same as one of the sums considered in y

(3)k )

−kµ(k)ϕ(k)

∑p6M/k

2(log p)3

p(log qkp) = −1

6kµ(k)ϕ(K)

(log

M

k

)3(log q4M3k) +O

( k

ϕ(k)(log q)3

).

2

From this (referring to lemma 39), we obtain

Π(1, 0, 0) = −(log Q)∑k

ν(k)yky(2)k +O

((log q)5

)= −Π(1, 2, 0) +O

((log q)5

)(6.86)

Π(0, 1, 0) = −∑k

ν(k)y(1)k y

(2)k +O

((log q)5

)= −Π(0, 1, 2) +O

((log q)5

)(6.87)

Π(0, 0, 1) = O((log q)5

).

Hence from (6.84) we derive the final estimate for m22.

Corollary 5. We have

m22 = m22(∆)(log q)6 +O((log q)5)

where m22 is the polynomial (6.9).

6.1.5 Harmonic non-vanishing

Let us now dispose of the residual quadratic forms. Those are the forms Π(t, u, v, w)with t+u+v+w 6 2, which enter into the quadratic forms M21 and M3, and the formwithout name in (6.85) which enters in M22.

135

Lemma 42. Let t, u, v, w be non-negative integers with

t+ u+ v + w 6 2.

ThenΠ(t, u, v, w) (log q)u+v+w+2(log log q)t+2 (log q)5(log log q)4

(where Π(t, u, v, w) refers to the value of the quadratic form for the vector (xm) previ-ously chosen). Moreover ∑

k

ν1(k)ykzk (log q)5(log log q)3.

Proof. By Lemma 38 we have

y(i)k (log log k)(log k)i+1

and therefore, directly from the definition

Π(t, u, v, w) = (log Q)u∑k

νt(k)y(v)k y

(w)k

(log q)u+v+w+2∑k6M

(log log k)t+2

k

(log q)u+v+w+3(log log q)t+2

and similarly from Lemma 41∑k

ν1(k)ykzk (log q)5(log log q)3.

2

We can now summarize our computations, referring to the definition of the polyno-mials m1 and m2 in Theorem 16, by saying that it follows from the decomposition (6.84and the results of Lemma 37, Corollaries 4 and 5 (together with Lemma 42 whichconfirms (6.67) and (6.73)), that for ∆ < 1

4 the asymptotic formula

M2 = m2(∆)(log q)6 +O((log q)5(log log q)4

)holds, where m2 is defined in (6.10).

Thus, applying (6.3), we obtain the estimate claimed in Theorem 16, valid for ∆ < 14 :

∑h

εf=−1

L′(f, 12

)6=0

1 >m1(∆)2

m2(∆)+O

((log log q)4

log q

).

An explicit calculation shows that

m1(14)2

m2(14)

=1954

136

and this establishes Theorem 15, the analogue of Theorem 10 for the harmonic average.Remark A point worth noticing, in comparison with Chapter 5, is that we have notused any deeper knowledge of the primes than Tchebychef’s estimate∑

p6x

log pp

= log x+O(1) (6.88)

and in particular we did not need the Prime Number Theorem. This is an advantageof working with an “abstract” mollifier, and optimizing it after changing variables.Unfortunately, this could not be imitated in the context of Chapter 5, because themollifier obtained would not be adequate for the application of Selberg’s lemma 16;compare with [Vdk], where a zero-free region of the Riemann zeta function is alsorequired.

6.2 Removing the harmonic weight: the head, II

We now proceed to prove Theorem 10 and the technique of Chapter 3 to go from anharmonic average to the natural one.

Let again M = q∆ with ∆ < 14 . We consider the first and second moments for the

natural average

Mn1 =

∑f∈S2(q)∗

ε−fM(f)L′(f, 12) = A[ε−fM(f)L′(f, 1

2)],

Mn2 =

∑f∈S2(q)∗

ε−f |M(f)L′(f, 12)|2 = A[ε−f |M(f)L′(f, 1

2)|2],

the mollifier M(f) being again of the form (6.4), for some real numbers (xm) supportedon squarefree integers m (this time this will be a necessary assumption from the start)with the same growth condition

xm (τ(m)(log qm))A

(for some absolute constant A > 0).Let κ > 0 be such that ∆ + 2κ < 1

4 . We wish to apply Proposition 8 to Mn1 and

Mn2 .

Lemma 43. For any odd primitive form f ∈ S2(q)∗, we have the bounds

ωfM(f)L′(f, 12) q−5/8,

ωf |M(f)L′(f, 12)|2 q−1/4.

Moreover

Ah[ε−f |M(f)L′(f, 12)|] (log q)C1 , (6.89)

Ah[ε−f |M(f)L′(f, 12)|2] (log q)C2 . (6.90)

The constants C1 and C2 depend only on the constant A > 0.

137

Proof. Let f be an odd primitive form. From the definition of M(f) and the growthcondition on xm we derive

M(f) (log q)BM1/2

for some constant B = B(A). Moreover, for the special value of the derivative of theL-function, we have the classical convexity bound (in q-aspect)

L′(f, 12) q1/4(log q)2

which we can reprove from (6.15), estimating by means of Lemma 26 and Deligne’sbound:

L′(f, 12) log q

∑l6√q

τ(l)√l

+√q∑l>√q

τ(l)l3/2 q1/4(log q)2.

On the other hand, we use again ωf (log q)q−1 from (3.16), and so

ωf |M(f)L′(f, 12)|2 (log q)2B+5q∆−1/2 q−1/4

and similarly for ωfM(f)L′(f, 12).

As for the averages, the second one is the same as the moment M2 previouslyconsidered,5 and we immediately obtain (6.90) from Proposition 19 using the growthcondition. Then (6.89) is deduced from (6.90) by Cauchy’s inequality. 2

Remark The bound for the derivative of the L-function is of course far from theconjectured truth

L′(f, 12)ε q

ε

(the Lindelof Hypothesis), but it will suffice here. However, it is important to mentionthat this convexity bound has been improved by Duke, Friedlander, Iwaniec [DFI], whoshowed that

L′(f, 12)ε q

47/192+ε.

The proof is based on the so-called “amplification technique”, and involves also verydeep treatments of the complementary term in the Petersson formula.

The Lemma verifies the conditions (3.17) and (3.18) and with x = qκ, we concludefrom Proposition 8 that there exists δ = δ(κ,∆) > 0 such that

Mn1 =

dim J0(q)ζ(2)

∑h

f∈S2(q)∗

ωf (x)ε−fM(f)L′(f, 12) +O(q1−δ), (6.91)

Mn2 =

dim J0(q)ζ(2)

∑h

f∈S2(q)∗

ωf (x)ε−f |M(f)L′(f, 12)|2 +O(q1−δ), (6.92)

where ωf (x) is the partial sum of length x of the symmetric square, see (3.20)

ωf (x) =∑n6x

ρf (n)n

=∑d`26x

λf (d2)d`2

.

5We avoid the notation, because it must be emphasized that the xm now considered are differentfrom those used at the end of the previous section.

138

We denote the moments with ωf (x) by Mi:

M1 =∑h

f∈S2(q)∗

ωf (x)ε−fM(f)L′(f, 12)

M2 =∑h

f∈S2(q)∗

ωf (x)ε−f |M(f)L′(f, 12)|2.

This last section will be extremely technical: the combinatorics of the various formsinto which the second moment is decomposed, pretty much kept under control in theformer Section, are now much harder to follow because of the loss of multiplicativityin the coefficients. The reader should probably confine her interest to the main term,since it is much the same (of course) as in the case of M2. Justification for this is that,if something was to go wrong, we would discover a correlation between the values ofL(Sym2 f, 1) and L′(f, 1

2)2, and this would be even more interesting than the theoremclaimed! It could be that we can not prove the theorem, but without establishingthis connection: such would be the case, for instance, if some residual estimate wasseen to depend on the existence of an exceptional (Landau-Siegel) zero for Dirichletcharacters. Indeed, a correlation of this kind is one of the many strange effects ofthis unlikely event. But the reader should be able to convince himself easily that thecomputations required to take care of all details involve no deeper facts than before,namely Tchebychef’s estimate (6.88) suffices to close the books.

6.2.1 Computation of the first moment

From (6.15), incorporating directly the mollifier M(f), we get

M1 =∑n,m

xm√nm

V(2πn√q

)×∆n

−(m,n)

where ∆n− is the Delta-symbol without weight for odd forms of Section 4.3. As in the

computation of M1 in Section 6.1.2, the basic estimate (4.13) is sufficient to get a goodapproximation, namely

M1 =∑m

xmm

∑d`26x

1(d`)2

∑r|(d2,m)

rV(2πmd2

r2√q

)+O

(xM√q

(log q)B)

and similarly (6.16) gives now

M1 =∑m

xmm

∑d`26x

1(d`)2

∑r|(d2,m)

r(

logqr2

md2

)+O(q−δ)

=∑m

xmm

∑r|m

r∑d`26xr|d2

1(d`)2

(log

qr2

md2

)+O(q−δ)

for some δ > 0 (since xM = qκ+∆ and κ+ ∆ < 14).

The summation over m is supported on squarefree integers, hence that over thedivisors r also. But if r is squarefree, then r divides d2 if and only if r divides d.

139

We use this information in the first place, as in the previous Chapter, to remove theconstraint d`2 6 x in the inner summation: the difference is at most

(log q)∑m

|xm|m

∑r|m

r∑d`2>xr|d

(d`)−2 log q√x

∑m

|xm|m

∑r|m

1√r (log q)B√

x

by a calculation similar to that of Lemma 22 and the growth assumption (6.6).Hence, with the summation over d and ` now free, we further have

M1 = ζ(2)∑m

xmm

∑r|m

r∑r|d

1d2

(log

qr2

md2

)+O(qδ)

= ζ(2)2∑m

d−1(m)xmm

(log

q

m

)+ 2ζ(2)ζ ′(2)

∑m

d−1(m)xmm

+O(qδ) (6.93)

for some δ = δ(∆) > 0.

6.2.2 Computation of the second moment

By the definition of ωf (x) we have

M2 =∑b

1b

∑m1,m2

xbm1xbm2√m1m2

∑d`26x

1d`2

Ah[ε−f λf (d2)λf (m1m2)L′(f, 12)2]

=∑b

1b

∑m1,m2

xbm1xbm2√m1m2

∑d`26x

1d`2

∑r|(m1m2,d2)

Ah[ε−f λf(m1m2d

2

r2

)L′(f, 1

2)2].

Sincem1m2d

2

r26M2x2 6 q2κ+2∆ < q1/2,

Proposition 18 is applicable, and we get a decomposition ofM2 which is exactly similarto the one of M2 given by Proposition 19: for some δ > 0,

M2 =112M21 −

14M22 +M3 +O(q−δ) (6.94)

where

M21 =∑b

1b

∑m1,m2

xbm1xbm2

m1m2

∑d`26x

1(d`)2

∑r|(d2,m1m2)

rτ(m1m2d

2

r2

)(log

Qr2

m1m2d2

)3

(6.95)

M22 =∑b

1b

∑m1,m2

xbm1xbm2

m1m2

∑d`26x

1(d`)2

∑r|(d2,m1m2)

rT(m1m2d

2

r2

)(log

Qr2

m1m2d2

)(6.96)

M3 =∑b

1b

∑m1,m2

xbm1xbm2

m1m2

∑d`26x

1(d`)2

∑r|(d2,m1m2)

rτ(m1m2d

2

r2

)P1

(log

Qr2

m1m2d2

).

(6.97)

140

6.2.3 Mutations of the second moment

Those quadratic forms M21, M22, M3 should be compared to M(δ) of (5.54). Thestrategy to deal with them will be the same: after preliminary transformations, weremove the condition d`2 6 x which brings a decomposition in terms of quadratic formswith mutative coefficients, as discussed in the Appendix to Section 4.3. After that, thepath is the same as for M2: a preferred term is selected, and optimization is performedfor this term and M1, before final estimations produce the no less final result.

The alphabet has only 26 letters: lest notations become unwieldy, we recycle someof those of the previous sections, and enlist the fraktur letters i, j, k, l as indices (theletters u, v, w will appear as coefficients, as in Chapter 5).

We define for convenience t0 = τ , t2 = T . First, observe that each of M21, M22

and M3, after expanding the powers of logarithms, can be expressed as a sum over bof quadratic forms∑

m1,m2

xbm1xbm2

m1m2

(log

Q

m1m2

)i ∑d`26x

(log d2)k

(d`)2

∑r|(m1m2,d2)

r(log r)ltj

(m1m2d2

r2

)(6.98)

which we denote byMx(i, j, k, l). The parameters satisfy 0 6 i+ j+ k+ l 6 3, j is 0 or 2.Although this looks forbidding enough, the main contribution will come from the

case i + j = 3, k = l = 0, in which case the inner sums involve only multiplicativefunctions (except for T when j = 2), and for the others, mutativity will create sundryquadratic forms during the process of “diagonalization”, as in (4.14), with very com-plicated coefficients ν, but yet small, and the loss of logarithms in the variables wk, ykwill make all these terms smaller, as happened for M2 before.

First we remove the constraint d`2 6 x. To do this, we separate m1m2/r and d2/rusing the identities∑

r|(m,n)

r(log r)lτ(mnr

)=

∑r|(m,n)

ul(r)τ(mr

)τ(nr

)∑

r|(m,n)

r(log r)lT(mnr

)=

∑r|(m,n)

ul(r)τ(mr

)T(nr

)+ T

(mr

)τ(nr

)where (compare with the function u(s, r) of Chapter 5) we have put

ul(r) =∑ab=r

µ(a)b(log b)l;

the first of these is a consequence of (6.68), and the second is one of (6.49).Therefore Mx(i, 0, k, l) is equal to∑

m1,m2

xbm1xbm2

m1m2

(log

Q

m1m2

)i ∑r|m1m2

ul(r)τ(m1m2

r

) ∑d`26xr|d2

(log d2)k

(d`)2τ(d2

r

)

and Mx(i, 2, k, l) is a sum of two terms, one of which is∑m1,m2

xbm1xbm2

m1m2

(log

Q

m1m2

)i ∑r|m1m2

ul(r)T(m1m2

r

) ∑d`26xr|d2

(log d2)k

(d`)2τ(d2

r

)(6.99)

141

and the other the same with τ and T interchanged.It is now clear that for each of the quadratic forms we can extend the summation

in d and ` to infinity: indeed, we recognize∑d`26xr|d2

(log d2)k

(d`)2τ(d2

r

)= (−1)kv(k)

x (1, r)∣∣∣t=0

where vx(s, r) is defined in (5.55) and the estimation (5.57) of Lemma 22 extends tothe derivatives (up to some logarithms) to put this removal into effect. Similarly,∑

d`26xr|d2

(log d2)k

(d`)2T(d2

r

)= (−1)k d

2

dt2v(k)x (1, r)

∣∣∣t=0

(the dependence on t of vx(s, r) is not displayed in the notation), and the same remarkapplies.

Hence we remove x and get quadratic forms M(i, j, k, l). We write simply

v(k)(r) = v(k)(1, r), v[k](r) =d2

dt2v(k)(1, r)

∣∣∣t=0

.

(the latter can only occur with k = 1 when i = 0, j = 2, l = 0, and contributes to theerror term at the end; it can be safely ignored by the reader). With these notations, itholds

M(i, 0, k, l) =∑m1,m2

xbm2xbm2

m1m2

(log

Q

m1m2

)i ∑r|m1m2

τ(m1m2

r

)ul(r)v(k)(r)

(and a cognate expression for M(i, 2, k, l)).Since v(s, r) was computed in Lemma 22, and found to be the product of v(s, 1) and

a multiplicative function, it follows from Leibniz’s rule that the derivatives v(k)(1, r) aremutative in the sense of the Appendix to Section 4.3, and the yoga described there canbe used to diagonalize the M(i, j, k, l).

These mutativity properties of ul and v(k) take the following simple form. For ul,from the definition we have

ul(mn) =∑i+j=l

(l

i

)ui(n)uj(n).

for (m,n) = 1. On the other hand, writing

v(s, r) = v(s, 1)v(s, r)

with v multiplicative, we see that v(k) is first a linear combination of the functions

v(j)(r) = v(j)(1, r)

for j 6 k, and each v(k)(r) satisfies in turn

v(k)(mn) =∑i+j=k

(k

i

)v(i)(m)v(j)(n)

142

for (m,n) = 1, by differentiating at s = 1 the formula

v(s,mn) = v(s,m)v(s, n).

Similar mutations are observed for v[k]. The following lemma will be useful.

Lemma 44. For all k > 0, l > 0, we have

ul(r)v(k)(r) rτ(r)2(log r)k+l

N(r)2

ul(r)v[k](r) rτ(r)2(log r)k+l

N(r)2

the implied constants depending only on k and l.

Proof. This follows very quickly from the computation of v in Lemma 22 and thedefinition of ul. 2

This mutativity enables us to further decomposeM(i, 0, k, l) as a linear combinationof forms of the type

∑m1,m2

xbm1xbm2

m1m2

(log

Q

m1m2

)iw(k, l;m1m2) (6.100)

where we have definedw(k, l;m) =

∑ab=m

τ(a)ul(b)v(k)(b) (6.101)

which has the multiplicativity property that w(k, l;mn), for (m,n) = 1, is a linearcombination of the functions w(k1, l1;m)w(k2, l2;n) with k1 + k2 = k, l1 + l2 = l.

Thus, in (6.100), extracting the common divisor of m1 and m2, then removing theresulting coprimality condition by Mobius inversion, and collecting the variables, allas in the Appendix to Section 4.3 (except that we keep the logarithm attached to thevariables xbm1 , xbm2 until the very end, and then expand again the power) will bringfor M(i, 0, k, l) a linear combination of expressions of the type

(log Q)i∑k

ν(k)wkyk

with

ν(k) =1k

∑ad=k

µ(d)h1(d)2h2(a2)ad

(log a)i1(log d)i2

wk =∑m

h3(m)(logm)i3xbkm

yk =∑m

h4(m)(logm)i4xbkm

for some i1, i2, i3, i4 with0 6 i1 + i2 6 i,

143

and hi = w(ki, li) for some ki 6 k, li 6 l. The total “weight” in logarithms must be atmost 3,

i + i1 + i2 + i3 + i4 + k1 + l1 + k2 + l2 + k3 + l3 + k4 + l4 6 3.

In practice, every logarithm from the store of 3 available which is diverted into oneof the coefficient functions hi (so ki + li > 0) or ν (so i1 + i2 > 0), has the effect ofmaking the given form of lower order of magnitude than the main term Π correspondingto i + i3 + i4 = 3, when evaluated for the chosen mollifier (xm): the estimate

(log Q)i∑k

ν(k)wkyk = O(

Π(log log q)B

log q

)(6.102)

will hold, for some absolute constant B > 0, except in this case.

6.2.4 The preferred quadratic form, II

In this section, we will incorporate back the sum over b into the quadratic forms; sincethis is everywhere present and propagates into all computations, the adaptation of theformulae and principles of the previous section is immediate.

We now start again by selecting a preferred quadratic form: in M21, we selectM(3, 0, 0, 0). This only involves the multiplicative function u = u0 and the functionv = v(0), which by Lemma 22 is given by

v(r) =ζ(2)4

ζ(4)v(r), v(r) = N(r)−2

∏p||r

τ(p)1 + p−2

.

Thus, only the constant factor comes out when performing the steps describedpreviously, and we have the expression

Π =M(3, 0, 0, 0) =ζ(2)4

ζ(4)

∑b

1b

∑m1,m2

xbm1xbm2

m1m2

(log

Q

m1m2

)3w(m1m2)

wherew(m) = w(0, 0;m)

(compare (5.58)). Recall that

w(m) =∑ab=m

τ(a)u(b)v(b).

We expand the logarithm (log

Q

m1m2

)3

in Π and thus – as in the case of M2 –, we obtain the decomposition

Π = Π(3, 0, 0)−6Π(2, 1, 0) + 6Π(1, 1, 1) + 6Π(1, 2, 0)−6Π(0, 1, 2)−2Π(0, 0, 3). (6.103)

with

Π(i, j, k) =ζ(2)4

ζ(4)(log Q)i

∑b

1b

∑m1,m2

w(m1m2)m1m2

(logm1)j(logm2)kxbm1xbm2 .

We state the output of the diagonalization procedure.

144

Lemma 45. For all i, j, k, we have

Π(i, j, k) =ζ(2)4

ζ(4)(log Q)i

∑k

ν(k)y(j)k y

(k)k

with

ν(k) =1k

∑abd=k

µ(d)w(d)2w(a2)ad

y(j)k =

∑m

w(m)m

(logm)jxkm,

(and we let as usual yk = y(0)k ).

Proof. This is what has been described before. 2

Notice that (xm) is supported on squarefree integers if and only if (yk) is also.We will choose the mollifier by optimizing Π(3, 0, 0), which we simply denote by Π,

with respect to the linear form M1.

6.2.5 Optimization of the preferred form

We will first prove some preliminary lemmas, and some more notation is needed. Letg be the Dirichlet convolution inverse of w. The change of variable from xm to yk isinverted by

xm =∑k

g(k)k

ykm. (6.104)

Furthermore, let j and j1 be the arithmetic functions

j(k) =1k

∑ab=k

g(a)d−1(b), j1(k) =1k

∑ab=k

g(a)d−1(b)(

logq

b

)(6.105)

so that the (main term of the) linear form M1 is expressed by

M1 = ζ(2)2∑k

j1(k)yk + 2ζ(2)ζ ′(2)∑k

j(k)yk. (6.106)

Lemma 46. The multiplicative function w satisfies

w(k) = τ(k)∏p|k

1 + p−1

1 + p−2(6.107)

for all squarefree integers k and

w(p2) = 3 +1p− 1p2

+ 4p− 1p2 − 1

(6.108)

for all primes p.

145

The multiplicative function ν satisfies

ν(k) =1k

∏p|k

A(p−1) (6.109)

for all squarefree integers k, where A is the rational function

A =(1−X2)3

(1 +X2)2.

Proof. Multiplicativity is obvious in all cases, and the rest is a matter of computingwithout mistakes. Note that the first two statements are also derived in the proof ofLemma 24.

The last computation gives a mysteriously clean result; recall that

ν(p) =1p

(1 +

w(p2)p− w(p)2

p

)and it doesn’t appear at first sight that (6.107) and (6.108) will give such a simplifica-tion. 2

Lemma 47. The arithmetic function j is multiplicative with

j(k) =µ(k)k

∏p|k

B(p−1) (6.110)

for all squarefree integers k, where B is the rational function

B =(1−X2)(1 +X)

1 +X2.

The function j1 satisfiesj1(k) = j(k)(log qk +O(1)). (6.111)

Proof. The first statement is another direct computation, using the fact that

g(p) = −w(p) = −21 + p−1

1 + p−2.

We have j1(k) = (log q)j(k)− j2(k) where

j2(k) =1k

∑ab=k

g(a)d−1(b)(log b)

and from the multiplicativity of j, we see that j2 satisfies the mutativity property

j2(mn) = j(m)j2(n) + j2(m)j(n)

hence by induction, for k squarefree,

j2(k) =∑p|k

j2(p)j(kp

)= j(k)

∑p|k

(1 + p−1)(log p)j(p)−1

= −j(k)∑p|k

(log p)1 + p−2

1− p−2= −j(k)(log k +O(1))

using (6.110), which was just proved, along the way. 2

146

The last lemma will be used for summation by part.

Lemma 48. The Dirichlet series∑k>1

µ(k)2j(k)2

ν(k)k−s

is absolutely convergent for Re (s) > 0 and extends to a meromorphic function forRe (s) > −1 with a simple pole of order 1 at s = 0 with residue

R =ζ(2)ζ(4)

.

Proof. The previous efforts give the Euler product

∑k>1

µ(k)2j(k)2

ν(k)k−s =

∏p

(1 + p−(s+1) (1 + p−2)2

(1− p−2)3· (1− p−2)2(1 + p−1)2

(1 + p−2)2

)=∏p

(1 + p−(s+1) 1 + p−1

1− p−1

)=

ζ(s+ 1)ζ(2(s+ 1))

∏p

(1 +

2p−(s+1)

(1 + p−(s+1))(p− 1)

)and the second Euler product is indeed absolutely convergent for Re (s) > −1. Hencethe lemma holds, with

R =1ζ(2)

∏p

(1 +

2p2 − 1

)=

1ζ(2)

∏p

p2 + 1p2 − 1

=1ζ(2)

· ζ(2)2

ζ(4)

as announced. 2

We can see from these computations that

w(p) = τ(p) +O(p−1), j1(p) = µ(p)(log qp) +O(1), ν(p) =1p

+O(p−2)

so those coefficients are very close to the coefficients used in proving the harmonic caseof the theorem. This is of course not much of a surprise.

We now choose (yk) to optimize Π with respect to M1: let

yk =

µ(k)2 j1(k)ν(k)

, for k 6M

0, for k > M .(6.112)

This is supported on squarefree numbers and we immediately see that the corre-sponding (xm) satisfies condition (6.5).

147

Proposition 20. For this choice of mollifier, it holds

M1 =ζ(2)3

ζ(4)m1(∆)(log q)3 +O((log q)2) (6.113)

where m1 is the polynomial defined in (6.7), and

Π =ζ(2)5

ζ(4)2m21(∆)(log q)6 +O((log q)5(log log q)) (6.114)

where m21 is the polynomial defined in (6.10).

In other words, apart from constant factors, the first moment and the chosen mainterm of the second moment are the same as the corresponding harmonic ones: comparethis with Lemma 37 and Corollary 4.

Proof. First step. From (6.106), we compute

M21 = ζ(2)2∑k6M

j1(k)yk + 2ζ(2)ζ ′(2)∑k6M

j(k)yk

= ζ(2)2∑k6M

µ(k)2j1(k)2

ν(k)+ 2ζ(2)ζ ′(2)

∑k6M

µ(k)2j(k)j1(k)ν(k)

= ζ(2)2∑k6M

µ(k)2j(k)2

ν(k)((log qk)2 +O(log qk)

)+ 2ζ(2)ζ ′(2)

∑k6M

µ(k)2j(k)2

ν(k)(log qk +O(1))

by Lemma 47

=ζ(2)3

ζ(4)m1(∆)(log q)3 +O((log q)2)

(by partial summation from Lemma 48, see Lemma 37). This proves the first statement.Second step. We claim that for i = 1, 2, 3, it holds

y(i)k = ci

µ(k)j(k)ν(k)

(log

M

k

)i(log qi+1M ik) +O

( j(k)ν(k)

(log q)i(log log q)B)

(6.115)

(for some absolute constant B), where c1 = −1, c2 = 13 , c3 = 0, namely the variables

y(i)k behave just like their harmonic analogues, see Lemma 38.

We write again

y(i)k =

∑`6M/k

w(`)`

Λi(`)yk`

and perform the same calculations leading to Lemma 38, involving multiple sums overprimes. But since

w(`) = τ(`) +O(`−1), j1(`) = µ(`)(log q`), ν(`) = `−1 +O(`−2)

148

we see that applying those approximation in a sum over numbers with exactly j 6 i

prime factors provides the same first term as there, while in the others the sum overat least one of the prime variables involved becomes convergent, so it can be summedtrivially, or one logarithm disappears, and in any case, those other terms are at leastone logarithm smaller. For instance, with one prime (i = 1)

y(1)k =

∑`6M/k

τ(`)`

(log `)µ(k`)j1(k`)

ν(k`)+O

( ∑`6M/k

τ(`)(log `)`2

µ(k`)j1(k`)ν(k`)

);

the estimated term is clearly

O(j1(k)ν(k)

)= O

( j(k)ν(k)

(log q)),

now continue with j1, etc. . .Third step. The formula (6.114) holds.Using the result of the second step (and the first step for Π itself), this is the same

as the proof of Lemma 39, using Lemma 48 to perform the summation by parts, andthe decomposition (6.103) to assemble the various parts. 2

6.2.6 The second part of the main term

We now turn our attention to the quadratic formM22 given by (6.96); it gives rise, bythe transformations described in Section 6.2.3, to various terms, only one of which willcontribute to the main term, namely that derived from (6.99) with i = 1 and k = l = 0,which we denote by M′22:

M′22 =ζ(2)4

ζ(4)

∑b

1b

∑m1,m2

xbm1xbm2

m1m2

(log

Q

m1m2

)$(m1m2)

where the arithmetic function $ is

$(m) =∑ab=m

T (a)u(b)v(b).

Using the mutativity of $

$(m1m2) = $(m1)w(m2) + w(m1)$(m2)

which follows from (6.48), the diagonalization yields

M′22 = 2ζ(2)4

ζ(4)

∑k

ν(k)∑m1,m2

w(m1)$(m2)m1m2

xkm1xkm2

(log

Q

m1m2

)+ (other terms)

where the other contributions will be of smaller order (the function T , of “weight” 2,appears only in the coefficient function ν(k) attached to k). We keep the notationM′22

for the first part.This decomposes, exactly as in the harmonic context, as

M′22 = 2(Π(1, 0, 0)− Π(0, 1, 0)− Π(0, 0, 1))

149

where

Π(i, j, k) =ζ(2)4

ζ(4)(log Q)i

∑k

ν(k)y(j)k z

(k)k

and

z(k)k =

∑m

$(m)m

(logm)kxkm.

Proposition 21. With the mollifier given by (6.112), we have

M′22 =ζ(2)5

ζ(4)2m22(∆)(log q)6 +O((log q)5(log log q)B)

for some (absolute) B > 0.

So this term behaves again as its harmonic counterpart (counterpoint?).

Proof. First step. We express zk = z(0)k and z

(1)k in terms of yk:

zk = 2∑

`6M/k

(log `)Λ(`)`

ykl

z(1)k =

∑`6M/k

w(`)Λ(`)`

zk` +∑

`6M/k

$(`)Λ(`)`

yk`

(compare Lemma 40).The first result, exactly the same as the corresponding one in Lemma 40, derives

from the Dirichlet series identities∑m

$(m)m−s =(∑m

T (m)m−s)(∑

m

u(m)v(m)m−s),∑

m

w(m)m−s =(∑m

τ(m)m−s)(∑

m

u(m)v(m)m−s),

since to express zk in terms of yk we need only compute the Dirichlet convolution$ ? g corresponding to dividing those two series. But then the Dirichlet series for uvsimplifies, which means the coefficients for zk are indeed as in Lemma 40.

The computation of z(1)k is the also the same as there.

Second step. We have

zk = −y(2)k +O

( j(k)ν(k)

(log q)2 log log q)

z(1)k = O

( j(k)ν(k)

(log q)3 log log q)

(compare Lemma 41).Observe that for p prime

$(`) = T (`), w(`) = τ(`) +O(`−1)

and proceed as in the proof of that lemma: the claim follows for the same reasonexplained in the proof of Proposition 20.

Third step. The lemma holds: this is a consequence of the previous step, and themethod of proof in the harmonic case, with Lemma 48 for the summations by part. 2

150

6.2.7 The residual quadratic forms

It remains to prove that the (many) quadratic forms that were disregarded during thetransformation process are, when evaluated for the chosen mollifier (xm), of smallerorder of magnitude.

They are the quadratic forms described in (6.100), with the sum over b added, plussome other similar ones coming from M22, which we have not written down explicitly.

To estimate their value, one has to express in terms of yk the linear forms∑m

h(m)(logm)ixkm

described below (6.100), with the sum over b inserted. This amounts to computing theDirichlet convolutions g ? h(log)i, and the claim that those residual terms are smalleris proved by showing that the Dirichlet generating series has a pole of order at mosti at s = 0. Since g, being the inverse of w, has a zero of order 2 at s = 0 (see 46),and the product h(log)i corresponds to the i-th derivative of h = w(k, l), only the latterneeds to be computed, and it will be found to have a pole of order 2 at s = 0, just as wdoes, because in the definition (6.101) of w(k, l), the factor ul(b)v(k)(b) is quite small (byLemma 44), and only perturbs the behavior of w(k, l), compared to that of w, beyondthe line Re (s) = 0.

Then the coefficients ν(k) are bounded by

ν(k) (log log k)B

k

for some absolute B > 0, using again Lemma 44, and from this, the definition of themollifier and Lemma 48, the required estimate will follow by summation by part. Thequadratic forms coming from M22 are treated exactly in the same way.

The task of giving any supplementary details is left to the reader, if more is felt tobe needed at this point: as already mentioned, getting a supplementary contributionwould be quite remarkable, and this sketch shows that everything can be checked.

6.2.8 Conclusion

From (6.93) and Proposition 20, for the first moment, and (6.94), Propositions 20 and 21for the second moment, we now conclude that for ∆ + 2κ < 1

4 we have

M1 =ζ(2)3

ζ(4)m1(∆)(log q)3 +O((log q)2(log log q)B)

M2 =ζ(2)5

ζ(4)2m2(∆)(log q)6 +O((log q)5(log log q)B)

for some B > 0. Therefore, by (6.91) and (6.92)

Mn1 =

ζ(2)2

ζ(4)dim J0(q)m1(∆)(log q)3 +O(q(log q)2(log log q)B)

Mn2 =

ζ(2)4

ζ(4)2dim J0(q)m2(∆)(log q)6 +O(q(log q)5(log log q)B)

151

and finally, for all q prime (large enough), we obtain

|f ∈ S2(q)∗ | εf = −1, L′(f, 12) 6= 0| > (Mn

1 )2

Mn2

=m1(∆)2

m2(∆)dim J0(q) +O

(q

(log log q)B

log q

).

We know already thatm1(1

4)2

m2(14)

=1954

and so by letting ∆ go to 14 , this inequality is the goal (6.1) of this chapter (in a stronger

version), and thus the proof of Theorem 6 is completed, through the non-vanishingTheorem 10.

This probably calls for a celebratory drink.

152

Appendix: Extending the mollifier

In this brief appendix, we sketch how some of the arguments of the previous chaptercan be extended, using techniques of Iwaniec-Sarnak [IS2]. The effect of this is that amollifier of length q∆ with ∆ < 1

2 can be used, instead of merely ∆ < 14 ; this yields a

further improvement of the constant in the non-vanishing theorem. Precisely, we have

Proposition 22. The approximate formula of Proposition 19 holds for all M = q∆

with ∆ < 12 , possibly with a different polynomial P1 in M3 (still of degree at most 2).

From this we deduce immediately:

Corollary 6. For any ε > 0, and any prime number q large enough in terms of ε, wehave

|f ∈ S2(q)∗ | L(f, 12) = 0, L′(f, 1

2) 6= 0| >( 7

16− ε)

dim J0(q),

andrank J0(q) >

( 716− ε)

dim J0(q).

Indeed, the approximate formula for M1 was already valid in the larger range ∆ < 12 ,

and the deduction of (6.1) from Proposition 19 did not use the hypothesis ∆ < 14 in

any other way (except in Lemma 43, which checks the applicability of the weight liftingtechnique of Chapter 3, and which can be immediately extended to this situation). Nowwe simply compute that

m1(12)2

m2(12)

=716

= 0.4375.

Sketch of the proof.Only the use of Lemma 11 in (6.29) (when considering Ah[ε−f λf (m)L′(f, 1

2)2]), whichleads to (6.30), has to be revised, since the approximate formula (17) for X(m) ismeaningful for ∆ < 1

2 (recall that is is applied for m = m1m2, mi 6M).This means we must reconsider the decomposition of the Delta symbol for odd

forms: for m < q, from the Petersson formula and Lemma 10 we have

∆−(m,n) = δ(m,n)− J (m,n) + J ′(m,n) +O(√mn

q2(log q)2

)and the error term is now good enough for ∆ < 1

2 . Of course, the part involvingJ ′(m,n) is the one leading to X(m), and this doesn’t require any modification. So weinvestigate more carefully the remainder term with J (m,n).

Let us denote this by X (m), so

X (m) =∑n>1

τ(n)√nW(4π2n

q

)J (n,m)

=2πq

∑r>1

1r

∑n>1

τ(n)S(m,n; qr)t(n)

with

t(x) =1√xJ1

(4π√mx

qr

)W(4π2x

q

).

153

Opening the Kloosterman sum S(m,n; qr), we get

X (m) =2πq

∑r>1

1r

∑∗

x mod qr

e(mxqr

)∑n>1

τ(n)e(nxqr

)t(n)

and to the inner sum over n we apply Proposition 16 and get

X (m) =2πq

∑r>1

1r× 2qrS(m, 0; qr)

∫ +∞

0(log√x

qr+ γ)t(x)dx

− 2πqr

∑h>1

τ(h)S(m− h, 0; qr)∫ +∞

0Y0

(4π√hx

qr

)t(x)dx

+4qr

∑h>1

τ(h)S(m+ h, 0; qr)∫ +∞

0K0

(4π√hx

qr

)t(x)dx

.

This is reminiscent of the treatment of X ′(m) in Section 6.1.3, but now the maincontribution will not come from the first integral, but from the frequency h = m in thesecond sum, where the Ramanujan sum degenerates completely

S(0, 0; qr) = ϕ(qr)

and thus no cancellation occurs. We will compute exactly this contribution, but willnot justify completely here that the remaining terms are smaller. This is done (in moreprecise form) in the forthcoming paper [IS2], for the case of the second moment of thespecial values L(f, 1

2), but there is no important difference. Actually, there is again atechnical difficulty of convergence due to the fact that the weight is 2 and J1 doesn’tgo to zero quickly enough near 0; it can be treated by inserting in t(x) a test functionξ(x) vanishing near 0 before performing the Voronoı summation, exactly as before.

However we can show quickly that the first term is small: using simply

|S(m, 0; qr)| 6 qr, J1(x) x

and Lemma 27 we find

4πq2

∑r>1

1r2S(m, 0; qr)

∫ +∞

0(log√x

qr+ γ)t(x)dx (log q)4

q

∑r>1

1r

∫ q

0

√mx

qr

dx√x

√m

q(log q)4

which (although very crude) is good enough to extend the mollifier to ∆ = 12 .

The frequency h = m that we mentioned is given by

Y(m) = −4π2

qτ(m)

∑r>1

ϕ(qr)r2

∫ +∞

0Y0

(4π√mx

qr

)J1

(4π√mx

qr

)W(4π2x

q

) dx√x

= −2πq

τ(m)√m

∑r>1

ϕ(qr)r

∫ +∞

0Y0(x)J1(x)W

(qr2x2

4m

)dx

154

which is rewritten as a complex integral using the definition (6.20) of the function W

Y(m) = −2πq

τ(m)√m× 1

2iπ

∫(1/2)

ζq(1 + 2s)Z(s)(4mq

)sΓ(s+ 1)2G(s)H(s)

ds

s3

where

Z(s) =∑r>1

ϕ(qr)r−(1+2s)

H(s) =∫ +∞

0Y0(x)J1(x)x−2sdx.

Lemma 49. We have for all q prime

Z(s) = ϕ(q)ζ(2s)

ζq(1 + 2s)

H(s) =1

2√π

Γ(s)Γ(1

2 − s)Γ(1− s)2

Γ(1 + s)2.

Proof. For the first:

Z(s) =∑n>0

∑r>1

(r,q)=1

ϕ(qn+1r)(qnr)−(1+2s)

=∑n>0

ϕ(qn+1)q−n(1+2s)∑

(r,q)=1

ϕ(r)r−(1+2s)

=ζq(2s)

ζq(1 + 2s)(q − 1)(1− q−2s)−1

= ϕ(q)ζ(2s)

ζq(1 + 2s)

while for the second we have by [G-R, 6.576.5, 6.576.6]

H(s) =1π

2−2s cos(πs)Γ(12 − s)

2F (12 − s,

12 − s; 2; 1)

where F is the Gauss hypergeometric function, which is here an elementary function([G-R, 9.122.1])

F (12 − s,

12 − s; 2; 1) =

Γ(2s)Γ(1

2 + s)2

and the formula announced follows from this and

Γ(s)Γ(s+ 12) = π1/221−2sΓ(2s), Γ(1

2 − s)Γ(12 + s) cos(πs) = π.

2

Hence we arrive at

Y(m) = −2πϕ(q)q

τ(m)√m× 1

2iπ

∫(1/2)

12√πζ(2s)

Γ(2s)Γ(1

2 − s)

(4mq

)sG(s)Γ(1− s)2ds

s3.

155

The functional equation of the Riemann zeta function yields

ζ(2s)Γ(2s) = π2s−12 ζ(1− 2s)Γ(1

2 − s)

so that

Y(m) = −ϕ(q)q

τ(m)√m× 1

2iπ

∫(1/2)

ζ(1− 2s)(4π2m

q

)sG(s)Γ(1− s)2ds

s3.

Moving the integration to the line Re (s) = 1 (this is cosmetic; remember thatG(1) = 0), the integral can be estimated directly

12iπ

∫(1)

ζ(1− 2s)(4π2m

q

)sG(s)Γ(1− s)2ds

s3√m

qτ(m)

which gives the bound

Y(m)√m

qτ(m).

This shows that this degenerate frequency is also small enough for a mollifier of

length up to q12 . However, instead of estimating, a much better treatment comes from

remarking that by shifting the contour of integration to Re (s) = −12 , and changing s

into −s, the resulting integral is (up to the change of ζ into ζq) the same exactly asthe one giving W (4π2m/q), with the result that this will essentially cancel the diagonalterm arising from the Kronecker symbol δ(m,n) in the Petersson formula: the “true”main term is then the residue at s = 0 of the function being integrated. This approachis further developed and polished in [IS2].

To finish the proof of Proposition 22, we would have to estimate the contributionsof the other frequencies, as well as that involving the K0 function. But for this we referthe reader to [IS2] again, or urge him to exercise his own talents. . .

156

Conclusion

Lear: First let me talk with this philosopher.What is the cause of thunder?

William Shakespeare, “King Lear”

What have we learned? Concerning the lower-bound, the result is quite satisfying.Even if the Birch and Swinnerton-Dyer conjecture was proved for J0(q), making a goodlower bound obvious by the simple argument of the sign of the functional equation, thenon-vanishing theorem for the special values of the derivative of the L-functions wouldretain its value, and its interpretation as progress towards Brumer’s conjecture.

On the other hand, the upper bound for the rank of J0(q) still assumes that theBirch and Swinnerton-Dyer conjecture holds. Of course, this seems inevitable if ana-lytic methods are to be applied. But how far are algebraic methods from proving theunconditional inequality

rank J0(q) dim J0(q) (6.116)

with an absolute implied constant? Answering, or exploring, this question is importantfor two reasons: to gauge precisely what is the price paid by assuming the truth of theconjecture; and because if it turned out that (6.116) was within reach, then togetherwith the analytic upper bound on the order of vanishing of the L-function of J0(q) ats = 1

2 , it would also bring fresh evidence to the Birch and Swinnerton-Dyer conjectureby showing how, in the case of J0(q), the rank and the order of vanishing are of thesame order of magnitude.

The exploration of this matter is naturally more algebraic and, because of my limitedknowledge, more naıve; I wish to thank here B. Edixhoven and J.L. Colliot-Thelene fordiscussing this with me.

The current algebraic methods to estimate the rank of abelian varieties start fromthe proof of the Mordell-Weil theorem. Let A/k be an abelian variety of dimension ddefined over a number field k, and let k be an algebraic closure of k. The first step ofthis proof is to show that for any integer n > 2, the quotient A(k)/nA(k) is finite (asis has to be if A(k) is to be of finite rank), using either Galois or etale cohomology toreduce to the fundamental finiteness statements in algebraic number theory.

Precisely, let ` be any prime number. It is then shown quite easily (essentially byKummer theory, see [Si1, X-4]) that there is a short exact sequence

0 −→ A(k)/`A(k) −→ Sel`(A) −→X`(A) −→ 0.

Sel`(A) is the `-Selmer group of A and X`(A) the `-part of the Tate-Shafarevitchgroup of A. Those are defined by

Sel`(A) = x ∈ H1(Gk, A[`]) | Resv x = 0 ∈ H1(Gv, A(k)), for all place v of k(6.117)

X`(A) = x ∈ H1(Gk, A(k))[`] | Resv x = 0 ∈ H1(Gv, A(k)), for all place v of k

157

where we have denoted by Gk the absolute Galois group of k and by Gv the decompo-sition group at v for a place v of k, which is isomorphic to the Galois group of the localfield kv, and the H1 refer to Galois cohomology groups.

Writing the Mordell-Weil group of A as a direct sum of the finite torsion subgroupand the free part

A(k) ' A(k)tors ⊕ Zr

it follows that for almost all prime numbers ` we have

A(k)/`A(k) ' (Z/`Z)r

hencerank A = dimF` A(k)/`A(k) 6 dim Sel`(A). (6.118)

(and this is even an equality, for almost all `, if the conjecture according to which thefull Tate-Shafarevitch group X(A) is finite is true).

The usual way of showing that the Selmer group is finite is by observing that onecan make any extension field (this can only increase the rank, and the Selmer group). Inparticular one can replace k by a field K containing the coordinates of all the `-torsionpoints of A, in which case A[`] is a trivial GK-module. Then the Galois cohomologygroup is simply

H1(GK , A[`]) = Hom(GK , A[`]) = Hom(GK , (Z/`Z)2d)

since, by the analytic uniformization of A as a complex torus, the group structure ofA[`] is known.

Now for each homomorphism ρ : GK −→ (Z/`Z)2d, the fixed field of the kernel ofρ is an abelian Galois extension of K with Galois group isomorphic to the image of ρ, asubgroup of (Z/`Z)2d, and conversely to each such extension field there can correspondonly finitely many morphisms ρ. Moreover, and here arithmetic enters the fray, oneshows that there exists a fixed set Σ of places of K such that all such extensions areunramified outside Σ (this is because A itself is unramified at all but finitely manyplaces). But there are only finitely many extensions of K which are unramified outsideΣ and of degree bounded by `2d. Therefore, we find that the Selmer group is finite, andrecover the weak Mordell-Weil theorem.

Consider again the case A = J0(q) over Q, and let’s see what this gives. Theproblem with the above scheme of proof is the moment when the extension to the fieldK is performed. It means that in (6.118), it is really the rank of A over K which isestimated. But for a high-dimensional variety, as ours are, the field K will usually beof very large degree over Q, because the lowest-degree possible K must be the fixedfield of the kernel of the representation ρ` of the Galois group of Q on `-torsion points

ρ` : GQ −→ GL(2d,F`),

which is expected to have a large image. Indeed, it is proved in [Se2] that, in the caseof elliptic curves without complex multiplication, the corresponding representation issurjective except for finitely many `. Then the rank of J0(q) over K will most likely be oflarger order of magnitude than dim J0(q), so the bound (6.118) cannot get near (6.116).

Heuristically, this can be confirmed as follows. Say the degree of the field K isd = d(q). One can guess that the exceptional set Σ should consist of primes above

158

` and primes above q (since J0(q) has good reduction outside q), but in any case theabelian extensions of K unramified outside Σ include at least those which are actuallyunramified everywhere. Those are classified by the ideal class group of K (by classfield theory); we need the `-rank of the Selmer group, and thus the first guess, whichcannot be obviously improved, is at least the `-rank of the class group. Again in a worstcase scenario that is hard to reject out of hand, this could be as large the logarithm ofthe discriminant of K. But then the (most optimistic this time!) lower bound on thediscriminant of a number field of degree d over Q is already exponential in d as d tendsto infinity, and we obtain something (which is neither lower nor upper bound, but theoptimistic worse case, as it were) of the order of the degree d of K. If ρ` happens to besurjective, this is |GL(2d,F`)| ≈ `(2d)2

. . .Whatever the value of this kind of argument, to get an unconditional upper bound,

it seems necessary to find a way of estimating the dimension over F` of the Selmergroup Sel`(J0(q)) without making any extension of the ground field Q. Moreover, theparticular geometric properties of J0(q) should be exploited, short of proving the Birchand Swinnerton-Dyer conjecture in general. And this has been the case as far as theanalytic upper bound proved in Chapter 5 is concerned: without the very special linkbetween J0(q) and weight 2 forms, the results would be much poorer, or nonexistent, asthey are today for modular elliptic curves for instance. This offers some hope, becausethe geometry of the modular curves and their Jacobians has been extensively studied,with considerable success, by Mazur, Wiles, Ribet and many others. Moreover, in thework of Wiles leading to Fermat’s theorem, the Selmer groups of the symmetric squareof the Galois representations associated to modular forms is of paramount importance(see [Wil], [DDT]), and its cardinality is computed, at least in many cases. From theanalytic point of view, this is related ([DDT, page 96]) to the special value at the edgeof the critical strip of the symmetric square L-function, which we have also encounteredin Chapter 3. Thus it is again clear that the Selmer group occurring in the study ofthe rank is a more difficult invariant.

There is a possible starting point, exploiting the fact that J0(q) is a Jacobian, andnot simply any abelian variety, in the cohomological nature of the Jacobian, which givesan isomorphism ([Mil, page 126])

J0(q)[`] ' H1et(X0(q)Q, µ`)

of GQ-modules. Then, using the Hochschild-Serre spectral sequence

Hp(GQ, Hqet(X0(q)Q, µ`))⇒ Hp+q

et (X0(q), µ`)

of the covering X0(q)Q → X0(q) (the latter designates here the natural model overQ), and more precisely the exact sequence of small order terms ([Mil, page 308]), ittranspires that the group H1(GQ, J0(q)[`]) which contains the Selmer group is essen-tially (up to other groups which do not depend on q) isomorphic to H2

et(X0(q), µ`).Then the local conditions defining the Selmer group (6.117) must be reinterpreted inthis setting (or, according to Colliot-Thelene, it should be possible to start workingdirectly with an integral model of X0(q), and the same kind of arguments would yielddirectly a finite group, possibly larger than the Selmer group however, as the behaviorat the ramified primes would be under less control). But of course the difficulty startsafter that: what to do next with this? If the problem is reduced in this way to oneabout the curve X0(q), it remains unclear whether it is simpler or more amenable to

159

further treatment. Indeed, over Q we must see X0(q) more as an arithmetic surface, ofdimension 2 (the other dimension being the primes, the spectrum of Z), and the secondcohomology group of a projective surface is very much the most interesting and mostsubtle in general (see [Mil, chapter 5] for instance).

We can now see quite clearly the depth of the upper bound (6.116) that has beenproved on the Birch and Swinnerton-Dyer conjecture. Yet, to the optimistic eye, thereseems to be a context to work in here, in which progress might not be impossible tocontemplate.

But this is another story, although one which it would be nice to write, or read,some day.

160

References

[B-H] Bushnell, C. J. and Henniart, G.: An upper bound on conductors for pairs, J.Number Theory 65 no. 2 (1997), 183–196.

[B-L] Bohr, H. and Landau, E.: Sur les zeros de la fonction ζ(s) de Riemann, CompteRendus de l’Acad. des Sciences (Paris) 158 (1914), 106–110.

[Br1] Brumer, A.: The rank of J0(N), Asterisque 228, SMF (1995), 41–68.

[Bum] Bump, D.: Automorphic forms and representations, Cambridge Studies in Ad-vanced Mathematics, 55, Cambridge University Press, 1997.

[CoS] Coates, J. and Schmidt, C-G.: Iwasawa theory for the symmetric square of anelliptic curve, J. Reine Angew. Math. 375/376 (1987), 104–156.

[C-S] Cornell, G. and Silverman, J. (editors): Arithmetic Geometry, Springer Verlag(1986).

[DDT] Darmon, H., Diamond, F. and Taylor, M.: Fermat’s Last Theorem, CurrentDevelopments in Math., International Press (1995), 1–154.

[Del] Deligne, P.: Formes modulaires et representations de GL(2), Modular Forms inOne Variable IV, Springer Lecture Notes 749 (1972), 55–105.

[DFI] Duke, W., Friedlander, J. and Iwaniec, H.: Bounds for automorphic L-functions, II, Invent. Math. 115 (1994), 219–239.

[Du1] Duke, W.: The critical order of vanishing of automorphic L-functions with highlevel, Invent. Math. 119 (1995), 165–174.

[Du2] Duke, W.: The dimension of the space of cusp forms of weight one, Internat.Math. Res. Notices 2 (1995), 99–109.

[D-K] Duke, W. and Kowalski, E.: A problem of Linnik for elliptic curves and mean-value estimates for automorphic representations, to appear in Invent. Math.

[Edx] Edixhoven, B.: The modular curves X0(N), notes for the ICTP Summer School,August 1997.

[Gel] Gelbart, S.: Automorphic forms on adele groups, Annals of Math. Studies 83,Princeton Univ. Press (1975).

[GHL] Goldfeld, D., Hoffstein, J. and Lieman, D.: An effective zero-free region, Ann.of Math. 140 (1994), 177–181.

[G-J] Gelbart, S. and Jacquet, H.: A relation between automorphic representations ofGL(2) and GL(3), Ann. Sci. E.N.S 4eme serie 11 (1978), 471–552.

161

[Gol] Goldfeld, D.: The class number of quadratic fields and the conjectures of Birchand Swinnerton-Dyer, Ann. Scuola Norm. Sup. Pisa 3, 4 (1976), 623–663.

[Gro] Gross, B.: Heegner points on X0(N), in Modular Forms, (R.A. Rankin, editor),Ellis Horwood (1984), 87–106.

[G-R] Gradshteyn, I. S.; Ryzhik, I. M. Table of integrals, series, and products, Fifthedition, Academic Press, 1994.

[G-Z] Gross, B. and Zagier, D.: Heegner points and derivatives of L-series, Invent.Math., 84 (1986), 225–320.

[Har] Hartshorne, R.: Algebraic geometry, Grad. Texts in Math. 52, Springer Verlag(1977).

[H-L] Hoffstein, J. and Lockhart, P.: Coefficients of Maass forms and the Siegel zero(with an appendix by D. Goldfeld, J. Hoffstein and D. Lieman), Ann. of Math.(2) 140 (1994), 161–181.

[I-R] Ireland, K. and Rosen, M.: A classical introduction to modern number theory,Second edition, Grad. Texts in Math. 84, Springer Verlag (1990).

[IS1] Iwaniec, H. and Sarnak, P.: The non-vanishing of central values of automorphicL-functions and Siegel’s zeros, preprint (1997).

[IS2] Iwaniec, H. and Sarnak, P.: The non-vanishing of central values of automorphicL-functions and Siegel’s zeros, in preparation.

[Iw1] Iwaniec, H.: On the order of vanishing of modular L-functions at the criticalpoint, Sem. Theor. Nombres Bordeaux 2 (1990), 365–376.

[Iw2] Iwaniec, H.: Topics in classical modular forms, Grad. Studies in Math. 17,A.M.S (1997).

[Iw3] Iwaniec, H.: Introduction to the Spectral Theory of Automorphic Forms, Bib-lioteca de la Revista Matematica Iberoamericana (1995).

[JPS] Jacquet, H., Piatetskii-Shapiro, I. I. and Shalika, J. A.: Rankin-Selberg convo-lutions, Amer. Jour. of Math. 105 (1983), 367–464.

[Jut] Jutila, M.: Lectures on a method in the theory of exponential sums, Tata Lec-tures on Mathematics and Physics 80, Springer-Verlag, 1987.

[K-L] Kolyvagin, V. and Logachev, D.: Finiteness of the Shafarevich-Tate group andthe group of rational points for some modular abelian varieties, Leningrad Math.J., 1, No. 5 (1990), 1229–1253.

[KM1] Kowalski, E. and Michel, P.: Sur le rang de J0(q), Preprint de l’Universited’Orsay, 53 (1997).

[KM2] Kowalski, E. and Michel, P.: Sur les zeros des fonctions L automorphes degrand niveau, Preprint de l’Universite d’Orsay, 54 (1997).

162

[KM3] Kowalski, E. and Michel, P.: A lower bound for the rank of J0(q), Preprint del’Universite d’Orsay, 69 (1997).

[Kob] Koblitz, N.: Introduction to elliptic curves and modular forms, Second edition,Grad. Texts in Math. 97, Springer Verlag (1993).

[K-S] Katz, N. and Sarnak, P.: Random matrices, Frobenius eigenvalues, and mon-odromy, to appear.

[LIS] Luo, W., Iwaniec, H. and Sarnak, P.: Low lying zeros for families of L-functions,preprint (1998).

[LRS] Luo, W., Rudnick, Z. and Sarnak, P.: On Selberg’s eigenvalue conjecture, Geom.Funct. Anal. 5 (1995), 387–401.

[Luo] Luo, W.:Zeros of Hecke L-functions associated with cusp forms, Acta Arith. 71,No.2 (1995), 139–158.

[Maz] Mazur, B.: On the passage from local to global in number theory, Bull. Amer.Math. Soc. (N.S.) 29 (1993), no. 1, 14–50.

[Me] Merel, L.: Bornes pour la torsion des courbes elliptiques sur les corps de nom-bres, Invent. Math 124 (1996), 437–450.

[Mes] Mestre, J.-F.: Formules explicites et minorations de conducteurs de varietesalgebriques, Comp. Math. 58 (1986), 209–232.

[Mil] Milne, J.S.: Etale cohomology, Princeton Mathematical Series 33, PrincetonUniv. Press, 1980.

[Miy] Miyake, T.: Modular Forms, Springer Verlag, 1989.

[M-W] Moeglin, C. and Waldspurger, J.L.: Poles des fonctions L de paires pourGL(N), appendix to Le spectre residuel de GL(n), Ann. Sci. ENS (4eme serie)22 (1989), 605–674.

[Mu1] Mumford, D.: Abelian varieties, Oxford University Press, 1970.

[Mu2] Mumford, D.: Tata lectures on Theta, II, Progress in Math. 43, Birkhauser,1984.

[Poi] Poitou, G.: Sur les petits discriminants, Seminaire Delange-Pisot-Poitou, 18eannee (1976/77), Theorie des nombres, Fasc. 1, Exp. No. 6, 18 pp., SecretariatMath., Paris, 1977.

[P-P] Perelli, A. and Pomykala, J.: Averages over twisted elliptic L-functions, ActaArith. 80, No 2 (1997), 149–163.

[R-S] Rudnick, Z. and Sarnak, P.: Zeros of principal L-functions and random matrixtheory, A celebration of John F. Nash, Jr., Duke Math. J. 81, no. 2 (1996),269–322.

[Sar] Sarnak, P.: Quantum Chaos, Symmetry and Zeta Functions, Current Develop-ments in Math., International Press 1997.

163

[Sel] Selberg, A.: Contributions to the theory of Dirichlet’s L-functions, Skr. NorskeVid. Akad. Oslo. I. (1946), 1–62, or Collected Papers, vol. 1, Springer Verlag,Berlin, (1989), 281–340.

[Se1] Serre, J.-P: Cours d’Arithmetique, 3eme edition, P.U.F (1988).

[Se2] Serre, J.-P.: Proprietes galoisiennes des points d’ordre fini des courbes ellip-tiques, Invent. Math. 15 (1972), 259–331.

[Sh1] Shimura, G.: Introduction to the arithmetic theory of automorphic functions,Iwanami Shoten and Princeton Univ. Press (1971).

[Sh2] Shimura, G.: On the holomorphy of certain Dirichlet series, Proc. of the LondonMath. Soc (3) 31 (1975), 79–95.

[Sh3] Shimura, G.: The special values of zeta functions associated with cusp forms,Comm. Pure and Appl. Math. 29 (1976), 783–804.

[Si1] Silverman, J.: The arithmetic of elliptic curves, Grad. Texts in Math. 106,Springer Verlag (1986).

[Si2] Silverman, J.: Advanced topics in the arithmetic of elliptic curves, Grad. Textsin Math. 151, Springer Verlag (1994).

[Tit] Titchmarsh, E.C.: The theory of the Riemann Zeta-function, Second edition(revised by D. R. Heath-Brown), Oxford University Press, 1986.

[Vdk] Vanderkam, J.: The rank of quotients of J0(N), preprint 1997.

[Wil] Wiles, A.: Modular elliptic curves and Fermat’s last theorem, Ann. of Math. (2)141 (1995), 443–551.