invariant theory with applicationsjdraisma/teaching/invtheory0910/lecture...chapter 1 lecture 1....

Invariant Theory with Applications

Jan Draisma and Dion Gijswijt

October 8 2009

Contents

1 Lecture 1. Introducing invariant theory 51.1 Polynomial functions . . . . . . . . . . . . . . . . . . . . . . . . . 51.2 Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.3 Invariant functions . . . . . . . . . . . . . . . . . . . . . . . . . . 71.4 Conjugacy classes of matrices . . . . . . . . . . . . . . . . . . . . 81.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Lecture 2. Symmetric polynomials 112.1 Symmetric polynomials . . . . . . . . . . . . . . . . . . . . . . . 112.2 Counting real roots . . . . . . . . . . . . . . . . . . . . . . . . . . 142.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3 Lecture 3. Multilinear algebra 193.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4 Lecture 4. Representations 254.1 Schur’s lemma and isotypic decomposition . . . . . . . . . . . . . 284.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5 Finite generation of the invariant ring 315.1 Noethers degree bound . . . . . . . . . . . . . . . . . . . . . . . . 335.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

6 Affine varieties and the quotient map 356.1 Affine varieties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356.2 Regular functions and maps . . . . . . . . . . . . . . . . . . . . . 396.3 The quotient map . . . . . . . . . . . . . . . . . . . . . . . . . . 41

7 The null-cone 45

8 Molien’s theorem and self-dual codes 498.1 Molien’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 508.2 Linear codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3

4 CONTENTS

9 Algebraic groups 559.1 Definition and examples . . . . . . . . . . . . . . . . . . . . . . . 559.2 The algebra of regular functions as a representation . . . . . . . 59

10 Reductiveness 63

11 First Fundamental Theorems 6711.1 Schur-Weyl Duality . . . . . . . . . . . . . . . . . . . . . . . . . . 6711.2 The First Fundamental Theorem for GLn . . . . . . . . . . . . . 6911.3 First Fundamental Theorem for On . . . . . . . . . . . . . . . . . 70

12 Phylogenetic tree models 7312.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7312.2 The statistical model . . . . . . . . . . . . . . . . . . . . . . . . . 7312.3 A tensor description . . . . . . . . . . . . . . . . . . . . . . . . . 7512.4 Equations of the model . . . . . . . . . . . . . . . . . . . . . . . 78

Chapter 1

Lecture 1. Introducinginvariant theory

The first lecture gives some flavor of the theory of invariants. Basic notions suchas (linear) group representation, the ring of regular functions on a vector spaceand the ring of invariant functions are defined, and some instructive examplesare given.

1.1 Polynomial functions

Let V be a complex vector space. We denote by V ∗ := {f : V → C linear map}the dual vector space. Viewing the elements of V ∗ as functions on V , andtaking the usual pointwise product of functions, we can consider the algebra ofall C-linear combinations of products of elements from V ∗.

Definition 1.1.1. The coordinate ring O(V ) of the vectorspace V is the algebraof functions F : V → C generated by the elements of V ∗. The elements of O(V )are called polynomial or regular functions on V .

If we fix a basis e1, . . . , en of V , then a dual basis of V ∗ is given by thecoordinate functions x1, . . . , xn defined by xi(c1e1 + · · · + cnen) := ci. For thecoordinate ring we obtain O(V ) = C[x1, . . . , xn]. This is a polynomial ring inthe xi, since our base field C is infinite.

Exercise 1.1.2. Show that indeed C[x1, . . . , xn] is a polynomial ring. In otherwords, show that the xi are algebraically independent over C: there is nononzero polynomial p ∈ C[X1, . . . , Xn] in n variables X1, . . . , Xn, such thatp(x1, . . . , xn) = 0. Hint: this is easy for the case n = 1. Now use induction onn.

We call a regular function f ∈ O(V ) homogeneous of degree d if f(tv) =tdf(v) for all v ∈ V and t ∈ C. Clearly, the elements of V ∗ are regular of degree

5

6 CHAPTER 1. LECTURE 1. INTRODUCING INVARIANT THEORY

1, and the product of polynomials f, g homogeneous of degrees d, d′ yields ahomogeneous polynomial of degree d+d′. It follows that every regular functionf can be written as a sum f = c0 + c1f1 + · · · + ckfk of regular functions fihomogeneous of degree i. This decomposition is unique (disregarding the termswith zero coefficient). Hence we have a direct sum decomposition O(V ) =⊕

d∈NO(V )d, where O(V )d := {f ∈ O(V ) | f homogeneous of degree d}, mak-ing O(V ) into a graded algebra.

Exercise 1.1.3. Show that indeed the decomposition of a regular function finto its homogeneous parts is unique.

In terms of the basis x1, . . . , xn, we have O(V )d = C[x1, . . . , xn]d, whereC[x1, . . . , xn]d consists of all polynomials of total degree d and has as basis themonomials xd11 x

d22 · · ·xdnn for d1 + d2 + · · ·+ dn = d.

1.2 Representations

Central objects in this course are linear representations of groups. For anyvector space V we write GL(V ) for the group of all invertible linear maps fromV to itself. When we have a fixed basis of V , we may identify V with Cn andGL(V ) with the set of invertible matrices n× n matrices GL(Cn) ⊂ Matn(C).

Definition 1.2.1. Let G be a group and let X be a set. An action of G on Xis a map α : G×X → X such that α(1, x) = x and α(g, α(h, x)) = α(gh, x) forall g, h ∈ G and x ∈ X.

If α is clear from the context, we will usually write gx instead of α(g, x).What we have just defined is sometimes called a left action of G on X; rightactions are defined similarly.

Definition 1.2.2. If G acts on two sets X and Y , then a map φ : X → Y iscalled G-equivariant if φ(gx) = gφ(x) for all x ∈ X and g ∈ G. As a particularcase of this, if X is a subset of Y satisfying gx ∈ X for all x ∈ X and g ∈ G,then X is called G-stable, and the inclusion map is G-equivariant.

Example 1.2.3. The symmetric group S4 acts on the set([4]2

)of unordered

pairs of distinct numbers in [4] := {1, 2, 3, 4} by g{i, j} = {g(i), g(j)}. Think ofthe edges in a tetrahedron to visualise this action. The group S4 also acts on theset X := {(i, j) | i, j ∈ [4] distinct} of all ordered pairs by g(i, j) = (g(i), g(j))—think of directed edges—and the map X → (

[4]2

)sending (i, j) to {i, j} is S4-

equivariant.

Definition 1.2.4. Let G be a group and let V be a vector space. A (linear)representation of G on V is a group homomorphism ρ : G→ GL(V ).

If ρ is a representation of G, then the map (g, v) 7→ ρ(g)v is an action of Gon V . Conversely, if we have an action α of G on V such that α(g, .) : V → V isa linear map for all g ∈ G, then the map g 7→ α(g, .) is a linear representation.

1.3. INVARIANT FUNCTIONS 7

As with actions, instead of ρ(g)v we will often write gv. A vector space with anaction of G by linear maps is also called a G-module.

Given a linear representation ρ : G → GL(V ), we obtain a linear represen-tation ρ∗ : G → GL(V ∗) on the dual space V ∗, called the dual representationor contragredient representation and defined by

(ρ∗(g)x)(v) := x(ρ(g)−1v) for all g ∈ G, x ∈ V ∗ and v ∈ V . (1.1)

Exercise 1.2.5. Let ρ : G → GLn(C) be a representation of G on Cn. Showthat with respect to the dual basis, ρ∗ is given by ρ∗(g) = (ρ(g)−1)T, where AT

denotes the transpose of the matrix A.

1.3 Invariant functions

Definition 1.3.1. Given a representation of a group G on a vector space V , aregular function f ∈ O(V ) is called G-invariant or simply invariant if f(v) =f(gv) for all g ∈ G, v ∈ V . We denote by O(V )G ⊆ O(V ) the subalgebra ofinvariant functions. The actual representation of G is assumed to be clear fromthe context.

Observe that f ∈ O(V ) is invariant, precisely when it is constant on theorbits of V under the action of G. In particular, the constant functions areinvariant.

The representation of G on V induces an action on the (regular) functionson V by defining (gf)(v) := f(g−1v) for all g ∈ G, v ∈ V . This way theinvariant ring can be discribed as the set of regular functions fixed by theaction of G: O(V )G = {f ∈ O(V ) | gf = f for all g ∈ G}. Observe thatwhen restricted to V ∗ ⊂ O(V ), this action coincides with the action corre-sponding to the dual representation. In terms of a basis x1, . . . , xn of V ∗, theregular functions are polynomials in the xi and the action of G is given bygp(x1, . . . , xn) = p(gx1, . . . , gxn) for any polynomial p. Since for every d, Gmaps the set of polynomials homogeneous of degree d to itself, it follows thatthe homogeneous parts of an invariant are invariant as well. This shows thatO(V )G =

⊕dO(V )Gd , where O(V )Gd := O(V )d ∩ O(V )G.

Example 1.3.2. Consider the representation ρ : Z/3Z → GL2(C) defined bymapping 1 to the matrix

(0 −11 −1

)(and mapping 2 to

(−1 1−1 0

)and 0 to the identity

matrix). With respect to the dual basis x1, x2, the dual representation is givenby:

ρ∗(0) =(

1 00 1

), ρ∗(1) =

(−1 −11 0

), ρ∗(2) =

(0 1−1 −1

). (1.2)

The polynomial f = x21 − x1x2 + x2

2 is an invariant:

ρ∗(1)f = (−x1 +x2)2− (−x1 +x2)(−x1) + (−x1)2 = x21−x1x2 +x2

2 = f, (1.3)


and since 1 is a generator of the group, f is invariant under all elements of thegroup. Other invariants are x2

1x2 − x1x22 and x3

1 − 3x1x22 + x3

2. These threeinvariants generate the ring of invariants, althought it requires some work toshow that.

A simpler example in which the complete ring of invariants can be computedis the following.

Example 1.3.3. Let D4 be the symmetry group of the square, generated by arotation r, a reflection s and the relations r4 = e, s2 = e and srs = r3, where eis the identity. The representation ρ of D4 on C2 is given by

ρ(r) =(

0 1−1 0

), ρ(s) =

(−1 00 1

), (1.4)

the dual representation is given by the same matrices:

ρ∗(r) =(

0 1−1 0

), ρ∗(s) =

(−1 00 1

). (1.5)

It is easy to check that x21+x2

2 and x21x

22 are invariants, and so are all polynomial

expressions in these two invariants. We will show that in fact O(C2)D4 =C[x2

1 + x22, x

21x

22] =: R. It suffices to show that all homogeneous invariants

belong to R.Let p ∈ C[x1, x2] be a homogeneous invariant. Since sp = p, only monomials

having even exponents for x1 can occur in p. Since r2s exchanges x1 and x2,for every monomial xa1x

b2 in p, the monomial xb1x

a2 must occur with the same

exponent. This proves the claim since every polynomial of the form x2n1 x2m

2 +x2m

1 x2n2 is an element of R. Indeed, we may assume that n ≤ m and proceed

by induction on n + m, the case n + m = 0 being trivial. If n > 0 we haveq = (x2

1x22)n(x2m−2n

2 +x2m−2n1 ) and we are done. If n = 0 we have 2q = 2(x2m

1 +x2m

2 ) = 2(x21+x2

2)m−∑m−1i=1

(mi

)(x2i

1 x2m−2i2 ) and we are done by induction again.

1.4 Conjugacy classes of matrices

In this section we discuss the polynomial functions on the square matrices,invariant under conjugation of the matrix variable by elements of GLn(C). Thisexample shows some tricks that are useful when proving that certain invariantsare equal. Denote by Mn(C) the vectorspace of complex n × n matrices. Weconsider the action of G = GLn(C) on Mn(C) by conjugation: (g,A) 7→ gAg−1

for g ∈ GLn(C) and A ∈ Mn(C). We are interested in finding all polynomialsin the entries of n× n matrices that are invariant under G. Two invariants aregiven by the functions A 7→ detA and A 7→ trA.

Let

χA(t) := det(tI −A) = tn − s1(A)tn−1 + s2(A)tn−2 − · · ·+ (−1)nsn(A) (1.6)

1.4. CONJUGACY CLASSES OF MATRICES 9

be the characteristic polynomial of A. Here the si are polynomials in the entriesof A. Clearly,

χgAg−1(t) = det(g(tI −A)g−1) = det(tI −A) = χA(t) (1.7)

holds for all t ∈ C. It follows that the functions s1, . . . , sn are G-invariant.Observe that s1(A) = trA and sn(A) = detA.

Proposition 1.4.1. The functions s1, . . . , sn generate O(Matn(C))GLn(C) andare algebraically independent.

Proof. To each c = (c1, . . . , cn ∈ Cn we associate the so-called companion matrix

Ac :=

0 · · · · · · 0 −cn1

. . .... −cn−1

0. . . . . .

......

.... . . . . . 0 c2

0 · · · 0 1 c1

∈Mn(C). (1.8)

A simple calculation shows that χAc(t) = tn + cn−1tn−1 + · · ·+ c1t+ c0.

Exercise 1.4.2. Verify that χAc(t) = tn + cn−1tn−1 + · · ·+ c1t+ c0.

This implies that si(Ac) = (−1)ici and therefore

{(s1(Ac), s2(Ac), . . . , sn(Ac) | A ∈Mn(C)} = Cn. (1.9)

It follows that the si are algebraically independent over C. Indeed, suppose thatp(s1, . . . , sn) = 0 for some polynomial p in n variables. Then

0 = p(s1, . . . , sn)(A) = p(s1(A), . . . , sn(A)) (1.10)

for all A and hence p(c1, . . . , cn) = 0 for all c ∈ Cn. But this implies that pitself is the zero polynomial.

Now let f ∈ O(Matn(C))G be an invariant function. Define the polyno-mial p in n variables by p(c1, . . . , cn) := f(Ac), and P ∈ O(Matn(C))G byP (A) := p(−s1(A), s2(A), . . . , (−1)nsn(A)). By definition, P and f agree onall companion matrices, and since they are both G-invariant they agree onW := {gAcg−1 | g ∈ G, c ∈ Cn}. To finish the proof, it suffices to show that Wis dense in Matn(C) since f − P is continuous and zero on W . To show thatW is dense in O(Matn(C)), it suffices to show that the set of matrices with ndistinct nonzero eigenvalues is a subset of W and is itself dense in O(Matn(C)).This we leave as an exercise.

Exercise 1.4.3. Let A ∈ Matn(C) have n distinct nonzero eigenvalues. Showthat A is conjugate to Ac for some c ∈ Cn. Hint: find v ∈ Cn such that


v,Av,A2v, . . . , An−1v is a basis for Cn. You might want to use the fact thatthe Vandermonde determinant

det

1 . . . 1c1 . . . cnc21 . . . c2n...

. . ....

cn−11 · · · cn−1

n

(1.11)

is nonzero if c1, . . . , cn are distinct and nonzero.

Exercise 1.4.4. Show that the set of matrices with n distinct nonzero eigen-values is dense in the set of all complex n × n matrices. Hint: every matrix isconjugate to an upper triangular matrix.

1.5 Exercises

Exercise 1.5.1. Let G be a finite group acting on V = Cn, n ≥ 1. Show thatO(V )G contains a nontrivial invariant. That is, O(V )G 6= C. Give an exampleof an action of an infinite group G on V with the property that only the constantfunctions are invariant.

Exercise 1.5.2. Let ρ : Z/2Z→ GL2(C) be the representation given by ρ(1) :=(−1 00 −1

). Compute the invariant ring. That is, give a minimal set of generators

for O(C2)Z/2Z.

Exercise 1.5.3. Let U := {( 1 a0 1 ) | a ∈ C} act on C2 in the obvious way. Denote

the coordinate functions by x1, x2. Show that O(C2)U = C[x2].

Exercise 1.5.4. Let ρ : C∗ → GL3(C) be the representation given by ρ(t) =(t−2 0 00 t−3 00 0 t4

). Find a minimal system of generators for the invariant ring.

Exercise 1.5.5. Let π : Matn(C)→ Cn be given by π(A) := (s1(A), . . . , sn(A)).Show that for every c ∈ Cn the fiber {A | π(A) = c} contains a unique conjugacyclass {gAg−1 | g ∈ GLn(C)} of a diagonalizable (semisimple) matrix A.

Chapter 2

Lecture 2. Symmetricpolynomials

In this chapter, we consider the natural action of the symmetric group Sn onthe ring of polynomials in the variables x1, . . . , xn. The fundamental theorem ofsymmetric polynomials states that the elementary symmetric polynomials gen-erate the ring of invariants. As an application we prove a theorem of Sylvesterthat characterizes when a univariate polynomial with real coefficients has onlyreal roots.

2.1 Symmetric polynomials

Let the group Sn act on the polynomial ring C[x1, . . . , xn] by permuting thevariables:

σp(x1, . . . , xn) := p(xσ(1), . . . , xσ(n)) for all σ ∈ Sn. (2.1)

The polynomials invariant under this action of Sn are called symmetric poly-nomials. As an example, for n = 3 the polynomial x2

1x2 +x21x3 +x1x

22 +x1x

23 +

x22x3 + x2x

23 + 7x1 + 7x2 + 7x3 is symmetric, but x2

1x2 + x1x23 + x2

2x3 is notsymmetric (although it is invariant under the alternating group).

In terms of linear representations of a group, we have a linear representationρ : Sn → GLn(C) given by ρ(σ)ei := eσ(i), where e1, . . . , en is the standardbasis of Cn. On the dual basis x1, . . . , xn the dual representation is given byρ∗(σ)xi = xσ(i), as can be easily checked. The invariant polynomial functionson Cn are precisely the symmetric polynomials.

Some obvious examples of symmetric polynomials are

s1 := x1 + x2 + · · ·+ xn and (2.2)s2 := x1x2 + x1x3 + · · ·+ x1xn + · · ·+ xn−1xn (2.3)

More generally, for every k = 1, . . . , n, the k-th elementary symmetric polyno-

11

12 CHAPTER 2. LECTURE 2. SYMMETRIC POLYNOMIALS

mialsk :=

∑i1<...<ik

xi1 · · ·xik (2.4)

is invariant. Recall that these polynomials express the coefficients of a univariatepolynomial in terms of its roots:

n∏i=1

(t− xi) = xn +n∑k=1

(−1)ksktn−k. (2.5)

Moreover, if g is any polynomial in n variables y1, . . . , yn, then g(s1, . . . , sn) isagain a polynomial in the xi which is invariant under all coordinate permuta-tions. A natural question is: which symmetric polynomials are expressible as apolynomial in the elementary symmetric polynomials. For example x2

1 + · · ·+x2n

is clearly symmetric and it can be expressed in terms of the si:

x21 + · · ·+ x2

n = s21 − 2s2. (2.6)

It is a beautiful fact that the elementary symmetric polynomials generate allsymmetric polynomials.

Theorem 2.1.1 (Fundamental theorem of symmetric polynomials). Every Sn-invariant polynomial f(x1, . . . , xn) in the xi can be written as g(s1, . . . , sn),where g = g(y1, . . . , yn) is a polynomial in n variables. Moreover, given f , thepolynomial g is unique.

The proof of this result uses the lexicographic order on monomials in thevariables x = (x1, . . . , xn). We say that xα := xα1

1 · · ·xαnn is (lexicographically)larger than xβ if there is a k such that αk > βk and αi = βi for all i < k. Sofor instance x2

1 > x1x42 > x1x

32 > x1x2x

53, etc. The leading monomial lm(f) of

a non-zero polynomial f in the xi is the largest monomial (with respect to thisordering) that has non-zero coefficient in f .

Exercise 2.1.2. Check that lm(fg) = lm(f)lm(g) and that lm(sk) = x1 · · ·xk.

Exercise 2.1.3. Show that there are no infinite lexicographically strictly de-creasing chains of monomials.

Since every decreasing chain of monomials is finite, we can use this order todo induction on monomials, as we do in the following proof.

Proof of Theorem 2.1.1. Let f be any Sn-invariant polynomial in the xi. Let xα

be the leading monomial of f . Then α1 ≥ . . . ≥ αn because otherwise a suitablepermutation applied to xα would yield a lexicographically larger monomial,which has the same non-zero coefficient in f as xα by invariance of f . Nowconsider the expression

sαnn sαn−1−αnn−1 · · · sα1−α2

1 . (2.7)

2.1. SYMMETRIC POLYNOMIALS 13

The leading monomial of this polynomial equals

(x1 · · ·xn)αn(x1 · · ·xn−1)αn−1−αn · · ·xα1−α21 , (2.8)

which is just xα. Subtracting a scalar multiple of the expression from f thereforecancels the term with monomial xα, and leaves an Sn-invariant polynomial witha strictly smaller leading monomial. After repeating this step finitely manytimes, we have expressed f as a polynomial in the sk.

This shows existence of g in the theorem. For uniqueness, let g ∈ C[y1, . . . , yn]be a nonzero polynomial in n variables. It suffices to show that g(s1, . . . , sn) ∈C[x1, . . . , xn] is not the zero polynomial. Observe that

lm(sα11 · · · sαnn ) = xα1+···+αn

1 xα2+···+αn2 · · ·xαnn . (2.9)

It follows that the leading monomials of the terms sα11 · · · sαnn , corresponding

to the monomials occuring with nonzero coefficient in g =∑α y

α, are pairwisedistinct. In particular, the largest such leading monomial will not be cancelledin the sum and is the leading monomial of g(s1, . . . , sn).

Remark 2.1.4. The proof shows that in fact the coefficients of the polynomialg lie in the ring generated by the coefficients of f . In particular, when f hasreal coefficients, also g has real coefficients.

Exercise 2.1.5. Let π : Cn → Cn be given by

π(x1, . . . , xn) = (s1(x1, . . . , xn), . . . , sn(x1, . . . , xn)). (2.10)

Use the fact that every univariate polynomial over the complex numbers can befactorised into linear factors to show that π is surjective. Use this to show thats1, . . . , sn are algebraically independent over C. Describe for b ∈ Cn the fiberπ−1(b).

Remark 2.1.6. The above proof of the fundamental theorem of symmetricpolynomials gives an algorithm to write a given symmetric polynomial as apolynomial in the elementary symmetric polynomials. In each step the initialmonomial of the residual symmetric polynomial is decreased, ending with thezero polynomial after a finite number of steps. Instead of using the describedlexicographic order on the monomials, other linear orders can be used. Anexample would be the degree lexicographic order, where we set xα > xβ if eitherα1 + · · ·+αn > β1 + · · ·+βn or equality holds and there is a k such that αk > βkand αi = βi for all i < k.

Example 2.1.7. We write x31 + x3

2 + x33 as a polynomial in the si. Since the

leading monomial is x31x

02x

03 we subtract s03s

02s

31 and are left with −3(x2

1x2 +x2

1x3 + x1x22 + x1x

23 + x2

2x3 + x2x23) − 6(x1x2x3). The leading monomial is

now x21x2, so we add 3s03s

12s

2−11 . This leaves 3x1x2x3 = 3s13s

1−12 s1−1

1 , which isreduced to zero in the next step.

This way we obtain x31 + x3

2 + x33 = s31 − 3s1s2 + 3s3.

Exercise 2.1.8. Give an upper bound on the number of steps of the algo-rithm in terms of the number of variables n and the (total) degree of the inputpolynomial f .


2.2 Counting real roots

Given a (monic) polynomial f(t) = tn−a1tn−1 + · · ·+ (−1)nan, the coefficients

are elementary symmetric functions in the roots of f . Therefore, any propertythat can be expressed as a symmetric polynomial in the roots of f , can also beexpressed as a polynomial in the coefficients of f . This way we can determineproperties of the roots by just looking at the coefficients of f . For example:when are all roots of f distinct?

Definition 2.2.1. For a (monic) polynomial f = (t−x1) · · · (t−xn), define thediscriminant ∆(f) of f by ∆(f) :=

∏1≤i<j≤n(xi − xj)2.

Clearly, ∆(f) = 0 if and only if all roots of f are distinct. It is not hard tosee that ∆(f) is a symmetric polynomial in the roots of f . We will see laterhow f can be expressed in terms of the coefficients of f .

Exercise 2.2.2. Let f(t) = t2−at+ b. Write ∆(f) as a polynomial in a and b.

Definition 2.2.3. Given n complex numbers x1, . . . , xn, the Vandermonde ma-trix A for these numbers is given by

A :=

1 x1 · · · xn−1

1

1 x2 · · · xn−12

...... · · · ...

1 xn · · · xn−1n

. (2.11)

Lemma 2.2.4. Given numbers x1, . . . , xn, the Vandermonde matrix A hasnonzero determinant if and only if the x1, . . . , xn are distinct.

Proof. View the determinant of the Vandermonde matrix (called the Vander-monde determinant) as a polynomial p in the variables x1, . . . , xn. For anyi < j, p(x1, . . . , xn) = 0 when xi = xj and hence p is divisible by (xj − xi).Expanding the determinant, we see that p is homogeneous of degree

(n2

), with

lowest monomial x01x

12 · · ·xn−1

n having coefficient 1. It follows that

p =∏

1≤i<j≤n

(xj − xi), (2.12)

since the right-hand side divides p, and the two polynomials have the samedegree and the same nonzero coefficient for x0

1x12 · · ·xn−1

n .

Exercise 2.2.5. Show that the Vandermonde matrix A of numbers x1, . . . , xnsatisfies detA =

∏1≤i<j≤n(xj − xi) by doing row- and column-operations on A

and applying induction on n.

Definition 2.2.6. Let f = (t − α1)(t − α2) · · · (t − αn) ∈ C[t] be a monicpolynomial of degree n in the variable t. We define the Bezoutiant matrixBez(f) of f by

Bez(f) = (pi+j−2(α1, . . . , αn))ni,j=1 , (2.13)

where pk(x1, . . . , xn) := xk1 + · · · + xkn for k = 0, 1, . . . is the k-th Newton poly-nomial.

2.2. COUNTING REAL ROOTS 15

Since the entries of Bez(f) are symmetric polynomials in the roots of f , it fol-lows by the fundamental theorem of symmetric polynomials that the entries arepolynomials (with integer coefficients) in the elementary symmetric functionsand hence in the coefficients of f . In particular, when f has real coefficients,Bez(f) is a real matrix. Another useful fact is that Bez(f) = ATA, where A isthe Vandermonde matrix for the roots α1, . . . , αn of f .

Exercise 2.2.7. Show that the discriminant of f satisfies: ∆(f) = det Bez(f).

Example 2.2.8. Let f = t2 − at + b have roots α and β. So a = α + βand b = αβ. We compute Bez(f). We have p0 = 2, p1 = a, p2 = a2 − 2bso Bez(f) =

(2 aa a2−2b

). The determinant equals a2 − 4b and the trace equals

a2 − 2b+ 2. There are three cases for the eigenvalues λ1 ≥ λ2 of Bez(f):

• If a2 − 4b > 0, we have λ1, λ2 > 0 and α, β are distinct real roots.

• If a2 − 4b = 0, we have λ1 > 0, λ2 = 0 and α = β.

• If a2− 4b < 0, we have λ1 > 0, λ2 < 0 and α and β are complex conjugate(nonreal) roots.

The determinant of Bez(f) determines whether f has double roots. Thematrix Bez(f) can give us much more information about the roots of f . Inparticular, it describes when a polynomial with real coefficients has only realroots!

Theorem 2.2.9 (Sylverster). Let f ∈ R[t] be a polynomial in the variable t withreal coefficients. Let r be the number of distinct roots in R and 2k the numberof distinct roots in C \ R. Then the Bezoutiant matrix Bez(f) has rank r + 2k,with r + k positive eigenvalues and k negative eigenvalues.

proof of Theorem 2.2.9. Number the roots α1, . . . , αn of f in such a way thatα1, . . . , α2k+r are distinct. We write mi for the multiplicity of the root αi,i = 1, . . . , 2k+r. Let A be the Vandermonde matrix for the numbers α1, . . . , αn,so that Bez(f) = ATA. We start by computing the rank of Bez(f).

Denote by A the (2k + r) × n submatrix of A consisting of the first 2k + rrows of A. An easy computation shows that

Bez(f) = ATA = AT diag(m1, . . . ,m2k+r)A, (2.14)

where diag(m1, . . . ,m2k+r) is the diagonal matrix with the multiplicities of theroots on the diagonal. Since, A contains a submatrix equal to the Vandermondematrix for the distinct roots α1, . . . , α2k+r, it follows by Lemma 2.2.4 that therows of A are linearly independent. Since the diagonal matrix has full rank, itfollows that Bez(f) has rank 2k + r.

To finish the proof, we write A = B + iC, where B and C are real matricesand i denotes a square root of −1. Since f has real coefficients, Bez(f) is a realmatrix and hence

Bez(f) = BTB − CTC + i(CTB +BTC) = BTB − CTC. (2.15)


We haverank(B) ≤ r + k, rank(C) ≤ k. (2.16)

Indeed, for any pair α, α of complex conjugate numbers, the real parts of αj

and αj are equal and the imaginary parts are opposite. Hence B has at mostr + k different rows and C has (up to a factor −1) at most k different nonzerorows.

Denote the kernel of Bez(f), B and C by N,NB and NC respectively. ClearlyNB ∩NC ⊆ N . This gives

dimN ≥ dim(NB ∩NC) ≥ dimNB + dimNC − n≥ (n− r − k) + (n− k)− n= n− r − 2k = dimN. (2.17)

Hence we have equality throughout, showing that dimNB = n−r−k,dimNC =n− k and NB ∩NC = N .

Write NB = N ⊕ N ′B and NC = N ⊕ N ′C as a direct sum of vector spaces.For all nonzero u ∈ N ′C , we have uTCTCu = 0 and uTBTBu > 0 and souTBez(f)u > 0. This shows that Bez(f) has at least dimN ′C = r + k positiveeigenvalues (see exercises). Similarly, uTBez(f)u < 0 for all nonzero u ∈ N ′Bso that Bez(f) has at least dimN ′B = k negative eigenvalues. Since Bez(f) hasn− r− 2k zero eigenvalues, it has exactly r+ k positive eigenvalues and exactlyk negative eigenvalues.

Exercise 2.2.10. Let B be a real n × n matrix and x ∈ Rn. Show thatxTBTBx ≥ 0 and that equality holds if and only if Bx = 0.

Exercise 2.2.11. Let A be a real symmetric n × n matrix. Show that thefollowing are equivalent:

• there exists a linear subspace V ⊆ Rn of dimension k such that xTAx > 0for all nonzero x ∈ V ,

• A has at least k positive eigenvalues.

Exercise 2.2.12. Use the previous exercise to show Sylvesters law of inertia:Given a real symmetric n × n matrix A and an invertible real matrix S, thetwo matrices A and STAS have the same number of positive, negative and zeroeigenvalues. This implies that the signature of A can be easily determined bybringing it into diagonal form using simultaneous row and column operations.

2.3 Exercises

Exercise 2.3.1. Let f(t) := t3 + at+ b, where a, b are real numbers.

• Compute Bez(f).

• Show that ∆(f) = −4a3 − 27b2.

2.3. EXERCISES 17

• Determine, in terms of a and b, when f has only real roots.

Exercise 2.3.2. Prove the following formulas due to Newton:

pk − s1pk−1 + · · ·+ (−1)k−1sk−1p1 + (−1)kksk = 0 (2.18)

for all k = 1, . . . , n.Show that for k > n the following similar relation holds:

pk − s1pk−1 + · · ·+ (−1)nsnpk−n = 0. (2.19)

Hint: Let f(t) = (1− tx1) · · · (1− txn) and compute f ′(t)/f(t) in two ways.

Chapter 3

Lecture 3. Multilinearalgebra

We review some constructions from linear algebra, in particular the tensor prod-uct of vector spaces. Unless explicitly stated otherwise, all our vector spacesare over the field C of complex numbers.

Definition 3.0.3. Let V1, . . . , Vk,W be vector spaces. A map φ : V1×· · ·×Vk →W is called multilinear (or k-linear or bilinear if k = 2 or trilinear if k = 3) iffor each i and all v1, . . . , vi−1, vi+1, . . . , vk the map Vi →W, vi 7→ φ(v1, . . . , vk)is linear.

Let U , V and T be vector spaces and let ⊗ : U × V → T be a bilinearmap. The map ⊗ is said to have the universal property if for every bilinearmap φ : U × V → W there exists a unique linear map f : T → W such thatφ = f ◦ ⊗.

U × V φ //

⊗��

W

T

f

;;wwwwwwwww

We will usually write u ⊗ v := ⊗(u, v) for (u, v) ∈ U × V . Although ⊗ willin general not be surjective, the image linearly spans T .

Exercise 3.0.4. Show that if ⊗ : U × V → T has the universal property, thevectors u⊗ v, u ∈ U, v ∈ V span T .

Given U and V , the pair (T,⊗) is unique up to a unique isomorphism. Thatis, given two bilinear maps ⊗ : U ×V → T and ⊗′ : U ×V → T ′ that both havethe universal property, there is a unique linear isomorphism f : T → T ′ suchthat f(u⊗ v) = u⊗′ v for all u ∈ U, v ∈ V . This can be seen as follows. Since⊗′ is bilinear, there exists by the universal property of ⊗, a unique linear mapf : T → T ′ such that ⊗′ = f ◦⊗. It suffices to show that f is a bijection. By the

19

20 CHAPTER 3. LECTURE 3. MULTILINEAR ALGEBRA

universal property of ⊗′ there is a linear map f ′ : T ′ → T such that ⊗′ = f ′ ◦⊗.Now ⊗ ◦ f ′ ◦ f = ⊗, which implies that f ′ ◦ f : T → T is the identity since theimage of ⊗ spans T (or alternatively, by using the universal property of ⊗, andthe bilinear map ⊗ itself). Hence f is injective. Similarly, f ◦ f ′ is the identityon T ′ and hence f is surjective.

Definition 3.0.5. Let U, V be vector spaces. The tensor product of U and Vis a vector space T together with a bilinear map ⊗ : U × V → T having theuniversal property. The space T , which is uniquely determined by U and V upto an isomorphism, is denoted by U ⊗ V .

Often we will refer to U ⊗ V as the tensor product of U and V , implicitlyassuming the map ⊗ : U × V → U ⊗ V .

So far, we have not shown that the tensor product U ⊗ V exists at all, nordid we gain insight into the dimension of this space in terms of the dimensionsof U and V . One possible construction of U ⊗ V is as follows.

Start with the vector space F (for free or formal) formally spanned by pairs(u, v) as u, v run through U, V , respectively. Now take the subspace R (forrelations) of F spanned by all elements of the form

(c1u+ u′, c2v + v′)− c1c2(u, v)− c1(u, v′)− c2(u′, v)− (u′, v′) (3.1)

with c1, c2 ∈ C, v, v′ ∈ V, u, u′ ∈ U . Now any map φ : U × V → W factorsthrough the injection i : U × V → F and a unique linear map g : F → W .The kernel of g contains R if and only if φ is bilinear, and in that case the mapg factors through the quotient map π : F → F/R and a unique linear mapf : F/R → W . Taking for ⊗ the bilinear map π ◦ i : (u, v) 7→ u ⊗ v, the spaceF/R together with the map ⊗ is the tensor product of U and V .

As for the dimension of U ⊗ V , let (ui)i∈I be a basis of U . Then by usingbilinearity of the tensor product, every element T ∈ U ⊗ V can be writtenas a t =

∑i∈I ui ⊗ wi with wi non-zero for only finitely many i. We claim

that the wi in such an expression are unique. Indeed, for k ∈ I let ξk bethe linear function on U determined by ui 7→ δik, i ∈ I. The bilinear mapU × V → V, (u, v) → ξk(u)v factors, by the universal property, through aunique linear map f : U ⊗ V → V . This map sends all terms in the expression∑i∈I ui ⊗wi for T to zero except the term with i = k, which is mapped to wk.

Hence wk = fk(t) and this shows the uniqueness of the wk.

Exercise 3.0.6. Use a similar argument to show that if (vj)j∈J is a basis forV , then the set of all elements of the form ui ⊗ vj , i ∈ I, j ∈ J form a basis ofU ⊗ V .

This exercise may remind you of matrices. Indeed, there is a natural mapφ from U ⊗ V ∗, where V ∗ is the dual of V , into the space Hom(V,U) of linearmaps V → U , defined as follows. Given a pair u ∈ U and f ∈ V ∗, φ(u ⊗ f) isthe linear map sending v to f(v)u. Here we are implicitly using the universalproperty: the linear map v 7→ f(v)u depends bilinearly on f and u, hence thereis a unique linear map U ⊗ V ∗ → Hom(V,U) that sends u ⊗ f to v 7→ f(v)u.

21

Note that if f and u are both non-zero, then the image of u⊗ f is a linear mapof rank one.

Exercise 3.0.7. 1. Show that φ is injective. Hint: after choosing a basis(ui)i use that a general element of U ⊗V ∗ can be written in a unique wayas∑i ui ⊗ fi.

2. Show that φ is surjective onto the subspace of Hom(V,U) of linear mapsof finite rank, that is, having finite-dimensional image.

Making things more concrete, if U = Cm and V = Cn and u1, . . . , um is thestandard basis of U and v1, . . . , vn is the standard basis of V with dual basisx1, . . . , xn, then the tensor ui ⊗ xj corresponds to the linear map with matrixEij , the matrix having zeroes everywhere except for a 1 in position (i, j).

Remark 3.0.8. A common mistake is to assume that all elements of U ⊗ Vare of the form u ⊗ v. The above shows that the latter elements correspondto rank-one linear maps from V ∗ to U , or to rank-one matrices, while U ⊗ Vconsists of all finite-rank linear maps from V ∗ to U—a much larger set.

Next we discuss tensor products of linear maps. If A : U → U ′ and B :V → V ′ are linear maps, then the map U × V → U ′ ⊗ V ′, (u, v) 7→ Au⊗Bv isbilinear. Hence, by the universal property of U ⊗ V there exists a unique linearmap U ⊗V → U ′⊗V ′ that sends u⊗v to Au⊗Bv. This map is denoted A⊗B.

Example 3.0.9. If dimU = m,dimU ′ = m′,dimV = n, dimV ′ = n′ and ifA,B are represented by an m′ ×m-matrix (aij)ij and an n′ × n-matrix (bkl)kl,respectively, then A⊗B can be represented by an m′n′×mn-matrix, with rowslabelled by pairs (i, k) with i ∈ [m′], k ∈ [n′] and columns labelled by pairs (j, l)with j ∈ [m], l ∈ [n], whose entry at position ((i, k), (j, l)) is aijbkl. This matrixis called the Kronecker product of the matrices (aij)ij and (bkl)kl.

Exercise 3.0.10. Assume, in the setting above, that U = U ′,m′ = m andV = V ′, n′ = n and A,B are diagonalisable with eigenvalues λ1, . . . , λm andµ1, . . . , µn, respectively. Determine the eigenvalues of A⊗B.

Most of what we said about the tensor product of two vector spaces carriesover verbatim to the tensor product V1 ⊗ · · · ⊗ Vk of k. This tensor productcan again be defined by a universal property involving k-linear maps, and itsdimension is

∏i dimVi. Its elements are called k-tensors. We skip the boring

details, but do point out that for larger k there is no longer a close relationshipwith of k-tensors with linear maps—in particular, the rank of a k-tensor T ,usually defined as the minimal number of terms in any expression of T as a sumof pure tensors v1⊗· · ·⊗vk, is only poorly understood. For instance, computingthe rank, which for k = 2 can be done using Gaussian elimination, is very hardin general. If all Vi are the same, say V , then we also write V ⊗k for V ⊗· · ·⊗V(k factors).

Given three vector spaces U, V,W , we now have several ways to take theirtensor product: (U ⊗ V ) ⊗W , U ⊗ (V ⊗W ) and U ⊗ V ⊗W . Fortunately,


these tensor products can be identified. For example, there is a unique linearisomorphism f : U⊗V ⊗W → (U⊗V )⊗W such that f(u⊗v⊗w) = (u⊗v)⊗wfor all u ∈ U, v ∈ V,w ∈W .

Indeed, consider the trilinear map U × V ×W → (U ⊗ V ) ⊗W defined by(u, v, w) 7→ (u⊗ v)⊗w. By the universal property, there is a unique linear mapf : U ⊗ V ⊗W → (U ⊗ V ) ⊗W such that f(u ⊗ v ⊗ w) = (u ⊗ v) ⊗ w for allu, v, w.

Now for fixed w ∈W , the bilinear map φw : U ×V → U ⊗V ⊗W defined byφw(u, v) := u⊗v⊗w induces a linear map gw : U⊗V → U⊗V ⊗W such that u⊗vis mapped to u⊗ v⊗w. Hence the bilinear map φ : (U ⊗V )×W → U ⊗V ⊗Wgiven by φ(x,w) := gw(x) induces a linear map g : (U ⊗ V )⊗W → U ⊗ V ⊗Wsending (u⊗v)⊗w to u⊗v⊗w. It follows that f ◦g and g◦f are the identity on(U⊗V )⊗W and U⊗V ⊗W respectively. This shows that f is an isomorphism.

Exercise 3.0.11. Let V be a vector space. Show that for all p, q there is aunique linear isomorphism V ⊗p ⊗ V ⊗q → V ⊗(p+q) sending (v1 ⊗ · · · ⊗ vp) ⊗(vp+1 ⊗ · · · ⊗ vp+q) to v1 ⊗ · · · ⊗ vp+q.

The direct sum TV :=⊕∞

k=0 V⊗k is called the tensor algebra of V , where

the natural linear map V ⊗k × V ⊗l → V ⊗k ⊗ V ⊗l = V ⊗(k+l) plays the role of(non-commutative but associative) multiplication. We move on to other typesof tensors.

Definition 3.0.12. Let V be a vector space. A k-linear map ω : V k → Wis called symmetric if ω(v1, . . . , vk) = ω(vπ(1), . . . , vπ(k)) for all permutationsπ ∈ Sym(k).

The k-th symmetric power of V is a vector space SkV together with a sym-metric k-linear map V k → SkV, (v1, . . . , vk)→ v1 · · · vk such that for all vectorspaces W and symmetric k-linear maps ψ : V k → W there is a unique linearmap φ : SkV →W such that ψ(u1, . . . , uk) = φ(u1 · · ·uk).

Uniqueness of the k-th symmetric power of V can be proved in exactly thesame manner as uniqueness of the tensor product. For existence, let R be thesubspace of V ⊗k := V ⊗ · · · ⊗ V spanned by all elements of the form

v1 ⊗ · · · ⊗ vk − vπ(1) ⊗ · · · ⊗ vπ(k), π ∈ Sym(k).

Then the composition of the maps V k → V ⊗k → V ⊗k/R is a symmetric k-linearmap and if ψ : V k → W is any such map, then ψ factors through a linear mapV ⊗k →W since it is k-linear, which in turn factors through a unique linear mapV ⊗k/R→W since ψ is symmetric. This shows existence of symmetric powers,and, perhaps more importantly, the fact that they are quotients of tensor powersof V . This observation will be very important in proving the first fundamentaltheorem for GL(V ).

There is also an analogue of the tensor product of maps: if A is a linear mapU → V , then the map Uk → SkV, (u1, . . . , uk) 7→ Au1 · · ·Auk is multilinear andsymmetric. Hence, by the universal property of symmetric powers, it factors

3.1. EXERCISES 23

through the map Uk → SkU and a linear map SkU → SkV . This map, whichsends u1 · · ·uk to Au1 · · ·Auk, is the k-th symmetric power SkA of A.

If (vi)i∈I is a basis of V , then using multilinearity and symmetry everyelement t of SkV can be written as a linear combination

∑α cαv

α of the elementsvα :=

∏i∈I v

αii —the product order is immaterial—where α ∈ NI satisfies |α| :=∑

i∈I αi = k and only finitely many coefficients cα are non-zero. We claim thatthe cα are unique, so that the vα, |α| = k a basis of V . Indeed, let α ∈ NIwith |α| = k. Then there is a unique k-linear map ψα : V k → C which ona tuple (vi1 , . . . , vik) takes the value 1 if |{j | ij = i}| = αi for all i ∈ I andzero otherwise. Moreover, ψα is symmetric and therefore induces a linear mapφα : SkV → C. We find that cα = φα(t), which proves the claim.

This may remind you of polynomials. Indeed, if V = Cn and x1, . . . , xn isthe basis of V ∗ dual to the standard basis of V , then SkV ∗ is just the space ofhomogeneous polynomials in the xi of degree k. The algebra of all polynomialfunctions on V therefore is canonically isomorphic to SV ∗ :=

⊕∞k=0 S

kV ∗. Theproduct of a homogeneous polynomials of degree k and homogeneous polynomi-als of degree l corresponds to the unique bilinear map SkV ∗ × SlV ∗ → Sk+lV ∗

making the diagram

(V ∗)⊗k × (V ∗)⊗l //

��

(V ∗)⊗k+l

��SkV ∗ × SlV ∗ //____ Sk+lV ∗

commute, and this corresponds to multiplying polynomials of degrees k andl. Thus SV ∗ is a quotient of the tensor algebra TV (in fact, the maximalcommutative quotient).

Above we have introduced SkV as a quotient of V ⊗k. This should not beconfused with the subspace of V ⊗k spanned by all symmetric tensors, defined asfollows. For every permutation π ∈ Sk there is a natural map V k → V k sending(v1, . . . , vk) to (vπ−1(1), . . . , vπ−1(k)). Composing this map with the natural k-linear map V k → V ⊗k yields another k-linear map V k → V ⊗k, and hence alinear map V ⊗k → V ⊗k, also denoted π. A tensor ω in V ⊗k is called symmetricif πω = ω for all π ∈ Sk. The restriction of the map V ⊗k → SkV to thesubspace of symmetric tensors is an isomorphism with inverse determined byv1 · · · vk 7→ 1

k!

∑π∈Sk π(v1⊗· · · vk). (Note that this inverse would not be defined

in characteristic less than k.)

Exercise 3.0.13. Show that the subspace of symmetric tensors in V ⊗k isspanned by the tensors v ⊗ v · · · ⊗ v, where v ∈ V .

3.1 Exercises

Exercise 3.1.1. Let U ⊗ V be the tensor product of the vector spaces U andV . Let u1, . . . , us and u′1, . . . , u

′t be two systems of linearly independent vectors


in U and let v1, . . . , vs and v′1, . . . , v′t be two systems of linearly independent

vectors in V . Suppose that

u1 ⊗ v1 + · · ·+ us ⊗ vs = u′1 ⊗ v′1 + · · ·+ u′t ⊗ v′t. (3.2)

Show that s = t.

Exercise 3.1.2. a) Let T ∈ V1 ⊗ V2 ⊗ V3 be an element of the tensor productof V1, V2 and V3. Suppose that there exist v1 ∈ V1, v3 ∈ V3, T23 ∈ V2⊗V3

and T12 ∈ V1 ⊗ V2 such that

T = v1 ⊗ T23 = T12 ⊗ v3. (3.3)

Show that there exist a v2 ∈ V2 such that T = v1 ⊗ v2 ⊗ v3.

b) Suppose that T ∈ V1⊗V2⊗V3 can be written as a sum of at most d1 tensorsof the form v1 ⊗ T23, where v1 ∈ V1, T23 ∈ V2 ⊗ V3, and also as a sum ofat most d3 tensors of the form T12 ⊗ v3, where v3 ∈ V3, T12 ∈ V1 ⊗ V2.Show that T can be written as the sum of at most d1d3 tensors of theform v1 ⊗ v2 ⊗ v3, where vi ∈ Vi.

Exercise 3.1.3. Let U, V,W be vector spaces. Denote by B(U × V,W ) thelinear space of bilinear maps from U × V to W . Show that the map f 7→ f ◦ ⊗is a linear isomorphism between Hom(U ⊗ V,W ) and B(U × V,W ).

Exercise 3.1.4. Let U, V be vector spaces. Show that the linear map φ :U∗ ⊗ V ∗ → (U ⊗ V )∗ given by φ(f ⊗ g)(u⊗ v) := f(u)g(v) is an isomorhism.

Chapter 4

Lecture 4. Representations

Central objects in this course are linear representations of groups. We willonly consider representations on complex vector spaces. Recall the folowingdefinition.

Definition 4.0.5. Let G be a group and let V be a vector space. A (linear)representation of G on V is a group homomorphism ρ : G→ GL(V ).

If ρ is a representation of G, then the map (g, v) 7→ ρ(g)v is an action ofG on V . A vector space with an action of G by linear maps is also called aG-module. Instead of ρ(g)v we will often write gv.

Definition 4.0.6. Let U and V be G-modules. A linear map φ : U → V iscalled a G-module morphism or a G-linear map if φ(gu) = gφ(u) for all u ∈ Uand g ∈ G. If φ is invertible, then it is called an isomorphism of G-modules.The linear space of all G-linear maps from U to V is denoted Hom(U, V )G.

The multilinear algebra constructions from Section 3 carry over to represen-tations. For instance, if ρ : G→ GL(U) and σ : G→ GL(V ) are representations,then the map ρ⊗σ : G→ GL(U ⊗V ), g 7→ ρ(g)⊗σ(g) is also a representation.Similarly, for any natural number k the map g 7→ Skρ(g) is a representation ofG on SkV . Also, the dual space V ∗ of all linear functions on V carries a naturalG-module structure: for f ∈ V ∗ and g ∈ G we let gf be the linear functiondefined by gf(v) = f(g−1v). The inverse ensures that the action is a left actionrather than a right action: for g, h ∈ G and v ∈ V we have

(g(hf))(v) = (hf)(g−1v) = f(h−1g−1v) = f((gh)−1v) = ((gh)f)(v),

so that g(hf) = (hg)f .

Exercise 4.0.7. Show that the set of fixed points in Hom(U, V ) under theaction of G is precisely Hom(U, V )G.

Example 4.0.8. Let V,U be G-modules. Then the space Hom(V,U) of linearmaps V → U is a G module with the action (gφ)(v) := gφ(g−1v). The space

25

26 CHAPTER 4. LECTURE 4. REPRESENTATIONS

U ⊗ V ∗ is also a G-module with action determined by g(u⊗ f) = (gu)⊗ (gf).The natural map Ψ : U ⊗ V ∗ → Hom(V,U) determined by Ψ(u ⊗ f)v = f(v)uis a morphism of G-modules. To check this it suffices to observe that

Ψ(g(u⊗ f))v = Ψ((gu)⊗ (gf))v = (gf)(v) · gu = f(g−1v) · guand

(gΨ(u⊗ f))v = gΨ(u⊗ f)(g−1v) = g(f(g−1v)u) = f(g−1v) · gu.The map Ψ is an G-module isomorphism of U⊗V ∗ with the space of finite-ranklinear maps from V to U . In particular, if U or V is finite-dimensional, then Ψis an isomorphism.

Example 4.0.9. Let G be a group acting on a set X. Consider the vectorspace

CX := {∑x∈X

λxx | λx ∈ C for all x ∈ X and λx = 0 for almost all x} (4.1)

consisting of all formal finite linear combinations of elements from X. Thenatural action of G given by g(

∑x λxx) :=

∑x λxgx makes CX into a G module.

In the special case where X = G and G acts on itself by multiplication on theleft, the module CG is called the regular representation of G.

Definition 4.0.10. A G-submodule of a G-module V is a G-stable subspace,that is, a subspace U such that gU ⊆ U for all g ∈ G. The quotient V/U thencarries a natural structure of G-module, as well, given by g(v+U) := (gv) +U .

Definition 4.0.11. A G-module V is called irreducible if it has precisely twoG-submodules (namely, 0 and V ).

Exercise 4.0.12. Show that for finite groups G, every irreducible G-modulehas finite dimension.

Note that, just like 1 is not a prime number and the empty graph is notconnected, the zero module is not irreducible. In this course we will be concernedonly with G-modules that are either finite-dimensional or locally finite.

Definition 4.0.13. A G-module V is called locally finite if every v ∈ V iscontained in a finite-dimensional G-submodule of V .

Proposition 4.0.14. For a locally finite G-module V the following statementsare equivalent.

1. for every G-submodule U of V there is a G-submodule W of V such thatU ⊕W = V ;

2. V is a (potentially infinite) direct sum of finite-dimensional irreducibleG-submodules.

In this case we call V completely reducible; note that we include that condi-tion that V be locally finite in this notion.

27

Proof. First assume (1). Let M be the collection of all finite-dimensional ir-reducible G-submodules of V . The collection of subsets S of M for which thesum

∑U∈S U is direct satisfies the condition of Zorn’s Lemma: the union of

any chain of such subsets S is again a subset of M whose sum is direct. Henceby Zorn’s Lemma there exists a maximal subset S of M whose sum is direct.Let U be its (direct) sum, which is a G-submodule of V . By (1) U has a directcomplement W , which is also a G-submodule. If W is non-zero, then it containsa non-zero finite-dimensional submodule (since it is locally finite), and for di-mension reasons the latter contains an irreducible G-submodule W ′. But thenS ∪ {W ′} is a subset of M whose sum is direct, contradicting maximality of S.Hence W = 0 and V = U =

⊕M∈SM , which proves (2).

For the converse, assume (2) and write V as the direct sum⊕

M∈SM ofirreducible finite-dimensional G-modules. Let U be any submodule of V . Thenthe collections of subsets of S whose sum intersects U only in 0 satisfies thecondition of Zorn’s Lemma. Hence there is a maximal such subset S′. Let Wbe its sum. We claim that U +W = V (and the sum is direct by construction).Indeed, if not, then some element M of S is not contained in U +W . But thenM∩(U+V ) = {0} by irreducibility of M and therefore the sum of S′∪{M} stillintersects U trivially, contradicting the maximality of S′. This proves (1).

Remark 4.0.15. It is not hard to prove that direct sums, submodules, andquotients of locally finite G-modules are again locally finite, and that they arealso completely reducible if the original modules were.

Example 4.0.16. A typical example of a module which is not completely re-ducible is the following. Let G be the group of invertible upper triangular2 × 2-matrices, and let V = C2. Then the subspace spanned by the first stan-dard basis vector e1 is a G-submodule, but it does not have a direct complementthat is G-stable.

Note that the group in this example is infinite. This is not a coincidence, asthe following fundamental results show.

Proposition 4.0.17. Let G be a finite group and let V be a finite-dimensionalG-module. Then there exists a Hermitian inner product (.|.) on V such that(gu|gv) = (u|v) for all g ∈ G and u, v ∈ V .

Proof. Let (.|.)′ be any Hermitian inner product on V and take

(u|v) :=∑g∈G

(gu|gv)′.

Straightforward computations shows that (.|.) is G-invariant, linear in its firstargument, and semilinear in its second argument. For positive definiteness, notethat for v 6= 0 the inner product (v|v) =

∑g∈G(gv|gv) is positive since every

entry is positive.

Theorem 4.0.18. For a finite group G any G-module is completely reducible.


Proof. Let V be a G-module. Then every v ∈ V lies in the finite-dimensionalsubspace spanned by its orbit Gv = {gv | g ∈ G}, which moreover is G-stable.Hence V is locally finite. By Zorn’s lemma there exists a submodule U ofV which is maximal among all direct sums of finite-dimensional irreduciblesubmodules of V . If U is not all of V , then let W be a finite-dimensionalsubmodule of V not contained in U , and let (.|.) be a G-invariant Hermitianform on W . Then U ∩ W is a G-submodule of W , and therefore so is theorthogonal complement (U∩W )⊥ of U∩W in W—indeed, one has (gw|U∩W ) =(w|g−1(U ∩W )) ⊆ (w|U ∩W ) = {0} for g ∈ G and w ∈ (U ∩W )⊥, so thatgw ∈ (U∩W )⊥. Let W ′ be an irreducible submodule of (U∩W )⊥. Then U⊕W ′is a larger submodule of V which is the direct sum of irreducible submodules ofV , a contradiction. Hence V = U is completely reducible.

4.1 Schur’s lemma and isotypic decomposition

The following easy observation due to the German mathematician Issai Schur(1875-1941) is fundamental to representation and invariant theory.

Lemma 4.1.1 (Schur’s Lemma). Let V and U be irreducible finite-dimensionalG modules for some group G. Then either V and U are isomorphic and Hom(V,U)G

is one-dimensional, or they are not isomorphic and Hom(V,U)G = {0}.Proof. Suppose that Hom(V,U)G contains a non-zero element φ. Then kerφ isa G-submodule of V unequal to all of V and hence equal to {0}. Also, imφis a G-submodule of U unequal to {0}, hence equal to U . It follows that φis an isomorphism of G-modules. Now suppose that φ′ is a second element ofHom(V,U)G. Then ψ := φ′ ◦ φ−1 is a G-morphism from U to itself; let λ ∈ Cbe an eigenvalue of it. Then ψ − λI is a G-morphism from U to itself, as well,and its kernel is a non-zero submodule, hence all of U . This shows that ψ = λIand therefore φ′ = λφ. Hence Hom(V,U)G is one-dimensional, as claimed.

If G is a group and V is a completely reducible G-module, then the de-composition of V as a direct sum of irreducible G-modules need not be unique.For instance, if V is the direct sum U1 ⊕ U2 ⊕ U3 where the first two are iso-morphic irreducible modules and the third is an irreducible module not iso-morphic to the other two, then V can also be written as U1 ⊕ ∆ ⊕ U3, where∆ = {u1 + φ(u1) | u1 ∈ U1} is the diagonal subspace of U1 ⊕ U2 correspondingto an isomorphism φ from U1 to U2.

However, there is always a coarser decomposition of V into G-modules whichis unique. For this, let {Ui}i∈I be a set of representatives of the isomorphismclasses of G-modules, so that every irreducible finite-dimensional G-module isisomorphic to Ui for some unique i ∈ I. For every i ∈ I let Vi be the (non-direct) sum of all G-submodules of V that are isomorphic to Ui. Clearly eachVi is a G-submodule of V and, since V is a direct sum of irreducible G-modules,∑i∈I Vi = V . Using Zorn’s lemma one sees that Vi can also be written as⊕j∈Ji Vij for irreducible submodules Vij , j ∈ Ji that are all isomorphic to Ui.

4.2. EXERCISES 29

We claim that V =⊕

i∈I Vi. To see this, suppose that Vi0 ∩∑i 6=i0 Vi 6= {0}

and let U be an irreducible submodule of this module. Then the projection ofU onto some Vi0,j along the direct sum of the remaining direct summands ovVi0 is non-zero, and similarly the projection of U onto Vi1,j for some i1, j alongthe remaining summands of Vi1 is non-zero. By Schur’s lemma U is then bothisomorphic to Vi0,j and to Vi1,j , a contradiction. Hence Vi0 ∩

∑i 6=i0 Vi is zero,

as claimed.The space Vi is called the isotypic component of V of type Ui, and it has the

following pretty description. The map Hom(Ui, V )G × Ui → V, (φ, u) 7→ φ(u)is bilinear, and therefore gives rise to a linear map Ψ : Hom(Ui, V )G ⊗Ui → V .This linear map is a linear isomorphism onto Vi.

Exercise 4.1.2. Let U, V,W be G-modules. Show that Hom(U ⊕ V,W )G ∼=Hom(U,W )⊕Hom(V,W ) and Hom(W,U ⊕ V ) ∼= Hom(W,U)⊕Hom(W,V ).

4.2 Exercises

Exercise 4.2.1. • Let V be a G-module and 〈, 〉 a G-invariant inner prod-uct on V . Show that for any two non-isomorphic, irreducible submodulesV1, V2 ⊂ V we have V1 ⊥ V2, that is, 〈v1, v2〉 = 0 for all v1 ∈ V1, v2 ∈ V2.

• Give an example where V1 6⊥ V2 for (isomorphic) irreducible G-modulesV1 and V2.

Exercise 4.2.2. Let the symmetric group on 3 letters S3 act on C[x1, x2, x3]2by permuting the variables. This action makes C[x1, x2, x3]2 into a S3-module.Give a decomposition of this module into irreducible submodules.

Exercise 4.2.3. Let G be an abelian group. Show that every irreducible G-module has dimension 1. Show that G has a faithful irreducible representationif and only if G is cyclic. A representation ρ is called faithful if it is injective.

Exercise 4.2.4. Let G be a finite group and V an irreducible G-module. Showthat there is a unique G-invariant inner product on V , unique up to multiplica-tion by scalars.

Exercise 4.2.5. Let G be a finite group, and let CG be the regular represen-tation of G and let CG = Wm1

1 ⊕ · · · ⊕Wmkk be the isotypic decomposition of

CG. Show that for every irreducible G-module W , there is an i such that W isisomorphic to Wi and show that mi = dimWi. Hint: for all w ∈ W the linearmap CG→W given by

∑g λgg 7→

∑g λggw is a G-linear map.

Chapter 5

Finite generation of theinvariant ring

In all examples we have met so far, the invariant ring was generated by a finitenumber of invariants. In this section we prove Hilbert’s theorem that underreasonable conditions, this is always the case. For the proof we will need anothertheorem by Hilbert.

Recall that for a ring R and a subset S ⊆ R, the ideal generated by S isdefined as

(S) := {r1s1 + · · ·+ rksk | k ∈ N, r1, . . . , rk ∈ R, s1, . . . , sk ∈ S}. (5.1)

You may want to check that this indeed defines an ideal in R. An ideal I ⊆ Ris called finitely generated if there is a finite set S such that I = (S).

Definition 5.0.6. A ring R is called Noetherian if every ideal I in R is finitelygenerated.

Exercise 5.0.7. Show that a ring R is Noetherian if and only if there is noinfinite ascending chain of ideals I1 ( I2 ( I3 ( · · · .

We will be mostly interested in polynomial rings over C in finitely manyindeterminates, for which the following theorem is essential.

Theorem 5.0.8 (Hilbert’s Basis Theorem). The polynomial ring C[x1, . . . , xn]is Noetherian.

We will deduce this statement from the following result.

Lemma 5.0.9 (Dixon’s Lemma). If m1,m2,m3, . . . is an infinite sequence ofmonomials in the variables x1, . . . , xn, then there exist indices i < j such thatmi|mj.

Proof. We proceed by induction on n. For n = 0 all monomials are 1, so wecan take any i < j. Suppose that the statement is true for n − 1 ≥ 0. Define

31

32 CHAPTER 5. FINITE GENERATION OF THE INVARIANT RING

the infinite sequences e1 ≤ e2 ≤ . . . and i1 < i2 < . . . as follows: e1 is thesmallest exponent of xn in any of the monomials mi, and i1 is the smallestindex i for which the exponent of xn in mi equals e1. For k > 1 the exponentek is the smallest exponent of xn in any of the mi with i > ik−1 and ik is thesmallest index i > ik−1 for which the exponent of xn in mi equals ek. Now themonomials in the sequence mi1/x

e1n ,mi2/x

e2n , . . . do not contain xn. Hence by

induction there exist j < l such that mij/xejn |mil/x

eln . As ej ≤ el we then also

have mij |mil , and of course ij < il, as claimed.

Proof of Hilbert’s Basis Theorem. Let I ⊆ C[x1, . . . , xn] be an ideal. For anypolynomial f in C[x1, . . . , xn] we denote by lm(f) the leading monomial of f : thelexicographically largest monomial having non-zero coefficient in f . By Dixon’slemma, the set of |-minimal monomials in {lm(f) | f ∈ I} is finite. Hence thereexist finitely many polynomials f1, . . . , fk ∈ I such that for all f ∈ I there existsan i with lm(fi)|lm(f). We claim that the ideal J := (f1, . . . , fk) generated bythe fi equals I. If not, then take an f ∈ I \J with the lexicographically smallestleading monomial among all counter examples. Take i such that lm(fi)|lm(f),say lm(f) = mlm(fi). Subtracting a suitable scalar multiple of mfi, whichlies in J , from f gives a polynomial with a lexicographically smaller leadingmonomial, and which is still in I \ J . But this contradicts the minimality oflm(f).

Remark 5.0.10. More generally, Hilbert showed that for R Noetherian, alsoR[x] is Noetherian (which you may want to prove yourself!). Since clearly anyfield is a Noetherian ring, this implies the previous theorem by induction on thenumber of indeterminates.

With this tool in hand, we can now return to our main theorem of thissection.

Theorem 5.0.11 (Hilbert’s Finiteness Theorem). Let G be a group and let Wbe a finite dimensional G-module with the property that C[W ] is completely re-ducible. Then C[W ]G := {f ∈ C[W ] | gf = f} is a finitely generated subalgebraof C[W ]. That is, there exist f1, . . . , fk ∈ C[W ]G such that every G-invariantpolynomial on W , is a polynomial in the fi.

The proof uses the so-called Reynolds operator ρ, which is defined as follows.We assume that the vector space C[W ] is completely reducible. Consider itsisotypic decomposition C[W ] =

⊕i∈I Vi and let 1 ∈ I correspond to the trivial

1-dimensional G-module, so that C[W ]G = V1. Now let ρ be the projection fromC[W ] onto V1 along the direct sum of all Vi with i 6= 1. This is a G-equivariantlinear map. Moreover, we claim that

ρ(f · h) = f · ρ(h)for all f ∈ V1, (5.2)

where the multiplication is multiplication in C[W ]. Indeed, consider the mapC[W ] → C[W ], h 7→ fh. This a G-module morphism, since g(f · h) = (gf) ·(gh) = f · (gh), where the first equality reflects that G acts by automorphisms

5.1. NOETHERS DEGREE BOUND 33

on C[W ] and the second equality follows from the invariance of f . Hence if wewrite h as

∑i hi with hi ∈ Vi, then fhi ∈ Vi by Schur’s lemma, and therefore the

component of fh =∑i(fhi) in V1 is just fh1. In other words ρ(fh) = fρ(h),

as claimed.

Exercise 5.0.12. Show that for a finite group G, the Reynolds operator is justf 7→ 1

|G|∑g∈G gf .

Proof of Hilbert’s finiteness theorem. Let I ′ :=⊕

d>0 C[W ]Gd be the ideal inC[W ]G consisting of all invariants with zero constant term. Denote by I :=C[W ]I ′ the ideal in C[W ] generated by I ′. Since W is finite dimensional, itfollows from Hilbert’s basis theorem that there exist f1, . . . , fk ∈ I that generatethe ideal I. We may assume that the fi belong to I ′. Indeed, if fi 6∈ I ′, we canwrite fi =

∑j fijgij for certain fij ∈ I ′ and gij ∈ C[W ] and replace fi with the

fij to obtain a finite generating set of I with fewer elements in I \ I ′.We observe that the ideal I ′ is generated by the fi. Indeed, let h ∈ I ′ ⊆ I

and write h =∑i gifi for some gi ∈ C[W ]. Using the Reynolds operator ρ we

find: h = ρ(h) =∑i ρ(figi) =

∑i fiρ(gi).

The proof is now completed by exercise 5.0.13.

Exercise 5.0.13. Let A ⊆ C[W ] be a subalgebra, and let A+ := ⊕d≥1A∩C[W ]dbe the ideal of polynomials in A with zero coefficient. Suppose that the idealA+ is finitely generated. Show that A is finitely generated as an algebra overC.

5.1 Noethers degree bound

For a finite group G, any G-module V is completely reducible as we have seen inthe previous lecture. This implies by Hilbert’s theorem that for finite groups, theinvariant ring is always finitely generated. In this section, we prove a result ofNoether stating that for finite groups G, the invariant ring is already generatedby the invariants of degree at most |G|, which implies a bound on the numberof generators needed.

Theorem 5.1.1 (Noether’s degree bound). Let G be a finite group, and let Wbe a (finite dimensional) G-module. Then the invariant ring C[W ]G is generatedby the homogeneous invariants of degree at most |G|.Proof. We choose a basis x1, . . . , xn of W ∗ so that C[W ] = C[x1, . . . , xn]. Forany n-tuple α = (α1, . . . , αn) of nonnegative integers we have an invariant

jα :=∑g∈G

g(xα11 · · ·xαnn ) (5.3)

homogeneous of degree |α| := α1 + · · ·+αn. Clearly, the invariants jα span thevector space C[W ]G, since for any invariant f =

∑α cαx

α11 · · ·xαnn , we have

f =1|G|

∑g∈G

gf =1|G|

∑α

cαjα. (5.4)

34 CHAPTER 5. FINITE GENERATION OF THE INVARIANT RING

It will therefore suffice to prove that every jα is a polynomial in the jβ with|β| <= |G|.

Let z1, . . . , zn be n new variables and define for j ∈ N the polynomials

pj(x1, . . . , xn, z1, . . . , zn) :=∑g∈G

(gx1 · z1 + · · ·+ gxn · zn)j . (5.5)

So these are the Newton polynomials (see Lecture 2), where we have substi-tuted the expressions (gx1 · z1 + · · · + gxn · zn) for the |G| variables. Ex-panding pj and sorting terms with respect to the variables zi, we see thatpj =

∑|α|=j fαz

α11 · · ·xαnn , where

fα =(

j

α1, . . . , αn

)jα. (5.6)

Now let j > |G|. Recall that pj is a polynomial in p1, . . . , p|G|. This impliesthat also the coefficients fα, |α| = j of pj are polynomials in the coefficientsfβ , |β| ≤ |G| of p1, . . . , p|G|. This finishes the proof, since

(j

α1,...,αn

) 6= 0 whenα1 + · · ·+ αn = j.

Exercise 5.1.2. Show that for all cyclic groups, the bound in the theorem ismet in some representation.

5.2 Exercises

For a finite group G, define β(G) to be the minimal number m such that forevery (finite dimensional) G-module W , the invariantring C[W ]G is generatedby the invariants of degree at most m. By Noether’s theorem, we always haveβ(G) ≤ |G|.Exercise 5.2.1. Let G be a finite abelian group. We use additive notation.Define the Davenport constant δ(G) to be the maximum length m of a non-shortable expression 0 = g1 + · · · + gm, g1, . . . , gm ∈ G. Non-shortable meansthat there is no strict non-empty subset I of {1, . . . , n} such that

∑i∈I gi = 0.

Show that δ(G) = β(G). Compute δ((Z/2Z)n).

Chapter 6

Affine varieties and thequotient map

6.1 Affine varieties

Definition 6.1.1. An affine variety is a subset of some Cn which is the commonzero set of a collection of polynomials in the coordinates x1, . . . , xn on Cn.

Suppose that S is a subset of C[x] := C[x1, . . . , xn] and let p ∈ Cn be acommon zero of the elements of S. Then any finite combination

∑i aifi where

the fi are in S and the ai are in C[x] also vanishes on p. The collection of allsuch polynomials is the ideal generated by S. So the study of affine varietiesleads naturally to the study of ideals in the polynomial ring C[x1, . . . , xn]. Wehave seen in Week 5 that such ideals are always finitely generated.

Exercise 6.1.2. Show that the collection of affine varieties in Cn satisfy thefollowing three properties:

1. Cn and ∅ are affine varieties;

2. the union of two affine varieties is an affine variety; and

3. the intersection of arbitrarily many affine varieties is an affine variety.

These conditions say that the affine varieties in Cn form the closed subsets ina topology on Cn. This topology is called the Zariski topology, after the Polish-American mathematician Otto Zariski (1899-1986). We will interchangeably usethe terms affine (sub)variety in Cn and Zariski-closed subset of Cn. Moreover, inthe last case we will often just say closed subset; when we mean closed subset inthe Euclidean sense rather than in the Zariski-sense, we will explicitly mentionthat.

Exercise 6.1.3. The Zariski-topology on Cn is very different from the Eu-clidean topology on Cn, as the answers to the following problems show:

35

36 CHAPTER 6. AFFINE VARIETIES AND THE QUOTIENT MAP

1. determine the Zariski-closed subsets of C;

2. prove that Rn is Zariski-dense in Cn (that is, the smallest Zariski-closedsubset of Cn containing Rn is Cn itself); and

3. show that every non-empty Zariski-open subset of Cn (that is, the com-plement of a Zariski-closed set) is dense in Cn.

On the other hand, in some other aspects the Zariski topology resembles theEuclidean topology:

1. show that Zariski-open subsets of Cn are also open in the Euclidean topol-ogy;

2. determine the image of the map φ : C2 → C3, (x1, x2)→ (x1, x1x2, x1(1+x2)), and show that its Zariski closure coincides with its Euclidean closure.

If you solved the last exercise correctly, then you found that the image is someZariski-closed subset minus some Zariski-closed subset plus some other Zariskiclosed subset. In general, the subsets of Cn that are generated by the Zariski-closed sets under (finitely many of) the operations ∪,∩, and complement, arecalled constructible sets. An important result due to the French mathematicianClaude Chevalley (1909-1984) says that the image of a constructible set under apolynomial map Cn → Cm is again a constructible set. Another important factis that the Euclidean closure of a constructible set equals its Zariski closure.

From undergraduate courses we know that C is an algebraically closed field,that is, that every non-constant univariate polynomial f ∈ C[x] has a root. Thefollowing multivariate analogue of this statement is the second major theoremof Hilbert’s that we will need.

Theorem 6.1.4 (Hilbert’s weak Nullstellensatz). Let I be an ideal in C[x] thatis not equal to all of C[x]. Then there exists a point ξ = (ξ1, . . . , ξn) such thatf(ξ) = 0 for all f ∈ I.

The theorem is also true with C replaced by any other algebraically closedfield. But we will give a self-contained proof that uses the fact that C is notcountable.

Lemma 6.1.5. Let U, V be vector spaces over C of countably infinite dimension,let A(x) : U⊗C[x]→ V ⊗C[x] be a C[x]-linear map, and let v(x) ∈ V ⊗C[x] be atarget vector. Suppose that for all ξ ∈ C there is a u ∈ U such that A(ξ)u = v(ξ).Then there exists a u(x) ∈ U ⊗ C(x) such that Au(x) = v(x).

Proof. Suppose, on the contrary, that no such u(x) exists. This means thatthe image under A(x) of the C(x)-vector space U ⊗ C(x) does not containv(x). Let F (x) be a C(x)-linear function on V ⊗ C(x) taking the value 0 onA(U ⊗ C(x)) and 1 on v(x); such a function exists and is determined by itsvalues f1(x), f2(x), f3(x), . . . ∈ C(x) on a C-basis v1, v2, v3, . . . of V . Since Cis uncountable there is a value ξ ∈ C where all fi are defined, so that F (ξ)

6.1. AFFINE VARIETIES 37

is a well-defined linear function on V . Now we have F (ξ)A(ξ)u = 0 for allu ∈ U but F (ξ)v(ξ) = 1, contradicting the assumption that A(ξ)u = v(ξ) has asolution.

Proof of the weak Nullstellensatz. We proceed by induction on n. For n = 0the statement is just that any proper ideal of C is 0. Now suppose that n >0 and that the statement is true for n − 1. By Hilbert’s basis theorem, theideal I is generated by finitely many polynomials f1, . . . , fk. If there exists avalue ξ ∈ C for xn such that the ideal in C[x1, . . . , xn−1] generated by f1,ξ :=f1(x1, . . . , xn−1, ξ), . . . , fk,ξ := (x1, . . . , xn−1, ξ) does not contain 1, then we canuse the induction hypothesis and we are done. Suppose therefore that no suchξ exists, that is, that 1 can be written as a C[x1, . . . , xn−1]-linear combination

1 =∑j

cj,ξfj,ξ

for every ξ ∈ C. We will use this fact in two ways. First, note that this meansthat ∑

j

cj,ξfj = 1 + (xn − ξ)gξ

for some polynomial gξ ∈ C[x1, . . . , xn]. Put differently, (xn − ξ) has a multi-plicative inverse modulo the ideal I for each ξ ∈ C. But then every univariatepolynomial in xn, being a product of linear ones since C is algebraically closed,has such a multiplicative inverse. Since 1 does not lie in I, this implies thatI ∩ C[xn] = {0}.

Second, by Lemma 6.1.5 applied to U = C[x1, . . . , xn−1]k, V = C[x1, . . . , xn−1],x = xn, and A(c1, . . . , ck) =

∑ki=1 cjfj , we can write

1 =k∑i=1

cj(xn)fj ,

where each cj(xn) lies in C[x1, . . . , xn−1](xn). Letting D(xn) ∈ C[xn] \ 0 be acommon denominator of the cj and setting c′j := Dcj ∈ C[x1, . . . , xn], we findthat

D(xn) =k∑i=1

c′jfj ∈ I.

But this contradicts our earlier conclusion that I does not contain non-zeropolynomials in xn only.

The Nulstellensatz has many applications to combinatorial problems.

Exercise 6.1.6. Let G = (V,E) be a finite, undirected graph with vertex setV and edge set E ⊆ (V2). A proper k-colouring of G is a map c : V → [k]with the property that c(i) 6= c(j) whenever {i, j} ∈ E. To G we associate


the polynomial ring C[xi | i ∈ V ] and its ideal I generated by the followingpolynomials:

xki − 1 for all i ∈ V ; and xk−1i + xk−2

i xj + . . .+ xk−1j for all {i, j} ∈ E.

Prove that G has a proper k-colouring if and only if 1 6∈ I.

Two important maps set up a beautiful duality between geometry and alge-bra. First, we have the map V that sends a subset S ⊆ C[x] to the variety V(S)that it defines; and second, the map I that sends a subset X ⊆ Cn to the idealI(X) ⊆ C[x] of all polynomials that vanish on all points in X. The followingproperties are straightforward:

1. if S ⊆ S′ then V(S) ⊇ V(S′);

2. if X ⊆ X ′ then I(X) ⊇ I(X ′);

3. X ⊆ V(I(X));

4. S ⊆ I(V(S));

5. V(I(V(S))) = V(S); and

6. I(V(I(X))) = I(X).

For instance, in (5) the containment ⊇ follows from (3) applied to X = V(S)and the containment ⊆ follows from (4) applied to S and then (1) applied toS ⊆ S′ := I(V(S)).

This shows that V and I set up an inclusion-reversing bijection between setsof the form V(S) ⊆ Cn—that is, affine varieties in Cn—and sets of the formI(X) ⊆ C[x]. Sets of the latter form are always ideals, but not all ideals are ofthis form, as the following example shows.

Example 6.1.7. Suppose that n = 1, fix a natural number k, and let Ik bethe ideal in C[x1] generated by xk1 . Then V(I) = {0} and I(V(I)) is the idealgenerated by x1. So for k > 1 the ideal Ik is not of the form I(X) for any subsetof Cn.

This example exhibits a necessary condition for an ideal to be of the formI(X) for some set X—it must be radical.

Definition 6.1.8. The radical of an ideal I ⊆ C[x] is the set of all polynomialsf of which some positive power lies in I; it is denoted

√I. The ideal I is called

radical if I =√I.

Indeed, suppose that I = I(X) and suppose that f ∈ C[x] has fk ∈ Ifor some k > 0. Then fk vanishes on X and hence so does f , and hencef ∈ I(X) = I. This shows that I is radical.

Exercise 6.1.9. Show that, for general ideals I,√I is an ideal containing I.

6.2. REGULAR FUNCTIONS AND MAPS 39

The second important result of Hilbert’s that we will need is that the con-dition that I be radical is also sufficient for I to be the vanishing ideal of someset X.

Theorem 6.1.10 (Hilbert’s Nullstellensatz). Suppose that I ⊆ C[x] is a radicalideal. Then I(V(I)) = I.

Proof. This follows from the weak Nullstellensatz using Rabinowitsch’s trickfrom 1929. Let g be a polynomial vanishing on all common roots of the polyno-mials in I. Introducing an auxilliary variable t, we have that the ideal in C[x, t]generated by I and tg−1 does not have any common zeroes. Hence by the weakNullstellensatz 1 can be written as

1 = a(tg − 1) +k∑i=1

cj(x, t)fj , a, cj ∈ C[x, t], fj ∈ I.

Replacing t on both sides by 1/g we have

1 =∑j

cj(x, 1/g)fj .

Multiplying both sides with a suitable power gd eliminates g from the denom-inators and hence expresses gd as a C[x]-linear combination of the fj . Hencegd ∈ I and therefore f ∈ I since I is radical.

We have thus set up an inclusion-reversing bijection between closed subsetsof Cn and radical ideals in C[x]. It is instructive to see what this bijection doeswith the smallest closed subsets consisting of a single point p = (p1, . . . , pn) ∈Cn. The ideal I(p) := I({p}) of polynomials vanishing on p is generated byx1 − p1, . . . , xn − pn (check this). This is a maximal ideal (that is, an idealwhich is maximal among the proper ideals of C[x1, . . . , xn]), since the quotientby it is the field C. This follows from the fact that, by definition, I is thekernel of the homomorphism of C-algebras C[x1, . . . , xn] → C, f 7→ f(p) andthat this homomorphism is surjective. Conversely, suppose that I is a maximalideal. Then it is radical—indeed, if the radical were strictly larger than I, itwould contain 1 by maximality, but then some power of 1 would be in I, acontradiction. Hence by the Nullstellensatz there exists a non-empty subset Xof Cn such that I = I(X). But then for any point p in X we have that I(p)is a radical ideal containing I, hence equal to I by maximality. We have thusproved the following corollary of the Nullstellensatz.

Corollary 6.1.11. The map sending p to I(p) is a bijection between points inCn and maximal ideals in C[x].

6.2 Regular functions and maps

Definition 6.2.1. Let X be an affine variety in Cn. Then a regular functionon X is by definition a C-valued function of the form f |X where f ∈ C[x].


Regular functions form a commutative C-algebra with 1, denoted C[X] (orsometimes O(X)) and sometimes called the coordinate ring of X. By definition,C[X] is the image of the restriction map C[x]→ {C−valued functions on X}, f 7→f |X . Hence it is isomorphic to the quotient algebra C[x]/I(X).

Example 6.2.2. 1. If X is a d-dimensional subspace of Cn, then I(X) isgenerated by the space X0 ⊆ (Cn)∗ of linear functions vanishing on X. Ify1, . . . , yd ∈ (Cn)∗ span a vector space complement of X0, then moduloI(X) every polynomial in the xi is equal to a unique polynomial in the yj .This shows that C[X] = C[y1, . . . , yd] is a polynomial ring in d variables.In terminology to be introduced below, X is isomorphic to the variety Cd.

2. Consider the variety X of (m+ 1)× (m+ 1)-matrices of the shape[x 00 y

]with x an m×m-matrix and y a complex number satisfying det(x)y = 1.Then C[X] = C[(xij)ij , y]/(det(x)y − 1). The map y 7→ 1/ det(x) sets upan isomorphism of this algebra with the algebra of rational functions in thevariables xij generated by the xij and 1/det(x). We therefore also writeC[X] = C[(xij)ij , 1/det(x)]. Note thatX is a group with respect to matrixmultiplication, isomorphic to GLn. This is the fundamental example ofan algebraic group; here algebraic refers to the variety structure of X.

3. Consider the variety X = M≤lk,m of all k × m-matrices all of whose (l +1)× (l+ 1)-minors (that is, determinants of (l+ 1)× (l+ 1)-submatrices)vanish. Elementary linear algebra shows that X consists of all matrices ofrank at most l, and that such matrices can always be written as AB withA ∈Mk,l, B ∈Ml,m.

Remark 6.2.3. In these notes a C-algebra is always a vector space A over Ctogether with an associative, bilinear multiplication A × A → A, such that Acontains an element 1 for which 1a = a = a1 for all a ∈ A. A homomorphismfrom A to a C-algebra B is a C-linear map φ : A → B satisfying φ(1) = 1 andφ(a1a2) = φ(a1)φ(a2) for all a1, a2 ∈ A. Most algebras that we will encounterare commutative.

Just like group homomorphisms are the natural maps between groups andcontinuous maps are the natural maps between topological spaces, regular mapsare the natural maps between affine varieties.

Definition 6.2.4. A regular map from an affine variety X to Cm is a mapφ : X → Cm of the form φ : x 7→ (f1(x), . . . , fm(x)) with f1, . . . , fm regularfunctions on X. If Y ⊆ Cm is an affine variety containing the image of φ, thenwe also call φ a regular map from X to Y .

Exercise 6.2.5. If ψ is a regular map from Y to a third affine variety Z, thenψ ◦ φ is a regular map from X to Z.

6.3. THE QUOTIENT MAP 41

Lemma 6.2.6. If X ⊆ Cn and Y ⊆ Cm are affine varieties, and if φ : X → Yis a regular map, then the map φ∗ : f 7→ f ◦φ is a homomorphism of C-algebrasfrom C[Y ] C[X].

Proof. Suppose that φ is given by regular functions (f1, . . . , fm) on X. Thenφ∗ sends the regular function h|Y ∈ C[Y ], where h is a polynomial in thecoordinates y1, . . . , ym on Cm, to the function h(f1, . . . , fm), which is clearly aregular function on C[X]. This shows that φ∗ maps C[Y ] to C[X]. One readilyverifies that φ∗ is an algebra homomorphism.

Note that if ψ : Y → Z is a second regular map, then φ∗ ◦ ψ∗ = (ψ ◦ φ)∗.

Definition 6.2.7. If X ⊆ Cn and Y ⊆ Cm are affine varieties, then an isomor-phism from X to Y is a regular map whose inverse is also a regular map. Thevarieties X and Y are called isomorphic if there is an isomorphism from X toY .

Lemma 6.2.8. If X ⊆ Cn and Y ⊆ Cm are isomorphic varieties, then C[X]and C[Y ] are isomorphic C-algebras.

Proof. If φ : X → Y, ψ : Y → X are a regular maps such that ψ ◦ φ = idXand φ ◦ ψ = idY , then φ∗ ◦ ψ∗ = idC[X] and ψ∗ ◦ φ∗ = idC[Y ], hence these twoalgebras are isomorphic.

Example 6.2.9. The affine variety X = C1 and the affine variety Y = {(x, y) ∈C2 | y−x2 = 0} are isomorphic, as the regular maps φ : X → Y, t 7→ (t, t2) andψ : Y → X, (x, y) 7→ x show.

Exercise 6.2.10. Prove that X = C1 is not isomorphic to the variety Z ={(x, y) ∈ C2 | xy − 1 = 0}.

6.3 The quotient map

Let G be a group and let W be a finite-dimensional G-module such that C[W ] =⊕SkW ∗ is a completely reducible G-module. By Hilbert’s finiteness theorem,

we know that the algebra C[W ]G of G-invariant polynomials is a finitely gener-ated algebra. Let f1, . . . , fk be a generating set of this algebra. Then we havea polynomial map

π : W → Ck, w 7→ (f1(w), . . . , fk(w)).

This map is called the quotient map, because in some sense, which will becomeclear below, the image of this map parameterises G-orbits in W .

Example 6.3.1. Take G := {−1, 1} with its action on W := C2 where −1 sends(x, y) to (−x,−y). The invariant ring C[W ]G is generated by the polynomialsf1 := x2, f2 := xy, f3 := y2 (check this). Thus the quotient map is π : C2 →C3, (x, y) 7→ (x2, xy, y2). Let u, v, w be the standard coordinates on C3, and


note that the image of π is contained in the affine variety Z ⊆ C3 with equationv2 − uw. Indeed, π is surjective onto Z: let u, v, w be complex numbers suchthat v2 = uw. Let x ∈ C be a square root of u. If x 6= 0, then set y := v/x sothat v = xy and w = v2/u = x2y2/x2 = y2. If x = 0, then let y be a squareroot of w. In both cases π(x, y) = (u, v, w). Note that the fibres of π over everypoint (u, v, w) are orbits of G.

Example 6.3.2. Take G := Sn with its standard action on W := Cn. Recallthat the invariants are generated by the elementary symmetric polynomialsσ1(x) =

∑i xi, . . . , σn(x) =

∏i xi. This gives the quotient map

π : Cn → Cn, (x1, . . . , xn) 7→ (σ1(x), . . . , σn(x)).

We claim that the image of this map is all of Cn. Indeed, for any n-tuple(c1, . . . , cn) ∈ Cn consider the polynomial

f(t) := tn − c1tn−1 + . . .+ (−1)ncn.

As C is algebraically closed, this polynomial has n roots (counted with multi-plicities), so that we can also write

f(t) = (t− x1) · · · (t− xn);

expanding this expression for f and comparing with the above gives π(x1, . . . , xn) =(c1, . . . , cn). Note furthermore that any other n-tuple (x′1, . . . , x

′n) with this

property is necessarily some permutation of the xi, since the roots of f aredetermined by (c1, . . . , cn).

This means that we can think of π : Cn → Cn as follows: every Sn-orbiton the first copy of Cn is “collapsed” by π to a single point on the second copyof Cn, and conversely, the fibre of π above any point in the second copy of Cnconsists of a single Sn-orbit. Thus the second copy of Cn parameterises Sn-orbitson Cn.

Example 6.3.3. Let the group G := SL2 act on the W := M2(C) of 2 × 2-matrices by left multiplication. If A has rank 2, then left-multiplying by asuitable g ∈ SL2 gives

gA =[1 00 det(A)

].

Now det is a polynomial on M2(C) which is SL2-invariant. We claim that itgenerates the invariant ring. Indeed, suppose f ∈ C[a11, a12, a21, a22] is anyinvariant. Then for A of rank two we find that

f(A) = f(gA) = f(1, 0, 0,det(A)) =: h(detA)

where h is a polynomial in 1 variable. Since both f and h are continous, andsince the rank-two matrices are dense in M2, we find that this equality actuallyholds for all matrices. Hence f(A) = h(detA) for all A, so f is in the algebragenerated by det.

6.3. THE QUOTIENT MAP 43

In this case the quotient map π : M2(C) → C is just the map A 7→ det(A).The fibre above a point d ∈ C∗ is just the set of 2× 2-matrices of determinantd, which is a Zariski-closed set and a single SL2-orbit. The fibre above 0 ∈ C isthe set of all matrices of rank ≤ 1. This is, of course, also a Zariski closed set,but not a single orbit—indeed, it consists of the closed orbit consisting of thezero matrix and the non-closed orbit consisting of all matrices of rank 1. Notethat the latter orbit has 0 in its closure.

These two examples illustrate the general situation: for finite G the fibresof the quotient map are precisely the orbits of G, while for infinite G they arecertain G-stable closed sets.

Theorem 6.3.4. Let Z denote the Zariski closure of π(W ), that is, the set ofall points in Cm that satisfy all polynomial relations that are satisfied by theinvariants f1, . . . , fk. The quotient map π has the following properties:

1. π(gw) = π(w) for all g ∈ G, w ∈W ;

2. the fibres of π are G-stable, Zariski-closed subsets of W ;

3. for any regular (polynomial) map ψ : W → Cm that satisfies ψ(gw) =ψ(w) for all g ∈ G there exists a unique regular map φ : Z 7→ Cm suchthat φ ◦ π = ψ.

4. π is surjective onto Z;

Proof. 1. π(gw) = (f1(gw), . . . , fk(gw)) = (f1(w), . . . , fk(w)) because the fiare invariant.

2. If w ∈ π−1(z), then π(gw) = π(w) = z, so gw ∈ π−1(z).

3. Let y1, . . . , ym be the standard coordinates on Cm. Then yi ◦ ψ is a G-invariant polynomial on W for all i. As the fj generate these polynomials,we may write yi ◦ ψ as gi(f1, . . . , fk) for some k-variate polynomial gi.Now the regular map φ : Z 7→ U, z 7→ (g1(z), . . . , gm(z)) has the requiredproperty. Notice that the gi need not be unique. However, the mapZ → Cm with the required property is unique: if φ1, φ2 both have theproperty, then necessarily φ1(π(w)) = φ2(π(w)) = ψ(w) for all w ∈ W ,so that φ1 and φ2 agree on the subset imπ of Z. Since Z is the Zariskiclosure of this set, φ1 and φ2 need to agree everywhere. (In fact Z = imπas we will see shortly.)

4. Let z ∈ Z. This means that the coordinates of z satisfy all polynomialrelations satisfied by f1, . . . , fk, hence there exists a homomorphism φ :C[W ]G = C[f1, . . . , fk] → C of C-algebras sending fi to zi. The kernelof this homomorphism is a maximal ideal Mz in C[W ]G. We claim thatthere exists a maximal ideal M ′ in C[W ] whose intersection with C[W ]G

is Mz. Indeed, let I be the ideal in C[W ] generated by Mz. We only needto show that I 6= C[W ]; then the axiom of choice implies the existence of


a maximal ideal containing I. Suppose, on the contrary, that I 3 1, andwrite

1 =l∑i=1

aihi with all ai ∈ C[W ], hi ∈Mz.

Since C[W ] is completely reducible as a G-module, there exists a Reynoldsoperator ρ. Applying ρ to both sides of the equality yields

1 =l∑i=1

ρ(ai)hi,

where the ρ(ai) are in C[W ]G. But this means that 1 lies in Mz, a contra-diction to the maximality of the latter ideal. This proves the claim thatsuch an M ′ exists. The maximal ideal M ′ is the kernel of evaluation atsome point w ∈ W by the discussion of maximal ideals after the Null-stellensatz. Thus we have found a point w ∈ W with the property thatevaluating fi at w gives zi. Thus π(w) = z and we are done.

Remark 6.3.5. By (3) Z is independent of the choice of generators of C[W ]G

in the following sense: any other choice of generators of CG yields a variety Z ′

with a G-invariant map π′ : W → Z ′, and (3) shows that there exist regularmaps φ : Z → Z ′ and φ′ : Z ′ → Z such that φ ◦ φ′ = idZ and φ′ ◦ φ = idZ′ .

Exercise 6.3.6. LetG = C∗ act onW = C4 by t(x1, x2, y1, y2) = (tx1, tx2, t−1y1, t

−1y2).

1. Find generators f1, . . . , fk of the invariant ring C[W ]G.

2. Determine the image Z of the quotient map π : W → Ck, w 7→ (f1(w), . . . , fk(w)).

3. For every z ∈ Z determine the fibre π−1(z).

Chapter 7

The null-cone

Let G be a group and let W be a finite-dimensional G-module. We have seenin Example 6.3.3 that G-orbits on W cannot always be separated by invariantpolynomials on W . Here is another example of this phenomenon.

Example 7.0.7. Let G = GLn(C) act on W = Mn(C) by conjugation. Wehave seen in week 1 that the invariant ring is generated by the coefficients ofthe characteristic polynomial χ. This means that the map π sending A to χAis the quotient map. By exercise 1.5.5 each fibre π−1(p) of π, where p is amonic univariate polynomial of degree n, contains a unique conjugacy class ofdiagonalisable matrices. This conjugacy class is in fact Zariski closed, since itis given by the additional set of equations

(A− λ1) · · · (A− λk) = 0

where λ1, . . . , λk are the distinct eigenvalues of p. We claim that all other (non-diagonalisable) conjugacy classes are not Zariski closed. Indeed, if A is notdiagonalisable, then after a conjugation we may assume that A is of the formD+N with D diagonal and N strictly upper triangular (e.g., move A to Jordannormal form). Conjugating A = D + N with a diagonal matrix of the formdiag(tn−1, . . . , t0), t ∈ C∗ multiplies the (i, j)-entry of A with tj−i. Hence fori ≥ j the entries of A do not change, while for i < j they are multiplied bya positive power of t. Letting t tend to 0 we find that the result tends to D.Hence D lies in the Euclidean closure of the conjugacy class of A, hence also inthe Zariski closure.

Note in particular the case where D = 0, i.e., where A is nilpotent. Then thecharacteristic polynomial of A is just xn, i.e., all invariants with zero constantterm vanish on A, and the argument above shows that 0 lies in the closure ofthe orbit of A. This turns out to be a general phenomenon.

Definition 7.0.8. The null-cone NW of the G-module W is the set of all vectorsw ∈W on which all G-invariant polynomials with zero constant term vanish.

45

46 CHAPTER 7. THE NULL-CONE

Thus, the null-cone of the module Mn(C) with the conjugation action ofGLn(C) on Mn(C) consists of all nilpotent matrices, and the null-cone of themodule Mn(C) with the action of SLn(C) by left multiplication consists of allsingular matrices.

Exercise 7.0.9. Check the second statement by verifying that the invariantring is generated by det.

Remark 7.0.10. Suppose that the invariant ring C[W ]G is generated by finitelymany invariant functions f1, . . . , fk, each with zero constant term. Let π be thecorresponding quotient map W → Ck. Prove that the null-cone NW is just thefibre π−1(0) above 0 ∈ Ck.

Exercise 7.0.11. Show that if G is finite, then the null-cone consists of 0 alone.

We want to describe the structure of the null-cone for one class of groups,namely, tori. The resulting structure theorem actually carries over mutatismutandis to general semisimple algebraic groups, and we will see some examplesof that fact later.

Definition 7.0.12. The n-dimensional torus Tn is the group (C∗)n.

For any n-tuple α = (a1, . . . , an) ∈ Zn we have a homomorphism

ρα : Tn → C∗, t = (t1, . . . , tn) 7→ tα := ta11 · · · tann ,

and hence a one-dimensional representation of Tn. Let W be a finite directsum of m such one-dimensional representations of Tn, so W is determined by asequence A = (α1, . . . , αm) of lattice points in Zn (possibly with multiplicities).Relative to a basis consisting of one vector for each αi, the representation Tn →GL(W ) is just the matrix representation

t 7→

tα1

. . .tαm

.We think of αi = (ai1, . . . , ain) as the i-th row of the m × n-matrix A. Letx = (x1, . . . , xm) denote the corresponding coordinate functions on W . Then(t1, . . . , tn) acts on the variable xi by

∏nj=1 t

−ai,jj —recall that the action on func-

tions involves taking an inverse of the group element—and hence on a monomialxu, u ∈ Nm by

m∏i=1

n∏j=1

t−uiai,jj =

n∏j=1

t(−uA)jj ,

where uA is the row vector obtained by left-multiplying A by u. This impliestwo things: first, all monomials appearing in any Tn-invariant polynomial onW are themselves invariant, so that C[W ]Tn is spanned by monomials in thexi, and second, the monomial xu is invariant if and only if uA = 0.

47

Definition 7.0.13. For w ∈ W let supp(w), the support of w, be the set ofαi ∈ Zn for which xi(w) 6= 0.

Theorem 7.0.14. The null-cone of the Tn-module W consists of all vectors wsuch that 0 does not lie in the convex hull of supp(w) ⊆ Zn ⊆ Rn.

Proof. Suppose first that 0 does not lie in that convex hull. Then there existsa vector β = (b1, . . . , bn) ∈ Zn such that β · α > 0 for all α ∈ supp(w); here · isthe dot product. This means that the vector

λ(t) = (tb1 , . . . , tbn), t ∈ C∗

acts by a strictly positive power of t on all non-zero components of w. Hencefor t → 0 the vector λ(t)w tends to 0. Hence each Tn-invariant polynomial fon W satisfies

f(w) = f(λ(t)w)→ f(0), t→ 0;

here the equality follows from the fact that f is invariant and the limit followsfrom the fact that f is continuous. Hence w is in the null-cone NV .

Conversely, suppose that 0 lies in the convex hull of the support of w. Thenwe may write 0 as u1α1 + . . . + umαm where the ui are natural numbers andnot all zero and where ui > 0 implies that αi lies in the support of w. ThenuA = 0, so xu is a non-constant invariant monomial, which moreover does notvanish on w since the only variables xi appearing in it have xi(w) 6= 0. Hencew does not lie in the null-cone of Tn on W .

Exercise 7.0.15. Let T be the group of invertible diagonal n×n-matrices, andlet T act on Mn(C) by conjugation, that is,

t ·A := tAt−1, t ∈ T, A ∈Mn(C).

Prove that A lies in the null-cone of T on Mn(C) if and only if there exists apermutation matrix P such that PAP−1 is strictly upper triangular.

Exercise 7.0.16. Let T be the group of diagonal 3× 3-matrices with determi-nant 1. Let U = C3 be the standard T -module with action

diag(t1, t2, t3)(x1, x2, x3) := (t1x1, t2x2, t3x3),

and consider the 9-dimensional T -module W = U⊗2.

1. Show that T ∼= T2.

2. Determine the irreducible T -submodules of W .

3. Draw the vectors αi forW in the plane, and determine all possible supportsof vectors in the null-cone of T on W .

48 CHAPTER 7. THE NULL-CONE

Exercise 7.0.17. Let G be the group SL2(C) acting on the space U = C2 andlet W be the space S2U with the standard action of G given by[a bc d

](x11e

21+x12e1e2+x22e

22) = x11(ae1+ce2)2+x12(ae1+ce2)(be1+de2)+x22(be1+de2)2,

where e1, e2 are the standard basis of C2.

1. Determine the invariant ring C[W ]G.

2. Determine the fibres of the quotient map.

3. Determine the null-cone.

Chapter 8

Molien’s theorem andself-dual codes

Let W⊕∞

d=0Wd be a direct sum of finite dimensional (complex) vector spacesWd. The Hilbert series (or Poincare series) H(W, t) is the formal power seriesin t defined by

H(V, t) :=∞∑d=0

dim(Vd)td, (8.1)

and encodes in a convenient way the dimensions of the vector spaces Wd. Inthis lecture, W will usually be the vector space C[V ]G of polynomial invariantswith respect to the action of a group G, where Wd is the subspace of invariantshomogeneous of degree d.

Example 8.0.18. Taking the polynomial ring in one variable, the Hilbert seriesis given by H(C[x], t) = 1 + t + t2 + · · · = 1

1−t . Similarly, H(C[x1, . . . , xn]) =1

(1−t)n .

Exercise 8.0.19. Let f1, . . . , fk ∈ C[x1, . . . , xn] be algebraically independenthomogeneous polynomials, where fi has degree di. Show that the Hilbert seriesof the subalgebra generated by the fi is given by

H(C[f1, . . . , fk], t) =1∏k

i=1(1− tdi). (8.2)

Example 8.0.20. Consider the action of the group G of order 3 on C[x, y]induced by the linear map x 7→ ζ3x, y 7→ ζ−1

3 y, where ζ3 is a third root of unity.Clearly, x3, y3 and xy are invariants and C[x, y]G = C[x3, y3, xy]. In fact, x3

and y3 are algebraically independent, and

C[x, y]G = C[x3, y3]⊕ C[x3, y3]xy ⊕ C[x3, y3](xy)2. (8.3)

Since H(C[x3, y3], t) = 1(1−t3)2 , we obtain H(C[x, y]G, t) = 1+t2+t4

(1−t3)2 .

Exercise 8.0.21. Compute the Hilbert series of C[x2, y2, xy].

49

50 CHAPTER 8. MOLIEN’S THEOREM AND SELF-DUAL CODES

8.1 Molien’s theorem

For finite groups G, it is possible to compute the Hilbert series directly, withoutprior knowlegde about the generators. This is captured in the following beautifultheorem of Molien.

Theorem 8.1.1 (Molien’s Theorem). Let ρ : G → GL(V ) be a representationof a finite group on a finite dimensional vector space V . Then the Hilbert seriesis given by

H(C[V ]G, t) =1|G|

∑g∈G

1det(I − ρ(g)t)

. (8.4)

Proof. Consider the action of G on C[V ] induced by the representation ρ. De-note for g ∈ G and d ∈ N by Ld(g) ∈ GL(C[V ]d) the linear map correspond-ing to the action of g ∈ G on the homogeneous polynomials of degree d. SoL1(g) = ρ∗(g).

The linear map πd := 1|G|∑g∈G Ld(g) is a projection onto C[V ]Gd . That is,

πd(p) ∈ C[V ]Gd for all p ∈ C[V ]d and πd is the identity on C[V ]Gd . It follows thattr(πd) = dim(C[V ]Gd ). This gives:

H(C[V ]G, t) =1|G|

∑g∈G

∞∑d=0

tr(Ld(g)). (8.5)

Now lets fix an element g ∈ G and compute the inner sum∑∞d=0 tr(Ld(g)).

Pick a basis x1, . . . , xn of V ∗ that is a system of eigenvectors for L1(g), sayL1(g)xi = λixi. Then the monomials in x1, . . . , xn of degree d for a system ofeigenvectors of Ld(g) with eigenvalues given by:

Ld(g) · xd11 · · ·xdnn = λd11 · · ·λdnn · xd11 · · ·xdnn (8.6)

for all d1 + · · ·+ dn = d. It follows that∞∑d=0

tdtr(Ld(g)) = (1 + λ1t+ λ21t

2 + · · · ) · · · (1 + λnt+ λnt2 + · · · )

=1

1− λ1t· · · 1

1− λnt =1

det(I − L1(g)t). (8.7)

Using the fact that for every g the equality det(I−L1(g)t) = det(I−ρ(g−1)t)holds and combining equations (8.5) and (8.7), we arrive at

H(C[V ]G, t) =1|G|

∑g∈G

∞∑d=0

tr(Ld(g))

=1|G|

∑g∈G

1det(I − ρ(g−1)t)

=1|G|

∑g∈G

1det(I − ρ(g)t

, (8.8)

8.2. LINEAR CODES 51

where the last equality follows by changing the order in which we sum over G.This completes the proof.

Exercise 8.1.2. Let U ⊂ V be finite dimensional vector spaces and let π : V →U be the identity on U . Show that tr(π) = dim(U). Hint: write π as a matrixwith respect to a convenient basis.

Example 8.1.3. Consider again the action of the group G = Z/3Z on C[x, y]induced by the linear map x 7→ ζx, y 7→ ζ−1y, where ζ is a third root of unity.Using Molien’s theorem, we find

H(C[x, y]G, t) =13

(1

(1− t)(1− t) +1

(1− ζt)(1− ζ2t)+

1(1− ζ2t)(1− ζt)

).

(8.9)A little algebraic manipulation and the fact that (1− ζt)(1− ζ2t) = (1 + t+ t2)shows this to be equal to

(1− t+ t2)(1 + t+ t2)(1− t)2(1− ζt)2(1− ζ2t)2

=1 + t2 + t4

(1− t3)2. (8.10)

Since this is equal to the Hilbert series of C[x3, y3, xy], we obtain as a by-product that the invariant ring is indeed generated by the three invariants x3,y3 and xy.

Exercise 8.1.4. Let G be the matrix group generated by A,B ∈ GL2(C) givenby

A :=(i 00 −i

), B :=

(0 1−1 0

). (8.11)

• Use Molien’s theorem to prove that the Hilbert series of C[x, y]G is givenby H(C[x, y]G, t) = 1+t6

(1−t4)2 .

• Find algebraically independent invariants f1, f2 of degree 4 and a thirdinvariant f3 of degree 6, such that C[x, y]G = C[f1, f2]⊕ C[f1, f2]f3.

8.2 Linear codes

A linear code is a linear subspace C ⊆ Fnq , where Fq is the field of q elements.The number n is called the length of the code. In the following, we will onlyconsider binary codes, that is, q = 2. The weight w(u) of a word u ∈ Fn2 is thenumber of nonzero positions in u, that is, w(u) := |{i | ui = 1}|. The Hammingdistance d(u, v) between two words is defined as the number of positions inwhich u and v, differ: d(u, v) = w(u− v).

A code C ⊆ Fn2 is called an [n, k, d]-code if the dimension of C is equal tok and the smallest Hamming distance between two distinct codewords is equalto d. In the setting of error correcting codes, messages are transmitted usingwords from the set of 2k codewords. If at most (d− 1)/2 errors are introduced


(by noise) into a codeword, the original can still be recovered by finding theword in C at minimum distance from the distorted word. The higher d, themore errors can be corrected and the higher k, the higher the information rate.

Much information about a code, including the parameters d and k, can beread of from its weight enumerator WC . This is the polynomial in x, y andhomogeneous of degree n, defined by

WC(x, y) :=n∑i=0

Aiyixn−i, Ai := |{u ∈ C | w(u) = i}|. (8.12)

Observe that the coefficient of xn in WC is always equal to 1, since C containsthe zero word. The number 2k of codewords equals the sum of the coefficientsA0, . . . , An and d is the smallest positive index i for which Ai > 0.

For a code C ⊆ Fn2 , the dual code C⊥ is defined by

C⊥ := {u ∈ Fn2 | u · c = 0 for all c ∈ C}, where u · c := u1c1 + · · ·+ uncn.(8.13)

Exercise 8.2.1. Check that the dimensions of a code C ⊆ Fn2 and its dual C⊥

sum to n.

The MacWilliams identity relates the weight enumerator of a code C andthat of its dual C⊥.

Proposition 8.2.2. Let C ⊆ Fn2 be a code. The weight enumerator of C⊥

satisfies

WC⊥(x, y) =1|C|WC(x+ y, x− y). (8.14)

Exercise 8.2.3. Prove the MacWilliams identity. Hint: let

f(u) :=∑v∈Fn2

xn−w(v)yw(v)(−1)u·v, (8.15)

and compute∑c∈C f(c) in two ways.

A code is called self-dual if C = C⊥. This implies that n is even and thedimension of C equals n/2. Furhermore, we have for every c ∈ C that c · c = 0so that w(c) is even. If every word in C has weight divisible by 4, the code iscalled even.

Exercise 8.2.4. An example of an even self-dual code is the extended Hammingcode spanned by the rows of the matrix

0 1 0 1 0 1 0 10 0 1 1 0 0 1 10 0 0 0 1 1 1 11 1 1 1 1 1 1 1

. (8.16)

8.2. LINEAR CODES 53

That this code is self-dual follows from the fact that it has dimension 4 and anytwo rows of the given matrix have dot product equal to 0. To see that it is aneven code, observe that the rows have weights divisible by 4 and that for anytwo words u, v with weights divisible by four and u ·v = 0, also u+v has weightdivisible by four.

Consider an even, self-dual code C. Then its weight enumerator must satisfy

WC(x, y) = WC(x+ y√

2,x− y√

2), WC(x, y) = WC(x, iy). (8.17)

Here the first equality follows from Proposition 8.2.2 and the fact that |C| =(√

2)n. The second equality follows from the fact that all weights are divisibleby 4. But this means that WC is invariant under the group G generated by thematrices

A :=1√2

(1 11 −1

), B :=

(1 00 i

), (8.18)

a group of 192 elements!

Exercise 8.2.5. Let ζ = e2πi8 be a primitive 8-th root of unity. Show that the

group G defined above is equal to the set of matrices

ζk(

1 00 α

), ζk

(0 1α 0

), ζk

1√2

(1 βα αβ

), (8.19)

where α, β ∈ {1, i,−1,−i} and k = 0, . . . , 7.

What can we say about the invariant ring C[x, y]G? Using Molien’s theorem,we can find the Hilbert series. A (slightly tedious) computation gives

H(C[x, y]G) =1

(1− t8)(1− t24). (8.20)

This suggests that the invariant ring is generated by two algebraically indepen-dent polynomials f1, f2 homogeneous of degrees 8 and 24 respectively. This isindeed the case, just take f1 := x8 + 14x4y4 + y8 and f2 := x4y4(x4 − y4)4.So the invariant ring is generated by f1 and f2, which implies the followingpowerful theorem on the weight enumerators of even self-dual codes.

Theorem 8.2.6 (Gleason). The weight enumerator of an even self-dual code isa polynomial in x8 + 14x4y4 + y8 and x4y4(x4 − y4).

Exercise 8.2.7. The Golay code is an even self-dual [24, 12, 8]-code. Use The-orem 8.2.6 to show that the weight enumerator of the Golay code equals

x24 + 759x16y8 + 2576x12y12 + 759x8y16 + y24. (8.21)

Exercise 8.2.8. There exists an even self-dual code C ⊆ F402 , that contains no

words of weight 4. How many words of weight 8 does C have?

Exercise 8.2.9. Let G be the group generated by the matrices 1√2

(1 11 −1

)and(

1 00 −1

). Use Molien’s theorem to compute the Hilbert series of C[x, y]G and find

a set of algebraically independent generators.

Chapter 9

Algebraic groups

9.1 Definition and examples

Definition 9.1.1. A linear algebraic group is a Zariski closed subgroup of someGLn.

In more down-to-earth terms, a linear algebraic group is a subgroup G ofGLn (with matrix product as group operation) which moreover is the zero setof some polynomial equations in the matrix entries. The algebra of regularfunctions C[G] on G is therefore C[x11, x12, . . . , xnn,

1det(x) ]/I(G) where I(G) is

the ideal of all regular functions on GLn that vanish on G. We will usually dropthe adjective linear and just say algebraic group. (Their theory, however, is verydifferent from the theory of elliptic curves and other Abelian varieties, whichare algebraic groups in other contexts.)

Example 9.1.2. 1. GLn itself is a linear algebraic group, with zero definingideal.

2. Often we will find it convenient not to specify a basis, and work withGL(V ) (and its subgroups), where V is an n-dimensional complex vectorspace, rather than with GLn. There is a basis-independent description ofthe algebra of regular functions on GL(V ), as well: it is the algebra offunctions on GL(V ) generated by End(V )∗ and 1/ det.

3. On := {g ∈ GLn | gT g = 1} is a linear algebraic group, called the orthogo-nal group. Its ideal turns out to be generated by the quadratic polynomialsof the form

∑i xijxil− δjl for j, l = 1, . . . , n (it is obvious that these poly-

nomials are contained in the ideal of On, but not that they generate aradical ideal).

4. The following is a basis-free version of the orthogonal group: let β be anon-degenerate symmetric bilinear form on an n-dimensional vector spaceV . Then O(β) := {g ∈ GL(V ) | β(gv, gw) = β(v, w) for all v, w ∈ V } isisomorphic, in the sense defined below, to On.

55

56 CHAPTER 9. ALGEBRAIC GROUPS

5. SLn := {g ∈ GLn | det(g) = 1} is a linear algebraic group, whose ideal isgenerated by det(x)− 1.

6. Ga :={[

1 b0 1

]| b ∈ C

}is an algebraic group, called the additive group.

Its ideal is generated by the polynomials x11 − 1, x22 − 1, x21, and thequotient by this ideal is isomorphic to C[x12].

7. Every finite group G “is” a linear algebraic group, because it can be re-alised as a finite (and hence Zariski-closed) subgroup of GL|G| using theleft regular representation.

8. Let A be a finite-dimensional C-algebra. Then the set of all automor-phisms of A is a linear algebraic subgroup of GL(A). Indeed, choosing abasis 1 = a1, . . . , an of A, the condition that a linear map g ∈ GL(A) is anautomorphism of A is that ga1 = a1 and (gai)(gaj) = g(aiaj); this gives(at most) quadratic equations for the entries of the matrix of g relative tothe basis a1, . . . , an.

9. The group of nonsingular diagonal n × n-matrices is a linear algebraicgroup, called the n-dimensional torus Tn. The torus T1 is also called themultiplicative group Gm or denoted GL1.

The name torus stems from the following relation with the real n-dimensionaltorus.

Exercise 9.1.3. Let G be the subset of Tn of all matrices whose diagonalentries lie on the unit circle in C. Show that G (an n-dimensional real torus) isa subgroup which is dense in Tn in the Zariski topology.

Exercise 9.1.4. Prove directly that the polynomials∑j xijxkj−δik with i, k =

1, . . . , n are contained in the ideal generated by the polynomials∑i xijxil − δil

with j, l = 1, . . . , k.

Remark 9.1.5. The map (C,+)→ Ga sending b to the matrix in the definitionof Ga is an isomorphism of abstract groups, which moreover is given by a regularmap whose inverse is also regular. We will therefore also consider (C,+) analgebraic group. This is consistent with a more abstract definition of (affine)algebraic groups as affine varieties with a compatible group structure.

Exercise 9.1.6. Prove that det(x)−1 is an irreducible polynomial in the entriesof x. (This shows that the ideal of SLn is, indeed, generated by det(x)− 1.)

Definition 9.1.7. Let G ⊆ GLn and H ⊆ GLm be algebraic groups. Analgebraic group homomorphism from G to H is a group homomorphism G→ Hwhich moreover is a regular map.

The latter condition means, very explicitly, that the homomorphism is givenby m2 functions φij : G → GLm which are restrictions to G of polynomials inthe matrix entries xij of GLn and 1/ det(x).

9.1. DEFINITION AND EXAMPLES 57

Example 9.1.8. Consider the map φ : GLn → GL(Mn) that sends g to thelinear map φ(g) : A 7→ gAg−1. This is an algebraic group homomorphism.Indeed, it is clearly a group homomorphism, so we need only verify that it is aregular map. This means that the n2×n2-matrix of φ(g) relative to the basis ofelementary matrices Eij (with zeroes everywhere but for a 1 on position (i, j))of Mn depends polynomially on the entries and the inverse of the determinantof g. But this follows from

(gEpqg−1)ij =∏k

∏l

gikδkpδql(g−1)lj

and Cramer’s rule that expresses the entries of g−1 in those of g.We claim that the image of φ is itself an algebraic group, that is, that it

is a Zariski-closed subgroup of GL(Mn). Note that it is certainly a group,isomorphic to GLn/ kerφ, where kerφ consists of the scalar matrices; this groupis called the projective (general) linear group and denoted PGLn. Note also thatimφ is contained in the group Aut(Mn) of automorphisms of the n2-dimensionalalgebra Mn. We prove that imφ = Aut(Mn). For this, note that the matricesEij , i, j = 1, . . . , n satisfy∑

i

Eii = I and EijEkl = δjkEil.

Now if α is any automorphism of Mn, then the matrices Fij := α(Eij), i, j =1, . . . , n will satisfy the same relations. Let Vi be the image of Fii. ThenVi 6= 0 since Fii 6= 0, and the relations

∑i Fii = I and FiiFii = Fii and

FiiFjj = 0, i 6= j imply that

Cn =n⊕i=1

Vi.

Hence each Vi must be one-dimensional. Let v1 be a basis of V1, and set vi :=Fi1v1, i = 2, . . . , n. Then vi 6= 0 since 0 6= Fi1 = Fi1F11, and vi spans Visince Fiivi = FiiFi1v1 = Fi1v1 = vi. Now there is a unique g ∈ GLn suchthat gei = vi, and we claim that φ(g) = α. Indeed, on the one hand we havegEijek = δjkvi, and on the other hand we have

Fijgek = Fijvk = FijFk1v1 = δjkFi1v1 = δjkvi,

which shows that gEijg−1 = Fij for all i, j = 1, . . . , n, so that α = φ(g), asclaimed.

Remark 9.1.9. In fact, the image of any homomorphism of algebraic groupsis Zariski-closed, and hence an algebraic group itself. This is a fundamentalproperty of algebraic groups, which for instance Lie groups do not share. Toprove this property, howeover, one needs slightly more algebraic geometry thanwe have at our avail here.

Definition 9.1.10. An algebraic group isomorphism is an algebraic group ho-momorphism from G to H that has an inverse which is also an algebraic grouphomomorphism.


Exercise 9.1.11. Determine all algebraic group automorphisms of Ga and ofTn.

Remark 9.1.12. If there exists an algebraic group isomorphism between alge-braic groups G ⊆ GLn and H ⊆ GLm, then one can use this isomorphism toprove that C[G] = C[GLn]/I(G) and C[H] = C[GLm]/I(H) are isomorphic, aswell. Just like with affine varieties discussed earlier, this allows one to thinkof an algebraic group abstractly, without thinking of one particular closed em-bedding into some matrix group. We will sometimes implicitly adopt this moreabstract point of view.

Definition 9.1.13. A finite-dimensional rational representation of an algebraicgroup G is an algebraic group homomorphism ρ : G → GL(V ) for some finite-dimensional vector space V , which is then called a rational G-module.

A locally finite rational representation of G is a group homomorphism ρ fromG into the group GL(V ) of bijective linear maps from some potentially infinitevector space V into itself with the following additional property: for each v ∈ Vthere is a finite-dimensional subspace U of V containing v which is ρ(G)-stableand for which the induced homomorphism ρ : G→ GL(U) is an algebraic grouphomomorphism, that is, regular.

In both cases we will write gv instead of ρ(g)v when ρ is clear from thecontext. We will also use the word module for the space V equipped with itslinear G-action.

Note that we insist that a rational representation ρ be defined by regularfunctions defined everywhere on G. We nevertheless follow the tradition of usingthe adjective rational, which refers to the fact that the defining functions mayinvolve 1/ det(x) in addition to the matrix entries xij .

Example 9.1.14. 1. The identity map GLn → GLn = GL(Cn) makes V =Cn into a rational GLn-module. The second tensor power V ⊗2 is alsoa rational GLn-module. To verify this we note that the matrix entriesof g⊗2 relative to the basis ej ⊗ el, j, l = 1, . . . , n of V ⊗2, where the eiare the standard basis of Cn, depend polynomially (in fact, quadratically)on the matrix entries of g: (g⊗2)(ej ⊗ el) = (

∑i gijei) ⊗ (

∑k gklek) =∑

i,k gijgklei ⊗ ek.

2. For any integer k the map GLn → Gm, g 7→ det(g)k is a one-dimensionalrational representation. It restricts to a rational representation of anyZariski closed subgroup G of GLn. For k = 0 this one-dimensional repre-sentation is called the trivial representation of G.

3. The group homomorphism Ga → GL1 sending[1 b0 1

]to exp(b) is not

given by regular functions, hence not a rational representation. This re-flects a difference between the theories of algebraic groups and of Liegroups.

9.2. THE ALGEBRA OF REGULAR FUNCTIONS AS A REPRESENTATION59

4. For any algebraic group G given as a Zariski closed subgroup of GL(V )the space V is a rational G-module, which we will the defining module ofG.

Exercise 9.1.15. Determine all the one-dimensional rational representationsof the torus Tn. Hint: C[Tn] ∼= C[x±1

11 , . . . , x±1nn ], and a one-dimensional rational

representation ρ : Tn → GL1 = T1 is an element ρ of C[Tn] that, apart frombeing multiplicative, does not vanish anywhere on Tn.

Exercise 9.1.16. Show that Ga does not have any non-trivial one-dimensionalrational representations. Show also that the Ga-submodule Ce1 in the definingmodule C2 does not have a Ga-stable vector space complement in C2.

Remark 9.1.17. Like the second exterior power above, (multi-)linear algebraconstructions transform locally finite rational G-modules into others. In partic-ular, if ρ : G→ GL(V ) is a locally finite rational representation, then

1. if U is a ρ(G)-stable subspace of V , then the induced maps G → GL(U)and G→ GL(V/U) are locally finite rational representations;

2. if W is a second locally finite rational G-module, then V ⊕W and V ⊗Ware also locally finite rational G-modules;

3. if k is a natural number, then SkV is a locally finite rational G-module—indeed, it is a quotient of V ⊗k, and hence locally finite and rational bythe previous two constructions; etc.

We will never consider other representations of algebraic groups than locallyfinite rational ones. We will often drop these adjectives.

9.2 The algebra of regular functions as a repre-sentation

To any algebraic group G ⊆ GLn we have associated its algebra C[G] of regularfunctions. Consider the map λ : G→ GL[C[xij , 1/ det(x)]) defined by

(λ(g)f)(h) := f(g−1h) for all g ∈ G, h ∈ GLn.

As f is a polynomial function in the xij and 1/ det(x), the expression f(g−1h)is a polynomial in the matrix entries and the inverse determinant of g−1h andhence in the matrix entries gij , hij , i, j = 1, . . . , n and the inverse determinantsdet(g)−1,det(h)−1. In particular, for fixed g, the function λ(g)f is a regularfunction on GLn.

The map λ satisfies λ(1)f = f and λ(g1g2)f = λ(g1)λ(g2)f and λ(g)(f1 +f2) = λ(g)f1+λ(g)f2 and λ(g)(cf) = cλ(g)f and λ(g)(f1f2) = (λ(g)f1)(λ(g)f2).In other words, λ furnishes a representation of G by means of automorphismson C[GLn]. We claim that it is locally finite and rational. To see this letf ∈ C[GLn]. Then the expression f(g−1h) can be expanded as a finite sum


∑i fi(g)f ′i(h), where the fi, f ′i are polynomial functions in the entries and in-

verse determinant of g, h, respectively. This shows that for all g ∈ G the functionλ(g)f lies in the finite-dimensional subspace of C[GLn] spanned by the f ′i . Nowlet U ⊆ C[GLn] be the linear span of all λ(g)f, g ∈ G. This is a G-stable space,and finite-dimensional since it is contained in in the span of the f ′i . This provesthat C[GLn] is a locally finite G-module. Finally, let g1, . . . , gk be such that thef ′′j := λ(gj)f form a basis of U . Choose any projection π from the span of thef ′i onto U . Then we have, for all g ∈ G,

λ(g)f ′′j = π(∑i

fi(ggj)f ′i).

The right-hand side is a polynomial expression in g and its inverse determinant,hence G→ GL(U) is a regular map, so that C[GLn] is a rational G-module, asclaimed.

Exercise 9.2.1. Show that the ideal I(G) ⊆ GLn of the algebraic group G isstable under λ(G).

As a consequence of the above, and of the exercise, the algebra C[G] =C[GLn]/I(G) of regular functions on G is a locally finite, rational G-module.

Remark 9.2.2. The above concerns the action of G by left translation on GLnand on itself. A similar construction can, of course, be carried out for its actionby right translation.

Proposition 9.2.3. Let V be any finite-dimensional rational module for the al-gebraic group G whose dual V ∗ is generated, as a G-module, by a single element.Then there exists a G-equivariant embedding of V into C[G].

Proof. Suppose that f generates V ∗. Take the map ψ : V → C[G], v 7→ fv,where fv ∈ C[G] is defined by fv(h) = f(h−1v), h ∈ G—this is a regular mapsince V is a rational G-module. The map ψ is G-equivariant since

fgv(h) = f(h−1gv) = fv(g−1h) = (gfv)(h)

for all g, h ∈ G and v ∈ V . Moreover, ψ is linear and has trivial kernel—indeed,fv = 0 means that (hf)(v) = f(h−1v) = fv(h) = 0 for all h ∈ G so that, sincethe G-orbit of f spans V ∗, v must be zero.

Exercise 9.2.4. 1. Use the argument of the proof to show that every finite-dimensional rational G-module can be embedded, as a G-module, into adirect sum of finitely many copies of C[G].

2. Prove that every finite-dimensional rational Tn-module is a direct sum ofone-dimensional Tn-modules. Hint: analyse the Tn-module C[Tn].

We already knew that finite-dimensional representations of finite groups arecompletely reducible, and the preceding exercise shows that the same is true forfinite-dimensiona rational representations of Tn. Next week we will prove thatthis is true for a larger class of groups.

9.2. THE ALGEBRA OF REGULAR FUNCTIONS AS A REPRESENTATION61

Exercise 9.2.5. Let A be the algebra of (complexified) quaternions, that is,A = CI ⊕CJ ⊕CK ⊕CL where I is the identity element and the (associative)multiplication is determined by J2 = K2 = L2 = −I and JK = L. Provethat the automorphism group of A is isomorphic, as an algebraic group, to thesubgroup SO3 of O3 consisting of the matrices with determinant 1.

Chapter 10

Reductiveness

We want to extend techniques and theory that we first derived for finite groups—such as the Reynolds operator and Hilbert’s finiteness results—to algebraicgroups. In general this is not so easy, but for the class of linearly reductivegroups the theory is very rich and deep.

Recall the following definition from the lecture on representations.

Definition 10.0.6. A locally finite representation of a group G is called com-pletely reducible if it is the direct sum of finite dimensional irreducible represen-tations of G.

Exercise 10.0.7. Show that a locally finite G-module V is completely reducibleif every finite dimensional submodule of V is completely reducible.

Definition 10.0.8. An algebraic group G is called linearly reductive if eachfinite-dimensional rational representation of G is completely reducible.

It turns out that linearly reductive groups G can be characterised in a moreintrinsic manner, namely, that the unipotent radical of G is trivial. Groups withthis property can been classified by combinatorial data involving objects suchas root systems and lattices. However, we are not going to do so. In this courseit will suffice to have a collection of examples of linearly reductive groups, suchas the classical groups GLn, (S)On,SLn,Sp2n and the tori Tn, in addition tofinite groups, which are always linearly reductive.

In the case of finite groups linear reductiveness follows from the fact thatthe group leaves invariant a Hermitian inner product on any finite-dimensionalrepresentation. This is not true for the classical groups.

Exercise 10.0.9. Show that there is no Hermitian inner product on Cn thatsatisfies (gv|gw) = (v|w) for all g ∈ On (we take non-degenerateness as part ofthe definition of a Hermitian inner product).

However, a weaker statement is true, which still suffices to prove linear re-ductiveness of the classical groups.

63

64 CHAPTER 10. REDUCTIVENESS

Theorem 10.0.10. Suppose that G ⊆ GLn has the property that with eachelement g also the Hermitian conjugate g∗ = gT is in G. Then G is linearlyreductive.

For the proof it will be convenient to give a name to the property thatthe defining representation in this theorem has. Given a group G and a mapg 7→ g∗ from G into itself, a ∗-representation is by definition a finite-dimensionalrepresentation ρ : G→ GL(V ) together with a Hermitian inner product (.|.) onV such that for each g ∈ G we have ρ(g∗) = ρ(g)∗, where the latter map is theHermitian conjugate determined by (ρ(g)v|w) = (v|ρ(g)∗) for all v, w ∈ V . Wealso call V a ∗-module for G.

From the ∗-module V for G we can construct other ∗-modules. First, if Uis a G-submodule of V , then U⊥ is also a G-submodule of V . Indeed, for g ∈ Gand v ∈ U⊥ we have

(ρ(g)v|U) = (v|ρ(g)∗U) = (v|ρ(g∗)U) = (v|U) = {0}). (10.1)

Now both U and U⊥ inherit Hermitian inner products from V , and one readilychecks that (ρ(g)|U )∗ = ρ(g)∗|U = ρ(g∗)|U . This shows that U is a ∗-module forG, and so is U⊥. Furthermore, the natural map U⊥ → V/U is an isomorphismof G-modules, by means of which the Hermitian inner product on U⊥ can betransferred to a Hermitian inner product on V/U making the latter into a ∗-module for G.

Second, we have an inner product on End(V ) defined by (A|B) = tr(AB∗),for A,B ∈ End(V ). Now for g ∈ G and all A,B ∈ End(V ) we have

(ρ(g)Aρ(g)−1|B) = tr(ρ(g)Aρ(g)−1B∗)

= tr(A(ρ(g)∗B(ρ(g)∗)−1)∗) = (A|ρ(g∗)Bρ((g∗)−1)). (10.2)

This shows that End(V ) is a ∗-module with the action of G by conjugation onit.

Third, if U is another ∗-module for G, then U ⊕ V is a ∗-module for G withrespect to the direct sum inner product; and U ⊗ V is a ∗-module for G withrespect to the inner product determined by (u⊗v|u′⊗v′) = (u|u′)·(v|v′), u, u′ ∈U, v, v′ ∈ V .

Exercise 10.0.11. Let V be a ∗-module for G with respect to the inner product(·|·) on V .

• Let φ : V → V ∗ be the antilinear map given by φ(v) = (u 7→ (u|v)). Checkthat the G-module V ∗ with the inner product (φ(v)|φ(w)) := (w|v) is a∗-module for G.

• Consider the ∗-module End(V ). By the previous item, the dual moduleEnd(V )∗ is also a ∗-module for G. Check that the inner product andG-action on End(V )∗ coincide with those induced by the natural linearbijection End(V )→ End(V )∗ given by A 7→ (B 7→ tr(AB)).

65

Finally, a symmetric power Sk(End(V )∗) ∼= SkEnd(V ) is the quotient ofthe tensor power End(V )⊗k of V by some G-submodules, and hence, by theconstructions above, a ∗-module for G. This shows that every finite-dimensionalG-submodule of C[End(V )] = C[(xij)ij ] is a ∗-module for G. Now we are in aposition to prove the theorem.

Proof of Theorem 10.0.10. Let V := Cn be the standard G-module. By as-sumption it is a ∗-module for G with respect to the standard Hermitian innerproduct (v|w) =

∑i viwi on Cn and the map g 7→ g∗ that sends a matrix

to its standard Hermitian conjugate. Hence, by the above construction, allfinite-dimensional G-submodules of C[End(V )] are ∗-modules for G. In partic-ular, every finite-dimensional G-submodule of C[End(V )] = C[(xij)ij ] is com-pletely reducible—this follows by induction from the argument with U and U⊥

above. But then (every finite-dimensional G-submodule of) the slightly largeralgebra C[GL(V )] = C[(xij)ij , 1/det(x)] is also completely reducible; see theexercise below. As a consequence, also the quotient C[GL(V )]/I(G) = C[G]and direct sums of finitely many copies of C[G] are completely reducible intofinite-dimensional irreducible rationalG-modules. By Exercise 9.2.4 every finite-dimensional rational G-module W is a submodule of some direct sum of copiesof C[G], so W is completely reducible, as well.

Exercise 10.0.12. Let G be a closed subgroup of GL(V ) and suppose that Wis a finite-dimensional G-submodule of C[GL(V )]. Show that for the integer klarger enough, W ′ := det(x)kW ⊆ C[(xij)ij ], and thatW is completely reducibleif and only if W ′ is.

As an application of the ideas presented above, we classify the (finite dimen-sional) irreducible rational representations of SL2(C).

Proposition 10.0.13. Let V = C2 be the standard SL2(C)-module. Then foreach nonnegative integer k, the rational SL2(C)-module Sk(V ) is irreducible andevery irreducible rational SL2(C)-module is isomorphic to exactly one Sk(V ).

Proof. We will first show that the Sk(V ) are irreducible. Denote by x, y thestandard basis C2 and by (xiyk−i)ki=0 the induced basis of Sd(V ). Hence wehave (

a bc d

)xiyk−i = (ax+ cy)i(bx+ dy)k−i. (10.3)

Let U ⊆ Sk(V ) be an irreducible submodule. Take u =∑ki=0 cix

iyk−i ∈ U ,with maximal support (i.e. the coefficients ci that are nonzero). We claim thatall coefficients ci are nonzero.

Indeed, c0 6= 0, since otherwise we can replace u by(1 0λ 1

)u =

∑i

ci(x+ λy)iyk−i =∑i

d(λ)xiyk−i.

Since d0 and every di with ci 6= 0 are nonzero polynomials in λ, we can takeany λ not a root of these nonzero polynomials and obtain a vector in U with

66 CHAPTER 10. REDUCTIVENESS

larger support. Now since c0 6= 0, the vector ( 1 λ0 1 )u =

∑i cix

i(y + λx)k−i hasfull support for all but a finite number of λ.

Next, take µ0, . . . , µk ∈ C such that the numbers λi := µ2i are distinct and

nonzero. Then for every i the vector µk(µ 0

0 µ−1

)u =

∑i λ

icixiyk−i belongs to

U and these k + 1 vectors are linearly independent because the Vandermondematrix (λji )

ki,j=0 has nonzero determinant. This shows that U = Sk(V ).

To prove that we have found all irreducible rational SL2(C)-modules, itsuffices by Exercise 9.2.4 to show that C[SL2(C)] = C[End(V )]/(det − 1) de-composes into copies of the Sk(V ). Since C[End(V )] =

⊕d S

d(End(V )) is com-pletely reducible, it suffices to consider each Sd(End(V )). Because End(C2) ∼=S2(C2)⊕ C, we have

Sd(End(V )) = Sd(End(S2(V )⊕ C)) =d⊕i=0

Si(S2(V )). (10.4)

By exercise 10.0.14, we can decompose each Si(S2(V )) as required, finishing theproof.

Exercise 10.0.14. Let V = C2 be the standard SL2(C) module. Show thatfor every positive integer d, we have the following decomposition of the SL2(C)module Sd(S2(V )):

Sd(S2(V )) ∼=bd/2c⊕k=0

S2d−4k(V ). (10.5)

Hint: Let x, y be a basis of V and X = x2, Y = y2, Z = xy a basis of S2(V ).Show that the kernel of the homomorphism φ : Sd(S2(V )) → S2d(V ) definedby φ(XaY bZc) := (x2)a(y2)b(xy)c, is isomorphic to Sd−2(S2(V )).

Chapter 11

First FundamentalTheorems

11.1 Schur-Weyl Duality

Let V := Cn, G := GL(V ), and ρ : G → GL(V ⊗k) the diagonal representationof G. The latter space also furnishes a representation λ : Sk → GL(V ⊗k) of thesymmetric group, given by

λ(π)(v1 ⊗ · · · ⊗ vk) = vπ−1(1) ⊗ · · · ⊗ vπ−1(k), π ∈ Sk, v1, . . . , vk ∈ V.

Clearly ρ(g) and λ(π) commute for all g ∈ G and π ∈ Sk. Let R be theassociative algebra generated by all ρ(g), g ∈ G and let S be the associativealgebra generated by all λ(π), π ∈ Sk. Then every element of R commutes withevery element of S. The following theorem, due to Schur, shows that this alsocharacterises R in terms of S and vice versa.

Theorem 11.1.1 (Schur-Weyl Duality). We have

R = {a ∈ End(V ⊗k) | as = sa for all s ∈ S}

andS = {a ∈ End(V ⊗k) | ar = ra for all r ∈ R}.

Proof. We first prove the inclusion

R ⊇ {a ∈ End(V ⊗k) | as = sa for all s ∈ S} (11.1)

(the other inclusion being obvious), and then deduce the second statementfrom the Double Commutant Theorem. Note that the right-hand side is justEnd(V ⊗k)Sk , where Sk acts by conjugation via λ on End(V ⊗k). The map

End(V )k → End(V ⊗k), (a1, . . . , ak) 7→ a1 ⊗ · · · ⊗ ak

67

68 CHAPTER 11. FIRST FUNDAMENTAL THEOREMS

is k-linear and hence induces a linear map

End(V )⊗k → End(V ⊗k).

It is not hard to check that this latter map is bijective and Sk-equivariant. Henceit identifies the space End(V ⊗k)Sk with (End(V )⊗k)Sk of symmetric tensors inE⊗k where E := End(V ).

On the other hand, the image ρ(g) of g ∈ G in E⊗k is just g⊗k, and theleft-hand side, R, of (11.1) contains all these elements. Since R is closed and Gis dense in E and taking k-th tensor powers is continuous, R contains all tensorsof the form a⊗k with a ∈ E. Hence we have to prove that the symmetric tensorsin E⊗k are spanned by the tensors of the form a⊗k.

For this we forget that E is an algebra of linear maps, and claim that for anyfinite-dimensional vector space E the subspace of E⊗k consisting of symmetrictensors is spanned by all elements of the form a⊗k := a ⊗ · · · ⊗ a. We provethis indirectly: suppose that f is a linear function on the space of symmetrictensors in E⊗k that vanishes identically on a⊗k for all a ∈ E. Expanding a as∑mi=1 tiai, where a1, . . . , am is a basis of E, we find that∑

α∈Nm,Pi αi=k

tαf(∑

i1,...,ik

ai1 ⊗ · · · ⊗ aik) = 0,

where the second sum is over all k-tuples i1, . . . , ik ∈ [m] in which exactly αicopies of i occur for all i, and where the vanishing is identical in t1, . . . , tm. Butthen f must vanish on each such sum

∑i1,...,ik

ai1 ⊗ · · · ⊗ aik , and as α runsthrough all multi-indices with

∑i αi = k these sums run through a basis of the

space of symmetric tensors in E⊗k. This proves the claim, and hence the firststatement. Applying the following theorem to H = Sn and R and S as above,we obtain the second statement.

Theorem 11.1.2 (Double Commutant Theorem). Let H be a group and letλ : H → GL(W ) be a completely reducible representation of H. Let S be thelinear span of λ(H); this is a subalgebra of End(W ). Set

R := {a ∈ End(W ) | as = sa for all s ∈ S}.

Then, conversely, we have

S = {a ∈ End(W ) | ar = ra for all r ∈ R}.

Proof. Let w1, . . . , wp be a basis of W . Since W is a completely reducible H-module, so is W p, the Cartesian product of p copies of W . Let M be thesubmodule of W p generated by (w1, . . . , wp); it consists of all elements of theform (sw1, . . . , swp) where s runs through S. As W p is completely reducible,M has a direct complement U that is stable under H, and the map φ : W p →W p, m + u 7→ m, m ∈ M,u ∈ U is H-equivariant. This means that eachcomponent φij , obtained by precomposing φ with the inclusion of the j-th copy

11.2. THE FIRST FUNDAMENTAL THEOREM FOR GLN 69

of W and postcomposing with the projection onto the i-th copy of W , is anelement of R.

Now let a ∈ End(W ) be an element that commutes with all elements ofR. Then, since all φij are elements of R, the p-tuple (a, . . . , a) : W p → W p

commutes with φ. Hence

φ(aw1, . . . , awp) = (a, . . . , a)φ(w1, . . . , wp) = (aw1, . . . , awp)

where the last equality follows from (w1, . . . , wp) ∈M , so that φ leaves it invari-ant. The left-hand side is an element of imφ = M , hence equal to (sw1, . . . , swp)for some s ∈ S. As s and a agree on the basis w1, . . . , wp, we have s = a.

11.2 The First Fundamental Theorem for GLn

One of the fundamental contributions of German mathematician Hermann Weylto the field of invariant theory concerns the invariants of GLn on “p covectorsand q vectors”. To set the stage, set G := GLn and let V = Cn denote thestandard G-module. Consider the G-module U := (V ∗)p ⊕ V q, i.e., the directsum of p copies of V ∗ and q copies of the dual V . We want to determineC[U ]G, that is, the algebra of G-invariant regular functions on U . Now forevery i = 1, . . . , p and j = 1, . . . , q the function

φij : U → C, (f1, . . . , fp, v1, . . . , vq) 7→ fi(vj)

is G-invariant by definition of the G-action on V ∗: (gfi)(gvj) = fi(vj).

Theorem 11.2.1 (Weyl’s First Fundamental Theorem (FFT) for GLn). Thealgebra C[U ]G is generated by the functions φij.

Before proving this theorem we will give a geometric interpretation. In termsof matrices U is isomorphic to Mp,n ×Mn,q, where the action of g ∈ G is givenby g(A,B) = (Ag−1, gB)—the rows of A are the p covectors and the columnsof B the q vectors. Clearly the product A ·B equals the product Ag−1 · gB, andtherefore the compositions of the p · q entry functions on Mp,q with the productmap Mp,n ×Mn,q →Mp,q are invariant regular functions. The theorem impliesthat they generate the algebra of invariants on Mp,n×Mn,q, so that the productmap Mp,n ×Mn,q →Mp,q is a quotient map in the sense of Chapter 6.

Proof. The theorem is a relatively easy consequence of Schur-Weyl duality andthe complete reducibility of rational represations of GLn. However, you need a“tensor flexible mind” to follow the proof; here it goes. We want to determine theG-invariant homogeneous polynomials of degree d on U . Those are the elementsof (SdU∗)G. By complete reducibility, the map ((U∗)⊗d)G → (SdU∗)G is sur-jective, so we concentrate on the invariant tensors in (U∗)⊗d = (V p⊕ (V ∗)q)⊗d.Denote the p copies of V by U1, . . . , Up and the q copies of V ∗ by U1′ , . . . , Uq′ .The tensor product (V p ⊕ (V ∗)q)⊗d decomposes, as a G-module, into the di-rect sum of tensor products Ui1 ⊗ · · · ⊗ Uid over all d-tuples (i1, . . . , id) ∈


{1, . . . , p, 1′, . . . , q′}. Each such product is isomorphic to V ⊗k ⊗ (V ∗)⊗d−k forsome k. Now by Exercise 11.2.2 below the space of G-invariants in this space iszero if k 6= d−k. If d = k, on the other hand, then this space is just End(V )⊗k,and Schur-Weyl duality tells us that the space of G-invariant elements is spannedby the image of the symmetric group.

Finally, we need to reinterpret these invariant tensors in terms of the φij .This is best done by an example: suppose that p = 2, q = 3, d = 4, i1 = i2 = 1and i3 = 1′ and i4 = 3′. Then U1 ⊗ U1 ⊗ U1′ ⊗ U3′ = V ⊗ V ⊗ V ∗ ⊗ V ∗. Nowthe elements of S2 correspond to the two ways of pairing the copies of V withthe copies of V ∗, and both invariant tensors project to the polynomial functionφ11φ13. On the other hand, if we take i1 = 1, i2 = 2, and leave the remainingvalues unchanged, then we obtain exactly the two polynomial functions φ11φ23

and φ13φ21.

Exercise 11.2.2. Show that V ⊗k⊗(V ∗)⊗l has no non-zeroG-invariant elementsunless k = l.

Exercise 11.2.3. By the proof above, the space [V ⊗k ⊗ (V ∗)⊗k]GL(V ) has di-mension at most k!. Show that for k fixed and n sufficiently large this dimensionis, indeed, k!.

Remark 11.2.4. Weyl’s Second Fundamental Theorem for GLn describes theideals of relations among the functions φij . The image of the quotient mapconsists of all p × q-matrices of the form AB with A ∈ Mp,n and B ∈ Mn,q.If n ≥ min{p, q}, then all p × q-matrices are of this form, and the φij arealgebraically independent. If n < p, q, then only the matrices of rank at most nare of the form AB. In this latter cases the n× n-subdeterminants (or minors)of the matrix (φij)ij are elements of the ideal. Weyl’s theorem says that theseminors generate the ideal.

11.3 First Fundamental Theorem for On

For applications to graph invariants we will need an analogue of Theorem 11.2.1for the orthogonal group. So consider the defining On-module V = Cn, andlet (.|.) be the standard symmetric bilinear form on V defined by (v|w) =∑ni=1 viwi. By definition we have (gv|gw) = (v|w) for all v, w ∈ V and g ∈

G := On. Now it is unnecessary to distinguish between vectors and co-vectors,because the map v 7→ (.|v) is an isomorphism of G-modules from V to V ∗. Henceconsider the G-module U := V p. We want to know the G-invariant polynomialson U . For all i, j = 1, . . . , p define ψij : U → C by

ψij(v1, . . . , vp) = (vi|vj).

This is a G-invariant function.

Theorem 11.3.1 (FFT for On, polynomial version). The functions ψij , 1 ≤i ≤ j ≤ p generate the invariant ring C[U ]On .

11.3. FIRST FUNDAMENTAL THEOREM FOR ON 71

Remark 11.3.2. Geometrically, this means that the map sending a matrix A ∈Mn,p to ATA is the quotient map for the action of On on Mn,p by multiplicationfrom the right. The image consists of all symmetric p × p-matrices of rank atmost n. Weyl’s Second Fundamental Theorem for On states that the ideal of thisimage is generated by the (n+ 1)× (n+ 1)-minors, in addition to the equationsexpressing that the matrix be symmetric.

Like in the case of GLn, this first fundamental theorem follows from thefollowing tensor variant. Let β ∈ V ⊗ V be the element corresponding to theidentity under the On-isomorphism V ⊗V → V ⊗V ∗ given by the bilinear form(.|.); equivalently, β equals

∑ni=1 ei ⊗ ei for any orthonormal basis e1, . . . , en of

V . Then β⊗k ∈ V ⊗2k is an On-invariant tensor. Moreover, for any permutationπ ∈ S2k acting on the 2k factors in V ⊗2k the tensor πβ⊗k is On-invariant, aswell.

Theorem 11.3.3 (FFT for On, tensor version). Let d be a natural number.The space (V ⊗d)On is trivial if d is odd, and equal to the span of all tensors ofthe form πβ⊗k, π ∈ S2k if d = 2k.

Exercise 11.3.4. Prove the polynomial version of the FFT for On, assumingthe tensor version.

The tensor version, in turn, follows from a double commutant argument,where the so-called Brauer algebra plays the role of the algebra of linear mapsfrom V ⊗k to itself commuting with all elements of On. The arguments areslightly technical, and we do not treat them here.

Exercise 11.3.5. Prove directly, without using the FFT, that C[V ]On does notcontain non-zero polynomials of odd degrees.

Exercise 11.3.6. Let m,n be natural numbers. Prove that the dimension of((S2(Cn))⊗m)On is at most the number of undirected graphs on the vertices1, . . . ,m in which every vertex has valency 2, i.e., is incident to exactly 2 edges(here a loop on a vertex is allowed and counts as valency 2), and that equalityholds for n sufficiently large.

Exercise 11.3.7. In the context of this exercise, a graph is a pair G = (V,E)where V (“vertices”) is a set and E (“edges”) is a multi-set of unordered pairs{u, v} with u, v ∈ V . Here u is allowed to equal v (“loops are allowed”), andtwo vertices may be connected by multiple edges (whence the term “multi-set”).Isomorphisms of such graphs are defined in the obvious manner. The graph Gis said to be k-regular if for every vertex v the number of edges containing v isk; here the loops from v to itself count twice. See Figure 11.1 for an example.

Let m,n be any natural numbers. Prove that the number of isomorphismclasses of k-regular graphs on m vertices is an upper bound to the dimension ofthe space (Sm(SkCn))On of On-invariants in Sm(SkCn).


Figure 11.1: A 3-regular graph on 4 vertices.

Chapter 12

Phylogenetic tree models

12.1 Introduction

In phylogenetics one tries to reconstruct a phylogenetic tree from genetic dataof n species alive today. These data take the form of strings over the alphabet{A,C,G,T} of nucleotides. So the input may look like the one in table 12.1,and and on the basis of this input one wants to decide between the three treesin Figure 12.1.

Remark 12.1.1. In fact, the DNA-strings of the n species may have differentlengths due to insertions and deletions. We assume that the data has beenpreprocessed—aligned is the technical term—so that we can compare the nu-cleotides at individual sites, i.e., so that it makes sense to compare the entrieswithin each column.

There are many approaches to this reconstruction problem. We will treatone of them, which involves a statistical model and its analysis using invarianttheory. In Section 12.2 we describe the statistical model, in Section 12.3 wetranslate the model into tensor calculus, and in Section 12.4 we describe theequations of the model.

12.2 The statistical model

Definition 12.2.1. A tree is a finite, undirected graph without loops and with-out multiple edges and without cycles. A vertex incident with exactly one edge

Human A A C G A C G A T T AChimp A A T G A G G A C T AGorilla A G T G A A G A C T G

Table 12.1: Fictional strings of DNA for three species.

73

74 CHAPTER 12. PHYLOGENETIC TREE MODELS

H C G C H G G C H

Figure 12.1: Possible trees for three species.

of T is called a leaf. Other vertices are called internal vertices. A rooted treeis a pair (T, r) of a tree and a specified vertex r of T , called the root. We willsometimes consider a rooted tree (T, r) as a directed graph, in which we directall edges away from the root.

Fix a natural number n and a rooted tree with n leaves. Fix a finite setB, called the alphabet; B will typically consist of the letters A,C,G,T, or, insimplified binary models, of the digits 0 and 1. For every directed edge p → qin T a fix a stochastic transition matrix Aqp with rows and columns labelled byB; stochastic means that Aqp has non-negative real entries and column sumsequal to 1. The tuple A := (Aqp)p→q gives rise to a probability distribution onBleaf(T ) as follows: the probability of f ∈ Bleaf(T ) equals

PT,A(f) :=1|B|

∑h∈Bvertex(T ):h|leaf(T )=f

∏p→q

Aqp(h(q), h(p)).

Here the sum is over all possible histories, i.e., assignments of letters of thealphabet to the vertices of T that on leafs coincide with the assignment of f(p)to p ∈ leaf(T ).

The bio-statistical interpretation is as follows. Every vertex of T correspondsto a species, but only the leaves correspond to species alive today. The rootcorresponds to a distant common ancestor, in whose DNA all elements of thealphabet B are equally likely—whence the factor 1/|B|.Remark 12.2.2. There are variations of this set-up where one allows to varythis root distribution, as well.

From the root the nucleotides start to mutate, and the fundamental as-sumption is that they do so independently and according to the same statisticalprocess. This process is modelled as follows: if a nucleotide equals b ∈ B in thespecies corresponding to vertex p, and if p → q is an arrow in T , then the nu-cleotide mutates to b′ ∈ B in species q with probability Aqp(b′, b). This processyields an n-tuple of elements of B, corresponding to a column in the geneticdata. Repeating the process N times leads to a leaf(T ) ×N = n ×N -array ofelements of B, of which the columns are independent draws from the probabilitydistribution PT,A.

Conversely, given a leaf(T ) × N -array of elements of B, we want to testwhether it is likely that the array corresponds to the tree T . To this end wesummarise the array in a so-called empirical distribution on Bleaf(T ).

12.3. A TENSOR DESCRIPTION 75

Example 12.2.3. In Table 12.1 we would have P (AAA) = 3/11, P (AAG) =2/11, P (CTT ) = 1/11,, etc.

The central question is: given the empirical distribution and the rooted tree(T, r), is this empirical distribution “equal to” PT,A for some tuple A of transi-tion matrices? Here the rooted tree T is assumed to be fixed, but the parameterA is allowed to vary. Equality should not be taken literally: “close to” in somesuitable sense suffices.

There are two fundamental approaches to this question. First, one cantry to find a tuple A that maximises the likelihood of the observed data andcompare the empirical distribution with PA. Second, and for this lecture moreimportantly, one can try to find equations that the |B|n coordinates of theprobability distribution PT,A mst satisfy regardless of the parameter A, andtest these equations on the empirical distribution. This leads to the followingfundamental problem: describe all polynomials in |B|n variables that vanish onall distributions PT,A; here T is fixed but A varies. It is this problem that weaddress in Section 12.4. 12.4

12.3 A tensor description

We change the discussion above into convenient tensor language. Let V be thevector space V := CB consisting of formal linear combinations of the alphabetV . We equip the space V with the symmetric bilinear form given by (u|v) :=∑b ubvb, where ub, vb are the coefficients of b in u, v, respectively. This bilinear

form allows us two identify V with V ∗ and V ⊗ V with V ⊗ V ∗ = End(V ), etc.Let T be a tree; in the description here the root is irrelevant. Associate to

every vertex p a copy Vp of V . A representation of T is by definition a tupleA = (Aqp)q∼p consisting of a tensor Aqp ∈ Vq ⊗ Vp for every ordered pair (q, p)of vertices that are connected in T , subject to the condition that Apq = ATqp,where the transpose denotes the isomorphism Vq ⊗Vp → Vp⊗Vq. The set of allrepresentations of T is denoted rep(T ).

Next we define a polynomial map ψT : rep(T ) → ⊗p∈leaf(T ) as follows:

A ∈ rep(T ) is first mapped to the tensor product of all its components Apq,where we take only one tensor per edge (so not both Apq and Aqp). The productis a tensor that lives in the space⊗

p∈vertex(T )

V ⊗{q∼p}p

Next we pair this tensor with the tensor⊗p∈internal(T )

∑b∈B

b⊗{q∼p}

using the bilinear form. The result is a tensor ΨT (A) that lives in the space⊗p∈leaf(T ) Vp.


P

Q R

S

W U

a b c d

e

f

g

ΨT (P, Q, R, S, U,W ) =∑a,b,c,d∈B(

∑e,f,g∈B PefQaeRbeSgfWcgUdg)a⊗ b⊗ c⊗ d

Figure 12.2: A tree T with a representation and its image.

Remark 12.3.1. The pairing that we use is defined as follows. If Ui, i ∈ I isa finite collection of vector spaces, and if J ⊆ I, then there is a natural bilinearmap

(⊗i∈I

Ui)× (⊗j∈J

U∗j )→⊗i∈I\J

Ui

which on pure tensors maps (⊗

i∈I ui,⊗

j∈J fj) to∏j∈J fj(uj) ·

⊗i∈J\I ui.

Above we in addition identify Vp with V ∗p by the bilinear form (.|.).

Example 12.3.2. Figure 12.2 depicts a tree T with a representation. Here thetensor P equals

∑e,f∈B Pefe ⊗ f , where e is thought of as lying in the space

Vq corresponding to the lower vertex and f is thought of as lying in the spacecorresponding to the upper vertex.

Note the resemblence with the statistical model: if we restrict ourselves tocertain real (and “stochastic”) tensors Aqp, then the coefficient of

⊗p∈leaf(T )f(p)

in ΨT (A) is exactly |B| times PT,A(f) for any f ∈ Bleaf(T ). It can be shownthat if one is interested in finding all polynomial equations satisfied by theseprobability distributions, one may forget the realness conditions and, to a largeextent, also the stochastic conditions. We are therefore led to the followingcentral question.

What is the ideal of the variety XT := {ΨT (A) | A ∈ rep(T )} ⊆⊗p∈leaf(T ) Vp? Here the bar denotes Zariski closure.

Exercise 12.3.3. Prove that XT is stable under∏p∈leaf(T ) GL(Vp).

To answer this question, we prove two fundamental lemmas.

Lemma 12.3.4. Let T be the tree obtained from T ′ by subdividing an edge q ∼ pinto two edges q ∼ r and r ∼ p. Then XT = XT ′ .

12.3. A TENSOR DESCRIPTION 77

r

q pr

q pr

T Tq Tp

Figure 12.3: Cutting a tree into two parts.

Proof. A representation A of T can be made into a representation A′ of T ′ bysetting

A′st :=

Ast if at least one of s, t is not in {p, q, r};Aqp if s = q, t = r; and∑b∈B b⊗ b(= I) if s = r, t = p

and one readily verifies that ΨT ′(A′) = ΨT (A). Similarly, a representation A′

of T can be made into a representation A′ of T by setting Aqp := A′qrA′rp, where

we have identified A′qr with a map from Vr to Vq and A′rp with a map from Vpto Vr. Again, we have ΨT (A) = ΨT ′(A′).

The next lemma expresses XT in terms of certain sub-trees. Let r be avertex of T with two neighbours q and p. Deleting r and its edges from Tyields two connected components; see Figure 12.3. Re-attaching r to q and top in these components yields trees Tq and Tp, respectively, both having r asa leaf. Identify XTq ⊆

⊗s∈leaf(Tq)

Vs with a variety of linear maps in Vr →⊗s∈leaf(Tq)−r Vs by means of the bilinear form, and XTp with a variety of linear

maps⊗

s∈leaf(Tp)−r Vs → Vr.

Lemma 12.3.5. Under this identification we have XT = XTq ·XTp .

Proof. It is most instructive to do this in Example 12.3.2.

Now by Exercise 12.3.3 the variety XTq of linear maps Vr →⊗

s∈leaf(Tq)−r Vsis closed under precomposing with elements from GL(Vr), and similarly, thevariety XTp of linear maps

⊗s∈leaf(Tp)−r Vs → Vr is closed under postcomposing

with elements from GL(Vr). We are therefore in the situation of the followingsection.


12.4 Equations of the model

We start with a fundamental proposition. Let k, l,m be natural numbers, andfor subvarieties V ⊆Mkl and W ⊆Mlm define

V ·W := {CD ∈Mkm | C ∈ V, D ∈W}.where the bar denotes Zariski-closure.

Proposition 12.4.1. Assume that V is stable under the action of GLl by mul-tiplication from the right and that W is stable under the action of GLl by mul-tiplication from the left. Then

I(V ·W ) = I(V ·Mlm) + I(Mkl ·W ).

Proof. The inclusion ⊇ is clear: if a polynomial vanishes on V ·Mlm, then itcertainly vanishes on the subset V ·W , and similarly for I(Mkl ·W ). For theinclusion ⊆ let f ∈ C[Mkm] be a function that vanishes on V ·W , and defineh : Mkl ×Mlm → C by

h(C,D) := f(CD).

This polynomial has the following properties:

1. h vanishes on V ×W ⊆Mkl ×Mlm; and

2. h(Cg−1, gD) = h(C,D) for all g ∈ GLl.

By the first property h can be written as h1 + h2, where h1 ∈ I(V ×Mlm) andh2 ∈ I(Mkl ×W ) (see Exercise ?? below). The second property expresses thath is GLl-invariant. Since Mkl ×W and V ×Mlm are GLl-stable subvarietiesof Mkl ×Mlm, their ideals are also GLl-stable, and therefore stable under theReynolds operator for GLl. (For the existence of a Reynolds operator we usethat GLl acts completely reducibly in rational representations; see Chapter ??.)Hence when applying the Reynolds operator to both sides of h = h1 + h2 thepolynomial h remains unchanged, and h1, h2 are projected onto GLl-invariantelements of I(V ×Mlm) and of I(Mkl × W ), respectively; we may thereforeassume from the beginning that h1 and h2 are GLl-invariant. Then, by theFFT for GLl, h1 and h2 factor through the product map Mkl ×Mlm → Mkm,that is, there exist h1, h2 ∈ C[Mkm] such that hi(C,D) = hi(CD). Since h1

vanishes on V × Mlm, we have h1 ∈ I(V · Mlm); and since h2 vanishes onMkl ×W , we have h2 ∈ I(Mkl ·W ). Moreover, we have

f(CD) = h(C,D) = h1(C,D) + h2(C,D) = h1(CD) + h2(CD)

for all C ∈Mkl and D ∈Mlm. Therefore

f ′ := f − h1 − h2

vanishes on all matrices that factorise as CD, C ∈Mkl, D ∈Mlm. In particular,f ′ lies in I(V ·Mlm)—and, for that matter, also in I(Mkl · W ). Hence f ∈I(V ·Mlm +Mkl ·W ), as claimed.

12.4. EQUATIONS OF THE MODEL 79

Lemma 12.4.2. Let V ⊆ Cs,W ⊆ Ct be affine varieties. Then the Cartesianproduct V ×W ⊆ Cs+t is an affine variety, and its ideal satisfies

I(V ×W ) = I(V × Ct) + I(Cs ×W ).

Proof. Denote by x = (x1, . . . , xs) the coordinates on Cs and by y = (y1, . . . , yt)the coordinates on Ct. Then the bilinear product map C[x]×C[y]→ C[x, y], (f, g) 7→fg gives rise to a linear isomorphism C[x]⊗C[y]→ C[x, y]. Now I(V ×Ct) is justthe image of I(V )⊗C[y] under this isomorphism, and similarly for I(Cs ×W )(check this). We will identify C[x]⊗ C[y] with C[x, y] in what follows.

Every element of the right-hand side vanishes on V ×W . Conversely, let(a, b) ∈ Cs+t be a point that does not lie on V ×W ; without loss of generality,assume that a 6∈ V . Then there exists a polynomial f ∈ I(V ) that does notvanish on a. Then f ⊗ 1 ∈ C[x, y] does not vanish on (a, b). This proves thatthe set of points in Cs+t on which the elements of the right-hand side vanishare exactly the points of V ×W ; hence this set is an affine variety.

Now let P be a vector space complement of I(V ) in C[x] and let Q be avector space complement of I(W ) in C[y]. Then the restriction map C[x] →{functions on V }, f 7→ f |V is injective on P , and similarly for Q and W . Theideal I(V ×W ) is by definition the kernel of the restriction map

π : C[x]⊗ C[y]→ {functions on V ×W}.

The left-hand side splits as a direct sum

(I(V )⊗ I(W ))⊕ (I(V )⊗Q)⊕ (P ⊗ I(W ))⊕ (P ⊗Q),

and the first three summands lie in the kernel of π. To prove that they span thekernel it suffices to prove that π is injective on P ⊗Q. Consider, therefore, anelement f of the latter space such that π(f) = 0. We can write f as

∑ki=1 pi⊗qi

where q1, . . . , qk are linearly independent elements of Q. For all v ∈ V thefunction

k∑i=1

pi(v)qi ∈ Q

vanishes identically on W , and is therefore the zero element of C[y] by definitionof Q. As the qi are linearly independent, this means all the pi ∈ P vanishidentically on V . Hence they are zero in C[x], and f = 0, as claimed.

Finally, we need to describe I(V ·Mlm), and, similarly, I(Mkl ·W ).

Exercise 12.4.3. Let V be a variety in Mkl which is stable under GLl. For afunction f ∈ I(V ) and an arbitrary matrix Q ∈Mml let φf,Q ∈ C[Mkm] denotethe function defined by

φf,Q(P ) = f(PQ).

1. Prove that φf,Q ∈ I(V ·Mlm) for all f ∈ I(V ) and Q ∈Mml.


1

2 3

Figure 12.4: A star with three leaves.

2. Let J ⊆ C[Mkm] be the ideal generated by all φf,Q for f ∈ I(V ) andQ ∈ Mml. Given generators f1, . . . , fk of I(V ), how would you obtain afinite set of polynomials that generate J?

Proposition 12.4.4. If V ⊆ Mkl is a GLl-stable variety, then the ideal I(V ·Mlm) ⊆ C[Mkm] is generated by the ideal J of the preceding exercise togetherwith all (l + 1)× (l + 1)-minors of a k ×m-matrix.

We will not prove this proposition. In fact, it follows from a slight generali-sation of Proposition 12.4.1 to schemes rather than varieties, together with theSecond Fundamental Theorem for GLl. We conclude with the following.

Theorem 12.4.5. If one knows the ideals of XT for all T that are stars (thatis, having at most one vertex with valency strictly larger than 1), then one cancompute the ideals of XT for all T .

The computation roughly goes as follows: if T is a star, we are done. Oth-erwise, if necessary, subdivide an internal edge of T to create a degree-2 vertexr. This does not change XT . Cut the tree open at r into Tq and Tp. Compute,recursively, the ideals of XTq and XTp , and then use XT = XTq ·XTp and thepropositions 12.4.1 and 12.4.4 to find equations for XT .

Exercise 12.4.6 (The binary case.). In this exercise B = {0, 1}. Consider astar S with 3 leaves labelled 1, 2, 3; see Figure 12.4.

1. Verify that

XS = {u1 ⊗ u2 ⊗ u3 + v1 ⊗ v2 ⊗ v3 | u1, . . . , v3 ∈ C2} ⊆ C2 ⊗ C2 ⊗ C2.

2. Prove that, in fact, all ω ∈ (C2)⊗3 in a dense open subset are sums oftwo pure tensors u1 ⊗ u2 ⊗ u3. Hint: think of ω as a linear map fromC2 to the space M2 of 2 × 2-matrices over C. Distinguish two cases: thecase where the image of this map is 1-dimensional, and the case wherethe image of this map is 2-dimensional. To settle this latter case, give anon-zero polynomial function f on the space of linear maps C2 →M2(C)such that f(ω) 6= 0 implies that ω is the sum of two pure tensors (theopposite implication will not hold).

12.4. EQUATIONS OF THE MODEL 81

3. The preceding part shows that I(XS) = 0. Fix a general tree T in whichall non-leafs have exactly three neighbours. Every edge of T gives rise toa bipartition leaf(T ) = L1 ∪L2 into leafs on one side of the edge and leafson the other side of the edge. A tensor ω ∈⊗p∈leaf(T ) Vp can be regardedas a linear map

⊗p∈L1

Vp →⊗

p∈L2Vp. The 3 × 3-minors of this linear

map are elements of XT .

Use the preceding part, Proposition 12.4.4, and induction to prove thatthese minors generate I(XT ).

invariant theory with applicationsjdraisma/teaching/invtheory0910/lecture...chapter 1 lecture 1....

Documents