graph neural networks · 2021. 3. 25. · graph neural networks graph deep learning 2021 - lecture...

35
Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

Upload: others

Post on 08-Apr-2021

23 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

Graph Neural Networks

Graph Deep Learning 2021 - Lecture 2

Daniele Grattarola

March 1, 2021

Page 2: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

Roadmap

Things we are going to cover:

• Practical introduction to GNNs

• Message passing

• Advanced GNNs (attention, edge attributes)

• Demo

After the break:

• Spectral graph theory and GNNs

1

Page 3: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

Recall: what are we doing?

Page 4: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

From CNNs to GNNs

0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.01.0

0.5

0.0

0.5

1.0

0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.01.0

0.5

0.0

0.5

1.0

• The receptive field of aCNN reflects theunderlying grid structure.

• The CNN has an inductivebias on how to process theindividualpixels/timesteps/nodes.

2

Page 5: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

From CNNs to GNNs

• Drop assumptions about underlying structure: it is now aninput of the problem.

• The only thing we know: the representation of a nodedepends on its neighbors.

3

Page 6: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

Discrete Convolution

1 2 3 4 5 6 1 0 1 4 6 8 10* =

Discrete convolution:

(f ? g)[n] =M∑

m=−M

f [n −m]g [m]

Problems:

• Variable degree of nodes

• Orientation

4

Page 7: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

Notation recap

• Graph: nodes connected by edges;

• X = [x1, . . . , xN ], xi ∈ RF , node attributes or “graph signal”;

• eij ∈ RS , edge attribute for edge i → j ;

• A, N × N adjacency matrix;

• D = diag([d1, . . . , dN ]), diagonal degree matrix;

• L = D− A, Laplacian;

• An = D−1/2AD−1/2, normalized adjacency matrix;

• Reference operator R: rij 6= 0 if ∃i → j

NOTE: all matrices are symmetric.

0 2 4 6 8

0

2

4

6

8

x1

x2 x3

x4

e12 e13

e14

5

Page 8: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

A quick recipe for a local learnable filter

Applying R to graph signal X is a local action:

(RX)i =N∑j=1

rij · xj =∑

j∈N (i)

rij · xj

Instead of having a different weight for each neighbor, we shareweights among nodes in the same neighborhood:

X′ = RXΘ

where Θ ∈ RF×F ′.

x1

x2 x3

x4

r12 r13

r14

6

Page 9: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

Powers of R

Let’s consider the effect of applying R2 to X:

(RRX)i =∑

j∈N (i)

rij(RX)j =∑

j∈N (i)

∑k∈N (j)

rij · rjk · xk

Key idea: by applying RK we read from the K -thorder neighborhood of a node.

0 2 4 6 80

2

4

6

8

K = 10 2 4 6 8

0

2

4

6

8

K = 20 2 4 6 8

0

2

4

6

8

K = 30 2 4 6 8

0

2

4

6

8

K = 4

K = 0

K = 1

K = 2

7

Page 10: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

Polynomials of R

To cover all neighbors of order 0 to K , we can just takea polynomial with weights Θ(k):

X′ = σ( K∑

k=0

RkXΘ(k))

Θ(0)

Θ(1)

Θ(2)

8

Page 11: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

Chebyshev Polynomials [1]

A recursive definition using Chebyshev polynomials:

T(0) = I

T(1) = L

T(k) = 2 · L · T(k−1) − T(k−2)

Where L =2Lnλmax

− I and Ln = I− D−1/2AD−1/2

Layer: X′ = σ( K∑

k=0T(k)XΘ(k)

)1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00

x

1.00

0.75

0.50

0.25

0.00

0.25

0.50

0.75

1.00

T(n) (x

)

T(1)

T(2)

T(3)

T(4)

T(5)

T(6)

[1] M. Defferrard et al., “Convolutional neural networks on graphs with fast localized spectral filtering,” 2016.

9

Page 12: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

Graph Convolutional Networks [2]

Polynomial of order K → K layers of order 1;

Three simplifications:

1. λmax = 2 → L =2Lnλmax

− I = −D−1/2AD−1/2 = −An

2. K = 1 → X′ = XΘ(0) − AnXΘ(1)

3. Θ = Θ(0) = −Θ(1)

Layer: X′ = σ(

(I + An︸ ︷︷ ︸A

)XΘ)

= σ(AXΘ

)For stability: A = D−1/2(I + A)D−1/2

x1

x2 x3

x4

a12 a13

a14

[2] T. N. Kipf et al., “Semi-supervised classification with graph convolutional networks,” 2016.

10

Page 13: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

A General Paradigm

Page 14: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

Message Passing Neural Networks [3]

x1

x2 x3

x4

e21 e31

e41

Graph.

m1

m21 m31

m41

e21 e31

e41

Messages.

x′1

m21 m31

m41

e21 e31

e41

Propagation.

[3] J. Gilmer et al., “Neural message passing for quantum chemistry,” 2017.

11

Page 15: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

Message Passing Neural Networks [3]

A general scheme for message-passing networks:

x′i = γ(xi ,�j∈N (i) φ (xi , xj , eji )

),

• φ: message function, depends on xi , xj and possibly theedge attribute eji (we call messages mji );

• �j∈N (i): aggregation function (sum, average, max, orsomething else...);

• γ: update function, final transformation to obtain newattributes after aggregating messages.

x′1

m21 m31

m41

e21 e31

e41

[3] J. Gilmer et al., “Neural message passing for quantum chemistry,” 2017.

12

Page 16: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

Models

Page 17: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

Graph Attention Networks [4]

1. Update the node features: hi = Θf xi withΘf ∈ RF ′×F .

2. Compute attention logits: eij = σ(θ>a [hi ‖ hj ]

),

with θa ∈ R2F ′.1

3. Normalize with Softmax:

aij = softmaxj(eij) =exp (eij)∑

k∈N (i)

exp (eik)

4. Propagate using the attention coefficients:

x′i =∑

j∈N (i)

aijhj

h1

h2 h3

h4

a21 a31

a41

1‖ indicates concatenation[4] P. Velickovic et al., “Graph attention networks,” 2017.

13

Page 18: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

Edge-Conditioned Convolution [5]

Key idea: incorporate edge attributes into the messages.

Consider a MLP φ : RS → RFF ′called a filter

generating network:

Θ(ji) = reshape(φ(eji ))

Use the edge-dependent weights to compute messages:

x′i = Θ(i)xi +∑

j∈N (i)

Θ(ji)xj + b

m1

m21 m31

m41

e21 e31

e41

[5] M. Simonovsky et al., “Dynamic edge-conditioned filters in convolutional neural networks on graphs,” 2017.

14

Page 19: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

So which GNN do I use?

GCNConvKipf & Welling

ChebConvDefferrard et al.

GraphSageConvHamilton et al.

ARMAConvBianchi et al.

ECCConvSimonovsky & Komodakis

GATConvVelickovic et al.

GCSConvBianchi et al.

APPNPConvKlicpera et al.

GINConvXu et al.

DiffusionConvLi et al.

GatedGraphConvLi et al.

AGNNConvThekumparampil et al.

TAGConvDu et al.

CrystalConvXie & Grossman

EdgeConvWang et al.

MessagePassingGilmer et al.

15

Page 20: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

A Good Recipe [6]

Message passing scheme:

• Message: mji = PReLU (BatchNorm (Θxj + b))

• Aggregate: magg =∑

j∈N (i)

mji

• Update: x′ = x || magg;

Architecture:

• Pre- and post-process node features using 2-layer MLPs;

• 4-6 message passing steps;

2-layer MLP

Message Passing

Message Passing

Message Passing

Message Passing

2-layer MLP

[6] J. You et al., “Design space for graph neural networks,” 2020.

16

Page 21: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

How do we use this?

Node-level learning.(e.g., social networks)

Graph-level learning.(e.g., molecules)

17

Page 22: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

GNN libraries

18

Page 23: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

Graph Convolution

Page 24: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

Discrete Convolution

1 2 3 4 5 6 1 0 1 4 6 8 10* =Recall: CNNs compute a discreteconvolution

(f ? g)[n] =M∑

m=−M

f [n −m]g [m] (1)

19

Page 25: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

Convolution Theorem

Given two functions f and g , their convolution f ? g can be expressed as:

f ? g = F−1 {F {f } · F {g}} (2)

Where F is the Fourier transform and F−1 its inverse.

Can we use this major property?

20

Page 26: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

What is the Fourier transform?

Key intuition – we are representing a function in a different basis.

F{f }[k] = f [k] =N−1∑n=0

f [n]e−i2πN kn

F−1{f }[n] = f [n] =1N

N−1∑k=0

f [k]e i2πN kn

= + +

21

Page 27: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

From FT to GFT

The eigenvectors of the Laplacian for a path graph canbe obtained analytically:

uk [n] =

1, for k = 0

e iπ(k+1)n/N , for odd k , k < N − 1

e−iπkn/N , for even k , k > 0

cos(πn), for odd k , k = N − 1

Looks familiar?

5 10 15 20

u 1

5 10 15 20

u 2

5 10 15 20

u 4

22

Page 28: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

From FT to GFT

Convolution

FourierTransform

Path Graph

Laplacian

FourierEigenbasis

• Drop the “grid” assumption

• Replace e−i2πN kn with generic uk [n]:

FG {f } [k] =N−1∑n=0

f [n]uk [n]

• GFT: FG{f } = f = U>f ;

• IGFT: F−1G {f } = f = Uf

23

Page 29: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

Graph Convolution

Recall:

• Convolution theorem: f ? g = F−1 {F {f } · F {g}}

• Spectral theorem: L = UΛU> =N−1∑i=0

λiuiu>i

Graph signals:f , g

GFT:U>f ,U>g

Multiply: 2

U>f � U>gIGFT:

U(U>f � U>g

)

Graph filter: U(U>f � U>g

)= U · diag(U>g)︸ ︷︷ ︸

g(Λ)

·U>f = U · g(Λ) · U>︸ ︷︷ ︸g(L)

f = g(L)f

2� indicates element-wise multiplication

24

Page 30: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

Spectral GCNs

Page 31: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

Spectral GCNs

A first idea [7]: transformation of each individualeigenvalue is learned with a free parameter θi .

Problems:

• O(N) parameters;

• not localized in node space (the only thingthat we want);

• U · g(Λ) · U> costs O(N2);

gθ(Λ) =

θ0

θ1. . .

θN−2

θN−1

[7] J. Bruna et al., “Spectral networks and locally connected networks on graphs,” 2013.

25

Page 32: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

Spectral GCNs

Better idea [7]:

• Localized in node domain ↔ smooth inspectral domain;

• Learn only a few parameters θi ;

• Interpolate the other eigenvalues using asmooth cubic spline;

Localized and O(1) parameters, butmultiplying by U twice is still expensive.

0 2 4 6 8 10 12 14 16n

4

3

2

1

0

1

2

n

interpolatedlearned

[7] J. Bruna et al., “Spectral networks and locally connected networks on graphs,” 2013.

26

Page 33: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

Chebyshev Polynomials [1]

The same recursion is used to filter eigenvalues:

T(0) = I

T(1) = Λ

T(k) = 2 · Λ · T(k−1) − T(k−2)

1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00x

1.00

0.75

0.50

0.25

0.00

0.25

0.50

0.75

1.00

T(n) (x

)

T(1)

T(2)

T(3)

T(4)

T(5)

T(6)

[1] M. Defferrard et al., “Convolutional neural networks on graphs with fast localized spectral filtering,” 2016.

27

Page 34: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

References i

[1] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural networks ongraphs with fast localized spectral filtering,” in Advances in Neural Information ProcessingSystems, 2016, pp. 3844–3852.

[2] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutionalnetworks,” in International Conference on Learning Representations (ICLR), 2016.

[3] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, “Neural messagepassing for quantum chemistry,” arXiv preprint arXiv:1704.01212, 2017.

[4] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, “Graphattention networks,” arXiv preprint arXiv:1710.10903, 2017.

[5] M. Simonovsky and N. Komodakis, “Dynamic edge-conditioned filters in convolutionalneural networks on graphs,” in Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition, 2017.

28

Page 35: Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture 2 Daniele Grattarola March 1, 2021

References ii

[6] J. You, R. Ying, and J. Leskovec, “Design space for graph neural networks,” arXiv preprintarXiv:2011.08843, 2020.

[7] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, “Spectral networks and locally connectednetworks on graphs,” arXiv preprint arXiv:1312.6203, 2013.

29