relational representation learning - github pagesrusty1s.github.io/rrl_gnn_2019.pdfdefferrard et...

RelationalRepresentation Learningwith Graph Neural Networks

Matthias Fey [email protected]

! " # /rusty1s

Roadmap1. Motivation & Introduction 2. Common Operators 3. Efficient Computation of Graph Neural Networks 4. Relation to the Weisfeiler-Lehman Algorithm 5. Recent Extensions

Hierarchical Models Deep(er) Models Unsupervised Models Generative Models

6. Conclusion

!2

Motivation &

Introduction

!3

MotivationEnable learning on irregularly structured data, e.g., graphs, point clouds, and manifolds:

borrow from/generalize successful concepts on Euclidean domains, e.g., convolution

Enforce relational inductive biases into the model: enable learning about entities, relations, and rules for composing them

known as either Geometric Deep Learning or Relational Representation Learning

!4Bronstein et al.: Geometric Deep Learning: Going Beyond Euclidean Data (IEEE Sig. Proc. Mag. 2017) Battaglia et al.: Relational Inductive Biases, Deep Learning, and Graph Networks (arXiv/1806.01261)

PersonMotorbike

HatRoad

On OnOn

redDirt

PersonMotorbike

HatRoad

On OnOn

redDirt

Dictionary

“a man with a red helmet riding a motorcycle down a countryside dirt road”

ManMotorcycle

HelmetRoad

On

WearDown

redDirt

Countryside

Visual Scene Understanding

A Non-Exhaustive List of Applications

!5

Autonomous Driving

Mikhail Baryshnikov Vaganova Academy

U.S.A.

Vilcek prize

awarded

educated_atcitizen_of

:country

:university

:award:ballet_dancer

Reasoning in Knowledge Graphs

Point Cloud Analysis

Observed dynamics Interaction graph

Dynamics in Physical Systems

A Non-Exhaustive List of Applications

!6

Social NetworksPrediction of molecular properties

and drug discovery

Protein-Protein- Interactions

3D Shapes Analysis

Human Pose Estimation

Generalization of the convolution operator to graphs

Graph Neural Networks (GNN)

!7Fey et al.: SplineCNN: Fast Geometric Deep Learning with Continuous B-Spline Kernels (CVPR 2018)

dynamic positions dynamic number of neighbors

From discrete to continuous filters

Graph Neural Networks (GNNs)Graph Node features in layer

GNNs can be expressed as a neighborhood aggregation or message passing scheme:

!8Gilmer et al.: Neural Message Passing for Quantum Chemistry (ICML 2017) Xu et al.: How Powerful are Graph Neural Networks? (ICLR 2019)

~h(t)v = COMBINE

(t)⇣~h(t�1)v , ~m(t)

v

⌘

<latexit sha1_base64="Gb2RIzrt06P0o5wVRa6SN9XPsew=">AAACQ3icbVBNa9tAFFylX4n7Ebc95rLEFGxojZQUkkvB2ASSQxoX6iTUUsVq/WQv2ZXE7pOpEfpvufQP9JY/kEsOCaXXQte2oE3SgYVhZt7b3YkyKQy67oWz8uDho8dPVtdqT589f7Fef/nq2KS55jDgqUz1acQMSJHAAAVKOM00MBVJOInOenP/ZAraiDT5jLMMAsXGiYgFZ2ilsP7FnwIvJmU4/Vo0sVXSD9RH+IZaFb2jw+7Bx72yMnwJMTbprfw7r1W+XUrq7wpfi/EEW2G94bbdBeh94lWkQSr0w/oPf5TyXEGCXDJjhp6bYVAwjYJLKGt+biBj/IyNYWhpwhSYoFh0UNI3VhnRONX2JEgX6r8TBVPGzFRkk4rhxNz15uL/vGGO8W5QiCTLERK+vCjOJcWUzgulI6GBo5xZwrgW9q2UT5hmHG3tNVuCd/fL98nxVtvbbruf3jc63aqOVbJBNkmTeGSHdMg+6ZMB4eScXJJrcuN8d66cn86vZXTFqWZek1twfv8BDgewXQ==</latexit>

G = (V, E)<latexit sha1_base64="oCS5t5xXBWcjB5BTX5VxLStxjhQ=">AAACEXicbVDLSgMxFL1TX7W+Rl26CRahgpQZFXQjFEV0WcE+oB1KJk3b0MyDJCOUYX7Bjb/ixoUibt2582/MtIOP1gOBc8+5l9x73JAzqSzr08jNzS8sLuWXCyura+sb5uZWXQaRILRGAh6Iposl5cynNcUUp81QUOy5nDbc4UXqN+6okCzwb9UopI6H+z7rMYKVljpmqe1hNSCYx1cJOkM/ZT05QN/FZbLfMYtW2RoDzRI7I0XIUO2YH+1uQCKP+opwLGXLtkLlxFgoRjhNCu1I0hCTIe7TlqY+9qh04vFFCdrTShf1AqGfr9BY/T0RY0/KkefqznRHOe2l4n9eK1K9Uydmfhgp6pPJR72IIxWgNB7UZYISxUeaYCKY3hWRARaYKB1iQYdgT588S+qHZfuobN0cFyvnWRx52IFdKIENJ1CBa6hCDQjcwyM8w4vxYDwZr8bbpDVnZDPb8AfG+xdw65zD</latexit>

~h(t)v 2 RF

<latexit sha1_base64="8FQXLuZv4EYmYi5WJXedeWJrrNA=">AAACCXicbVDLSsNAFJ34rPUVdelmsAh1UxIVdFkUxGUV+4AmDZPppB06mYSZSaGEbN34K25cKOLWP3Dn3zhps9DWAxcO59zLvff4MaNSWda3sbS8srq2Xtoob25t7+yae/stGSUCkyaOWCQ6PpKEUU6aiipGOrEgKPQZafuj69xvj4mQNOIPahITN0QDTgOKkdKSZ0JnTHA6zLxxL62qkww6lEMnRGro++l91rvxzIpVs6aAi8QuSAUUaHjml9OPcBISrjBDUnZtK1ZuioSimJGs7CSSxAiP0IB0NeUoJNJNp59k8FgrfRhEQhdXcKr+nkhRKOUk9HVnfqOc93LxP6+bqODSTSmPE0U4ni0KEgZVBPNYYJ8KghWbaIKwoPpWiIdIIKx0eGUdgj3/8iJpndbss5p1d16pXxVxlMAhOAJVYIMLUAe3oAGaAINH8AxewZvxZLwY78bHrHXJKGYOwB8Ynz81k5oC</latexit>

~m(t)v = AGGREGATE(t)

⇣n~h(t�1)w : w 2 N (v)

o⌘

<latexit sha1_base64="ymUbbb2hr6+hC3YHiT43C5W7/BE=">AAACYXicbZFPb9MwGMadbMBW2AjbcReLCqk9UCWABBek/dFUTmigdZtUl8hxndaa7UT2m7LKypfkxoULXwQnzQE2XsnyT8/jV379OCulsBDHP4Nwa/vR4yc7u72nz/b2n0cvDq5sURnGJ6yQhbnJqOVSaD4BAZLflIZTlUl+nd2eNf71ihsrCn0J65LPFF1okQtGwUtpdEdWnDlVp6tvbgDDGn/EBPgdGOVOxuOv5+OTy/O6s4jkOQw2G3G47Vy25utkWKffMWnm0diD0JgoCktGpftcD1ZDTIxYLIHUHQzTqB+P4rbwQ0g66KOuLtLoB5kXrFJcA5PU2mkSlzBz1IBgktc9UlleUnZLF3zqUVPF7cy1CdX4lVfmOC+MXxpwq/7d4aiydq0yf7IZ2973GvF/3rSC/MPMCV1WwDXbXJRXEkOBm7jxXBjOQK49UGaEnxWzJTWUgf+Ung8huf/kh3D1ZpS8HcVf3vWPT7s4dtAReokGKEHv0TH6hC7QBDH0K9gO9oL94He4G0bhweZoGHQ9h+ifCo/+ABGUtTo=</latexit>

permutation- invariant

t

Common Operators

!9

Operators for Learning on Graphs

!10

Xu et al.: How Powerful are Graph Neural Networks? (ICLR 2019) Hamilton et al.: Inductive Representation Learning on Large Graphs (NIPS 2017) Kipf and Welling: Semi-Supervised Classification with Graph Convolutional Networks (ICLR 2017) Velickovic et al.: Graph Attention Networks (ICLR 2018)

~h(t)v = �

0

@⇥(t)X

w2N (v)[{v}

C(t�1)v,w

~h(t�1)w

1

A

<latexit sha1_base64="3KYre38aJ5JoCXZLBttT/cCuy1g=">AAACdHicbVFNj9MwEHXC11K+Chw4wGGgWqmVlioBJLggrdgLJ7RI292V6hI5rtNY6ziRPUlVWfkF/Dtu/AwunHHaIGCXJ1l6evOexjOTVkpajKLvQXjt+o2bt/ZuD+7cvXf/wfDho1Nb1oaLGS9Vac5TZoWSWsxQohLnlRGsSJU4Sy+OuvpZI4yVpT7BTSUWBVtpmUnO0EvJ8CttBHd5mzRf3BgnLbwHauWqYECVyHAMtGCYp5mjJ7lA1vYuausicWugUu8cnCn3qR03E6C8roA6aIC2LRwlrjlYb2Mv4y54AL87rv+IRq5ynCTDUTSNtoCrJO7JiPQ4Tobf6LLkdSE0csWsncdRhQvHDEquRDugtRUV4xdsJeaealYIu3DbpbWw75UlZKXxTyNs1b8TjhXWborUO7sB7eVaJ/6vNq8xe7dwUlc1Cs13jbJaAZbQXQCW0giOauMJ40b6vwLPmWEc/Z0Gfgnx5ZGvktNX0/j1NPr8ZnT4oV/HHnlKXpAxiclbckg+kmMyI5z8CJ4EEDwPfobPwlG4v7OGQZ95TP5BOP0FGpe8Vw==</latexit>

Trainable parameters

Normalization coefficient

Non- Linearity

Normalization can be either ...

... static: C(t)v,w := 1

Added self-loops

C(t)v,w := |N (v)|�1... structure-dependent:

... data-dependent, aka Attention

Operators for Learning on Point Clouds

!11

Qi et al.: Deep Learning on Point Sets for 3D Classification and Segmentation (CVPR 2017) Qi et al.: PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space (NIPS 2017) Fey et al.: SplineCNN: Fast Geometric Deep Learning with Continuous B-Spline Kernels (CVPR 2018)

Neighborhood is typically given by ball query or K nearest neighbor search.

Formulation can be easily extended to handlearbitrary edge features!

~h(t)v = max

w2N (v)[{v}�(t)⇥

⇣~h(t�1)w , ~pw � ~pv

⌘

trainable function, e.g., MLP

relative Cartesian coordinates

PointNet(++):

Efficient Computation of Graph Neural Networks

!12

is placeholder for either sum, mean, or max

Efficient Parallelization of GNNs

!13Fey and Lenssen: Fast Graph Representation Learning with PyTorch Geometric (ICLR-W 2019)

⇤

COO format

1. Gather node information into edge parallel space 2. Scatter edge information into node parallel space

vectorized computation

íh1

íh2

íh3

íh4 �� íh1, íh2, íe2,1

�

�� íh1, íh3, íe3,1

�

�� íh1, íh4, íe4,1

�gather (I , �)

íh1scatter_’ (I , �)

AGGREGATE(t)(v) = ⇤w2N (v)�(t)⇥

⇣~h(t�1)v , ~h(t�1)

w , ~e (t�1)w,v

⌘

Mini-Batching in GNNs


G1<latexit sha1_base64="m+1peg0Sv0RYpnMX6aPhyDSQl8Y=">AAAB9HicbVDLSgMxFL3js9ZX1aWbYBFclRkVdFl0ocsK9gHtUDLpbRuayYxJplCGfocbF4q49WPc+Tdm2llo64HA4Zx7uScniAXXxnW/nZXVtfWNzcJWcXtnd2+/dHDY0FGiGNZZJCLVCqhGwSXWDTcCW7FCGgYCm8HoNvObY1SaR/LRTGL0QzqQvM8ZNVbyOyE1Q0ZFejftet1S2a24M5Bl4uWkDDlq3dJXpxexJERpmKBatz03Nn5KleFM4LTYSTTGlI3oANuWShqi9tNZ6Ck5tUqP9CNlnzRkpv7eSGmo9SQM7GQWUi96mfif105M/9pPuYwTg5LND/UTQUxEsgZIjytkRkwsoUxxm5WwIVWUGdtT0ZbgLX55mTTOK95FxX24LFdv8joKcAwncAYeXEEV7qEGdWDwBM/wCm/O2Hlx3p2P+eiKk+8cwR84nz+noJID</latexit>

G2<latexit sha1_base64="2Ny97sloKr+pBcxClBwWMzw2i1o=">AAAB9HicbVDLSgMxFL2pr1pfVZdugkVwVWaqoMuiC11WsA9oh5JJM21oJjMmmUIZ+h1uXCji1o9x59+YaWehrQcCh3Pu5Z4cPxZcG8f5RoW19Y3NreJ2aWd3b/+gfHjU0lGiKGvSSESq4xPNBJesabgRrBMrRkJfsLY/vs389oQpzSP5aKYx80IylDzglBgreb2QmBElIr2b9Wv9csWpOnPgVeLmpAI5Gv3yV28Q0SRk0lBBtO66Tmy8lCjDqWCzUi/RLCZ0TIasa6kkIdNeOg89w2dWGeAgUvZJg+fq742UhFpPQ99OZiH1speJ/3ndxATXXsplnBgm6eJQkAhsIpw1gAdcMWrE1BJCFbdZMR0RRaixPZVsCe7yl1dJq1Z1L6rOw2WlfpPXUYQTOIVzcOEK6nAPDWgChSd4hld4QxP0gt7Rx2K0gPKdY/gD9PkDqSSSBA==</latexit>

G3<latexit sha1_base64="Pu4SJjRsjPzRVgDkpSyvrBmW8YA=">AAAB9HicbVDLSgMxFL2pr1pfVZdugkVwVWasoMuiC11WsA9oh5JJM21oJjMmmUIZ+h1uXCji1o9x59+YaWehrQcCh3Pu5Z4cPxZcG8f5RoW19Y3NreJ2aWd3b/+gfHjU0lGiKGvSSESq4xPNBJesabgRrBMrRkJfsLY/vs389oQpzSP5aKYx80IylDzglBgreb2QmBElIr2b9Wv9csWpOnPgVeLmpAI5Gv3yV28Q0SRk0lBBtO66Tmy8lCjDqWCzUi/RLCZ0TIasa6kkIdNeOg89w2dWGeAgUvZJg+fq742UhFpPQ99OZiH1speJ/3ndxATXXsplnBgm6eJQkAhsIpw1gAdcMWrE1BJCFbdZMR0RRaixPZVsCe7yl1dJ66Lq1qrOw2WlfpPXUYQTOIVzcOEK6nAPDWgChSd4hld4QxP0gt7Rx2K0gPKdY/gD9PkDqqiSBQ==</latexit>

G1<latexit sha1_base64="m+1peg0Sv0RYpnMX6aPhyDSQl8Y=">AAAB9HicbVDLSgMxFL3js9ZX1aWbYBFclRkVdFl0ocsK9gHtUDLpbRuayYxJplCGfocbF4q49WPc+Tdm2llo64HA4Zx7uScniAXXxnW/nZXVtfWNzcJWcXtnd2+/dHDY0FGiGNZZJCLVCqhGwSXWDTcCW7FCGgYCm8HoNvObY1SaR/LRTGL0QzqQvM8ZNVbyOyE1Q0ZFejftet1S2a24M5Bl4uWkDDlq3dJXpxexJERpmKBatz03Nn5KleFM4LTYSTTGlI3oANuWShqi9tNZ6Ck5tUqP9CNlnzRkpv7eSGmo9SQM7GQWUi96mfif105M/9pPuYwTg5LND/UTQUxEsgZIjytkRkwsoUxxm5WwIVWUGdtT0ZbgLX55mTTOK95FxX24LFdv8joKcAwncAYeXEEV7qEGdWDwBM/wCm/O2Hlx3p2P+eiKk+8cwR84nz+noJID</latexit>

G2<latexit sha1_base64="2Ny97sloKr+pBcxClBwWMzw2i1o=">AAAB9HicbVDLSgMxFL2pr1pfVZdugkVwVWaqoMuiC11WsA9oh5JJM21oJjMmmUIZ+h1uXCji1o9x59+YaWehrQcCh3Pu5Z4cPxZcG8f5RoW19Y3NreJ2aWd3b/+gfHjU0lGiKGvSSESq4xPNBJesabgRrBMrRkJfsLY/vs389oQpzSP5aKYx80IylDzglBgreb2QmBElIr2b9Wv9csWpOnPgVeLmpAI5Gv3yV28Q0SRk0lBBtO66Tmy8lCjDqWCzUi/RLCZ0TIasa6kkIdNeOg89w2dWGeAgUvZJg+fq742UhFpPQ99OZiH1speJ/3ndxATXXsplnBgm6eJQkAhsIpw1gAdcMWrE1BJCFbdZMR0RRaixPZVsCe7yl1dJq1Z1L6rOw2WlfpPXUYQTOIVzcOEK6nAPDWgChSd4hld4QxP0gt7Rx2K0gPKdY/gD9PkDqSSSBA==</latexit>

G3<latexit sha1_base64="Pu4SJjRsjPzRVgDkpSyvrBmW8YA=">AAAB9HicbVDLSgMxFL2pr1pfVZdugkVwVWasoMuiC11WsA9oh5JJM21oJjMmmUIZ+h1uXCji1o9x59+YaWehrQcCh3Pu5Z4cPxZcG8f5RoW19Y3NreJ2aWd3b/+gfHjU0lGiKGvSSESq4xPNBJesabgRrBMrRkJfsLY/vs389oQpzSP5aKYx80IylDzglBgreb2QmBElIr2b9Wv9csWpOnPgVeLmpAI5Gv3yV28Q0SRk0lBBtO66Tmy8lCjDqWCzUi/RLCZ0TIasa6kkIdNeOg89w2dWGeAgUvZJg+fq742UhFpPQ99OZiH1speJ/3ndxATXXsplnBgm6eJQkAhsIpw1gAdcMWrE1BJCFbdZMR0RRaixPZVsCe7yl1dJ66Lq1qrOw2WlfpPXUYQTOIVzcOEK6nAPDWgChSd4hld4QxP0gt7Rx2K0gPKdY/gD9PkDqqiSBQ==</latexit>

GNN(H,A)<latexit sha1_base64="3+rmQczZuUEYN1tWGb5fd6cFgtk=">AAACDXicbZDLSgMxFIYz9VbrrerSTbAKFaTMqKDLqgu7KhXsBTpDyaSZNjRzITkjlmFewI2v4saFIm7du/NtTC+CVn8IfPznHHLO70aCKzDNTyMzN7+wuJRdzq2srq1v5De3GiqMJWV1GopQtlyimOABqwMHwVqRZMR3BWu6g8tRvXnLpOJhcAPDiDk+6QXc45SAtjr5PRvYHUg/uapW06LtE+i7XlJJD/E3n6cHnXzBLJlj4b9gTaGApqp18h92N6SxzwKggijVtswInIRI4FSwNGfHikWEDkiPtTUGxGfKScbXpHhfO13shVK/APDY/TmREF+poe/qztGKarY2Mv+rtWPwzpyEB1EMLKCTj7xYYAjxKBrc5ZJREEMNhEqud8W0TyShoAPM6RCs2ZP/QuOoZB2XzOuTQvliGkcW7aBdVEQWOkVlVEE1VEcU3aNH9IxejAfjyXg13iatGWM6s41+yXj/AjoXm6I=</latexit>

+

H1<latexit sha1_base64="NFAAj70oI1d8vhGG2WuH9Zutmdc=">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQq6LLopssK9gFNKZPppB06mYSZG6GE/oYbF4q49Wfc+TdO2iy0emDgcM693DMnSKQw6LpfTmltfWNzq7xd2dnd2z+oHh51TJxqxtsslrHuBdRwKRRvo0DJe4nmNAok7wbTu9zvPnJtRKwecJbwQUTHSoSCUbSS70cUJ0GYNedDb1ituXV3AfKXeAWpQYHWsPrpj2KWRlwhk9SYvucmOMioRsEkn1f81PCEsikd876likbcDLJF5jk5s8qIhLG2TyFZqD83MhoZM4sCO5lnNKteLv7n9VMMbwaZUEmKXLHloTCVBGOSF0BGQnOGcmYJZVrYrIRNqKYMbU0VW4K3+uW/pHNR9y7r7v1VrXFb1FGGEziFc/DgGhrQhBa0gUECT/ACr07qPDtvzvtytOQUO8fwC87HN+GrkZI=</latexit>

H2<latexit sha1_base64="YGnMqtU5v3nsYsDlzUaWiTFHDSg=">AAAB83icbVDLSgMxFL2pr1pfVZdugkVwVWaqoMuimy4r2Ad0hpJJM21oJjMkGaEM/Q03LhRx68+482/MtLPQ1gOBwzn3ck9OkAiujeN8o9LG5tb2Tnm3srd/cHhUPT7p6jhVlHVoLGLVD4hmgkvWMdwI1k8UI1EgWC+Y3ud+74kpzWP5aGYJ8yMyljzklBgreV5EzCQIs9Z82BhWa07dWQCvE7cgNSjQHla/vFFM04hJQwXReuA6ifEzogyngs0rXqpZQuiUjNnAUkkipv1skXmOL6wywmGs7JMGL9TfGxmJtJ5FgZ3MM+pVLxf/8wapCW/9jMskNUzS5aEwFdjEOC8Aj7hi1IiZJYQqbrNiOiGKUGNrqtgS3NUvr5Nuo+5e1Z2H61rzrqijDGdwDpfgwg00oQVt6ACFBJ7hFd5Qil7QO/pYjpZQsXMKf4A+fwDjL5GT</latexit>

H3<latexit sha1_base64="vfYZQA0nNM552CNF+KfUkw9i8Mc=">AAAB83icbVDLSgMxFL2pr1pfVZdugkVwVWasoMuimy4r2Ad0hpJJM21oJjMkGaEM/Q03LhRx68+482/MtLPQ1gOBwzn3ck9OkAiujeN8o9LG5tb2Tnm3srd/cHhUPT7p6jhVlHVoLGLVD4hmgkvWMdwI1k8UI1EgWC+Y3ud+74kpzWP5aGYJ8yMyljzklBgreV5EzCQIs9Z82BhWa07dWQCvE7cgNSjQHla/vFFM04hJQwXReuA6ifEzogyngs0rXqpZQuiUjNnAUkkipv1skXmOL6wywmGs7JMGL9TfGxmJtJ5FgZ3MM+pVLxf/8wapCW/9jMskNUzS5aEwFdjEOC8Aj7hi1IiZJYQqbrNiOiGKUGNrqtgS3NUvr5PuVd1t1J2H61rzrqijDGdwDpfgwg00oQVt6ACFBJ7hFd5Qil7QO/pYjpZQsXMKf4A+fwDks5GU</latexit>

A<latexit sha1_base64="TjQQyoPJj2LwIWswFz1fOdPBp/Y=">AAAB8XicbVDLSgMxFL1TX7W+qi7dBIvgqsyooMuqG5cV7APbUjJppg3NZIbkjlCG/oUbF4q49W/c+Tdm2llo64HA4Zx7ybnHj6Uw6LrfTmFldW19o7hZ2tre2d0r7x80TZRoxhsskpFu+9RwKRRvoEDJ27HmNPQlb/nj28xvPXFtRKQecBLzXkiHSgSCUbTSYzekOPKD9HraL1fcqjsDWSZeTiqQo94vf3UHEUtCrpBJakzHc2PspVSjYJJPS93E8JiyMR3yjqWKhtz00lniKTmxyoAEkbZPIZmpvzdSGhozCX07mSU0i14m/ud1EgyueqlQcYJcsflHQSIJRiQ7nwyE5gzlxBLKtLBZCRtRTRnakkq2BG/x5GXSPKt651X3/qJSu8nrKMIRHMMpeHAJNbiDOjSAgYJneIU3xzgvzrvzMR8tOPnOIfyB8/kDqlCQ5w==</latexit>

H<latexit sha1_base64="tre5yBQ5Tv9HV/Uk3hFoV8zEYPE=">AAAB8XicbVDLSgMxFL1TX7W+qi7dBIvgqsyooMuimy4r2Ae2pWTSO21oJjMkGaEM/Qs3LhRx69+482/MtLPQ1gOBwzn3knOPHwuujet+O4W19Y3NreJ2aWd3b/+gfHjU0lGiGDZZJCLV8alGwSU2DTcCO7FCGvoC2/7kLvPbT6g0j+SDmcbYD+lI8oAzaqz02AupGftBWp8NyhW36s5BVomXkwrkaAzKX71hxJIQpWGCat313Nj0U6oMZwJnpV6iMaZsQkfYtVTSEHU/nSeekTOrDEkQKfukIXP190ZKQ62noW8ns4R62cvE/7xuYoKbfsplnBiUbPFRkAhiIpKdT4ZcITNiagllitushI2poszYkkq2BG/55FXSuqh6l1X3/qpSu83rKMIJnMI5eHANNahDA5rAQMIzvMKbo50X5935WIwWnHznGP7A+fwBtPOQ7g==</latexit>

No additional overhead due to sparse layout! Input examples can have different size!

PyTorch Geometric: A Fast GNN Library

✓ provides over 25 GNN implementations ✓ access to over 100 benchmark datasets ✓ extendable via a simple Message Passing API ✓ leverages dedicated CUDA kernels ✓ supports multi-GPUs ✓ thoroughly documented

https://github.com/rusty1s/pytorch_geometric



class MyOwnNet(Module): def __init__(self, in_channels, out_channels): self.conv1 = GCNConv(in_channels, 16) self.conv2 = GCNConv(16, out_channels)


layer initialization

network execution flow

def forward(self, x, edge_index): x = self.conv1(x, edge_index) x = relu(x) x = self.conv2(x, edge_index) x = softmax(x, dim=1) return x


class MyOwnConv(MessagePassing): def __init__(self, ...): super(MyOwnConv, self).__init__('add')


add, mean or max aggregation

pass every- thing needed to

construct messages

def forward(self, x, edge_index): return self.propagate(edge_index, x=x)

Features get automatically mapped to respective source and target nodes

def message(self, x_i, x_j): # Construct messages to aggregate.

Relation to the Weisfeiler-Lehman Algorithm

!18

Initialization

How Powerful are Graph Neural Networks?WL-Test - A Graph Isomorphism Heuristic via iterative color refinement:

1. Aggregate colors from neighboring nodes 2. Hash aggregated colors into unique new colors

!19Xu et al.: How Powerful are Graph Neural Networks? (ICLR 2019) Morris et al.: Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks (AAAI 2019)

First iteration

How Powerful are Graph Neural Networks?WL-Test - A Graph Isomorphism Heuristic via iterative color refinement:

1. Aggregate colors from neighboring nodes 2. Hash aggregated colors into unique new colors


3. Graphs are non-isomorphic if color histograms differ after roundsT

How Powerful are Graph Neural Networks? GNNs are at most as powerful as the WL-Test in distinguishing graph structures


GNNs are as powerful as the WL-Test iff and are injective

Use sum instead of mean or max aggregation

AGGREGATE<latexit sha1_base64="zfSQpx0joImZBgW6dkMwc5cjuC8=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16CRbBU0lU0GOrlHqs0i9oQ9lsN+3S3STsTqQl5K948aCIV/+IN/+N2zYHrT4YeLw3w8w8L+JMgW1/Gbm19Y3Nrfx2YWd3b//APCy2VRhLQlsk5KHselhRzgLaAgacdiNJsfA47XiT27nfeaRSsTBowiyirsCjgPmMYNDSwCz2gU5BiqRarz/U6tVmLR2YJbtsL2D9JU5GSihDY2B+9ochiQUNgHCsVM+xI3ATLIERTtNCP1Y0wmSCR7SnaYAFVW6yuD21TrUytPxQ6grAWqg/JxIslJoJT3cKDGO16s3F/7xeDP61m7AgioEGZLnIj7kFoTUPwhoySQnwmSaYSKZvtcgYS0xAx1XQITirL/8l7fOyc1G27y9LlZssjjw6RifoDDnoClXQHWqgFiJoip7QC3o1UuPZeDPel605I5s5Qr9gfHwDMZuT4A==</latexit>

COMBINE<latexit sha1_base64="+Ax8YPdwBbXIwm0j9BXk3pQbcKI=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBbBU0lU0GNpEfSgVrAf0Iay2W7apZtN2J0US+g/8eJBEa/+E2/+G7dtDlp9MPB4b4aZeX4suAbH+bJyS8srq2v59cLG5tb2jr2719BRoiir00hEquUTzQSXrA4cBGvFipHQF6zpD6tTvzliSvNIPsA4Zl5I+pIHnBIwUte2O8AeQYVp9e6mcn17OenaRafkzID/EjcjRZSh1rU/O72IJiGTQAXRuu06MXgpUcCpYJNCJ9EsJnRI+qxtqCQh0146u3yCj4zSw0GkTEnAM/XnREpCrcehbzpDAgO96E3F/7x2AsGFl3IZJ8AknS8KEoEhwtMYcI8rRkGMDSFUcXMrpgOiCAUTVsGE4C6+/Jc0Tkruacm5PyuWK1kceXSADtExctE5KqMrVEN1RNEIPaEX9Gql1rP1Zr3PW3NWNrOPfsH6+AYKm5NC</latexit>

vs. v0v

mean/max fail

vs. vs.v

v0

max fails

vs. vs.v v0

mean/max fail

vs.

Recent GNN Extensions

!22

Voxel Grids (pool_2d generalisation)

Either via deterministic coarsening methods...

Hierarchical Models

!23

Defferrard et al.: Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering (NIPS 2016) Simonovsky and Komodakis: Dynamic Edge-Conditioned Filters in CNNs on Graphs (CVPR 2017) Qi et al.: PointNet++ Deep Hierarchical Feature Learning on Point Sets in a Metric Space (NIPS 2017) Ranjan et al.: Generating 3D Faces using Convolutional Mesh Autoencoders (ECCV 2018)

0

1

5

6

4

8

10

9 0 1 2

0

3

2

4

5

0

1

2

2 3 4 51

0 4 51 8 92 3 7 11

7 11

321 0

6 10

Graclus (greedy pair matching)

1024 points 512 points 256 points 128 points

Iterative Sampling of the most distant points

Figure 4: A sequence of approximations generated using our algorithm. The original model on the left has 5,804 faces. The approximationsto the right have 994, 532, 248, and 64 faces respectively. Note that features such as horns and hooves continue to exist through many simpli-fications. Only at extremely low levels of detail do they begin to disappear.

5 Deriving Error QuadricsTo construct our error quadrics, we must choose a heuristic to char-acterize the geometric error. We have selected a heuristic which isquite similar to the one given by Ronfard and Rossignac [7]. Fol-lowing [7], we can observe that in the original model, each vertexis the solution of the intersection of a set of planes — namely, theplanes of the triangles that meet at that vertex. We can associate aset of planes with each vertex, and we can define the error of thevertex with respect to this set as the sum of squared distances to itsplanes:

!(v) = !([vx vy vz 1]T) =∑

p∈planes(v)(pTv)2 (2)

where p = [a b c d]T represents the plane defined by the equationax+ by+ cz+ d = 0 where a2 + b2 + c2 = 1. This approximateerror metric is similar to [7], although we have used a summationrather than a maximum over the set of planes. The set of planes at avertex is initialized to be the planes of the triangles that meet at thatvertex. Note that if wewere to track these plane sets explicitly, as [7]did, we would propagate planes after a contraction (v1,v2) → v us-ing the rule: planes(v) = planes(v1) ∪ planes(v2). This can requirea sizeable amount of storage that does not diminish as simplificationprogresses.The error metric given in (2) can be rewritten as a quadratic form:

!(v) =∑

p∈planes(v)(vTp)(pTv)

=∑

p∈planes(v)vT(ppT)v

= vT(

∑

p∈planes(v)Kp

)

v

where Kp is the matrix:

Kp = ppT =

⎡

⎢⎣

a2 ab ac adab b2 bc bdac bc c2 cdad bd cd d2

⎤

⎥⎦

This fundamental error quadric Kp can be used to find the squareddistance of any point in space to the plane p. We can sum these fun-damental quadrics together and represent an entire set of planes bya single matrix Q.We implicitly track sets of planes using a single matrix; instead

of computing a set union (planes(v1) ∪ planes(v2)) we simply addtwo quadrics (Q1 + Q2). If the sets represented by Q1 and Q2 inthe original metric are disjoint, the quadric addition is equivalent tothe set union. If there is some overlap, then a single plane may becounted multiple times. However, any single plane can be countedat most 3 times since each plane is initially distributed only to the

vertices of its defining triangle. This may introduce some impreci-sion into the error measurement, but it has major benefits: the spacerequired to track a plane set is only that required for a 4×4 symmet-ric matrix (10 floating point numbers), and the cost of updating theapproximation is only that for adding two such matrices. If we arewilling to sacrifice some additional storage, it would even be possi-ble to eliminate this multiple counting using an inclusion-exclusionformula.Thus, to compute the initialQmatrices required for our pair con-

traction algorithm, each vertex must accumulate the planes for thetriangles which meet at that vertex. For each vertex, this set ofplanes defines several fundamental error quadrics Kp. The errorquadric Q for this vertex is the sum of the fundamental quadrics.Note that the initial error estimate for each vertex is 0, since eachvertex lies in the planes of all its incident triangles.

5.1 Geometric InterpretationAs we will see, our plane-based error quadrics produce fairly highquality approximations. In addition, they also possess a useful geo-metric meaning3.The level surfaces of these quadrics are almost always ellipsoids.

In some circumstances, the level surfaces may be degenerate. Forinstance, parallel planes (e.g., around a planar surface region) willproduce level surfaces which are two parallel planes, and planeswhich are all parallel to a line (e.g., around a linear surface crease)will produce cylindrical level surfaces. The matrix used for find-ing optimal vertex positions (Eq. 1) will be invertible as long as thelevel surfaces are non-degenerate ellipsoids. In this case, v will beat the center of the ellipsoid.

6 Additional DetailsThe general algorithm outlined so far performs well on most mod-els. However, there are a few important enhancements which im-prove its performance on certain types ofmodels, particularly planarmodels with open boundaries.

Preserving Boundaries. The error quadrics derived earlier donot make any allowance for open boundaries. For models such asterrain height fields, it is necessary to preserve boundary curveswhile simplifying their shape. We might also wish to preserve dis-crete color discontinuities. In such cases, we initially label eachedge as either normal or as a “discontinuity”. For each face sur-rounding a particular discontinuity edge, we generate a perpendic-ular plane running through the edge. These constraint planes arethen converted into quadrics, weighted by a large penalty factor, and

3Kalvin and Taylor [5] describe a somewhat similar use of quadrics torepresent plane sets. They were tracking sets of planes which fit a set ofpoints within some tolerance. They used ellipsoids in plane-space to rep-resent the set of valid approximating planes.

Mesh Downsampling via quadric matrices

Hierarchical Models... or differentiable pooling modules

!24Ying et al.: Hierarchical Graph Representation Learning with Differentiable Pooling (NeuriPS 2018)

2

4h1h2h3

3

5

2

4h1h2h3

3

5

Compute node features Compute soft assignment Coarsen graph and features

Backprop

r

H = �(AHW ) H = �(AH✓)A = softmax(AH�)

H = STH

A = STAS

2

4h1h2h3

3

5

2

4h1h2h3

3

5

2

4h1h2h3

3

5

2

4h1h2h3

3

5

2

4h1h2h3

3

52

4h1h2h3

3

5

2

4h1h2h3

3

5

2

4h1h2h3

3

5

2

4h1h2h3

3

52

4h1h2h3

3

5

H = GNN1(H,A)

H = S>H

A = S>AS

S = softmax(GNN2(H,A))

soft assignment matrix

DiffPool: One GNN learns node embeddings

Another GNN learns cluster assignments

Hierarchical Models... or differentiable pooling modules

Top-k Pooling: Filter nodes based on scoring function

!25Cangea et al.: Towards Sparse Hierarchical Graph Classifiers (NeurIPS-W 2018)

⇥ ⇥

⇥

⇥poolGCN GCN

pool⌃

mean k max

mean k max

predictMLP

~i = topk(~y) H = (H� ~y)~i

~y = tanh(H · ~✓) ensure gradient computation

Layer Aggregation

Deep(er) Models"Washed out" representations after just a few layers

Idea: "Jump back" to earlier representations

!26Xu et al.: Representation Learning on Graphs with Jumping Knowledge Networks (ICML 2018)

~h(1)v

~h(2)v

~h(3)v

~h(final)v

Concatenation:

Pooling:

Weighted Sum:

~h(1)v k . . . k~h(T )

v

max⇣~h(1)v , . . . ,~h(T )

v

⌘

TX

t=1

↵(t)v~h(t)v

Attention

Deep(er) ModelsExtension: Allow jumps directly while aggregating

Aggregate more global information in one branch while falling back to more local information in others

!27Fey: Just Jump: Dynamic Neighborhood Aggregation in Graph Neural Networks (ICLR-W 2019)

t=1 t=2

~hv

+~h(3)v v +

+

+

N (v)

�(3

)⇥

⇣, n

o⌘

+self-preserving state

no state

previous state

current state

~h(3)v

Attention Aggregate

Unsupervised Modelsbased on Message Passing-based models

!28Velickovic et al.: Deep Graph Infomax (ICLR 2019)

~xi

~exj

(X,A)

(eX, eA)

~hi

~ehj

(H,A)

( eH, eA)

E

C

E

~s

RD

D

+

�

~xi

~exj

(X,A)

(eX, eA)

~hi

~ehj

(H,A)

( eH, eA)

E

C

E

~s

RD

D

+

�Corruption function

Global readout function, e.g., global mean pool

Discriminator

X

v2VE(H,A)

hlogD

⇣~hv,~s

⌘i+ E(H,A)

hlog

⇣1�D

⇣~hv,~s

⌘⌘iMaximize mutual information:

Generative Models

!29

Kipf and Welling: Variational Graph Auto-Encoders (NIPS-W 2016) Pan et al.: Adversarially Regularized Graph Autoencoder for Graph Embedding (IJCAI 2018) Simonovsky and Komodakis: GraphVAE: Towards Generation of Small Graphs Using VAEs (arXiv/1802.03480) Cao and Kipf: MolGAN: An Implicit Generative Model for Small Molecular Graphs (ICML-W 2018) You et al.: Generating Realistic Graphs with Deep Auto-regressive Models (ICML 2018)

(Variational/Adversarially Regularized) Autoencoder: GNN encoder and inner product/MLP decoder expensive graph matching procedure, prefixed size

Generative Adversarial Nets for Graphs: MLP generator and GNN discriminator likelihood-free, but still prefixed in size

Auto-regressive Graph Generation: Decomposing of graph generation into a sequence of node and edge formations not prefixed in size, but inherently sequential

ConclusionGNNs have interesting applications in many different areas

GNNs follow a simple message passing scheme to learn localized embeddings

GNNs can be very efficientlyimplemented

GNNs can be enhanced by and used for various concepts

GNNs are really hot right now 🔥🔥🔥

!30

relational representation learning - github pagesrusty1s.github.io/rrl_gnn_2019.pdfdefferrard et...

Documents