graph neural networks - chrome.ws.dei.polimi.it

Graph neural networksFebruary 2018

Visin Francesco

Graph Nets — Francesco Visin

Who I am


Outline

● Motivation and examples● Graph nets

○ (Semi)-formal definition○ Interaction network○ Relation network○ Gated graph sequence neural network○ Attention is all you need

● Implementation example● Conclusions


Motivation● The modern machine learning toolkit is

well-suited to data that is:○ Fixed-length (MLPs)○ Sequential (RNNs) ○ Spatial (CNNs)


Motivation● The modern machine learning toolkit is

well-suited to data that is:○ Fixed-length (MLPs)○ Sequential (RNNs) ○ Spatial (ConvNets)

● But there is not much to support graph structured data.

Image Credit - Diane Harris Cline, Vossman


Motivation● The world is complex, yet very

structured



structured● Many things and problems are

fundamentally relational



structured● Many things and problems are

fundamentally relational● Many problems can be naturally

represented as graphs.


Decoding structure

Reasoning about structure

Inferring structure from unstructured data

Tasks on graphs categorization

Inferring structure

Reasoning

Decoding


Image Credit - Leibe et Al.

Categorization examples

Parsing language Visual scene recognition

DecodingInferring Reasoning


Physical state prediction

Image Credit - Battaglia et al.

Categorization examplesDecodingInferring Reasoning

Molecule structure generation

Image Credit - Li et al.


Categorization examplesDecodingInferring Reasoning

Drug toxicity prediction Finding the treasure

Image Credit - Andrew Doane and lastspark, from Noun ProjectImage Credit - Greg Rodgers


Graphs & Computer Science/Machine Learning● Graph algorithms

○ Dijkstra○ Bellman-Ford

Image Credit - Artyom Kalinin




● Hand-crafted features, graph kernels, etc ..○ Measure similarity between

graphs

Image Credit -Ghosh et al. Image Credit - Vishwanathan et al.




● Hand-crafted features, graph kernels, etc ..○ Measure similarity between

graphs● Graphical models

○ Restricted Boltzmann Machines○ Deep Belief Networks○ Deep Boltzmann Machines

Image Credit - Geoffrey Hinton




● Hand-crafted features, graph kernels, etc ..

● Graphical models○ Restricted Boltzmann Machines○ Deep Belief Networks○ Deep Boltzmann Machines

● Graph neural nets!

Image Credit - Back to the future!


Right! ….but what is a Graph net??


● Growing interest:○ Graph Neural Networks (Scarselli et al 2007; 2008)○ Pointer Networks (Vinyals et al 2015)○ Graph Convolutional Networks (Bruna et al 2013; Duvenaud et al 2015; Henaff et

al 2015; Kipf & Welling 2016; Schlichtkrull et al 2017; Defferrard et al 2017)○ Gated Graph Neural Networks (Li et al 2015)○ Interaction Networks (Battaglia et al 2016; Watters et al 2017;Raposo et al 2017; )○ Message Passing Networks (Gilmer et al. 2017)○ Deep Generative Models of Graphs (Li et al. 2018)

Graph netsLiterature (partial)


● Graph nets (GNs) are a class of models that: ○ Use graphs as

■ inputs and/or ■ outputs and/or ■ latent representation

Graph netsHigh level




○ Manipulate graph-structured representations





○ Manipulate graph-structured representations

○ Share model components across entities and relations



Graph netsCore idea

Task: Is there a golden object in the scene?


Graph netsCore idea


MLP: Inefficient learning; needs to see object of interest in all possible positions


Graph netsCore idea


MLP: Inefficient learning; needs to see object of interest in all possible positions

ConvNet: efficient; applies same function at each position (e.g. is the golden object at position x?) and then pools over the outcomes


Graph netsCore idea

Task: Are there two red objects close to each other?


Graph netsCore idea

Task: Are there two red objects close to each other?

ConvNet: Use a kernel large enough (e.g. 2x2 for this) and check if one has 2 red object in its receptive field

Observed patches by the 2x2 kernel are:

Note: within the patch, convnet needs to experience all permutations


Graph netsCore idea

Edges

Node states

Graph Nets: Think of your observation as a graph


Graph netsCore idea

The number of edges grows quadratically !!!



Graph netsCore idea

Represent the node sparsely, by detecting the

objects of interest

Still O(n2) edges!



Graph netsCore idea Use a sparse representation

for the edges, if you know which objects are allowed to

interact



Graph netsCore idea

Reason in terms of pairs !


Graph netsCore idea

● Same function re-used for every edge to compute its effect

● Same function re-used for every node to compute its new state given sum of all incoming effects and its state

● Invariant to the position of objects !

● Invariant to the distance between objects in topological spaceReason in terms of pairs !


● Edge effects:○ edge type○ <related node states>○ global state

Graph netsStates update


● Edge effects: ○ edge type○ <related node states>○ global state

● Node update:○ node state○ summed effects on incoming edges ○ global state

Graph netsStates update

Sum is commutative and associative!


● Edge effects: ○ edge type○ <related node states>○ global state

● Node update:○ node state○ summed effects on incoming edges ○ global state

● Global state update:○ <node states>○ global state

Graph netsGlobal state update


Graph netsComponents

● G = <O, R> Graph

● O = {o1, o2, .., om} Collection of objects

○ o_i = {oi1, oi

2, …, oin} Object i with n features

● R = {r1, r2, .., rk} Collection of relations between objects

○ rk = <oi, oj, ak> Relation k between oi and oj with attribute ak

● E = {ek} Collection of effects of relations

● X = {xl} Collection of external effects


Graph netsComponents

out = A(g( O( (X, R(m(G))))))Graph

Marshalling functionRelation model: predict the effectsExternal effects

Aggregation function

Object model


Abstraction model


Interaction networks

out = A(g( O( (X, R(m(G))))))Graph

Marshalling functionRelation model: predict the effectsExternal effects

Objects


Object model


Abstraction model

MLP

Fixed! Matrix prod + concat

MLP columnwise

Fixed! Matrix prod

MLP columnwise

Fixed! Matrix prod


● Reason about interactions between objects● Physical reasoning

○ n-body domain: galaxy of objects that interact○ bouncing balls: balls in a box bounce on walls and can collide○ string comprised of masses connected by springs

● Learnable physics engine simulation!● Works/transfers to variable number of objects!● Graph to graph

out = A(g( O( (X, R(m(G))))))



out = A(g( O( (X, R(m(G))))))


Image Credit - Battaglia et al.



http://www.youtube.com/watch?v=2xdkZ4wmv5M

http://www.youtube.com/watch?v=qZLw8wmrJGg


out = A(g( O( (X, R(m(G))))))

Relation networks

Implicit (fixed: tuples)

Not modeled


out = A(g( O( (X, R(m(G))))))

Relation networks

MLP columnwise

Any aggregation function commutative and associative

MLP


● Reason about objects and their relations○ Classification of scenes from graph representation○ Classification of scenes from raw input○ Exploit the relations to perform one-shot relation learning

on a new scene● Graph to vectors

out = A(g( O( (X, R(m(G))))))

Relation networks


out = A(g( O( (X, R(m(G))))))

Relation networks on CLEVR


out = A(g( O( (X, R(m(G))))))

Gated graph sequence neural networks

Connectivity function

LinearSum over nodes + soft self attention

GRU


● Reason about verification of computer programs● Attempt to prove code properties, such as memory safety● BABI task (language comprehension)● Per-node predictions● Gated (GRU style) updates to the nodes● Internal propagation steps while generating outputs sequentially● First follow up of Scarselli 2008

out = A(g( O( (X, R(m(G))))))

Gated graph sequence neural networks


out = A(g( O( (X, R(m(G))))))

Attention is all you need

Different input tokens (nodes)

Masking (optional)

Linear (resulting in key, query & value)

Attention (including accumulation); head Concat


● Consider each time-step (input or hidden state) as a node● Reason about the inner relations in its hidden state (over time)

○ Exploit self attention as a form of recursive memory○ Multiple attention heads in parallel (eq. to multiple edges per same nodes in IN)

● Not explicitly introduced as a graph network approach● Equivalent to Graph Convolutional Net, or Graph net with a fully connected

graph but with attention on the edges● Graph to vector

out = A(g( O( (X, R(m(G))))))

Attention is all you need


Cool! Where can I get one??

Graph Propagation Core# node states and edgesstates = tf.placeholder(tf.float32)edges = tf.placeholder(tf.int32)

# use sonnet or your favorite toolkit to build# feedforward networkseffect_net = snt.Sequential(...)node_net = snt.Sequential(...)

# compute effects / messages along each edge, edge# features can be fed into the model too if availablestates_from = tf.gather(states, edges[:, 0])states_to = tf.gather(states, edges[:, 1])concat_states = tf.concat([states_from, states_to], axis=-1)effects = effect_net(concat_states)

# aggregate incoming effects by a sum, or averageaggregated_effects = tf.unsorted_segment_sum(effects, edges[:, 1], tf.shape(states)[0])

# update the node statesstates = node_net(tf.concat([aggregated_effects, states], axis=-1))

0

1

2

3

states0 1

1 2

3 1

edges

0123

(0, 1)(1, 2)(3, 1)

effects sum effects

0123

new states

from to

Output Modules# graph-level outputgraph_vec = tf.reduce_sum(states, axis=0)graph_output = graph_level_network(graph_vec)... # feed into your favorite loss

# node-level outputnode_outputs = node_level_net(states)... # feed into your favorite loss

# edge-level outputstates_from = tf.gather(states, edges[:, 0])states_to = tf.gather(states, edges[:, 1])concat_states = tf.concat([states_from, states_to], axis=-1)edge_outputs = edge_level_net(concat_states)... # feed into your favorite loss

Graph-level output

Node-level output

Edge-level output

Output Modules# relational network style graph-level output

# concat_states <- paired states for all (i,j) pairseffects = effect_net(concat_states)graph_vec = tf.reduce_sum(effects, axis=0)graph_output = graph_level_network(graph_vec)... # feed into your favorite loss

Relation network style graph-level output


Conclusions

● Graph nets are a powerful model to reason on graph related structures● There are three main categories of tasks in this domain:

○ vector to graph○ graph to graph○ graph to vector

● Several variants of graph nets have been proposed in the literature● They are easy to implement!● Sky is the limit: surprise us!!!

THANK YOUCredits

Razvan Pascanu, Victor Bapst

graph neural networks - chrome.ws.dei.polimi.it

Documents