quelle ia pour les données structurées ? graph neural...

Quelle IA pour les données structurées ?

Graph Neural NetworksRomain Raveaux

romain.raveaux@univ-tours.fr

Maître de conférences

Université de Tours

Laboratoire d’informatique (LIFAT)

Equipe RFAI

Outline

• Graph Neural Network• Encoder

• Decoder

• The model• Matrix formalization

• Message passing formalization

• The losses

Previous episode

• Introduction and Graph-based problems• http://romain.raveaux.free.fr/document/cours%20IA%20DI5%20graphs%20in

troV2.pdf

Graph Neural Network (GNN)

Taxonomy:Graph/node embedding

Explicit embedding

Through feature extraction

End-to-end learning : Here are the GNN

Implicit embedding

Graph space

Graph Neural Network

• Input: A graph

• Output: Node embeddings

• Assumptions: stationarity and compositionality

• The goal:• Graph Neural Networks (GNN) perform an end-to-end learning including

feature extraction and classification.

Graph Neural Networks as a encoder

Output : a node embedding

Graph Neural Networks as a encoder

Two Key Components

• Encoder maps each node to a low-dimensional vector.

• Similarity function specifies how relationships in vector space map to relationships in the original network.

node in the input graph

d-dimensional

embedding

Similarity of u and v in

the original networkdot product between node

embeddings

What is the goal of the encoder ?

• Optimize the encoder such that :

• Should two nodes have similar embeddings if they….1. are connected?

2. share neighbors?

3. Pr(v|u) ? Estimate the probability of visiting v

Decoder:

Basics of “conventional“ neural networks

A Linear model

• 𝑋 ∈ ℝ𝑚×1 : input

• W∈ ℝ

• B∈ ℝ

• 𝑌 ∈ ℝ𝑚×1 : output (prediction)

• 𝑌 = 𝑋𝑊 + 𝐵

𝑋 =𝑥1x2𝑥3

[𝑤1] = 𝑊

𝑥1𝑤1x2𝑤2

𝑥3𝑤3

= 𝑋𝑊

A Linear model : for multi-dimensional input

• 𝑋 ∈ ℝ𝑚×𝑑 : input

• W∈ ℝ𝑑×1

• B∈ ℝ

• 𝑌 ∈ ℝ𝑚×1 : output (prediction)

• 𝑌 = 𝑋𝑊 + 𝐵

𝑋 =𝑥1𝑎 𝑥1𝑏𝑥2𝑎 𝑥2𝑏𝑥3𝑎 𝑥3𝑏

𝑤𝑎𝑤𝑏

= 𝑊

𝑥1𝑎𝑤𝑎 +𝑥1𝑏 𝑤𝑏

= 𝑋𝑊

A Linear model : for multi-dimensional input and output

• W∈ ℝ𝑑×𝑛

• B∈ ℝ𝑛

• 𝑌 ∈ ℝ𝑚×𝑛 : output (prediction)

• 𝑌 = 𝑋𝑊 + 𝐵

𝑋 =𝑥1𝑎 𝑥1𝑏𝑥2𝑎 𝑥2𝑏𝑥3𝑎 𝑥3𝑏

𝑤1𝑎

𝑤2𝑎

𝑤1𝑏

𝑤2𝑏= 𝑊

𝑥1𝑎𝑤1𝑎 +𝑥1𝑏 𝑤2𝑎

𝑥1𝑎𝑤1𝑏 +𝑥1𝑏 𝑤2𝑏

= 𝑋𝑊

A non-linear model : for multi-dimensionalinput/output

• W∈ ℝ𝑑×𝑛

• B∈ ℝ𝑛

• 𝑅 𝑎 , 𝜎 𝑎 : a non-linear function

• 𝑌 ∈ ℝ𝑚×𝑛 : output (prediction)

• 𝑌 = 𝜎 𝑋𝑊 + 𝐵

Image credit : Julien Mille15

Can we stack the model ? Endomorphismproperty

• 𝑋 ∈ ℝ𝑚×𝑑 : input• 𝑊(1) ∈ ℝ𝑑×𝑛

• 𝐵(1) ∈ ℝ𝑛

• 𝑌(1) ∈ ℝ𝑚×𝑛 : intermediate results• 𝑊(1) ∈ ℝ𝑛×𝑜

• 𝐵(1) ∈ ℝ𝑜

• 𝑌(2) ∈ ℝ𝑚×𝑜 : output (prediction)

• 𝑌(1) = 𝜎 𝑋𝑊 1 + 𝐵 1

• 𝑌(2) = 𝜎( 𝑌(1)𝑊 2 + 𝐵 2 )

• 𝑌(2) = 𝜎(𝜎 𝑋𝑊 1 + 𝐵 1 𝑊 2 + 𝐵 2 )

• 𝑌(𝑙) = 𝑓( 𝑌 𝑙−1 ,𝑊 𝑙 , 𝐵 𝑙 )=𝑓( 𝑌 𝑙−1 , 𝜃 𝑙 )

• 𝑌(𝑙) = 𝑓 𝑓 𝑋, 𝜃 𝑙−1 , 𝜃 𝑙 ∶ 𝐶𝑜𝑚𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝑜𝑓 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛s• 𝐹 𝑋, Θ = 𝑓𝜃 𝑙 ∘ 𝑓𝜃 𝑙−1 ∶ 𝐶𝑜𝑚𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝑜𝑓 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛s

Multi Layer Perceptron : MLPA sequence of linearoperations and non linear operations

Why these non-linear operations ? 16

The basics of artificial neural networks

The basics of graph neural networks

Adjacency matrix

Degree matrix

Exponentiation of the adjacency matrix

Key idea and Intuition [Kipf and Welling, 2016]• The key idea is to generate node embeddings based on local

neighborhoods.

• The intuition is to aggregate node information from their neighbors using neural networks.

• Nodes have embeddings at each layer and the neural network can be arbitrary depth. “layer-0” embedding of node u is its input feature, i.e. Fu.

GNN: a pictorial model

Topology and Features

• A∈ ℝ|𝑉|×|𝑉| : Adjacency matrix (topology/structure)

• H∈ ℝ|𝑉|×𝑑 : d dimensional features vector for each node

• 𝐴𝐻 ∈ ℝ|𝑉|×𝑑= Feature aggregation

A=𝑎1𝑎 𝑎1𝑏𝑎2𝑎 𝑎2𝑏

ℎ1𝑎ℎ2𝑎

ℎ1𝑏ℎ2𝑏

= 𝐻

𝑎1𝑎ℎ1𝑎 +𝑎1𝑏 ℎ2𝑎𝑎2𝑎ℎ1𝑎 +𝑎2𝑏 ℎ2𝑎

𝑎1𝑎ℎ1𝑏 +𝑎1𝑏 ℎ2𝑏𝑎2𝑎ℎ1𝑏 +𝑎2𝑏 ℎ2𝑏 = 𝐴𝐻

Simple example of f

Topology and Features

• 𝐴𝐻 ∈ ℝ|𝑉|×𝑑= Feature aggregation

• W∈ ℝ𝑑×𝑙 : Parameters matrix that does not depend on |V|

AH =𝑎ℎ1𝑎 𝑎ℎ1𝑏𝑎ℎ2𝑎 𝑎ℎ2𝑏

𝑊1𝑎

𝑊2𝑎

𝑊1𝑏 𝑊1𝑐

𝑊2𝑏 𝑊2𝑐= 𝑊

𝑎ℎ1𝑎𝑊1𝑎 +𝑎ℎ1𝑏 𝑊2𝑎

𝑎ℎ2𝑎𝑊1𝑎 +𝑎ℎ2𝑏 𝑊2𝑎

𝑎ℎ1𝑎𝑊1𝑏 +𝑎ℎ1𝑏 𝑊2𝑏 𝑎ℎ1𝑎𝑊1𝑐 +𝑎ℎ1𝑏 𝑊2𝑐

𝑎ℎ2𝑎𝑊1𝑏 +𝑎ℎ2𝑏 𝑊2𝑏 𝑎ℎ2𝑎𝑊1𝑐 +𝑎ℎ2𝑏 𝑊2𝑐 = 𝐴𝐻𝑊

average of neighbor’s

previous layer embeddings

The Math

• Basic approach: Average neighbor messages and apply a neural network.

Initial “layer 0” embeddings

are equal to node features

kth layer

embedding

of vnon-linearity

(e.g., ReLU or

previous layer

embedding of v

Training the Model

• After K-layers of neighborhood aggregation, we get output embeddings for each node.

• We can feed these embeddings into any loss functionand run stochastic gradient descent to train the aggregation parameters.

trainable matrices

(i.e., what we learn)

Inductive Capability

Inductive node embedding generalize to entirely unseen graphs

e.g., train on protein interaction graph from model organism A and

generate embeddings on newly collected data about organism B

train on one graph generalize to new graph

Inductive Capability

train with snapshot new node arrivesgenerate embedding

for new node

Many application settings constantly encounter previously unseen nodes.

e.g., Reddit, YouTube, GoogleScholar, ….

Need to generate new embeddings “on the fly”Representation Learning on Networks,

snap.stanford.edu/proj/embeddings-www, WWW 2018

2 issues of this simple example

• Issue 1:• for every node, f sums up all the feature vectors of all neighboring nodes but

not the node itself.

• Fix: simply add the identity matrix to A

• Issue 2:• A is typically not normalized and therefore the multiplication with A will

completely change the scale of the feature vectors.

• Fix: Normalizing A such that all rows sum to one:

Altogether: [Kipf and Welling, 2016]

• The two patched mentioned before +

• A better (symmetric) normalization of the adjacency matrix

O(|E|) time complexity overall.

Graph Convolutional Networks

• Kipf et al.’s Graph Convolutional Networks (GCNs) are a slight variation on the neighborhood aggregation idea:

same matrix for self and

neighbor embeddingsper-neighbor

normalization

Basic Neighborhood Aggregation

GCN Neighborhood Aggregation

instead of simple

average, normalization

varies across neighbors

use the same transformation

matrix for self and neighbor

embeddings

• Empirically, they found this configuration to give the best results.

• More parameter sharing.

• Down-weights high degree neighbors.

From matrix computations to message passing algorithms

More complex models [Nowak et al., 2017]

GraphSAGE Idea

• So far we have aggregated the neighbor messages by taking their (weighted) average, can we do better?

GraphSAGE Idea

Any differentiable function

that maps set of vectors to a

single vector.

• Simple neighborhood aggregation:

• GraphSAGE:

GraphSAGE Differences

49generalized aggregation

concatenate self embedding and

neighbor embedding

GraphSAGE Variants

• Mean:

• Pool• Transform neighbor vectors and apply symmetric vector

function.

• LSTM:• Apply LSTM to random permutation of neighbors.

element-wise mean/max

Graph Attention Network (GAT)

• contributions of neighboring nodes to the central node are neither identical like GraphSage, GCN, …

• learn the relative weights between two connected nodes

P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, “Graph attention networks,” in Proc. of ICLR, 2017

GNN as an encoder

Applications and losses

• Let us recall that Z is the output the GNN

Unsupervised learning

• The graph factorization problem is the problem of predicting if two nodes are linked or not.

• Where is a similarity measure between two nodes embeddings

• Variation of graph factorization• DeepWalk [Perozzi et al., 2014]

• node2vec [Grover and Leskovec, 2016]

Unsupervised embedding: Graph factorization

• Use stochastic gradient descent (SGD) as a general optimization method.

The graph factorization problem is the problem of predicting if two nodes are linked or not.

Supervised learning

• Node classification : social network

Supervised learning

• Node classification :

Node-wise classification

Supervised learning :Node classification

• Last layer:

• For the last layer the activation function is a softmax activation function.

• The loss :

Gibbs distribution

Supervised learning:

• Graph classification• A global average pooling layer must be added to gather all the node

embeddings of a given graph.

• Z’ can be fed to a MLP for classification

• The loss is the cross entropy

Semi-Supervised learning:

• Node classification:• the problem of classifying nodes in a graph, where labels are only available

for a small subset of nodes.

• where label information is smoothed over the graph via some form of explicit graph-based regularization

• Assumption is that connected nodes in the graph are likely to share the same label (class).

• This is true for instance for a neighborhood graph.

𝑠𝑜, l𝑟𝑒𝑔 = ||𝑍. 𝑍𝑇−A||Frobenius2

Some applications

• Node classification on citation networks. • The input is a citation network where nodes are papers, edges are citation

links and optionally bag-of-words features on nodes. The target for each node is a paper category (e.g. stat.ML, cs.LG, ...).

Image/molecule classification

• Graph classification: An image is represented as a graph: based on raw pixels (a regular grid and all images have the same graph) or based on superpixels (irregular graph)

Graph classification

Taken from M. Bronstein. CVPR Tutorial 2017 65

1. cancerous or not cancerous molecules

2. determination of the boiling point

Molecular graph

Summary on GNN

quelle ia pour les données structurées ? graph neural...

Documents

deep parametric continuous convolutional neural...

spectral clustering with graph neural networks for graph...

explain graph neural networks to understand weighted graph

graph neural networks - github pages · “gated graph...

graph neural networks in computer...

learning discrete structures for graph neural …learning...

graph neural networks with heterophily

graph neural networking challenge 2021

hyperbolic graph convolutional neural networkspapers.nips.cc...

chapter 12 graph neural networks: graph transformation

factor graph neural network - arxiv.org

a gentle introduction to graph neural networks gentle... ·...

graph neural networks · 2021. 3. 25. · graph neural...

graph neural networking - itu

benchmarking graph neural networks - arxiv

inference attacks against graph neural networks

graph structural-topic neural network

gcc: graph contrastive coding for graph neural network pre...

gmnn: graph markov neural networks

chapter 15 dynamic graph neural networks