deep learning on graphs - deeploria site · deeploria...
TRANSCRIPT
Deep Learning on Graphswith Graph Convolutional Networks
Thomas Kipf, 22 March 2017
… …
…
Input
Hidden layer Hidden layer
ReLU
Output
ReLU
joint work with Max Welling (University of Amsterdam)
BDL Workshop @ NIPS 2016 Conference paper @ ICLR 2017 (+ new paper on arXiv)
Deep Learning on Graph-Structured Data Thomas Kipf
The success story of deep learning
2
Speech data
Natural language processing (NLP)
…
Deep Learning on Graph-Structured Data Thomas Kipf
The success story of deep learning
2
Speech data
Natural language processing (NLP)
…
Deep neural nets that exploit:
- translation invariance (weight sharing) - hierarchical compositionality
Deep Learning on Graph-Structured Data Thomas Kipf
Recap: Deep learning on Euclidean data
3
Euclidean data: grids, sequences…
2D grid
Deep Learning on Graph-Structured Data Thomas Kipf
Recap: Deep learning on Euclidean data
3
Euclidean data: grids, sequences…
2D grid
1D grid
Deep Learning on Graph-Structured Data Thomas Kipf
Recap: Deep learning on Euclidean data
4
We know how to deal with this:
Convolutional neural networks (CNNs)
(Animation by Vincent Dumoulin) (Source: Wikipedia)
Deep Learning on Graph-Structured Data Thomas Kipf
Recap: Deep learning on Euclidean data
4
We know how to deal with this:
Convolutional neural networks (CNNs)
(Animation by Vincent Dumoulin) (Source: Wikipedia)
Deep Learning on Graph-Structured Data Thomas Kipf
Recap: Deep learning on Euclidean data
4
We know how to deal with this:
Convolutional neural networks (CNNs)
(Animation by Vincent Dumoulin) (Source: Wikipedia)
or recurrent neural networks (RNNs)
(Source: Christopher Olah’s blog)
Deep Learning on Graph-Structured Data Thomas Kipf
Convolutional neural networks (on grids)
5
(Animation by Vincent Dumoulin)
Single CNN layer with 3x3 filter:
Deep Learning on Graph-Structured Data Thomas Kipf
Convolutional neural networks (on grids)
5
(Animation by Vincent Dumoulin)
Single CNN layer with 3x3 filter:
Deep Learning on Graph-Structured Data Thomas Kipf
Convolutional neural networks (on grids)
5
(Animation by Vincent Dumoulin)
Single CNN layer with 3x3 filter:
…
Deep Learning on Graph-Structured Data Thomas Kipf
Convolutional neural networks (on grids)
5
(Animation by Vincent Dumoulin)
Single CNN layer with 3x3 filter:
…
Update for a single pixel: • Transform neighbors individually
• Add everything up
Deep Learning on Graph-Structured Data Thomas Kipf
Convolutional neural networks (on grids)
5
(Animation by Vincent Dumoulin)
Single CNN layer with 3x3 filter:
…
Update for a single pixel: • Transform neighbors individually
• Add everything up
Full update:
Deep Learning on Graph-Structured Data Thomas Kipf
Graph-structured data
6
… …
…
Input
Hidden layer Hidden layer
ReLU
Output
ReLU
What if our data looks like this?
Deep Learning on Graph-Structured Data Thomas Kipf
or this:
Graph-structured data
6
… …
…
Input
Hidden layer Hidden layer
ReLU
Output
ReLU
What if our data looks like this?
Deep Learning on Graph-Structured Data Thomas Kipf
or this:
Graph-structured data
6
… …
…
Input
Hidden layer Hidden layer
ReLU
Output
ReLU
What if our data looks like this?
Real-world examples:• Social networks • World-wide-web • Protein-interaction networks
• Telecommunication networks • Knowledge graphs • …
Deep Learning on Graph-Structured Data Thomas Kipf
Graphs: Definitions
7
… …
…
Input
Hidden layer Hidden layer
ReLU
Output
ReLUGraph:
: Set of nodes ,
: Set of edges
(adjacency matrix):
We can define:
(can also be weighted)
Deep Learning on Graph-Structured Data Thomas Kipf
Graphs: Definitions
7
• Set of trainable parameters • Trainable in time • Applicable even if the input graph changes
Model wish list:
… …
…
Input
Hidden layer Hidden layer
ReLU
Output
ReLUGraph:
: Set of nodes ,
: Set of edges
(adjacency matrix):
We can define:
(can also be weighted)
Deep Learning on Graph-Structured Data Thomas Kipf
A naive approach
8
• Take adjacency matrix and feature matrix
• Concatenate them
• Feed them into deep (fully connected) neural net
• Done?
… …
…
Input
Hidden layer Hidden layer
ReLU
Output
ReLU
?
Deep Learning on Graph-Structured Data Thomas Kipf
A naive approach
8
• Take adjacency matrix and feature matrix
• Concatenate them
• Feed them into deep (fully connected) neural net
• Done?
Problems:
• Huge number of parameters
• Needs to be re-trained if number of nodes changes
• Does not generalize across graphs
… …
…
Input
Hidden layer Hidden layer
ReLU
Output
ReLU
?
Deep Learning on Graph-Structured Data Thomas Kipf
A naive approach
8
• Take adjacency matrix and feature matrix
• Concatenate them
• Feed them into deep (fully connected) neural net
• Done?
Problems:
• Huge number of parameters
• Needs to be re-trained if number of nodes changes
• Does not generalize across graphs
… …
…
Input
Hidden layer Hidden layer
ReLU
Output
ReLU
?
Deep Learning on Graph-Structured Data Thomas Kipf
A naive approach
8
• Take adjacency matrix and feature matrix
• Concatenate them
• Feed them into deep (fully connected) neural net
• Done?
Problems:
• Huge number of parameters
• Needs to be re-trained if number of nodes changes
• Does not generalize across graphs
… …
…
Input
Hidden layer Hidden layer
ReLU
Output
ReLU
?We need weight sharing!
=> CNNs on graphs or “Graph Convolutional Networks” (GCNs)
Deep Learning on Graph-Structured Data Thomas Kipf
CNNs on graphs with spatial filters
9
Consider this undirected graph:
(related idea was first proposed in Scarselli et al. 2009)
Deep Learning on Graph-Structured Data Thomas Kipf
CNNs on graphs with spatial filters
9
Consider this undirected graph:
Calculate update for node in red:
(related idea was first proposed in Scarselli et al. 2009)
Deep Learning on Graph-Structured Data Thomas Kipf
CNNs on graphs with spatial filters
9
Consider this undirected graph:
Calculate update for node in red:
(related idea was first proposed in Scarselli et al. 2009)
Deep Learning on Graph-Structured Data Thomas Kipf
CNNs on graphs with spatial filters
9
Consider this undirected graph:
Calculate update for node in red:
Update rule:
: neighbor indices
: norm. constant (per edge)
(related idea was first proposed in Scarselli et al. 2009)
Deep Learning on Graph-Structured Data Thomas Kipf
CNNs on graphs with spatial filters
9
Consider this undirected graph:
Calculate update for node in red:
How is this related to spectral CNNs on graphs? ➡ Localized 1st-order approximation of spectral filters [Kipf & Welling, ICLR 2017]
Update rule:
: neighbor indices
: norm. constant (per edge)
(related idea was first proposed in Scarselli et al. 2009)
Deep Learning on Graph-Structured Data Thomas Kipf
Vectorized form
10
with
Or treat self-connection in the same way:
with
Deep Learning on Graph-Structured Data Thomas Kipf
Vectorized form
10
with
Or treat self-connection in the same way:
with
is typically sparse➡ We can use sparse matrix multiplications! ➡ Efficient implementation in Theano or TensorFlow
Deep Learning on Graph-Structured Data Thomas Kipf 11
Model architecture (with spatial filters)
… …
…
Input
Hidden layer Hidden layer
ReLU
Output
ReLU
Input: Feature matrix , preprocessed adjacency matrix
[Kipf & Welling, ICLR 2017](or more sophisticated filters / basis functions)
Deep Learning on Graph-Structured Data Thomas Kipf 12
What does it do? An example.
f( ) =
Forward pass through untrained 3-layer GCN model
Parameters initialized randomly
Produces (useful?) random embeddings!
2-dim output per node
[Karate Club Network]
Deep Learning on Graph-Structured Data Thomas Kipf
Connection to Weisfeiler-Lehman algorithm
13
A “classical” approach for node feature assignment
Deep Learning on Graph-Structured Data Thomas Kipf
Connection to Weisfeiler-Lehman algorithm
13
A “classical” approach for node feature assignment
Deep Learning on Graph-Structured Data Thomas Kipf
Connection to Weisfeiler-Lehman algorithm
13
A “classical” approach for node feature assignment
Useful as graph isomorphism check for most graphs(exception: highly regular graphs)
Deep Learning on Graph-Structured Data Thomas Kipf
Connection to Weisfeiler-Lehman algorithm
13
A “classical” approach for node feature assignment
Useful as graph isomorphism check for most graphs(exception: highly regular graphs)
Deep Learning on Graph-Structured Data Thomas Kipf
Connection to Weisfeiler-Lehman algorithm
13
A “classical” approach for node feature assignment
Useful as graph isomorphism check for most graphs(exception: highly regular graphs)
GCN:
Deep Learning on Graph-Structured Data Thomas Kipf
Semi-supervised classification on graphs
14
Setting: Some nodes are labeled (black circle) All other nodes are unlabeled
Task: Predict node label of unlabeled nodes
Deep Learning on Graph-Structured Data Thomas Kipf
Semi-supervised classification on graphs
14
Setting: Some nodes are labeled (black circle) All other nodes are unlabeled
Task: Predict node label of unlabeled nodes
Standard approach: graph-based regularization (“smoothness constraints”)
with
assumes: connected nodes likely to share same label
[Zhu et al., 2003]
Deep Learning on Graph-Structured Data Thomas Kipf
Semi-supervised classification on graphs
15
Embedding-based approaches Two-step pipeline:
1) Get embedding for every node 2) Train classifier on node embedding
Examples: DeepWalk [Perozzi et al., 2014], node2vec [Grover & Leskovec, 2016]
Deep Learning on Graph-Structured Data Thomas Kipf
Semi-supervised classification on graphs
15
Embedding-based approaches Two-step pipeline:
1) Get embedding for every node 2) Train classifier on node embedding
Examples: DeepWalk [Perozzi et al., 2014], node2vec [Grover & Leskovec, 2016]
Problem: Embeddings are not optimized for classification!
Deep Learning on Graph-Structured Data Thomas Kipf
Semi-supervised classification on graphs
15
Embedding-based approaches Two-step pipeline:
1) Get embedding for every node 2) Train classifier on node embedding
Examples: DeepWalk [Perozzi et al., 2014], node2vec [Grover & Leskovec, 2016]
Problem: Embeddings are not optimized for classification!
Idea: Train graph-based classifier end-to-end using GCN
Evaluate loss on labeled nodes only:
set of labeled node indices
label matrix
GCN output (after softmax)
Deep Learning on Graph-Structured Data Thomas Kipf
Toy example (semi-supervised learning)
16
Video also available here: http://tkipf.github.io/graph-convolutional-networks
Deep Learning on Graph-Structured Data Thomas Kipf
Toy example (semi-supervised learning)
16
Video also available here: http://tkipf.github.io/graph-convolutional-networks
Deep Learning on Graph-Structured Data Thomas Kipf
Classification on citation networks
17
(Figure from: Bronstein, Bruna, LeCun, Szlam, Vandergheynst, 2016)
Input: Citation networks (nodes are papers, edges are citation links, optionally bag-of-words features on nodes)
Target: Paper category (e.g. stat.ML, cs.LG, …)
Deep Learning on Graph-Structured Data Thomas Kipf
Experiments and results
18
Dataset statistics
(Kipf & Welling, Semi-Supervised Classification with Graph Convolutional Networks, ICLR 2017)
Model: 2-layer GCN
Deep Learning on Graph-Structured Data Thomas Kipf
Experiments and results
18
Classification results (accuracy)
no input features
Dataset statistics
(Kipf & Welling, Semi-Supervised Classification with Graph Convolutional Networks, ICLR 2017)
Model: 2-layer GCN
Deep Learning on Graph-Structured Data Thomas Kipf
Experiments and results
18
Classification results (accuracy)
no input features
Dataset statistics
(Kipf & Welling, Semi-Supervised Classification with Graph Convolutional Networks, ICLR 2017)
Model: 2-layer GCN
Learned representations (t-SNE embedding of hidden layer activations)
Deep Learning on Graph-Structured Data Thomas Kipf
Other recent work
19
Spectral domain Spatial domain
SyncSpecCNN (Spectral Transformer Nets)
[Yi et al., 2016]
Chebyshev CNN (approximate spectral filters in spatial domain) [Defferrard et al., 2016]
Mixture model CNN (learned basis functions)
[Monti et al., 2016]
Global filters (hard to generalize) Local filters (limited field of view)
Deep Learning on Graph-Structured Data Thomas Kipf
Conclusions and open questions
20
• Deep learning paradigm can be extended to graphs • State-of-the-art results in a number of domains • Use end-to-end training instead of multi-stage approaches
(graph kernels, DeepWalk, node2vec etc.)
Results from past 1-2 years have shown:
Open problems:
• Learn basis functions from scratch - First step: [Monti et al., 2016] • Robustness of filters to change of graph structure • Learn representations of graphs - First step: [Duvenaud et al., 2015]
• Theoretical understanding of what is learned • Common benchmark datasets • …
Deep Learning on Graph-Structured Data Thomas Kipf
Further reading
21
Blog post Graph Convolutional Networks: http://tkipf.github.io/graph-convolutional-networks
Code on Github: http://github.com/tkipf/gcn
• E-Mail: [email protected]
• Twitter: @thomaskipf
• Web: http://tkipf.github.io
Kipf & Welling, Semi-Supervised Classification with Graph Convolutional Networks, ICLR 2017: https://arxiv.org/abs/1609.02907
Kipf & Welling, Variational Graph Auto-Encoders, NIPS BDL Workshop, 2016: https://arxiv.org/abs/1611.07308
You can get in touch with me via:
Project funded by SAP