geometric representations of graphs and words …people.dsv.su.se/~henke/dsws/devdatt.pdf ·...
TRANSCRIPT
Geometric Representations of Graphs and Words andApplications
Devdatt Dubhashi
Computer Science and Engg., Chalmers and GU
Machine Learning, Algorithms, Computational Biology Group
www.cs.chalmers.se/research/lab
Data Science, Stockholm
Dec 4�5, 2014
DD () Geometric Representations of Graphs and Words and ApplicationsData Science, Stockholm Dec 4�5, 2014 1
/ 32
�Era of Big Data is Here�
Large-scale, time-varying, heterogeneous, inter-related, etc.
DD () Geometric Representations of Graphs and Words and ApplicationsData Science, Stockholm Dec 4�5, 2014 2
/ 32
Geometric Embeddings: Unifying Framework for Learning
Embed graph or word or other data in high (moderate) dimensional
Euclidean space in a way that preserves structure.
Can exploit geometry to design fast algorithms.
Can exploit arsenal of machine learning techniques that work on vector
structured data.
DD () Geometric Representations of Graphs and Words and ApplicationsData Science, Stockholm Dec 4�5, 2014 3
/ 32
Lovasz ϑ and orthogonal labellings
u1
u2
u3 u4
u5
1
u1
u2
u3 u4
u5
2
u1
u2
u3 u4
u5
3
u1
u2
u3 u4
u5
4
u1
u2
u3 u4
u5
5
u1
u2
u3 u4
u5
Orthogonal Labelling
U = [u1, . . . ,un] is an orthogonal labelling of G if
u>i uj = 0 whenever (i, j) /∈ E
DD () Geometric Representations of Graphs and Words and ApplicationsData Science, Stockholm Dec 4�5, 2014 4
/ 32
Lovasz ϑ and orthogonal labellings
~u1~u2
~u3~u4
~u5
~c
1√ϑ
ϑ(G) = minU
min‖c‖=1
maxi
1
(c>ui)2
cos−1 1√ϑ: Maximum angle between �handle� c and any of ui's minimum
among all valid orthogonal representations and handles.
DD () Geometric Representations of Graphs and Words and ApplicationsData Science, Stockholm Dec 4�5, 2014 5
/ 32
Applications of ϑ(G)
Finding large independent sets/cliques
Graph coloring
Finding planted cliques in random graphs
Finding maxcut
· · ·Problem
Can't compute ϑ for graphs having > 100's of nodes!
SDP: �E�cient� in theory but disastrous in practice.
DD () Geometric Representations of Graphs and Words and ApplicationsData Science, Stockholm Dec 4�5, 2014 6
/ 32
Approximating ϑ(G) by ω(K)
Vinay Jethava, Ph.D (2014) Given K = Aρ + I, �nd best handle
c ⇔ One-class SVM
~u1~u2
~u3~u4 ~u5
~c
1√ω
maxxi≥0
2n∑i=1
xi −∑i
∑j
xixjKij︸ ︷︷ ︸ω(K)
How close is one-class SVM solution ω(K) to ϑ(G) ?
DD () Geometric Representations of Graphs and Words and ApplicationsData Science, Stockholm Dec 4�5, 2014 7
/ 32
Approximating ϑ(G) by ω(K)
Vinay Jethava, Ph.D (2014) Given K = Aρ + I, �nd best handle
c ⇔ One-class SVM
~c
1√ω 1√
ϑ
How close is one-class SVM solution ω(K) to ϑ(G) ?DD () Geometric Representations of Graphs and Words and Applications
Data Science, Stockholm Dec 4�5, 2014 7/ 32
Using the SVM-ϑ Theory
Vinay Jethava, Ph.D. 2014
Solving classic combinatorial optimization problems on graphs:
maxcut, max-k-cut, graph colouring replacing SDP by SVM.
Finding planted cliques.
Integrative analysis of networks: �nding a common dense subgraphs in
multiple graphs using multiple kernel learning.
DD () Geometric Representations of Graphs and Words and ApplicationsData Science, Stockholm Dec 4�5, 2014 8
/ 32
Google's word2vec: Words and Contexts
A context cwi of a word wi occurence in a text w1, · · · .wi, · · ·wN is a
small window of words wi−t, · · · , wi−1, wi+1, · · ·wi+t occuring around
wi.
word2vec assigns vectors uw,uc to words and contexts using a simple
logistic regression model based on the dot product uTwuc.
Completely unsupervised method that scales to very large corpus e.g.
Wikipedia.
Words that occur in the same kinds of contexts will get assigned
similar vectors.
Surprising ability to capture semantic information, (not very well
understood).
DD () Geometric Representations of Graphs and Words and ApplicationsData Science, Stockholm Dec 4�5, 2014 9
/ 32
word2vec
DD () Geometric Representations of Graphs and Words and ApplicationsData Science, Stockholm Dec 4�5, 2014 10
/ 32
Extending word2vec
Multiple vectors corresponding to di�erent senses of a word.
graph2vec uses same idea for embeddings of graph structured data.
DD () Geometric Representations of Graphs and Words and ApplicationsData Science, Stockholm Dec 4�5, 2014 11
/ 32
Applications
Language Technology
(VR Framework Project �Culturomics�, 2013�17 17 M SEK)
Business Intelligence
(SSF Data Intensive Systems, 2012-16, 25 M SEK)
Computational Biology
(VR project 2011-15)
DD () Geometric Representations of Graphs and Words and ApplicationsData Science, Stockholm Dec 4�5, 2014 12
/ 32
Word Sense Disambiguation (WSD)
Fundamental problem in language technology:
I went �shing for some sea bass.
The bass line of the song is too weak.
DD () Geometric Representations of Graphs and Words and ApplicationsData Science, Stockholm Dec 4�5, 2014 13
/ 32
Using word vectors for WSD
(Fredrik Johansson, Mikael Kageback, Richard Johansson, 2014 in
progress)
Train multiple vectors corresponding to di�erent senses based on the
di�erent contexts (also represented by vectors)
Cluster the contexts in a non�parametric data�driven fashion.
use clusters to assign senses, possibly using seantic networks such as
WordNet or BabelNet.
DD () Geometric Representations of Graphs and Words and ApplicationsData Science, Stockholm Dec 4�5, 2014 14
/ 32
Multi�document Summarization
Fundamental problem in information extraction:
DD () Geometric Representations of Graphs and Words and ApplicationsData Science, Stockholm Dec 4�5, 2014 15
/ 32
word2vec for Multi�document Summarization
(Olof Mögren, Mikael Kageback and Vinay Jethava 2014)
Measure similarity between sentences using similarity of vectors for
their constituent words.
Use this similarity in sub�modular optimization of coherence and
diversity of summary.
Can combine with other similarities using multiple kernel learning
(MKL).
Tool available for use at Findwise Labs.
DD () Geometric Representations of Graphs and Words and ApplicationsData Science, Stockholm Dec 4�5, 2014 16
/ 32
Entity Disambiguation (ED)
DD () Geometric Representations of Graphs and Words and ApplicationsData Science, Stockholm Dec 4�5, 2014 17
/ 32
Entity Disambiguation (ED)
DD () Geometric Representations of Graphs and Words and ApplicationsData Science, Stockholm Dec 4�5, 2014 18
/ 32
Entity Disambiguation (ED)
DD () Geometric Representations of Graphs and Words and ApplicationsData Science, Stockholm Dec 4�5, 2014 19
/ 32
Entity Disambiguation (ED)
“Chris Anderson” “Chris Anderson”
TED WIRED
DD () Geometric Representations of Graphs and Words and ApplicationsData Science, Stockholm Dec 4�5, 2014 20
/ 32
Entity Disambiguation (ED)
G1
G2v1v2
K(G1,G2)
G
DD () Geometric Representations of Graphs and Words and ApplicationsData Science, Stockholm Dec 4�5, 2014 21
/ 32
Comparing graphs
Graph kernels compare how similar two graphs are
K : G × G → R
De�ned using features extracted from subgraphs
DD () Geometric Representations of Graphs and Words and ApplicationsData Science, Stockholm Dec 4�5, 2014 22
/ 32
Graph kernels
Existing graph kernels
Random Walks (Gärtner et. al., '03)
Shortest paths (Borgwardt & Kriegel, '05)
Graphlets (Shervashidze, Mehlhorn et. al., '09)
· · ·
�local� subgraphs cannot capture global properties
girth � length of the shortest cycle
chromatic number χ(G)
maxcut
DD () Geometric Representations of Graphs and Words and ApplicationsData Science, Stockholm Dec 4�5, 2014 23
/ 32
Kernels based on ϑ embedding
~u1~u2
~u3~u4 ~u5
~c
1√ϑ
~u1
~u2
~u3~u4
~u5~c
1
~u1
~u2
~u3~u4
~u5~c2
~u1
~u2
~u3~u4
~u5~c
3
~u1
~u2
~u3~u4
~u5~c
4
~u1
~u2
~u3~u4
~u5~c
5
~u1
~u2
~u3~u4
~u5~c
~u1
~u2
~u3~u4
~u5~d
1
~u1
~u2
~u3~u4
~u5~d2
~u1
~u2
~u3~u4
~u5~d
3
~u1
~u2
~u3~u4
~u5~d
4
~u1
~u2
~u3~u4
~u5~d
5
~u1
~u2
~u3~u4
~u5~d
(a) ϑ(G) (b) Subgraph UG|B (c) Lovasz value ϑB
Fredrik Johansson, Lic. 2014
K(G(1), G(2)) =∑
(B1,B2),|B1|=|B2|
k(ϑB1 , ϑB2)
where Bi is a subgraph of G(i).
DD () Geometric Representations of Graphs and Words and ApplicationsData Science, Stockholm Dec 4�5, 2014 24
/ 32
Computing E�ciently
Problems
The Lovasz�ϑ kernel involves a sum over all subgraphs � too
expensive!
Computing ϑ(G) has complexity O(n5), where n is the number of
nodes. The optimization is a semide�nite program.
Solutions, Johansson et al 2014
Random sampling!O(n log n/ε2) samples is enough.
Use the SVM�ϑ approximation.
DD () Geometric Representations of Graphs and Words and ApplicationsData Science, Stockholm Dec 4�5, 2014 25
/ 32
Classifying Graphs with Planted Structures
(a) Random graph (b) Graph with planted clique
Theorem (F. Johansson, ICML 2014)
There is a linear separator separating with high probability, G(n, p) and
G(n, p, k) graphs, for large enough k = 2t√
n(1−p)p , with margin,
γ ≥ O(t)− o(√n)
DD () Geometric Representations of Graphs and Words and ApplicationsData Science, Stockholm Dec 4�5, 2014 26
/ 32
Results on graph kernel benchmarks
Table: Average classi�cation accuracy (%) on benchmark datasets. Numbers inbold indicate the best results in each column.
Kernels PTC MUTAG ENZYME NCIA
SP 63.0 87.2 30.5 67.3
GL 63.1 83.5 26.7 62.9
RW 60.6 85.6 21.2 63.1
Lo-ϑ 64.3 86.2 26.5 65.2
SVM-ϑ 63.8 87.8 33.5 62.7
DD () Geometric Representations of Graphs and Words and ApplicationsData Science, Stockholm Dec 4�5, 2014 27
/ 32
Entity Disambiguation (ED)
G1
G2v1v2
K(G1,G2)
G
CIKM 2013 paper and tool implemented in Recorded Future pipeline.
DD () Geometric Representations of Graphs and Words and ApplicationsData Science, Stockholm Dec 4�5, 2014 28
/ 32
Integrative Analysis/Data Fusion
How to combine multiple sources of evidence?
How to combine di�erent data type measurements e.g. RNa-seq,
microarray, copy number variation ...
How to �nd what is common and what is di�erent across multiple
networks.
DD () Geometric Representations of Graphs and Words and ApplicationsData Science, Stockholm Dec 4�5, 2014 29
/ 32
Common dense subgraphs in multiple graphs
⇒Integrative analysis of microarray gene expression datasets
Detecting motifs
Identifying functional groups
(Vinay Jethava et al, NIPS 2013, JMLR 2014)
DD () Geometric Representations of Graphs and Words and ApplicationsData Science, Stockholm Dec 4�5, 2014 30
/ 32
Multiple kernel learning
Kernel � similarity matrix between objects of a certain type.
MKL � Data fusion of multiple sources of information
DD () Geometric Representations of Graphs and Words and ApplicationsData Science, Stockholm Dec 4�5, 2014 31
/ 32
Summary
Geometric representations
are a unifying framework for learning problems from diverse contexts.
www.cs.chalmers.se/research/lab
DD () Geometric Representations of Graphs and Words and ApplicationsData Science, Stockholm Dec 4�5, 2014 32
/ 32