deep learning: representation learning · lsa learns word embeddings via co-occurences across...
TRANSCRIPT
![Page 1: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/1.jpg)
Deep Learning: Representation LearningMachine Learning in der Medizin
Asan Agibetov, [email protected]
Medical University of ViennaCenter for Medical Statistics, Informatics and Intelligent Systems
Section for Artificial Intelligence and Decision SupportWähringer Strasse 25A, 1090 Vienna, OG1.06
December 05, 2019
![Page 2: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/2.jpg)
Supervized vs. Unsupervized
▶ Supervised: Given a dataset D = {(x, y)} of inputs x withtargets y, learn to predict y from x (MLE)
▶ L(D) =∑
(x,y)∈D − log p(y|x)▶ Clear task: learn to predict y from x
▶ Unsupervised: Given a dataset D = {x} of inputs x learn topredict… what?
▶ L(D) ?▶ We want a single task that will allow the network generalize to
many other tasks (which ones?)
![Page 3: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/3.jpg)
Can we learn data?
▶ One way: density modeling▶ L(D) =
∑x∈D − log p(x)
▶ Goal: learn true distribution (everything about data)▶ Issues:
1. curse of dimensionality (complex interactions betweenvariables)
2. focus on low-level details (pixel correlations, word N-Grams),rather than on “high-level” structure (image contents,semantics)
3. even if we learn underlying structure, how access and exploitthat knowledge for future tasks (representation learning)
![Page 4: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/4.jpg)
Can we memorize randomized pixels?▶ ImageNet training set ∼ 1.28M images, each assigned 1 of
1000 labels▶ If labels are equally probable, shuffled labels
∼ log2(1000) ∗ 1.28M =∼ 12.8 Mbits▶ for a good generalization a CNN has to learn ∼ 12.8 Mbits of
information with ∼ 30M weights▶ 2 weights to learn 1 bit of information
▶ Each image is 128x128 pixels, shuffled images contain ∼ 500Gbits
▶ 50 × 1010 vs. 12.8 × 106 - 4 orders of magnitude more info tolearn
▶ A modern CNN (∼ 30M weights) can memorise randomisedImageNet labels
▶ Could it memorize randomized images?
Zhang et al., ”Understanding Deep Learning Requires RethinkingGeneralization”, 2016
![Page 5: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/5.jpg)
ConvNets have very strong biases▶ Even a randomly initialized AlexNet produces features that
once fed through an MLP give 12% accuracy on ImageNet▶ Random classifier would give 0.1% on ImageNet (1 in 1000
categories)▶ Good performance of convnets is intimately tied to their
convolutional structure▶ strong prior on the input signal
![Page 6: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/6.jpg)
Drawbacks of supervised learning
▶ Success of convnets largely depended on the availability oflarge supervised datasets
▶ This means that further success depends on the availability ofeven larger amounts of supervised datasets
▶ Obviously, supervised datasets are expensive to obtain▶ But, also are more prone to the introduction of biases
”Seeing through the human reporting bias...”, Misra et al., CVPR 2016
![Page 7: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/7.jpg)
What’s wrong with learning tasks?▶ We want rapid generalization to new tasks and situations
▶ just as humans who learn skills and apply them to tasks, ratherthan the other way around
▶ “Stop learning tasks, start learning skills” - Satinder Singh @Deepmind
▶ Is there enough information in the targets to learn transferableskills?
▶ targets for supervised learning contain far less informationthan the input data
▶ unsupervised learning gives us an essentially unlimited supplyof information about the world
![Page 8: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/8.jpg)
Self-supervized learning▶ Define a (simple) prediction task which requires some
semantic understanding▶ conditional prediction (less uncertainty, less high-dimensional)
▶ Example▶ input: two image patches from the same image▶ task: predict their spatial relationship
![Page 9: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/9.jpg)
Self-supervized on images
![Page 10: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/10.jpg)
Learning representations (embeddings)▶ Learn a feature space, where points are mapped close to each
other▶ if they share semantic information in input space
▶ PCA and K-means are still very good baselines▶ But, can we learn better “representations”?
![Page 11: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/11.jpg)
Embeddings: representing symbolic data
▶ Goal: represent symbolic data in a continuous space▶ easily measure relatedness (distance measures on vectors)▶ Linear Algebra and DL to perform complex reasonng
▶ Challenges▶ discrete nature, easy to count, but not obvious how to
represent▶ cannot use backprop (no gradients) on discrete units▶ embedded continuous space can be potentially very large
(high-dimensional vector spaces)▶ data not associated to a regular grid structure like image (e.g.,
text)
![Page 12: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/12.jpg)
Learning word representations
▶ learning word representations from raw text (without anysupervision)
▶ applications:▶ text classification▶ ranking (e.g., google search)▶ machine translation▶ chatbot
![Page 13: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/13.jpg)
Latent Semantic Analysis▶ Problem: Find similar documents in a corpus▶ Solution:
▶ construct term/document matrix - (normalized) occurrencecounts
(Image credit) Course notes ”An introduction to Deep Learning”,Marc’Aurelio Ranzato
”Indexing by Latent Semantic Analysis”, Deerwester et al., JASIS, 1990
![Page 14: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/14.jpg)
Latent Semantic Analysis (contd.)
▶ Problem: Find similar documents in a corpus▶ Solution:
▶ construct term/document matrix - (normalized) occurrencecounts
▶ perform SVD (singular value decomposition)▶ xi,j # times word i appears in document j
(Image credit) Course notes ”An introduction to Deep Learning”,Marc’Aurelio Ranzato
![Page 15: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/15.jpg)
Latent Semantic Analysis (contd.)
▶ Each column of VT - representation of a document in thecorpus
▶ Each column is D-dimensional vector▶ we can use it to compare and retrieve documents▶ xi,j # times word i appears in document j
(Image credit) Course notes ”An introduction to Deep Learning”,Marc’Aurelio Ranzato
![Page 16: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/16.jpg)
Latent Semantic Analysis (contd.)
▶ Each row of U - vectorial representation of a word in thedictionary
▶ aka embedding▶ xi,j # times word i appears in document j
![Page 17: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/17.jpg)
Word embeddings
▶ Convert words (symbols) into a D dimensional vector▶ D becomes a hyperparameter
▶ In the embedded space▶ compare words (compare vectors)▶ apply machine learning (DL) to represent sequence of words▶ use weighted sum of embeddings (linear combination of
vectors) for document retrieval
![Page 18: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/18.jpg)
Bi-gram
▶ Bi-gram models probability of a word given the preceding one
p(wk|wk−1), wk ∈ V
▶ Bi-gram model builds a matrix of counts
c(wk|wk−1) =
c1,1 . . . c1,|V|. . . ci,j . . .
c|V|,1 . . . c|V|,|V|
▶ ci,j number of times word i is preceded by word j
![Page 19: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/19.jpg)
Factorized bi-gram
▶ factorize (via SVD) the bigram matrix▶ reduce # of parameters to learn▶ become more robus to noise (entries with low counts)
c(wk|wk−1) =
c1,1 . . . c1,|V|. . . ci,j . . .
c|V|,1 . . . c|V|,|V|
= UV
▶ rows of U ∈ R|V|×D - “output” word embeddings▶ columns of V ∈ RD×|V| - “input” word embeddings
![Page 20: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/20.jpg)
Factorize bi-gram with NN
▶ we could express UV factorization with a two layer (linear)neural network
(Image credit) Course notes ”An introduction to Deep Learning”,Marc’Aurelio Ranzato
![Page 21: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/21.jpg)
Recap
▶ LSA learns word embeddings via co-occurences acrossdocuments
▶ bi-gram learns word embeddings that only take into accountnext word
▶ if you combine the two you get word2vec▶ use more context around the word (n ≥ 2)▶ but look for context only around the word of interest
![Page 22: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/22.jpg)
Word2Vec: Continuous bag of words
(Image credit) Course notes ”An introduction to Deep Learning”,Marc’Aurelio Ranzato
![Page 23: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/23.jpg)
Word2Vec: Skip-gram
▶ similar to bi-gram, but predict Npreceding and N following words
▶ words with same context will havesimilar embedding (e.g., cat &kitty)
▶ input projection - look-up table▶ bulk of computation - prediction of
words in context▶ learning by cross-entropy
minimization via SGD
(Image credit) Course notes ”An introduction to Deep Learning”,Marc’Aurelio Ranzato
![Page 24: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/24.jpg)
Continous representations for discrete data
Mikolov et al., ”word2vec” NIPS 2014
![Page 25: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/25.jpg)
Word2Vec: Linguistic regularities in Word Vector Space▶ word vector space implictly encodes linguistic regularities
▶ distributed word representations implicitly contain syntacticand semantic information
▶ KING is similar to QUEEN as MAN is similar to WOMAN▶ KINGS is similar to KINGS as MAN is similar to MEN
(Image credit) ”Efficient estimation of word representations”, Mikolov et al.NIPS 2013
![Page 26: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/26.jpg)
Word2Vec: Vector operations in Word Vector Space
▶ vector operations (addition/subtraction) give intuitive results
(Image credit) ”Efficient estimation of word representations”, Mikolov et al.NIPS 2013
![Page 27: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/27.jpg)
Word2Vec: Linguistic regularities in Word Vector Space(contd.)
(Image credit) ”Efficient estimation of word representations”, Mikolov et al.NIPS 2013
![Page 28: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/28.jpg)
Recap
▶ Embeddings of words (from 1-hot to distributedrepresentation):
▶ understand similarity between words▶ plug them within any parametric ML model
▶ Several ways to learn word embeddings▶ word2vec among the most popular ones
▶ word2vec leverages large amounts of unlabeled data
![Page 29: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/29.jpg)
AI for network biology
Deep Learning for Network Biology. ISMB Tutorial 2018
![Page 30: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/30.jpg)
AI for network biology: link prediction
Deep Learning for Network Biology. ISMB Tutorial 2018
![Page 31: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/31.jpg)
Representing biological knowledge
Agibetov et al., ”Shared hypothesis testing”, J. Biomed. Sem., 2018
![Page 32: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/32.jpg)
Biological link prediction
Agibetov et al., ”Shared hypothesis testing”, J. Biomed. Sem., 2018
![Page 33: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/33.jpg)
Link prediction as distance based inference in embeddingspace
![Page 34: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/34.jpg)
Learning graph embeddings
▶ Learn link estimate Q(u, v) 7→ [0, 1] (u, v node pairs) andapproximate graph structure (connectivity) with MLE(maximum likelihood estimation)
▶ Pr(G) =∏
(u,v)∈EtrainQ(u, v)
∏(u,v)̸∈Etrain
1 − Q(u, v)
▶ If Q perfect estimatorthen Pr(x) = 1 iff x = G(i.e., graph can be fullyreconstructed)
▶ Q can be trained toestimate links at differentorders, i.e., approximateAn.
Haija, et al., ”Graph likelihood”, CIKM17, NeurIPS 2018
![Page 35: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/35.jpg)
Animation
▶ Animations are always good
![Page 36: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/36.jpg)
Tensor decomposition for multi-relational knowledge graphcompletion
▶ Learn functions Aresults in(fi, fj) (1 for each relation type k)▶ minimize reconstruction error 1
2 (∑
k∥Ak − ERkET∥)2F
▶ E - entity embeddings
Nickel et al, ”RESCAL”, ICML 2011
![Page 37: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/37.jpg)
Polypharmacy and side effects link prediction
Zitnik et al. Oxford. 2018
![Page 38: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/38.jpg)
Representation learning in Hyperbolic space
Nickel and Kiela. NIPS 2017
![Page 39: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/39.jpg)
Hierarchical relationship from hyperbolic embeddings
Nickel and Kiela. ICML 2018
![Page 40: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/40.jpg)
Clusters of proteins and age groups from hyperboliccoordinates
Lobato et al. Bioinformatics 2018
![Page 41: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/41.jpg)
Why non-Euclidean space - (low-dim) manifolds
▶ Computing on a lowerdimensional space leadsto manipulating fewerdegrees of freedom
▶ Non-linear degrees offreedom often make moreintuitive sense
▶ cities on the earth arebetter localized givingtheir longitude andlatitude (2dimensions)
▶ instead of giving theirposition x, y, z in theEuclidean 3D space
![Page 42: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/42.jpg)
What’s so special about Riemannian geometry - curvature
![Page 43: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/43.jpg)
Model of hyperbolic geometry
![Page 44: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/44.jpg)
Properties of hyperbolic geometry
![Page 45: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/45.jpg)
Computing lengths in hyperbolic geometry
![Page 46: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/46.jpg)
Approximation of graph distance
![Page 47: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/47.jpg)
Hyperbolic embeddings
Same as in Euclidean case we try to learn a link estimatorQ(u, v) 7→ [0, 1] (u, v node pairs) with MLE
▶ Pr(G) =∏
(u,v)∈Etrain Q(u, v)∏
(u,v)̸∈Etrain 1 − Q(u, v)▶ If Q perfect estimator then Pr(x) = 1 iff x = G (i.e., graph
can be fully reconstructed)
Embeddings are parameters Θ of link estimator Q; trained withcross-entropy loss L and negative sampling
▶ L(Θ) =∑
(u,v) log e−d(u,v)∑v′∈neg(u) e−d(u,v′)
▶ But we perform all computations in hyperbolic space
![Page 48: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/48.jpg)
Riemannian optimization
![Page 49: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/49.jpg)
Backpropagation to learn embeddings
Nickel and Kiela. NIPS 2017
![Page 50: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/50.jpg)
Link prediction for multi-relational biological knowledgegraphs
![Page 51: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/51.jpg)
Hyperbolic Large-Margin classifier (SVM)
Agibetov, Samwald. SemDeep-4@ISWC 2018Cho et al. arxiv 2018
![Page 52: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/52.jpg)
Performance evaluation
Agibetov, Dorffner, Samwald. SemDeep-5@IJCAI 2019
![Page 53: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/53.jpg)
Cancer subtype identification
Wang et al. ”Similarity network fusion for aggregating data types on agenomic scale.” Nature Methods. 2014
![Page 54: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/54.jpg)
Graph Convolutional Neural Networks (GCN)
Bronstein. ”GCN Tutorial”. 2019
![Page 55: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/55.jpg)
GCN applications
Zitnik et al. Bioinformatics. 2018
![Page 56: Deep Learning: Representation Learning · LSA learns word embeddings via co-occurences across documents bi-gram learns word embeddings that only take into account next word if you](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09e96a7e708231d42918e7/html5/thumbnails/56.jpg)
Predicting metastatic events in breast cancer
Chereda H. et al. ”Utilizing Molecular Network Information via GraphConvolutional Neural Networks to Predict Metastatic Event in Breast Cancer.”Stud. Health. Technol. Inform. 2019