fabio massimo zanzotto art group dipartimento di ingegneria dell’impresa
DESCRIPTION
Distributed Tree Kernels and Distributional Semantics : Between Syntactic Structures and Compositional Distributional Semantics. Fabio Massimo Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa University of Rome ”Tor Vergata”. Prequel. P 1 : T 1 H 1. T 1. - PowerPoint PPT PresentationTRANSCRIPT
Distributed Tree Kernels and Distributional Semantics:Between Syntactic Structures and
Compositional Distributional Semantics
Fabio Massimo ZanzottoART Group
Dipartimento di Ingegneria dell’ImpresaUniversity of Rome ”Tor Vergata”
© F.M.Zanzotto
Prequel
© F.M.Zanzotto
Recognizing Textual Entailment (RTE)
The task (Dagan et al., 2005) Given a text T and an hypothesis H, decide if T implies H
T1
H1
“Farmers feed cows animal extracts”
“Cows eat animal extracts”
P1: T1 H1
RTE as a classification task• Selecting the best learning algorithm• Defining the feature space
© F.M.Zanzotto
Recognizing Textual Entailment (RTE)
T1
H1
“Farmers feed cows animal extracts”
“Cows eat animal extracts”
P1: T1 H1
© F.M.Zanzotto
Learning RTE Classifiers: the feature space
T1
H1
“Farmers feed cows animal extracts”
“Cows eat animal extracts”
P1: T1 H1
T2
H2
“They feed dolphins fish”
“Fish eat dolphins”
P2: T2 H2
T3
H3
“Mothers feed babies milk”
“Babies eat milk”
P3: T3 H3
Training examples
Classification
Relevant FeaturesRules with Variables
(First-order rules)
feed eatX Y X Y feed eatX Y Y X
feed eatX Y X Y
© F.M.Zanzotto
AveragePrecisio
n Accuracy First Author (Group)80.8% 75.4% Hickl (LCC)71.3% 73.8% Tatu (LCC)
64.4% 63.9%Zanzotto (Milan &
Rome)62.8% 62.6% Adams (Dallas)66.9% 61.6% Bos (Rome & Leeds)
Learning RTE Classifiers: the feature space
S
NP VP
VB NP
X
Y
eat
VP
VB NP X
feed
NP Y
Rules with Variables(First-order rules)
feed eatX Y X Y
Zanzotto&Moschitti, Automatic learning of textual entailments with cross-pair similarities, Coling-ACL, 2006
RTE 2 Results
© F.M.Zanzotto
Adding semanticsDistributional Semantics
Mehdad, Moschitti, Zanzotto, Syntactic/Semantic Structures for Textual Entailment Recognition, Proceedings of NAACL, 2010
NP VP
VB NP X
S
NP VP
VB
S
X
killed died
NP VP
VB NP X
NP VP
VB
X
murdered died
S S
Promis
ing!!!
Distributional Sem
antics
© F.M.Zanzotto
Compositional Distributional Semantics (CDS)
Mitchell&Lapata (2008) set a general model for bigrams
that assigns a distributional meaning to a sequence of two words “x y”:– R is the relation between x and y– K is an external knowledge
),,,( KRyxfz
z
An active research area!
© F.M.Zanzotto
1z
1z
1y
2y
x
Compositional Distributional Semantics (CDS)
hands
car
moving
moving hands
moving car
A “distributional” semantic space Composing “distributional” meaning
© F.M.Zanzotto
Compositional Distributional Semantics (CDS)
Mitchell&Lapata (2008) set a general model for bigrams
that assigns a distributional meaning to a sequence of two words “x y”:– R is the relation between x and y– K is an external knowledge
),,,( KRyxfz
z
handsmovingyx
moving handsz
zyx
f
© F.M.Zanzotto
Matrices AR and BR can be estimated with:- positive examples taken from dictionaries
- multivariate regression models
CDS: Full Additive Model
The full additive model
yBxAz RR
Zanzotto, Korkontzelos, Fallucchi, Manandhar, Estimating Linear Models for Compositional Distributional Semantics, Proceedings of the 23rd COLING, 2010
contact /ˈkɒntækt/ [kon-takt] 2. close interaction
© F.M.Zanzotto
CDS: Recursive Full Additive Model
eat
cows extracts
animal
VN VN
NN
f(
=f(
=
=
Let’s scale up to sentences by recursively applying the model!
Let’s apply it to RTE
Extremely poor results
© F.M.Zanzotto
Recursive Full Additive Model: a closer look
«chickens eat beef extracts»
«cows eat animal extracts»
¿2 𝐴𝑉𝑁𝑒𝑎𝑡+𝐵𝑉𝑁𝑐𝑜𝑤𝑠+𝐵𝑉𝑁 𝐴𝑁 𝑁
𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑠+𝐵𝑉𝑁 𝐵𝑁 𝑁𝑎𝑛𝑖𝑚𝑎𝑙
¿2 𝐴𝑉𝑁𝑒𝑎𝑡+𝐵𝑉𝑁𝑐 h𝑖𝑐𝑘𝑒𝑛𝑠+𝐵𝑉𝑁 𝐴𝑁 𝑁
𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑠+𝐵𝑉𝑁𝐵𝑁 𝑁𝑏𝑒𝑒𝑓
𝑣 ∙𝑢 ∙ ∙ ∙ ∙…
𝑣
𝑢
f
f
… evaluating the similarity
Ferrone&Zanzotto, Linear Compositional Distributional Semantics and Structural Kernels, Proceedings of Joint Symposium of Semantic Processing, 2013
© F.M.ZanzottoFerrone&Zanzotto, Linear Compositional Distributional Semantics and Structural Kernels, Proceedings of Joint Symposium of Semantic Processing, 2013
Recursive Full Additive Model: a closer look
¿2 𝐴𝑉𝑁𝑒𝑎𝑡+𝐵𝑉𝑁𝑐𝑜𝑤𝑠+𝐵𝑉𝑁 𝐴𝑁 𝑁
𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑠+𝐵𝑉𝑁 𝐵𝑁 𝑁𝑎𝑛𝑖𝑚𝑎𝑙𝑣
¿2 𝐴𝑉𝑁𝑒𝑎𝑡+𝐵𝑉𝑁𝑐 h𝑖𝑐𝑘𝑒𝑛𝑠+𝐵𝑉𝑁 𝐴𝑁 𝑁
𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑠+𝐵𝑉𝑁𝐵𝑁 𝑁𝑏𝑒𝑒𝑓𝑢
structuremeaning
structuremeaning
<1?
structuremeaning
𝑣 ∙𝑢=∑𝑖𝑉 𝑖 𝑣 𝑖 ∙∑
𝑗𝑈 𝑗 𝑢 𝑗=∑
𝑖 , 𝑗𝑉 𝑖 𝑣𝑖 ∙𝑈 𝑗 𝑢 𝑗
© F.M.Zanzotto
The prequel …
𝐵𝑉𝑁 𝐵𝑁𝑁𝑏𝑒𝑒𝑓
structuremeaning
Recognizing Textual Entailment
Feature Spaces of the Rules with Variables
adding distributional semantics
Distributional Semantics
Binary CDS
Recursive CDS
𝑣 ∙𝑢=∑𝑖𝑉 𝑖 𝑣 𝑖 ∙∑
𝑗𝑈 𝑗 𝑢 𝑗=∑
𝑖 , 𝑗𝑉 𝑖 𝑣𝑖 ∙𝑈 𝑗 𝑢 𝑗
© F.M.Zanzotto
Distributed Tree Kernels
𝑣 ∙𝑢=∑𝑖𝑉 𝑖 𝑣 𝑖 ∙∑
𝑗𝑈 𝑗 𝑢 𝑗=∑
𝑖 , 𝑗𝑉 𝑖 𝑣𝑖 ∙𝑈 𝑗 𝑢 𝑗
© F.M.Zanzotto
Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012
Tree Kernels
VP
VB NP NP
S
NP
NNS
VP
VB NP
feed
NP
NNS
cows
NN NNS
animal extracts
S
NP
NNS
Farmers
VP
VB NP NP
S
NP
NNS
Farmers… … …
0
000
010
0
0
010
010
0
0
010
000
0
… … …
T ti tj
𝑇 1∙𝑇2=∑𝑖𝛼 𝑖𝜏𝑖∑
𝑗𝛽 𝑗𝜏 𝑗
𝑣 ∙𝑢=∑𝑖𝑉 𝑖 𝑣 𝑖 ∙∑
𝑗𝑈 𝑗 𝑢 𝑗=∑
𝑖 , 𝑗𝑉 𝑖 𝑣𝑖 ∙𝑈 𝑗 𝑢 𝑗
© F.M.Zanzotto
Tree Kernels in Smaller Vectors
VP
VB NP NP
S
NP
NNS
VP
VB NP
feed
NP
NNS
cows
NN NNS
animal extracts
S
NP
NNS
Farmers
VP
VB NP NP
S
NP
NNS
Farmers… … …
0
000
010
0
0
010
010
0
0
010
000
0
… … …
00921011.0
00039842.000032132.0
00084673.0
00043675.000136979.0
00056302.0
00075940.000154736.0
T ti tj
… … …
CDS desiderata- Vectors are smaller- Vectors are obtained with a Compositional Function
Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012
© F.M.Zanzotto
Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012
Names for the «Distributed» World
00921011.0
00039842.000032132.0
00084673.0
00043675.000136979.0
00056302.0
00075940.000154736.0
… … …
Distributed Trees(DT)
Distributed Tree Fragments (DTF)
Distributed Tree Kernels (DTK)
As we are encoding trees in small vectors, the tradition is distributed structures (Plate, 1994)
© F.M.Zanzotto
• Compositionally building Distributed Tree Fragments
• Distributed Tree Fragments are a nearly orthonormal base of Rd
• Distributed Trees can be efficiently computed• DTKs shuold approximate Tree Kernels
DTK: Expected properties and challenges
Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012
© F.M.Zanzotto
• Compositionally building Distributed Tree Fragments
• Distributed Tree Fragments are a nearly orthonormal base that embeds Rm in Rd
• Distributed Trees can be efficiently computed• DTKs shuold approximate Tree Kernels
DTK: Expected properties and challenges
Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012
© F.M.Zanzotto
Compositionally building Distributed Tree Fragments
Basic elementsN a set of nearly orthogonal random vectors for node labels a basic vector composition function with some ideal properties
A distributed tree fragment is the application of the composition function on the node vectors, according to the order given by a depth first visit of the tree.
Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012
© F.M.Zanzotto
Building Distributed Tree Fragments
Properties of the Ideal function 1. Non-commutativity with a
very high degree k2. Non-associativity3. Bilinearity
Approximation4. 5. 6.
we demonstrated DTF are a nearly orthonormal base
Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012
© F.M.Zanzotto
• Compositionally building Distributed Tree Fragments
• Distributed Tree Fragments are a nearly orthonormal base that embeds Rm in Rd
• Distributed Trees can be efficiently computed• DTKs shuold approximate Tree Kernels
DTK: Expected properties and challenges
Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012
© F.M.Zanzotto
Building Distributed Trees
Given a tree T, the distributed representation of its subtrees is the vector:
where S(T) is the set of the subtrees of T
VP
VB NP NP
S
NP
NNS
VP
VB NP
feed
NP
NNS
cows
NN NNS
animal extracts
S
NP
NNS
Farmers
VP
VB NP NP
S
NP
NNS
Farmers
…S( ) = { , }
Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012
© F.M.Zanzotto
Building Distributed Trees
A more efficient approach
N(T) is the set of nodes of Ts(n) is defined as:
if n is terminal
if nc1…cm
Computing a Distributed Tree is linear with respect to the size of N(T)
Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012
© F.M.Zanzotto
• Compositionally building Distributed Tree Fragments
• Distributed Tree Fragments are a nearly orthonormal base that embeds Rm in Rd
• Distributed Trees can be efficiently computed• DTKs shuold approximate Tree Kernels
DTK: Expected properties and challenges
Property 1 (Nearly Unit Vectors)
Property 2 (Nearly Orthogonal Vectors)
Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012
© F.M.Zanzotto
Task-based Analysis for x
Question Classification Recognizing Textual Entailment
Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012
with these realizations of the ideal function :• Shuffled normalized element-wise product • Shuffled circular convolution
© F.M.Zanzotto
Remarks
00921011.0
00039842.000032132.0
00084673.0
00043675.000136979.0
00056302.0
00075940.000154736.0
… … …
Distributed Trees(DT)
Distributed Tree Fragments (DTF)
Distributed Tree Kernels (DTK)
are a nearly orthonormal base that embeds Rm in Rd
can be efficiently computed
approximate Tree Kernels
© F.M.Zanzotto
Side effect: reduced time complexity
• Tree kernels (TK) (Collins & Duffy, 2001) have quadratic complexity• Current techniques control this complexity (Moschitti, 2006), (Rieck et
al., 2010), (Shin et al.,2011)
DTKs change the complexity as they can be used with Linear Kernel Machines
SVM+TK LinearSVM+DTKShuf. Prod. Shuf. Circ. Conv.
Training O(n3|N(T)|2) O(n|N(T)|d) O(n|N(T)|d log d)
Classification O(n|N(T)|2) O(|N(T)|d) O(|N(T)|d log d)
n: # of training examples|N(T)|: # of nodes of the tree T
Zanzotto&Dell'Arciprete, Distributed Convolution Kernels on Countable Sets, Journal of Machine Learning Research, Accepted conditioned to minor revisions
© F.M.Zanzotto
Sequel• Towards Structured Prediction:
Distributed Representation Parsing• Generalizing the theory:
Distributed Convolution Kernels on Countable Sets• Adding back distributional semantics:
Distributed Smoothed Tree Kernels
© F.M.Zanzotto
Sequel• Towards Structured Prediction :
Distributed Representation Parsing• Generalizing the theory:
Distributed Convolution Kernels on Countable Sets• Adding back distributional semantics:
Distributed Smoothed Tree Kernels
© F.M.Zanzotto
Distributed Representation Parsing (DRP): the idea
Distributed Tree
Encoder(DT)
Zanzotto&Dell'Arciprete, Transducing Sentences to Syntactic Feature Vectors: an Alternative Way to "Parse"?, Proceedings of the ACL-Workshop on CVSC, 2013
© F.M.ZanzottoZanzotto&Dell'Arciprete, Transducing Sentences to Syntactic Feature Vectors: an Alternative Way to "Parse"?, Proceedings of the ACL-Workshop on CVSC, 2013
Distributed Representation Parsing (DRP): the idea
Symbolic Parser
(SP)
Distributed Tree
Encoder(DT)
«We booked the flight»
Distributed Representation Parsing (DRP)
Sentence Encoder
(D)
Transducer(P)
© F.M.ZanzottoZanzotto&Dell'Arciprete, Transducing Sentences to Syntactic Feature Vectors: an Alternative Way to "Parse"?, Proceedings of the ACL-Workshop on CVSC, 2013
• Non-Lexicalized Sentence Models– Bag-of-postags– N-grams of postags
• Lexicalized Sentence Models– Unigrams– Unigrams + N-grams of postags
DRP: Sentence Encoder
«We booked the flight»
Distributed Representation Parsing (DRP)
Sentence Encoder
(D)
Transducer(P)
© F.M.ZanzottoZanzotto&Dell'Arciprete, Transducing Sentences to Syntactic Feature Vectors: an Alternative Way to "Parse"?, Proceedings of the ACL-Workshop on CVSC, 2013
Estimation: Principal Component Analysis and Partial Least Square Estimation
T=PSApproximation: Moore-Penrose pseudoinverse to derive (Penrose, 1955) where k is the number of selected singular values
DRP: Transducer
)()( kk TSP
«We booked the flight»
Distributed Representation Parsing (DRP)
Sentence Encoder
(D)
Transducer(P)
© F.M.Zanzotto
• Data– English Penn Treebank with standard split– Distributed trees with 3 l (0, 0.2, 0.4) and 2 models
Unlexicalized/Lexicalized– Dimension of the reduced space (4,096 and 8,192)
• System Comparison– Distributed Symbolic Parser DSP(s) = DT(SP(s))– Symbolic Parser: Bikel Parser (Bikel, 2004) with Collins Settings
(Collins, 2003)
• Parameter Estimation– Parameters
• k for the pseudo-inverse• j for the sentence encoders D
– Maximization of the similarity (see parsing performance) on Section 24
Experimental set-up
Zanzotto&Dell'Arciprete, Transducing Sentences to Syntactic Feature Vectors: an Alternative Way to "Parse"?, Proceedings of the ACL-Workshop on CVSC, 2013
© F.M.Zanzotto
Evaluation Measure
«Distributed» Parsing Performance
Unlexicalized Trees
Lexicalized Trees
Zanzotto&Dell'Arciprete, Transducing Sentences to Syntactic Feature Vectors: an Alternative Way to "Parse"?, Proceedings of the ACL-Workshop on CVSC, 2013
© F.M.Zanzotto
Sequel• Towards Structured Prediction :
Distributed Representation Parsing• Generalizing the theory:
Distributed Convolution Kernels on Countable Sets• Adding back distributional semantics:
Distributed Smoothed Tree Kernels
© F.M.Zanzotto
The following general property holds:
where • CK is a convolution kernel• DCK is the related distributed convolution kernel
Implemented Distributed Convolution Kernels• Distributed Tree Kernel• Distributed Subpath Kernel• Distributed Route Kernel• Distributed String Kernel• Distributed Partial Tree Kernel
Distributed Convolution Kernels on Contable Sets
Zanzotto&Dell'Arciprete, Distributed Convolution Kernels on Countable Sets, Journal of Machine Learning Research, Accepted conditioned to minor revisions
© F.M.Zanzotto
Sequel• Towards Structured Prediction :
Distributed Representation Parsing• Generalizing the theory:
Distributed Convolution Kernels on Countable Sets• Adding back distributional semantics:
Distributed Smoothed Tree Kernels
© F.M.Zanzotto
Going back to RTE and distributional semantics
Mehdad, Moschitti, Zanzotto, Syntactic/Semantic Structures for Textual Entailment Recognition, Proceedings of NAACL, 2010
NP VP
VB NP X
S
NP VP
VB
S
X
killed died
NP VP
VB NP X
NP VP
VB
X
murdered died
S S
Promis
ing!!!
Distributional Sem
antics
© F.M.ZanzottoFerrone&Zanzotto, Linear Compositional Distributional Semantics and Structural Kernels, Proceedings of Joint Symposium of Semantic Processing, 2013
A Novel Look at the Recursive Full Additive Model
¿2 𝐴𝑉𝑁𝑒𝑎𝑡+𝐵𝑉𝑁𝑐𝑜𝑤𝑠+𝐵𝑉𝑁 𝐴𝑁 𝑁
𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑠+𝐵𝑉𝑁 𝐵𝑁 𝑁𝑎𝑛𝑖𝑚𝑎𝑙𝑣
¿2 𝐴𝑉𝑁𝑒𝑎𝑡+𝐵𝑉𝑁𝑐 h𝑖𝑐𝑘𝑒𝑛𝑠+𝐵𝑉𝑁 𝐴𝑁 𝑁
𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑠+𝐵𝑉𝑁𝐵𝑁 𝑁𝑏𝑒𝑒𝑓𝑢
structuremeaning
structuremeaning
<1?
structuremeaning
𝑣 ∙𝑢=∑𝑖𝑉 𝑖 𝑣 𝑖 ∙∑
𝑗𝑈 𝑗 𝑢 𝑗=∑
𝑖 , 𝑗𝑉 𝑖 𝑣𝑖 ∙𝑈 𝑗 𝑢 𝑗
© F.M.Zanzotto
A Novel Look at the Recursive Full Additive Model
𝑣 ∙𝑢=∑𝑖𝑉 𝑖 𝑣 𝑖 ∙∑
𝑗𝑈 𝑗 𝑢 𝑗=∑
𝑖 , 𝑗𝑉 𝑖 𝑣𝑖 ∙𝑈 𝑗 𝑢 𝑗
Zanzotto, Ferrone, Baroni, When the whole is not greater than the sum of its parts: A decompositional look at compositional distributional semantics, re-submitted
¿∑𝑖 , 𝑗
¿𝑉 𝑖𝑇𝑈 𝑗 ,𝑣 𝑖𝑢 𝑗
𝑇 ¿𝐹𝑟𝑜𝑏 if Structi =Structj
if Structi≠Structj
Choosing:
≈ ∑{𝑖 , 𝑗∨Struct i = S truct j \}
𝑣 𝑖 ∙𝑢 𝑗
© F.M.Zanzotto
«Convolution Conjecture»
𝑇 1 ∙𝑇2=∑𝑖𝛼 𝑖𝜏𝑖∑
𝑗𝛽 𝑗𝜏 𝑗
Compositional Distributional Models based on linear algebra and Convolution Kernels are intimately related
𝑣 ∙𝑢=∑𝑖𝑉 𝑖 𝑣 𝑖 ∙∑
𝑗𝑈 𝑗 𝑢 𝑗=∑
𝑖 , 𝑗𝑉 𝑖 𝑣𝑖 ∙𝑈 𝑗 𝑢 𝑗
Zanzotto, Ferrone, Baroni, When the whole is not greater than the sum of its parts: A decompositional look at compositional distributional semantics, re-submitted
For example:
Convolution Kernel Recursive Full Additive Model
The similarity equations between two vectors/tensors obtained with CDSMs can be decomposed into operations performed on the subparts of the input phrases.
© F.M.Zanzotto
Distributed Smoothed Tree Kernels
NP VP
VB NP
S
killed
NP VP
VB NP
murdered
S: murdered
: killed
NP VP
VB NP
S: killed
synt( ) = NP VP
VB NP
S
NP VP
VB NP
S: killed
head( ) = killed
Ferrone, Zanzotto, Towards Syntax-aware Compositional Distributional Semantic Models, Proceedings of CoLing, 2014
© F.M.Zanzotto
In general, for a lexicalized tree:
we define
Distributed Smoothed Tree Kernels
Ferrone, Zanzotto, Towards Syntax-aware Compositional Distributional Semantic Models, Proceedings of CoLing, 2014
© F.M.Zanzotto
Distributed Smoothed Tree Kernels
𝑻= ∑𝑡𝑖∈𝑆 (𝑡 )
𝑻 𝑖= ∑𝑡 𝑖∈𝑆(𝑡 )
𝑠𝑦𝑛𝑡 (𝑡𝑖)h𝑒𝑎𝑑 (𝑡𝑖)𝑇
Ferrone, Zanzotto, Towards Syntax-aware Compositional Distributional Semantic Models, Proceedings of CoLing, 2014
Distributed Smoothed Tree
The resulting dot (frobenius) product
© F.M.Zanzotto
What’s next...
© F.M.Zanzotto
Recognizing Textual Entailment
Feature Spaces of the Rules with Variables
adding distributional semantics
Distributional Semantics
Binary CDS
𝑇 1∙𝑇 2=∑𝑖𝛼 𝑖𝜏𝑖(1)∑
𝑗𝛽 𝑗 𝜏 𝑗
(2)
Tree Kernels
Distributed Tree Kernels
𝑣 ∙𝑢=∑𝑖
𝑣 𝑖∑𝑗
𝑢 𝑗=∑𝑖 , 𝑗
𝑣𝑖 𝑢 𝑗
Distributed Representation Parsing
Distributed Convolution Kernelson Countable Sets
𝐵𝑉𝑁 𝐵𝑁𝑁𝑏𝑒𝑒𝑓
structuremeaning
Recursive CDSPreq
uel
Sequ
el
© F.M.Zanzotto
Recognizing Textual Entailment
Feature Spaces of the Rules with Variables
adding distributional semantics
Distributional Semantics
Binary CDS
𝑇 1∙𝑇 2=∑𝑖𝛼 𝑖𝜏𝑖(1)∑
𝑗𝛽 𝑗 𝜏 𝑗
(2)
Tree Kernels
Distributed Tree Kernels
𝑣 ∙𝑢=∑𝑖
𝑣 𝑖∑𝑗
𝑢 𝑗=∑𝑖 , 𝑗
𝑣𝑖 𝑢 𝑗
Distributed Representation Parsing
Distributed Convolution Kernelson Countable Sets
𝐵𝑉𝑁 𝐵𝑁𝑁𝑏𝑒𝑒𝑓
structuremeaning
Recursive CDSPreq
uel
Sequ
el
Adding back distributional meaningDistributed Convolution Kernels
on Countable Sets
Lexicalized Distributed Representation Parsing
Wha
t’s n
ext
Distributed Tree Kernels and Distributional Semantics:Between Syntactic Structures and Compositional Distributional Semantics
© F.M.Zanzotto
• Applications– Indexing structured information for Fast Syntax-aware
Information Retrieval– Semantic Text Similarity– Fast Document Summarization– Indexing structured information for XML Information
RetrievalAny other suggestion?
• Accelerator– Optimizing the code with GPU programming (CUDA)
After what’s next… what’s for?
© F.M.Zanzotto
• Lorenzo Dell’Arciprete• Marco Pennacchiotti• Alessandro Moschitti• Yashar Mehdad• Ioannis Korkontzelos• Lorenzo Ferrone
Code Distributed Tree Kernels and Distributed Convolution Kernels:
http://code.google.com/p/distributed-tree-kernels/
Credits
© F.M.Zanzotto
© F.M.Zanzotto
Distributed Tree KernelsCompositional
Distributional Semantics
Brain&Computer
VP
VB NP NP
S
C
N
F
VB NP NP
S
VP
© F.M.Zanzotto
Distributed Tree KernelsZanzotto, F. M. & Dell'Arciprete, L. Distributed Tree Kernels, Proceedings of International Conference on Machine Learning, 2012Tree Kernels and Distributional SematicsMehdad, Y.; Moschitti, A. & Zanzotto, F. M. Syntactic/Semantic Structures for Textual Entailment Recognition, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2010Compositional Distributional SemanticsZanzotto, F. M.; Korkontzelos, I.; Fallucchi, F. & Manandhar, S. Estimating Linear Models for Compositional Distributional Semantics, Proceedings of the 23rd International Conference on Computational Linguistics (COLING), 2010Distributed and Distributional Tree KernelsZanzotto, F. M. & Dell'arciprete, L. Distributed Representations and Distributional Semantics, Proceedings of the ACL-HLT 2011 workshop on Distributional Semantics and Compositionality (DiSCo), 2011
If you want to read more…
© F.M.Zanzotto
Initial Idea• Zanzotto, F. M. & Moschitti, A. Automatic learning of textual entailments with cross-
pair similarities, ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, 2006
First refinement of the algorithm• Moschitti, A. & Zanzotto, F. M. Fast and Effective Kernels for Relational Learning from
Texts, Proceedings of 24th Annual International Conference on Machine Learning, 2007Adding shallow semantics• Pennacchiotti, M. & Zanzotto, F. M. Learning Shallow Semantic Rules for Textual
Entailment, Proceeding of International Conference RANLP - 2007, 2007A comprehensive description• Zanzotto, F. M.; Pennacchiotti, M. & Moschitti, A. A Machine Learning Approach to
Textual Entailment Recognition, NATURAL LANGUAGE ENGINEERING, 2009
My first lifeLearning Textual Entailment Recognition Systems
© F.M.Zanzotto
Adding Distributional Semantics• Mehdad, Y.; Moschitti, A. & Zanzotto, F. M. Syntactic/Semantic Structures for Textual Entailment
Recognition, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2010
A valid kernel with an efficient algorithm• Zanzotto, F. M. & Dell'Arciprete, L. Efficient kernels for sentence pair classification, Conference on
Empirical Methods on Natural Language Processing, 2009• Zanzotto, F. M.; Dell'arciprete, L. & Moschitti, A. Efficient Graph Kernels for Textual Entailment
Recognition, FUNDAMENTA INFORMATICAEApplications• Zanzotto, F. M.; Pennacchiotti, M. & Tsioutsiouliklis, K. Linguistic Redundancy in Twitter,
Proceedings of 2011 Conference on Empirical Methods on Natural Language Processing (EmNLP), 2011
Extracting RTE Corpora• Zanzotto, F. M. & Pennacchiotti, M. Expanding textual entailment corpora from Wikipedia using co-
training, Proceedings of the COLING-Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources, 2010
Learning Verb Relations• Zanzotto, F. M.; Pennacchiotti, M. & Pazienza, M. T. Discovering asymmetric entailment relations
between verbs using selectional preferences, ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
My first lifeLearning Textual Entailment Recognition Systems
© F.M.Zanzotto
Zanzotto, F. M. & Croce, D. Comparing EEG/ERP-like and fMRI-like Techniques for Reading Machine Thoughts, BI 2010: Proceedings of the Brain Informatics Conference - Toronto, 2010Zanzotto, F. M.; Croce, D. & Prezioso, S. Reading what Machines "Think": a Challenge for Nanotechnology, Joint Conferences on Avdanced Materials, 2009Zanzotto, F. M. & Croce, D. Reading what machines "think", BI 2009: Proceedings of the Brain Informatics Conference - Bejing, China, October 2009Prezioso, S.; Croce, D. & Zanzotto, F. M. Reading what machines "think": a challenge for nanotechnology, JOURNAL OF COMPUTATIONAL AND THEORETICAL NANOSCIENCE, 2011Zanzotto, F. M.; Dell'arciprete, L. & Korkontzelos, Y. Rappresentazione distribuita e semantica distribuzionale dalla prospettiva dell'Intelligenza Artificiale, TEORIE & MODELLI, 2010
My second lifeParallels between Brains and Computers
© F.M.Zanzotto
Thank you for the attention
© F.M.Zanzotto
Structured Feature Spaces: Dimensionality Reduction
VP
VB NP NP
S
NP
NNS
VP
VB NP
feed
NP
NNS
cows
NN NNS
animal extracts
S
NP
NNS
Farmers
VP
VB NP NP
S
NP
NNS
Farmers… … …
0
000
010
0
0
010
010
0
0
010
000
0
… … …
00921011.0
00039842.000032132.0
00084673.0
00043675.000136979.0
00056302.0
00075940.000154736.0
T ti tj
… … …
Traditional Dimensionality Reduction Techniques• Singular Value Decomposition• Random Indexing• Feature Selection
Not ap
plicab
le
© F.M.Zanzotto
Computational Complexity of DTK
• n size of the tree• k selected tree fragments• qw reducing factor• O(.) worst-case complexity• A(.) average-case complexity
© F.M.Zanzotto
Time Complexity Analysis
• DTK time complexity is independent of the tree sizes!
© F.M.Zanzotto
Outline
• DTK: Expected properties and challenges• Model:• Distributed Tree Fragments• Distributed Trees
• Experimental evaluation• Remarks• Back to Compositional Distributional Semantics• Future Work
© F.M.Zanzotto
Towards Distributional Distributed Trees
• Distributed Tree Fragments– Non-terminal nodes n: random vectors– Terminal nodes w: random vectors
• Distributional Distributed Tree Fragments– Non-terminal nodes n: random vectors– Terminal nodes w: distributional vectors
Caveat: Property 2
Random vectors are nearly orthogonal Distributional vectors are not
021 tt
Zanzotto&Dell‘Arciprete, Distributed Representations and Distributional Semantics, Proceedings of the ACL-workshop DiSCo, 2011
© F.M.Zanzotto
Experimental Set-up
• Task Based Comparison:– Corpus: RTE1,2,3,5– Measure: Accuracy
• Distributed/Distributional Vector Size: 250• Distributional Vectors:
– Corpus: UKWaC (Ferraresi et al., 2008)– LSA: applied with k=250
Zanzotto&Dell‘Arciprete, Distributed Representations and Distributional Semantics, Proceedings of the ACL-workshop DiSCo, 2011
© F.M.Zanzotto
Accuracy Results
Zanzotto&Dell‘Arciprete, Distributed Representations and Distributional Semantics, Proceedings of the ACL-workshop DiSCo, 2011
© F.M.Zanzotto
Adding semanticsShallow semantics
Pennacchiotti&Zanzotto, Learning Shallow Semantic Rules for Textual Entailment, Proceeding of RANLP, 2007
T
H
“For my younger readers, Chapman killed John Lennon more than twenty years ago.”“John Lennon died more than twenty years ago.”
T HLearning example
NP VP
VB NPY X
S
NP VP
VB Y
S
X
A generalized rule
causes
cs cs
killed diedVariables with Types
© F.M.Zanzotto
Empirical Evaluation of Properties• Non-commutativity• Distributivity over the sum• Norm preservation• Orthogonality preservation
OK
OK
?
?
© F.M.Zanzotto
Symbolic Parser
(SP)
Distributed Tree
Encoder(DT)
«We booked the flight»
Distributed Representation Parsing (DRP)
Sentence Encoder
(D)
Transducer(P)