fabio massimo zanzotto art group dipartimento di ingegneria dell’impresa

70
Distributed Tree Kernels and Distributional Semantics: Between Syntactic Structures and Compositional Distributional Semantics Fabio Massimo Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa University of Rome ”Tor Vergata”

Upload: menefer

Post on 23-Feb-2016

30 views

Category:

Documents


0 download

DESCRIPTION

Distributed Tree Kernels and Distributional Semantics : Between Syntactic Structures and Compositional Distributional Semantics. Fabio Massimo Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa University of Rome ”Tor Vergata”. Prequel. P 1 : T 1  H 1. T 1. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

Distributed Tree Kernels and Distributional Semantics:Between Syntactic Structures and

Compositional Distributional Semantics

Fabio Massimo ZanzottoART Group

Dipartimento di Ingegneria dell’ImpresaUniversity of Rome ”Tor Vergata”

Page 2: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Prequel

Page 3: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Recognizing Textual Entailment (RTE)

The task (Dagan et al., 2005) Given a text T and an hypothesis H, decide if T implies H

T1

H1

“Farmers feed cows animal extracts”

“Cows eat animal extracts”

P1: T1 H1

RTE as a classification task• Selecting the best learning algorithm• Defining the feature space

Page 4: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Recognizing Textual Entailment (RTE)

T1

H1

“Farmers feed cows animal extracts”

“Cows eat animal extracts”

P1: T1 H1

Page 5: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Learning RTE Classifiers: the feature space

T1

H1

“Farmers feed cows animal extracts”

“Cows eat animal extracts”

P1: T1 H1

T2

H2

“They feed dolphins fish”

“Fish eat dolphins”

P2: T2 H2

T3

H3

“Mothers feed babies milk”

“Babies eat milk”

P3: T3 H3

Training examples

Classification

Relevant FeaturesRules with Variables

(First-order rules)

feed eatX Y X Y feed eatX Y Y X

feed eatX Y X Y

Page 6: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

AveragePrecisio

n Accuracy First Author (Group)80.8% 75.4% Hickl (LCC)71.3% 73.8% Tatu (LCC)

64.4% 63.9%Zanzotto (Milan &

Rome)62.8% 62.6% Adams (Dallas)66.9% 61.6% Bos (Rome & Leeds)

Learning RTE Classifiers: the feature space

S

NP VP

VB NP

X

Y

eat

VP

VB NP X

feed

NP Y

Rules with Variables(First-order rules)

feed eatX Y X Y

Zanzotto&Moschitti, Automatic learning of textual entailments with cross-pair similarities, Coling-ACL, 2006

RTE 2 Results

Page 7: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Adding semanticsDistributional Semantics

Mehdad, Moschitti, Zanzotto, Syntactic/Semantic Structures for Textual Entailment Recognition, Proceedings of NAACL, 2010

NP VP

VB NP X

S

NP VP

VB

S

X

killed died

NP VP

VB NP X

NP VP

VB

X

murdered died

S S

Promis

ing!!!

Distributional Sem

antics

Page 8: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Compositional Distributional Semantics (CDS)

Mitchell&Lapata (2008) set a general model for bigrams

that assigns a distributional meaning to a sequence of two words “x y”:– R is the relation between x and y– K is an external knowledge

),,,( KRyxfz

z

An active research area!

Page 9: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

1z

1z

1y

2y

x

Compositional Distributional Semantics (CDS)

hands

car

moving

moving hands

moving car

A “distributional” semantic space Composing “distributional” meaning

Page 10: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Compositional Distributional Semantics (CDS)

Mitchell&Lapata (2008) set a general model for bigrams

that assigns a distributional meaning to a sequence of two words “x y”:– R is the relation between x and y– K is an external knowledge

),,,( KRyxfz

z

handsmovingyx

moving handsz

zyx

f

Page 11: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Matrices AR and BR can be estimated with:- positive examples taken from dictionaries

- multivariate regression models

CDS: Full Additive Model

The full additive model

yBxAz RR

Zanzotto, Korkontzelos, Fallucchi, Manandhar, Estimating Linear Models for Compositional Distributional Semantics, Proceedings of the 23rd COLING, 2010

contact   /ˈkɒntækt/ [kon-takt] 2. close interaction

Page 12: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

CDS: Recursive Full Additive Model

eat

cows extracts

animal

VN VN

NN

f(

=f(

=

=

Let’s scale up to sentences by recursively applying the model!

Let’s apply it to RTE

Extremely poor results

Page 13: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Recursive Full Additive Model: a closer look

«chickens eat beef extracts»

«cows eat animal extracts»

¿2 𝐴𝑉𝑁𝑒𝑎𝑡+𝐵𝑉𝑁𝑐𝑜𝑤𝑠+𝐵𝑉𝑁 𝐴𝑁 𝑁

𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑠+𝐵𝑉𝑁 𝐵𝑁 𝑁𝑎𝑛𝑖𝑚𝑎𝑙

¿2 𝐴𝑉𝑁𝑒𝑎𝑡+𝐵𝑉𝑁𝑐 h𝑖𝑐𝑘𝑒𝑛𝑠+𝐵𝑉𝑁 𝐴𝑁 𝑁

𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑠+𝐵𝑉𝑁𝐵𝑁 𝑁𝑏𝑒𝑒𝑓

𝑣 ∙𝑢 ∙ ∙ ∙ ∙…

𝑣

𝑢

f

f

… evaluating the similarity

Ferrone&Zanzotto, Linear Compositional Distributional Semantics and Structural Kernels, Proceedings of Joint Symposium of Semantic Processing, 2013

Page 14: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.ZanzottoFerrone&Zanzotto, Linear Compositional Distributional Semantics and Structural Kernels, Proceedings of Joint Symposium of Semantic Processing, 2013

Recursive Full Additive Model: a closer look

¿2 𝐴𝑉𝑁𝑒𝑎𝑡+𝐵𝑉𝑁𝑐𝑜𝑤𝑠+𝐵𝑉𝑁 𝐴𝑁 𝑁

𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑠+𝐵𝑉𝑁 𝐵𝑁 𝑁𝑎𝑛𝑖𝑚𝑎𝑙𝑣

¿2 𝐴𝑉𝑁𝑒𝑎𝑡+𝐵𝑉𝑁𝑐 h𝑖𝑐𝑘𝑒𝑛𝑠+𝐵𝑉𝑁 𝐴𝑁 𝑁

𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑠+𝐵𝑉𝑁𝐵𝑁 𝑁𝑏𝑒𝑒𝑓𝑢

structuremeaning

structuremeaning

<1?

structuremeaning

𝑣 ∙𝑢=∑𝑖𝑉 𝑖 𝑣 𝑖 ∙∑

𝑗𝑈 𝑗 𝑢 𝑗=∑

𝑖 , 𝑗𝑉 𝑖 𝑣𝑖 ∙𝑈 𝑗 𝑢 𝑗

Page 15: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

The prequel …

𝐵𝑉𝑁 𝐵𝑁𝑁𝑏𝑒𝑒𝑓

structuremeaning

Recognizing Textual Entailment

Feature Spaces of the Rules with Variables

adding distributional semantics

Distributional Semantics

Binary CDS

Recursive CDS

𝑣 ∙𝑢=∑𝑖𝑉 𝑖 𝑣 𝑖 ∙∑

𝑗𝑈 𝑗 𝑢 𝑗=∑

𝑖 , 𝑗𝑉 𝑖 𝑣𝑖 ∙𝑈 𝑗 𝑢 𝑗

Page 16: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Distributed Tree Kernels

𝑣 ∙𝑢=∑𝑖𝑉 𝑖 𝑣 𝑖 ∙∑

𝑗𝑈 𝑗 𝑢 𝑗=∑

𝑖 , 𝑗𝑉 𝑖 𝑣𝑖 ∙𝑈 𝑗 𝑢 𝑗

Page 17: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012

Tree Kernels

VP

VB NP NP

S

NP

NNS

VP

VB NP

feed

NP

NNS

cows

NN NNS

animal extracts

S

NP

NNS

Farmers

VP

VB NP NP

S

NP

NNS

Farmers… … …

0

000

010

0

0

010

010

0

0

010

000

0

… … …

T ti tj

𝑇 1∙𝑇2=∑𝑖𝛼 𝑖𝜏𝑖∑

𝑗𝛽 𝑗𝜏 𝑗

𝑣 ∙𝑢=∑𝑖𝑉 𝑖 𝑣 𝑖 ∙∑

𝑗𝑈 𝑗 𝑢 𝑗=∑

𝑖 , 𝑗𝑉 𝑖 𝑣𝑖 ∙𝑈 𝑗 𝑢 𝑗

Page 18: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Tree Kernels in Smaller Vectors

VP

VB NP NP

S

NP

NNS

VP

VB NP

feed

NP

NNS

cows

NN NNS

animal extracts

S

NP

NNS

Farmers

VP

VB NP NP

S

NP

NNS

Farmers… … …

0

000

010

0

0

010

010

0

0

010

000

0

… … …

00921011.0

00039842.000032132.0

00084673.0

00043675.000136979.0

00056302.0

00075940.000154736.0

T ti tj

… … …

CDS desiderata- Vectors are smaller- Vectors are obtained with a Compositional Function

Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012

Page 19: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012

Names for the «Distributed» World

00921011.0

00039842.000032132.0

00084673.0

00043675.000136979.0

00056302.0

00075940.000154736.0

… … …

Distributed Trees(DT)

Distributed Tree Fragments (DTF)

Distributed Tree Kernels (DTK)

As we are encoding trees in small vectors, the tradition is distributed structures (Plate, 1994)

Page 20: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

• Compositionally building Distributed Tree Fragments

• Distributed Tree Fragments are a nearly orthonormal base of Rd

• Distributed Trees can be efficiently computed• DTKs shuold approximate Tree Kernels

DTK: Expected properties and challenges

Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012

Page 21: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

• Compositionally building Distributed Tree Fragments

• Distributed Tree Fragments are a nearly orthonormal base that embeds Rm in Rd

• Distributed Trees can be efficiently computed• DTKs shuold approximate Tree Kernels

DTK: Expected properties and challenges

Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012

Page 22: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Compositionally building Distributed Tree Fragments

Basic elementsN a set of nearly orthogonal random vectors for node labels a basic vector composition function with some ideal properties

A distributed tree fragment is the application of the composition function on the node vectors, according to the order given by a depth first visit of the tree.

Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012

Page 23: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Building Distributed Tree Fragments

Properties of the Ideal function 1. Non-commutativity with a

very high degree k2. Non-associativity3. Bilinearity

Approximation4. 5. 6.

we demonstrated DTF are a nearly orthonormal base

Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012

Page 24: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

• Compositionally building Distributed Tree Fragments

• Distributed Tree Fragments are a nearly orthonormal base that embeds Rm in Rd

• Distributed Trees can be efficiently computed• DTKs shuold approximate Tree Kernels

DTK: Expected properties and challenges

Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012

Page 25: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Building Distributed Trees

Given a tree T, the distributed representation of its subtrees is the vector:

where S(T) is the set of the subtrees of T

VP

VB NP NP

S

NP

NNS

VP

VB NP

feed

NP

NNS

cows

NN NNS

animal extracts

S

NP

NNS

Farmers

VP

VB NP NP

S

NP

NNS

Farmers

…S( ) = { , }

Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012

Page 26: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Building Distributed Trees

A more efficient approach

N(T) is the set of nodes of Ts(n) is defined as:

if n is terminal

if nc1…cm

Computing a Distributed Tree is linear with respect to the size of N(T)

Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012

Page 27: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

• Compositionally building Distributed Tree Fragments

• Distributed Tree Fragments are a nearly orthonormal base that embeds Rm in Rd

• Distributed Trees can be efficiently computed• DTKs shuold approximate Tree Kernels

DTK: Expected properties and challenges

Property 1 (Nearly Unit Vectors)

Property 2 (Nearly Orthogonal Vectors)

Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012

Page 28: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Task-based Analysis for x

Question Classification Recognizing Textual Entailment

Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012

with these realizations of the ideal function :• Shuffled normalized element-wise product • Shuffled circular convolution

Page 29: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Remarks

00921011.0

00039842.000032132.0

00084673.0

00043675.000136979.0

00056302.0

00075940.000154736.0

… … …

Distributed Trees(DT)

Distributed Tree Fragments (DTF)

Distributed Tree Kernels (DTK)

are a nearly orthonormal base that embeds Rm in Rd

can be efficiently computed

approximate Tree Kernels

Page 30: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Side effect: reduced time complexity

• Tree kernels (TK) (Collins & Duffy, 2001) have quadratic complexity• Current techniques control this complexity (Moschitti, 2006), (Rieck et

al., 2010), (Shin et al.,2011)

DTKs change the complexity as they can be used with Linear Kernel Machines

SVM+TK LinearSVM+DTKShuf. Prod. Shuf. Circ. Conv.

Training O(n3|N(T)|2) O(n|N(T)|d) O(n|N(T)|d log d)

Classification O(n|N(T)|2) O(|N(T)|d) O(|N(T)|d log d)

n: # of training examples|N(T)|: # of nodes of the tree T

Zanzotto&Dell'Arciprete, Distributed Convolution Kernels on Countable Sets, Journal of Machine Learning Research, Accepted conditioned to minor revisions

Page 31: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Sequel• Towards Structured Prediction:

Distributed Representation Parsing• Generalizing the theory:

Distributed Convolution Kernels on Countable Sets• Adding back distributional semantics:

Distributed Smoothed Tree Kernels

Page 32: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Sequel• Towards Structured Prediction :

Distributed Representation Parsing• Generalizing the theory:

Distributed Convolution Kernels on Countable Sets• Adding back distributional semantics:

Distributed Smoothed Tree Kernels

Page 33: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Distributed Representation Parsing (DRP): the idea

Distributed Tree

Encoder(DT)

Zanzotto&Dell'Arciprete, Transducing Sentences to Syntactic Feature Vectors: an Alternative Way to "Parse"?, Proceedings of the ACL-Workshop on CVSC, 2013

Page 34: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.ZanzottoZanzotto&Dell'Arciprete, Transducing Sentences to Syntactic Feature Vectors: an Alternative Way to "Parse"?, Proceedings of the ACL-Workshop on CVSC, 2013

Distributed Representation Parsing (DRP): the idea

Symbolic Parser

(SP)

Distributed Tree

Encoder(DT)

«We booked the flight»

Distributed Representation Parsing (DRP)

Sentence Encoder

(D)

Transducer(P)

Page 35: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.ZanzottoZanzotto&Dell'Arciprete, Transducing Sentences to Syntactic Feature Vectors: an Alternative Way to "Parse"?, Proceedings of the ACL-Workshop on CVSC, 2013

• Non-Lexicalized Sentence Models– Bag-of-postags– N-grams of postags

• Lexicalized Sentence Models– Unigrams– Unigrams + N-grams of postags

DRP: Sentence Encoder

«We booked the flight»

Distributed Representation Parsing (DRP)

Sentence Encoder

(D)

Transducer(P)

Page 36: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.ZanzottoZanzotto&Dell'Arciprete, Transducing Sentences to Syntactic Feature Vectors: an Alternative Way to "Parse"?, Proceedings of the ACL-Workshop on CVSC, 2013

Estimation: Principal Component Analysis and Partial Least Square Estimation

T=PSApproximation: Moore-Penrose pseudoinverse to derive (Penrose, 1955) where k is the number of selected singular values

DRP: Transducer

)()( kk TSP

«We booked the flight»

Distributed Representation Parsing (DRP)

Sentence Encoder

(D)

Transducer(P)

Page 37: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

• Data– English Penn Treebank with standard split– Distributed trees with 3 l (0, 0.2, 0.4) and 2 models

Unlexicalized/Lexicalized– Dimension of the reduced space (4,096 and 8,192)

• System Comparison– Distributed Symbolic Parser DSP(s) = DT(SP(s))– Symbolic Parser: Bikel Parser (Bikel, 2004) with Collins Settings

(Collins, 2003)

• Parameter Estimation– Parameters

• k for the pseudo-inverse• j for the sentence encoders D

– Maximization of the similarity (see parsing performance) on Section 24

Experimental set-up

Zanzotto&Dell'Arciprete, Transducing Sentences to Syntactic Feature Vectors: an Alternative Way to "Parse"?, Proceedings of the ACL-Workshop on CVSC, 2013

Page 38: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Evaluation Measure

«Distributed» Parsing Performance

Unlexicalized Trees

Lexicalized Trees

Zanzotto&Dell'Arciprete, Transducing Sentences to Syntactic Feature Vectors: an Alternative Way to "Parse"?, Proceedings of the ACL-Workshop on CVSC, 2013

Page 39: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Sequel• Towards Structured Prediction :

Distributed Representation Parsing• Generalizing the theory:

Distributed Convolution Kernels on Countable Sets• Adding back distributional semantics:

Distributed Smoothed Tree Kernels

Page 40: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

The following general property holds:

where • CK is a convolution kernel• DCK is the related distributed convolution kernel

Implemented Distributed Convolution Kernels• Distributed Tree Kernel• Distributed Subpath Kernel• Distributed Route Kernel• Distributed String Kernel• Distributed Partial Tree Kernel

Distributed Convolution Kernels on Contable Sets

Zanzotto&Dell'Arciprete, Distributed Convolution Kernels on Countable Sets, Journal of Machine Learning Research, Accepted conditioned to minor revisions

Page 41: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Sequel• Towards Structured Prediction :

Distributed Representation Parsing• Generalizing the theory:

Distributed Convolution Kernels on Countable Sets• Adding back distributional semantics:

Distributed Smoothed Tree Kernels

Page 42: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Going back to RTE and distributional semantics

Mehdad, Moschitti, Zanzotto, Syntactic/Semantic Structures for Textual Entailment Recognition, Proceedings of NAACL, 2010

NP VP

VB NP X

S

NP VP

VB

S

X

killed died

NP VP

VB NP X

NP VP

VB

X

murdered died

S S

Promis

ing!!!

Distributional Sem

antics

Page 43: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.ZanzottoFerrone&Zanzotto, Linear Compositional Distributional Semantics and Structural Kernels, Proceedings of Joint Symposium of Semantic Processing, 2013

A Novel Look at the Recursive Full Additive Model

¿2 𝐴𝑉𝑁𝑒𝑎𝑡+𝐵𝑉𝑁𝑐𝑜𝑤𝑠+𝐵𝑉𝑁 𝐴𝑁 𝑁

𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑠+𝐵𝑉𝑁 𝐵𝑁 𝑁𝑎𝑛𝑖𝑚𝑎𝑙𝑣

¿2 𝐴𝑉𝑁𝑒𝑎𝑡+𝐵𝑉𝑁𝑐 h𝑖𝑐𝑘𝑒𝑛𝑠+𝐵𝑉𝑁 𝐴𝑁 𝑁

𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑠+𝐵𝑉𝑁𝐵𝑁 𝑁𝑏𝑒𝑒𝑓𝑢

structuremeaning

structuremeaning

<1?

structuremeaning

𝑣 ∙𝑢=∑𝑖𝑉 𝑖 𝑣 𝑖 ∙∑

𝑗𝑈 𝑗 𝑢 𝑗=∑

𝑖 , 𝑗𝑉 𝑖 𝑣𝑖 ∙𝑈 𝑗 𝑢 𝑗

Page 44: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

A Novel Look at the Recursive Full Additive Model

𝑣 ∙𝑢=∑𝑖𝑉 𝑖 𝑣 𝑖 ∙∑

𝑗𝑈 𝑗 𝑢 𝑗=∑

𝑖 , 𝑗𝑉 𝑖 𝑣𝑖 ∙𝑈 𝑗 𝑢 𝑗

Zanzotto, Ferrone, Baroni, When the whole is not greater than the sum of its parts: A decompositional look at compositional distributional semantics, re-submitted

¿∑𝑖 , 𝑗

¿𝑉 𝑖𝑇𝑈 𝑗 ,𝑣 𝑖𝑢 𝑗

𝑇 ¿𝐹𝑟𝑜𝑏 if Structi =Structj

if Structi≠Structj

Choosing:

≈ ∑{𝑖 , 𝑗∨Struct i = S truct j \}

𝑣 𝑖 ∙𝑢 𝑗

Page 45: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

«Convolution Conjecture»

𝑇 1 ∙𝑇2=∑𝑖𝛼 𝑖𝜏𝑖∑

𝑗𝛽 𝑗𝜏 𝑗

Compositional Distributional Models based on linear algebra and Convolution Kernels are intimately related

𝑣 ∙𝑢=∑𝑖𝑉 𝑖 𝑣 𝑖 ∙∑

𝑗𝑈 𝑗 𝑢 𝑗=∑

𝑖 , 𝑗𝑉 𝑖 𝑣𝑖 ∙𝑈 𝑗 𝑢 𝑗

Zanzotto, Ferrone, Baroni, When the whole is not greater than the sum of its parts: A decompositional look at compositional distributional semantics, re-submitted

For example:

Convolution Kernel Recursive Full Additive Model

The similarity equations between two vectors/tensors obtained with CDSMs can be decomposed into operations performed on the subparts of the input phrases.

Page 46: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Distributed Smoothed Tree Kernels

NP VP

VB NP

S

killed

NP VP

VB NP

murdered

S: murdered

: killed

NP VP

VB NP

S: killed

synt( ) = NP VP

VB NP

S

NP VP

VB NP

S: killed

head( ) = killed

Ferrone, Zanzotto, Towards Syntax-aware Compositional Distributional Semantic Models, Proceedings of CoLing, 2014

Page 47: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

In general, for a lexicalized tree:

we define

Distributed Smoothed Tree Kernels

Ferrone, Zanzotto, Towards Syntax-aware Compositional Distributional Semantic Models, Proceedings of CoLing, 2014

Page 48: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Distributed Smoothed Tree Kernels

𝑻= ∑𝑡𝑖∈𝑆 (𝑡 )

𝑻 𝑖= ∑𝑡 𝑖∈𝑆(𝑡 )

𝑠𝑦𝑛𝑡 (𝑡𝑖)h𝑒𝑎𝑑 (𝑡𝑖)𝑇

Ferrone, Zanzotto, Towards Syntax-aware Compositional Distributional Semantic Models, Proceedings of CoLing, 2014

Distributed Smoothed Tree

The resulting dot (frobenius) product

Page 49: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

What’s next...

Page 50: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Recognizing Textual Entailment

Feature Spaces of the Rules with Variables

adding distributional semantics

Distributional Semantics

Binary CDS

𝑇 1∙𝑇 2=∑𝑖𝛼 𝑖𝜏𝑖(1)∑

𝑗𝛽 𝑗 𝜏 𝑗

(2)

Tree Kernels

Distributed Tree Kernels

𝑣 ∙𝑢=∑𝑖

𝑣 𝑖∑𝑗

𝑢 𝑗=∑𝑖 , 𝑗

𝑣𝑖 𝑢 𝑗

Distributed Representation Parsing

Distributed Convolution Kernelson Countable Sets

𝐵𝑉𝑁 𝐵𝑁𝑁𝑏𝑒𝑒𝑓

structuremeaning

Recursive CDSPreq

uel

Sequ

el

Page 51: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Recognizing Textual Entailment

Feature Spaces of the Rules with Variables

adding distributional semantics

Distributional Semantics

Binary CDS

𝑇 1∙𝑇 2=∑𝑖𝛼 𝑖𝜏𝑖(1)∑

𝑗𝛽 𝑗 𝜏 𝑗

(2)

Tree Kernels

Distributed Tree Kernels

𝑣 ∙𝑢=∑𝑖

𝑣 𝑖∑𝑗

𝑢 𝑗=∑𝑖 , 𝑗

𝑣𝑖 𝑢 𝑗

Distributed Representation Parsing

Distributed Convolution Kernelson Countable Sets

𝐵𝑉𝑁 𝐵𝑁𝑁𝑏𝑒𝑒𝑓

structuremeaning

Recursive CDSPreq

uel

Sequ

el

Adding back distributional meaningDistributed Convolution Kernels

on Countable Sets

Lexicalized Distributed Representation Parsing

Wha

t’s n

ext

Distributed Tree Kernels and Distributional Semantics:Between Syntactic Structures and Compositional Distributional Semantics

Page 52: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

• Applications– Indexing structured information for Fast Syntax-aware

Information Retrieval– Semantic Text Similarity– Fast Document Summarization– Indexing structured information for XML Information

RetrievalAny other suggestion?

• Accelerator– Optimizing the code with GPU programming (CUDA)

After what’s next… what’s for?

Page 53: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

• Lorenzo Dell’Arciprete• Marco Pennacchiotti• Alessandro Moschitti• Yashar Mehdad• Ioannis Korkontzelos• Lorenzo Ferrone

Code Distributed Tree Kernels and Distributed Convolution Kernels:

http://code.google.com/p/distributed-tree-kernels/

Credits

Page 54: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Page 55: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Distributed Tree KernelsCompositional

Distributional Semantics

Brain&Computer

VP

VB NP NP

S

C

N

F

VB NP NP

S

VP

Page 56: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Distributed Tree KernelsZanzotto, F. M. & Dell'Arciprete, L. Distributed Tree Kernels, Proceedings of International Conference on Machine Learning, 2012Tree Kernels and Distributional SematicsMehdad, Y.; Moschitti, A. & Zanzotto, F. M. Syntactic/Semantic Structures for Textual Entailment Recognition, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2010Compositional Distributional SemanticsZanzotto, F. M.; Korkontzelos, I.; Fallucchi, F. & Manandhar, S. Estimating Linear Models for Compositional Distributional Semantics, Proceedings of the 23rd International Conference on Computational Linguistics (COLING), 2010Distributed and Distributional Tree KernelsZanzotto, F. M. & Dell'arciprete, L. Distributed Representations and Distributional Semantics, Proceedings of the ACL-HLT 2011 workshop on Distributional Semantics and Compositionality (DiSCo), 2011

If you want to read more…

Page 57: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Initial Idea• Zanzotto, F. M. & Moschitti, A. Automatic learning of textual entailments with cross-

pair similarities, ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, 2006

First refinement of the algorithm• Moschitti, A. & Zanzotto, F. M. Fast and Effective Kernels for Relational Learning from

Texts, Proceedings of 24th Annual International Conference on Machine Learning, 2007Adding shallow semantics• Pennacchiotti, M. & Zanzotto, F. M. Learning Shallow Semantic Rules for Textual

Entailment, Proceeding of International Conference RANLP - 2007, 2007A comprehensive description• Zanzotto, F. M.; Pennacchiotti, M. & Moschitti, A. A Machine Learning Approach to

Textual Entailment Recognition, NATURAL LANGUAGE ENGINEERING, 2009

My first lifeLearning Textual Entailment Recognition Systems

Page 58: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Adding Distributional Semantics• Mehdad, Y.; Moschitti, A. & Zanzotto, F. M. Syntactic/Semantic Structures for Textual Entailment

Recognition, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2010

A valid kernel with an efficient algorithm• Zanzotto, F. M. & Dell'Arciprete, L. Efficient kernels for sentence pair classification, Conference on

Empirical Methods on Natural Language Processing, 2009• Zanzotto, F. M.; Dell'arciprete, L. & Moschitti, A. Efficient Graph Kernels for Textual Entailment

Recognition, FUNDAMENTA INFORMATICAEApplications• Zanzotto, F. M.; Pennacchiotti, M. & Tsioutsiouliklis, K. Linguistic Redundancy in Twitter,

Proceedings of 2011 Conference on Empirical Methods on Natural Language Processing (EmNLP), 2011

Extracting RTE Corpora• Zanzotto, F. M. & Pennacchiotti, M. Expanding textual entailment corpora from Wikipedia using co-

training, Proceedings of the COLING-Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources, 2010

Learning Verb Relations• Zanzotto, F. M.; Pennacchiotti, M. & Pazienza, M. T. Discovering asymmetric entailment relations

between verbs using selectional preferences, ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics

My first lifeLearning Textual Entailment Recognition Systems

Page 59: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Zanzotto, F. M. & Croce, D. Comparing EEG/ERP-like and fMRI-like Techniques for Reading Machine Thoughts, BI 2010: Proceedings of the Brain Informatics Conference - Toronto, 2010Zanzotto, F. M.; Croce, D. & Prezioso, S. Reading what Machines "Think": a Challenge for Nanotechnology, Joint Conferences on Avdanced Materials, 2009Zanzotto, F. M. & Croce, D. Reading what machines "think", BI 2009: Proceedings of the Brain Informatics Conference - Bejing, China, October 2009Prezioso, S.; Croce, D. & Zanzotto, F. M. Reading what machines "think": a challenge for nanotechnology, JOURNAL OF COMPUTATIONAL AND THEORETICAL NANOSCIENCE, 2011Zanzotto, F. M.; Dell'arciprete, L. & Korkontzelos, Y. Rappresentazione distribuita e semantica distribuzionale dalla prospettiva dell'Intelligenza Artificiale, TEORIE & MODELLI, 2010

My second lifeParallels between Brains and Computers

Page 60: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Thank you for the attention

Page 61: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Structured Feature Spaces: Dimensionality Reduction

VP

VB NP NP

S

NP

NNS

VP

VB NP

feed

NP

NNS

cows

NN NNS

animal extracts

S

NP

NNS

Farmers

VP

VB NP NP

S

NP

NNS

Farmers… … …

0

000

010

0

0

010

010

0

0

010

000

0

… … …

00921011.0

00039842.000032132.0

00084673.0

00043675.000136979.0

00056302.0

00075940.000154736.0

T ti tj

… … …

Traditional Dimensionality Reduction Techniques• Singular Value Decomposition• Random Indexing• Feature Selection

Not ap

plicab

le

Page 62: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Computational Complexity of DTK

• n size of the tree• k selected tree fragments• qw reducing factor• O(.) worst-case complexity• A(.) average-case complexity

Page 63: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Time Complexity Analysis

• DTK time complexity is independent of the tree sizes!

Page 64: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Outline

• DTK: Expected properties and challenges• Model:• Distributed Tree Fragments• Distributed Trees

• Experimental evaluation• Remarks• Back to Compositional Distributional Semantics• Future Work

Page 65: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Towards Distributional Distributed Trees

• Distributed Tree Fragments– Non-terminal nodes n: random vectors– Terminal nodes w: random vectors

• Distributional Distributed Tree Fragments– Non-terminal nodes n: random vectors– Terminal nodes w: distributional vectors

Caveat: Property 2

Random vectors are nearly orthogonal Distributional vectors are not

021 tt

Zanzotto&Dell‘Arciprete, Distributed Representations and Distributional Semantics, Proceedings of the ACL-workshop DiSCo, 2011

Page 66: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Experimental Set-up

• Task Based Comparison:– Corpus: RTE1,2,3,5– Measure: Accuracy

• Distributed/Distributional Vector Size: 250• Distributional Vectors:

– Corpus: UKWaC (Ferraresi et al., 2008)– LSA: applied with k=250

Zanzotto&Dell‘Arciprete, Distributed Representations and Distributional Semantics, Proceedings of the ACL-workshop DiSCo, 2011

Page 67: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Accuracy Results

Zanzotto&Dell‘Arciprete, Distributed Representations and Distributional Semantics, Proceedings of the ACL-workshop DiSCo, 2011

Page 68: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Adding semanticsShallow semantics

Pennacchiotti&Zanzotto, Learning Shallow Semantic Rules for Textual Entailment, Proceeding of RANLP, 2007

T

H

“For my younger readers, Chapman killed John Lennon more than twenty years ago.”“John Lennon died more than twenty years ago.”

T HLearning example

NP VP

VB NPY X

S

NP VP

VB Y

S

X

A generalized rule

causes

cs cs

killed diedVariables with Types

Page 69: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Empirical Evaluation of Properties• Non-commutativity• Distributivity over the sum• Norm preservation• Orthogonality preservation

OK

OK

?

?

Page 70: Fabio Massimo  Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa

© F.M.Zanzotto

Symbolic Parser

(SP)

Distributed Tree

Encoder(DT)

«We booked the flight»

Distributed Representation Parsing (DRP)

Sentence Encoder

(D)

Transducer(P)