fabio massimo zanzotto art group dipartimento di ingegneria dell’impresa

Distributed Tree Kernels and Distributional Semantics:Between Syntactic Structures and

Compositional Distributional Semantics

Fabio Massimo ZanzottoART Group

Dipartimento di Ingegneria dell’ImpresaUniversity of Rome ”Tor Vergata”

© F.M.Zanzotto

Prequel

© F.M.Zanzotto

Recognizing Textual Entailment (RTE)

The task (Dagan et al., 2005) Given a text T and an hypothesis H, decide if T implies H

T1

H1

“Farmers feed cows animal extracts”

“Cows eat animal extracts”

P1: T1 H1

RTE as a classification task• Selecting the best learning algorithm• Defining the feature space

© F.M.Zanzotto

Recognizing Textual Entailment (RTE)

T1

H1



P1: T1 H1

© F.M.Zanzotto

Learning RTE Classifiers: the feature space

T1

H1



P1: T1 H1

T2

H2

“They feed dolphins fish”

“Fish eat dolphins”

P2: T2 H2

T3

H3

“Mothers feed babies milk”

“Babies eat milk”

P3: T3 H3

Training examples

Classification

Relevant FeaturesRules with Variables

(First-order rules)

feed eatX Y X Y feed eatX Y Y X

feed eatX Y X Y

© F.M.Zanzotto

AveragePrecisio

n Accuracy First Author (Group)80.8% 75.4% Hickl (LCC)71.3% 73.8% Tatu (LCC)

64.4% 63.9%Zanzotto (Milan &

Rome)62.8% 62.6% Adams (Dallas)66.9% 61.6% Bos (Rome & Leeds)

Learning RTE Classifiers: the feature space

S

NP VP

VB NP

X

Y

eat

VP

VB NP X

feed

NP Y

Rules with Variables(First-order rules)

feed eatX Y X Y

Zanzotto&Moschitti, Automatic learning of textual entailments with cross-pair similarities, Coling-ACL, 2006

RTE 2 Results

© F.M.Zanzotto

Adding semanticsDistributional Semantics

Mehdad, Moschitti, Zanzotto, Syntactic/Semantic Structures for Textual Entailment Recognition, Proceedings of NAACL, 2010

NP VP

VB NP X

S

NP VP

VB

S

X

killed died

NP VP

VB NP X

NP VP

VB

X

murdered died

S S

Promis

ing!!!

Distributional Sem

antics

© F.M.Zanzotto

Compositional Distributional Semantics (CDS)

Mitchell&Lapata (2008) set a general model for bigrams

that assigns a distributional meaning to a sequence of two words “x y”:– R is the relation between x and y– K is an external knowledge

),,,( KRyxfz

z

An active research area!

© F.M.Zanzotto

1z

1z

1y

2y

x


hands

car

moving

moving hands

moving car

A “distributional” semantic space Composing “distributional” meaning

© F.M.Zanzotto


Mitchell&Lapata (2008) set a general model for bigrams

that assigns a distributional meaning to a sequence of two words “x y”:– R is the relation between x and y– K is an external knowledge

),,,( KRyxfz

z

handsmovingyx

moving handsz

zyx

f

© F.M.Zanzotto

Matrices AR and BR can be estimated with:- positive examples taken from dictionaries

- multivariate regression models

CDS: Full Additive Model

The full additive model

yBxAz RR

Zanzotto, Korkontzelos, Fallucchi, Manandhar, Estimating Linear Models for Compositional Distributional Semantics, Proceedings of the 23rd COLING, 2010

contact /ˈkɒntækt/ [kon-takt] 2. close interaction

© F.M.Zanzotto

CDS: Recursive Full Additive Model

eat

cows extracts

animal

VN VN

NN

f(

=f(

=

=

Let’s scale up to sentences by recursively applying the model!

Let’s apply it to RTE

Extremely poor results

© F.M.Zanzotto

Recursive Full Additive Model: a closer look

«chickens eat beef extracts»

«cows eat animal extracts»

¿2 𝐴𝑉𝑁𝑒𝑎𝑡+𝐵𝑉𝑁𝑐𝑜𝑤𝑠+𝐵𝑉𝑁 𝐴𝑁 𝑁

𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑠+𝐵𝑉𝑁 𝐵𝑁 𝑁𝑎𝑛𝑖𝑚𝑎𝑙

¿2 𝐴𝑉𝑁𝑒𝑎𝑡+𝐵𝑉𝑁𝑐 h𝑖𝑐𝑘𝑒𝑛𝑠+𝐵𝑉𝑁 𝐴𝑁 𝑁

𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑠+𝐵𝑉𝑁𝐵𝑁 𝑁𝑏𝑒𝑒𝑓

𝑣 ∙𝑢 ∙ ∙ ∙ ∙…

𝑣

𝑢

f

f

… evaluating the similarity

Ferrone&Zanzotto, Linear Compositional Distributional Semantics and Structural Kernels, Proceedings of Joint Symposium of Semantic Processing, 2013

© F.M.ZanzottoFerrone&Zanzotto, Linear Compositional Distributional Semantics and Structural Kernels, Proceedings of Joint Symposium of Semantic Processing, 2013

Recursive Full Additive Model: a closer look


𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑠+𝐵𝑉𝑁 𝐵𝑁 𝑁𝑎𝑛𝑖𝑚𝑎𝑙𝑣


𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑠+𝐵𝑉𝑁𝐵𝑁 𝑁𝑏𝑒𝑒𝑓𝑢

structuremeaning

structuremeaning

<1?

structuremeaning

𝑣 ∙𝑢=∑𝑖𝑉 𝑖 𝑣 𝑖 ∙∑

𝑗𝑈 𝑗 𝑢 𝑗=∑

𝑖 , 𝑗𝑉 𝑖 𝑣𝑖 ∙𝑈 𝑗 𝑢 𝑗

© F.M.Zanzotto

The prequel …

𝐵𝑉𝑁 𝐵𝑁𝑁𝑏𝑒𝑒𝑓

structuremeaning

Recognizing Textual Entailment

Feature Spaces of the Rules with Variables

adding distributional semantics

Distributional Semantics

Binary CDS

Recursive CDS




© F.M.Zanzotto

Distributed Tree Kernels




© F.M.Zanzotto

Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012

Tree Kernels

VP

VB NP NP

S

NP

NNS

VP

VB NP

feed

NP

NNS

cows

NN NNS

animal extracts

S

NP

NNS

Farmers

VP

VB NP NP

S

NP

NNS

Farmers… … …

0

000

010

0

0

010

010

0

0

010

000

0

… … …

T ti tj

𝑇 1∙𝑇2=∑𝑖𝛼 𝑖𝜏𝑖∑

𝑗𝛽 𝑗𝜏 𝑗




© F.M.Zanzotto

Tree Kernels in Smaller Vectors

VP

VB NP NP

S

NP

NNS

VP

VB NP

feed

NP

NNS

cows

NN NNS

animal extracts

S

NP

NNS

Farmers

VP

VB NP NP

S

NP

NNS

Farmers… … …

0

000

010

0

0

010

010

0

0

010

000

0

… … …

00921011.0

00039842.000032132.0

00084673.0

00043675.000136979.0

00056302.0

00075940.000154736.0

T ti tj

… … …

CDS desiderata- Vectors are smaller- Vectors are obtained with a Compositional Function


© F.M.Zanzotto


Names for the «Distributed» World

00921011.0

00039842.000032132.0

00084673.0

00043675.000136979.0

00056302.0

00075940.000154736.0

… … …

Distributed Trees(DT)

Distributed Tree Fragments (DTF)

Distributed Tree Kernels (DTK)

As we are encoding trees in small vectors, the tradition is distributed structures (Plate, 1994)

© F.M.Zanzotto

• Compositionally building Distributed Tree Fragments

• Distributed Tree Fragments are a nearly orthonormal base of Rd

• Distributed Trees can be efficiently computed• DTKs shuold approximate Tree Kernels

DTK: Expected properties and challenges


© F.M.Zanzotto


• Distributed Tree Fragments are a nearly orthonormal base that embeds Rm in Rd




© F.M.Zanzotto

Compositionally building Distributed Tree Fragments

Basic elementsN a set of nearly orthogonal random vectors for node labels a basic vector composition function with some ideal properties

A distributed tree fragment is the application of the composition function on the node vectors, according to the order given by a depth first visit of the tree.


© F.M.Zanzotto

Building Distributed Tree Fragments

Properties of the Ideal function 1. Non-commutativity with a

very high degree k2. Non-associativity3. Bilinearity

Approximation4. 5. 6.

we demonstrated DTF are a nearly orthonormal base


© F.M.Zanzotto

Building Distributed Trees

Given a tree T, the distributed representation of its subtrees is the vector:

where S(T) is the set of the subtrees of T

VP

VB NP NP

S

NP

NNS

VP

VB NP

feed

NP

NNS

cows

NN NNS

animal extracts

S

NP

NNS

Farmers

VP

VB NP NP

S

NP

NNS

Farmers

…S( ) = { , }


© F.M.Zanzotto

Building Distributed Trees

A more efficient approach

N(T) is the set of nodes of Ts(n) is defined as:

if n is terminal

if nc1…cm

Computing a Distributed Tree is linear with respect to the size of N(T)


© F.M.Zanzotto





Property 1 (Nearly Unit Vectors)

Property 2 (Nearly Orthogonal Vectors)


© F.M.Zanzotto

Task-based Analysis for x

Question Classification Recognizing Textual Entailment


with these realizations of the ideal function :• Shuffled normalized element-wise product • Shuffled circular convolution

© F.M.Zanzotto

Remarks

00921011.0

00039842.000032132.0

00084673.0

00043675.000136979.0

00056302.0

00075940.000154736.0

… … …

Distributed Trees(DT)

Distributed Tree Fragments (DTF)

Distributed Tree Kernels (DTK)

are a nearly orthonormal base that embeds Rm in Rd

can be efficiently computed

approximate Tree Kernels

© F.M.Zanzotto

Side effect: reduced time complexity

• Tree kernels (TK) (Collins & Duffy, 2001) have quadratic complexity• Current techniques control this complexity (Moschitti, 2006), (Rieck et

al., 2010), (Shin et al.,2011)

DTKs change the complexity as they can be used with Linear Kernel Machines

SVM+TK LinearSVM+DTKShuf. Prod. Shuf. Circ. Conv.

Training O(n3|N(T)|2) O(n|N(T)|d) O(n|N(T)|d log d)

Classification O(n|N(T)|2) O(|N(T)|d) O(|N(T)|d log d)

n: # of training examples|N(T)|: # of nodes of the tree T

Zanzotto&Dell'Arciprete, Distributed Convolution Kernels on Countable Sets, Journal of Machine Learning Research, Accepted conditioned to minor revisions

© F.M.Zanzotto

Sequel• Towards Structured Prediction:

Distributed Representation Parsing• Generalizing the theory:

Distributed Convolution Kernels on Countable Sets• Adding back distributional semantics:

Distributed Smoothed Tree Kernels

© F.M.Zanzotto

Sequel• Towards Structured Prediction :




© F.M.Zanzotto

Distributed Representation Parsing (DRP): the idea

Distributed Tree

Encoder(DT)

Zanzotto&Dell'Arciprete, Transducing Sentences to Syntactic Feature Vectors: an Alternative Way to "Parse"?, Proceedings of the ACL-Workshop on CVSC, 2013

© F.M.ZanzottoZanzotto&Dell'Arciprete, Transducing Sentences to Syntactic Feature Vectors: an Alternative Way to "Parse"?, Proceedings of the ACL-Workshop on CVSC, 2013

Distributed Representation Parsing (DRP): the idea

Symbolic Parser

(SP)

Distributed Tree

Encoder(DT)

«We booked the flight»

Distributed Representation Parsing (DRP)

Sentence Encoder

(D)

Transducer(P)


• Non-Lexicalized Sentence Models– Bag-of-postags– N-grams of postags

• Lexicalized Sentence Models– Unigrams– Unigrams + N-grams of postags

DRP: Sentence Encoder



Sentence Encoder

(D)

Transducer(P)


Estimation: Principal Component Analysis and Partial Least Square Estimation

T=PSApproximation: Moore-Penrose pseudoinverse to derive (Penrose, 1955) where k is the number of selected singular values

DRP: Transducer

)()( kk TSP



Sentence Encoder

(D)

Transducer(P)

© F.M.Zanzotto

• Data– English Penn Treebank with standard split– Distributed trees with 3 l (0, 0.2, 0.4) and 2 models

Unlexicalized/Lexicalized– Dimension of the reduced space (4,096 and 8,192)

• System Comparison– Distributed Symbolic Parser DSP(s) = DT(SP(s))– Symbolic Parser: Bikel Parser (Bikel, 2004) with Collins Settings

(Collins, 2003)

• Parameter Estimation– Parameters

• k for the pseudo-inverse• j for the sentence encoders D

– Maximization of the similarity (see parsing performance) on Section 24

Experimental set-up


© F.M.Zanzotto

Evaluation Measure

«Distributed» Parsing Performance

Unlexicalized Trees

Lexicalized Trees


© F.M.Zanzotto

The following general property holds:

where • CK is a convolution kernel• DCK is the related distributed convolution kernel

Implemented Distributed Convolution Kernels• Distributed Tree Kernel• Distributed Subpath Kernel• Distributed Route Kernel• Distributed String Kernel• Distributed Partial Tree Kernel

Distributed Convolution Kernels on Contable Sets

Zanzotto&Dell'Arciprete, Distributed Convolution Kernels on Countable Sets, Journal of Machine Learning Research, Accepted conditioned to minor revisions

© F.M.Zanzotto

Going back to RTE and distributional semantics

Mehdad, Moschitti, Zanzotto, Syntactic/Semantic Structures for Textual Entailment Recognition, Proceedings of NAACL, 2010

NP VP

VB NP X

S

NP VP

VB

S

X

killed died

NP VP

VB NP X

NP VP

VB

X

murdered died

S S

Promis

ing!!!

Distributional Sem

antics

© F.M.ZanzottoFerrone&Zanzotto, Linear Compositional Distributional Semantics and Structural Kernels, Proceedings of Joint Symposium of Semantic Processing, 2013

A Novel Look at the Recursive Full Additive Model


𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑠+𝐵𝑉𝑁 𝐵𝑁 𝑁𝑎𝑛𝑖𝑚𝑎𝑙𝑣


𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑠+𝐵𝑉𝑁𝐵𝑁 𝑁𝑏𝑒𝑒𝑓𝑢

structuremeaning

structuremeaning

<1?

structuremeaning




© F.M.Zanzotto

A Novel Look at the Recursive Full Additive Model




Zanzotto, Ferrone, Baroni, When the whole is not greater than the sum of its parts: A decompositional look at compositional distributional semantics, re-submitted

¿∑𝑖 , 𝑗

¿𝑉 𝑖𝑇𝑈 𝑗 ,𝑣 𝑖𝑢 𝑗

𝑇 ¿𝐹𝑟𝑜𝑏 if Structi =Structj

if Structi≠Structj

Choosing:

≈ ∑{𝑖 , 𝑗∨Struct i = S truct j \}

𝑣 𝑖 ∙𝑢 𝑗

© F.M.Zanzotto

«Convolution Conjecture»

𝑇 1 ∙𝑇2=∑𝑖𝛼 𝑖𝜏𝑖∑

𝑗𝛽 𝑗𝜏 𝑗

Compositional Distributional Models based on linear algebra and Convolution Kernels are intimately related




Zanzotto, Ferrone, Baroni, When the whole is not greater than the sum of its parts: A decompositional look at compositional distributional semantics, re-submitted

For example:

Convolution Kernel Recursive Full Additive Model

The similarity equations between two vectors/tensors obtained with CDSMs can be decomposed into operations performed on the subparts of the input phrases.

© F.M.Zanzotto


NP VP

VB NP

S

killed

NP VP

VB NP

murdered

S: murdered

: killed

NP VP

VB NP

S: killed

synt( ) = NP VP

VB NP

S

NP VP

VB NP

S: killed

head( ) = killed

Ferrone, Zanzotto, Towards Syntax-aware Compositional Distributional Semantic Models, Proceedings of CoLing, 2014

© F.M.Zanzotto

In general, for a lexicalized tree:

we define



© F.M.Zanzotto


𝑻= ∑𝑡𝑖∈𝑆 (𝑡 )

𝑻 𝑖= ∑𝑡 𝑖∈𝑆(𝑡 )

𝑠𝑦𝑛𝑡 (𝑡𝑖)h𝑒𝑎𝑑 (𝑡𝑖)𝑇


Distributed Smoothed Tree

The resulting dot (frobenius) product

© F.M.Zanzotto





Binary CDS

𝑇 1∙𝑇 2=∑𝑖𝛼 𝑖𝜏𝑖(1)∑

𝑗𝛽 𝑗 𝜏 𝑗

(2)

Tree Kernels


𝑣 ∙𝑢=∑𝑖

𝑣 𝑖∑𝑗

𝑢 𝑗=∑𝑖 , 𝑗

𝑣𝑖 𝑢 𝑗

Distributed Representation Parsing

Distributed Convolution Kernelson Countable Sets


structuremeaning

Recursive CDSPreq

uel

Sequ

el

© F.M.Zanzotto





Binary CDS

𝑇 1∙𝑇 2=∑𝑖𝛼 𝑖𝜏𝑖(1)∑

𝑗𝛽 𝑗 𝜏 𝑗

(2)

Tree Kernels


𝑣 ∙𝑢=∑𝑖

𝑣 𝑖∑𝑗

𝑢 𝑗=∑𝑖 , 𝑗

𝑣𝑖 𝑢 𝑗

Distributed Representation Parsing

Distributed Convolution Kernelson Countable Sets


structuremeaning

Recursive CDSPreq

uel

Sequ

el

Adding back distributional meaningDistributed Convolution Kernels

on Countable Sets

Lexicalized Distributed Representation Parsing

Wha

t’s n

ext

Distributed Tree Kernels and Distributional Semantics:Between Syntactic Structures and Compositional Distributional Semantics

© F.M.Zanzotto

• Applications– Indexing structured information for Fast Syntax-aware

Information Retrieval– Semantic Text Similarity– Fast Document Summarization– Indexing structured information for XML Information

RetrievalAny other suggestion?

• Accelerator– Optimizing the code with GPU programming (CUDA)

After what’s next… what’s for?

© F.M.Zanzotto

• Lorenzo Dell’Arciprete• Marco Pennacchiotti• Alessandro Moschitti• Yashar Mehdad• Ioannis Korkontzelos• Lorenzo Ferrone

Code Distributed Tree Kernels and Distributed Convolution Kernels:

http://code.google.com/p/distributed-tree-kernels/

Credits

© F.M.Zanzotto

Distributed Tree KernelsZanzotto, F. M. & Dell'Arciprete, L. Distributed Tree Kernels, Proceedings of International Conference on Machine Learning, 2012Tree Kernels and Distributional SematicsMehdad, Y.; Moschitti, A. & Zanzotto, F. M. Syntactic/Semantic Structures for Textual Entailment Recognition, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2010Compositional Distributional SemanticsZanzotto, F. M.; Korkontzelos, I.; Fallucchi, F. & Manandhar, S. Estimating Linear Models for Compositional Distributional Semantics, Proceedings of the 23rd International Conference on Computational Linguistics (COLING), 2010Distributed and Distributional Tree KernelsZanzotto, F. M. & Dell'arciprete, L. Distributed Representations and Distributional Semantics, Proceedings of the ACL-HLT 2011 workshop on Distributional Semantics and Compositionality (DiSCo), 2011

If you want to read more…

© F.M.Zanzotto

Initial Idea• Zanzotto, F. M. & Moschitti, A. Automatic learning of textual entailments with cross-

pair similarities, ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, 2006

First refinement of the algorithm• Moschitti, A. & Zanzotto, F. M. Fast and Effective Kernels for Relational Learning from

Texts, Proceedings of 24th Annual International Conference on Machine Learning, 2007Adding shallow semantics• Pennacchiotti, M. & Zanzotto, F. M. Learning Shallow Semantic Rules for Textual

Entailment, Proceeding of International Conference RANLP - 2007, 2007A comprehensive description• Zanzotto, F. M.; Pennacchiotti, M. & Moschitti, A. A Machine Learning Approach to

Textual Entailment Recognition, NATURAL LANGUAGE ENGINEERING, 2009

My first lifeLearning Textual Entailment Recognition Systems

© F.M.Zanzotto

Adding Distributional Semantics• Mehdad, Y.; Moschitti, A. & Zanzotto, F. M. Syntactic/Semantic Structures for Textual Entailment

Recognition, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2010

A valid kernel with an efficient algorithm• Zanzotto, F. M. & Dell'Arciprete, L. Efficient kernels for sentence pair classification, Conference on

Empirical Methods on Natural Language Processing, 2009• Zanzotto, F. M.; Dell'arciprete, L. & Moschitti, A. Efficient Graph Kernels for Textual Entailment

Recognition, FUNDAMENTA INFORMATICAEApplications• Zanzotto, F. M.; Pennacchiotti, M. & Tsioutsiouliklis, K. Linguistic Redundancy in Twitter,

Proceedings of 2011 Conference on Empirical Methods on Natural Language Processing (EmNLP), 2011

Extracting RTE Corpora• Zanzotto, F. M. & Pennacchiotti, M. Expanding textual entailment corpora from Wikipedia using co-

training, Proceedings of the COLING-Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources, 2010

Learning Verb Relations• Zanzotto, F. M.; Pennacchiotti, M. & Pazienza, M. T. Discovering asymmetric entailment relations

between verbs using selectional preferences, ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics

My first lifeLearning Textual Entailment Recognition Systems

© F.M.Zanzotto

Zanzotto, F. M. & Croce, D. Comparing EEG/ERP-like and fMRI-like Techniques for Reading Machine Thoughts, BI 2010: Proceedings of the Brain Informatics Conference - Toronto, 2010Zanzotto, F. M.; Croce, D. & Prezioso, S. Reading what Machines "Think": a Challenge for Nanotechnology, Joint Conferences on Avdanced Materials, 2009Zanzotto, F. M. & Croce, D. Reading what machines "think", BI 2009: Proceedings of the Brain Informatics Conference - Bejing, China, October 2009Prezioso, S.; Croce, D. & Zanzotto, F. M. Reading what machines "think": a challenge for nanotechnology, JOURNAL OF COMPUTATIONAL AND THEORETICAL NANOSCIENCE, 2011Zanzotto, F. M.; Dell'arciprete, L. & Korkontzelos, Y. Rappresentazione distribuita e semantica distribuzionale dalla prospettiva dell'Intelligenza Artificiale, TEORIE & MODELLI, 2010

My second lifeParallels between Brains and Computers

© F.M.Zanzotto

Structured Feature Spaces: Dimensionality Reduction

VP

VB NP NP

S

NP

NNS

VP

VB NP

feed

NP

NNS

cows

NN NNS

animal extracts

S

NP

NNS

Farmers

VP

VB NP NP

S

NP

NNS

Farmers… … …

0

000

010

0

0

010

010

0

0

010

000

0

… … …

00921011.0

00039842.000032132.0

00084673.0

00043675.000136979.0

00056302.0

00075940.000154736.0

T ti tj

… … …

Traditional Dimensionality Reduction Techniques• Singular Value Decomposition• Random Indexing• Feature Selection

Not ap

plicab

le

© F.M.Zanzotto

Computational Complexity of DTK

• n size of the tree• k selected tree fragments• qw reducing factor• O(.) worst-case complexity• A(.) average-case complexity

© F.M.Zanzotto

Outline

• DTK: Expected properties and challenges• Model:• Distributed Tree Fragments• Distributed Trees

• Experimental evaluation• Remarks• Back to Compositional Distributional Semantics• Future Work

© F.M.Zanzotto

Towards Distributional Distributed Trees

• Distributed Tree Fragments– Non-terminal nodes n: random vectors– Terminal nodes w: random vectors

• Distributional Distributed Tree Fragments– Non-terminal nodes n: random vectors– Terminal nodes w: distributional vectors

Caveat: Property 2

Random vectors are nearly orthogonal Distributional vectors are not

021 tt

Zanzotto&Dell‘Arciprete, Distributed Representations and Distributional Semantics, Proceedings of the ACL-workshop DiSCo, 2011

© F.M.Zanzotto

Experimental Set-up

• Task Based Comparison:– Corpus: RTE1,2,3,5– Measure: Accuracy

• Distributed/Distributional Vector Size: 250• Distributional Vectors:

– Corpus: UKWaC (Ferraresi et al., 2008)– LSA: applied with k=250


© F.M.Zanzotto

Adding semanticsShallow semantics

Pennacchiotti&Zanzotto, Learning Shallow Semantic Rules for Textual Entailment, Proceeding of RANLP, 2007

T

H

“For my younger readers, Chapman killed John Lennon more than twenty years ago.”“John Lennon died more than twenty years ago.”

T HLearning example

NP VP

VB NPY X

S

NP VP

VB Y

S

X

A generalized rule

causes

cs cs

killed diedVariables with Types

© F.M.Zanzotto

Empirical Evaluation of Properties• Non-commutativity• Distributivity over the sum• Norm preservation• Orthogonality preservation

OK

OK

?

?

fabio massimo zanzotto art group dipartimento di ingegneria dell’impresa

Documents

distributional semantics

zanzotto1prequel f

feature space f

t1 h1 f

animal extractsp1

close interaction f

additive model zanzotto

rteextremely poor results