mono- and bilingual modeling of selectional preferences

Mono- and bilingual modeling of selectional preferences

Sebastian PadóInstitute for Computational Linguistics

Heidelberg University

(joint work with Katrin Erk, Ulrike Pado, Yves Peirsman)

Some context

•Computational lexical semantics: modeling the meaning of words and phrases

•Distributional approach• Observe the usage of words in corpora

• Robustness: Broad coverage, manageable complexity • Flexibility: Corpus choice determines model

Knowledge

Corpus

Structure

Methods:Distributional

semantics

Phenomena:Semantic

relations in bilingual

dictionaries

Application:Predictions of plausibility judgments

Plausibility of Verb-Relation-Argument-TriplesVerb Relation Argumen

tPlausibility

eat subject customer 6.9eat object customer 1.5eat subject apple 1.0eat object apple 6.4• Central aspect of language

• Selectional preferences [Katz & Fodor 1963, Wilks 1975]• Generalization of lexical similarity• Incremental language processing [McRae &

Matsuki 2009]• Disambiguation [Toutanova et al. 2005],

Applicability of inference rules [Pantel et al. 2007], SRL [Gildea & Jurafsky 2002]

Modelling Plausibility•Approximating plausibility by frequency

•Two lexical variables: Frequency of most triples is zero•Implausibility or sparse data?• Generalization based on an ontology (WordNet)

[Resnik 1996]• Generalization based on vector space [Erk, Padó, und

Padó 2010]

English corpus

(eat, obj, apple) 100

(eat, obj, hat) 1(eat, obj,

telephone) 0(eat, obj, caviar) 0

(eat, obj, apple): highly plausible(eat, obj, hat): somewhat plausible(eat, obj, telephone): ?(eat, obj, caviar): ?

Semantic Spaces

• Characterization of word meaning though profile over occurrence contexts [Salton, Wang, and Yang 1974, Landauer & Dumais 1997, Schütze 1998]

• Geometrically: Vector in high-dimensional space

• High vector similarity implies high semantic similarity• Next neighbors = synonyms

cultiver

rouler

mandarine

5 1

clémentine

4 1

voiture 1 20

Frcultiver

rouler

mandarineclémentine

voiture

Similarity-based generalization[Pado, Pado & Erk 2010]

•Plausibility is average vector space similarity to seen arguments

• (v, r, a): verb – relation – argument head word triple

• seenargs: set of argument head words seen in the corpus

• wt: weight function• Z: normalization constant• sim: semantic (vector space) similarity

Geometrical interpretation

Peter

husbandchild

orangeapple

breakfastcaviar Seen objects of

“eat”

Seen subjects of “eat”

telephone

Evaluation

•Triples with human plausibility ratings [McRae et al. 1996]• Evaluation: Correlation of model

predictions with human judgments• Spearman’s ρ = 1: perfect correlation; ρ = 0:

no correlation•Result: Vector space model attains almost quality of “deep” model at 98% coverage

Modell Abdeckung

Spearman’s rho

Resnik 1996 [ontology-based]

100% 0.123 n.s.

EPP [vector space-based] 98% 0.325 ***U. Pado et al. 2006 [“deep” model]

78% 0.415 ***

From one to many languages…

•Vector space model reduces the need for language resources to predict plausibility judgments• No ontologies•Still necessary: Observations of triples, target words• Large, accurately parsed corpus• Problematic for basically all languages except

English

•Can we extend our strategy to new languages?

Resnik [Brockmann & Lapata 2002]

TIGER+ GermaNet

ρ= .37

EPP [Pado & Peirsman 2010]

HGC ρ= .33

Predicting plausibility for new languages

•Transfer with a bilingual lexicon [Koehn and Knight 2002]• Cross-lingual knowledge transfer

•Print dictionaries are problematic• Instead: acquire from distributional data

cultiver – grow

pomme – apple

(cultiver, Obj, pomme) Englishmodel

Englishcorpus

(grow, obj, apple): highly plausible

Bilingual semantic space

• Joint semantic space for words from both languages [Rapp 1995, Fung & McKeown 1997]• Dimensions are bilingual word pairs, can be

bootstrapped• Frequencies observable from comparable

corpora• Nearest neighbors:

Cross-lingual synonyms ⟷ Translations

(cultiver, grow)

(rouler, drive)

mandarine

5 1

mandarin

4 2

car 1 20

Fr

cultiver/grow

rouler/drive

mandarinemandarin

carE

Nearest neighbors in bilingual space

• Similar usages / context profiles do not necessarily indicate synonymy

(cultiver, grow)

(rouler, drive)

pear 5 1pomme 4 2car 1 20

Fr

cultiver/grow

rouler/drive

pearpomme

carE

• Bilingual case: Peirsman & Pado (2011)• Lexicon extraction for EN/DE and

EN/NL

Evaluation against Gold Standard

•Evaluation of nearest cross-lingual neighbors against a translators’ dictionary

Analysis of 200 noun pairs (EN-DE)

Meta-Relation Relation Frequency

Example

Synonymy (50%) 99 Verhältnis - relationship

Semantic similarity (16%)

Antonymy 1 Inneres - exteriorCo-Hyponymy

15 Straßenbahn - bus

Hyponymy 3 Kunstwerk - painting

Hypernymy 15 Dramatiker - poetSemantic relatedness (19%)

39 Kapitel - essay

Errors (14%) 28 DDR-Zeit – trainee

Similarity by relation

How to proceed?

•Classical reaction: Focus on cross-lingual synonyms• Aggressive filtering of nearest-neighbor lists • Risk: Sparse data issues

•Our hypothesis (prelimimary version):• Non-synonymous pairs still provide information about

bilingual similarity• Should be exploited for cross-lingual knowledge transfer• Experimental validation: Vary number of synonyms,

observe effect on cross-lingual knowledge transfer

Varying the number of neighbors

•Nearest neighbors: 50% of synonyms•Further neighbors: quick decline to 10% of synonyms

Experimental setup

rouler – drive

bagnole – jalopy, banger,

car

(bagnole, subj, rouler) English model

Englishcorpus

Consider plausibilities für:

(jalopy, subj, drive)(banger, subj, drive)

(car, subj, drive)

Details• Model:• English model: trained on BNC as before• Bilingual lexicon extracted from BNC und

Stuttgarter Nachrichtenkorpus HGC as comparable corpora

• Prediction based on n nearest English neighbours for German argument

• Evaluation:• 90 German (v,r,a) triples with human

plausibility ratings [Brockmann & Lapata 2003]

Results – EN-DE

1 NN

2 NN

3 NN

4 NN

5 NN

Translated English EPP 0.34 0.41 0.44 0.46 0.40

Model Resources Sperman’s ρResnik [Brockmann & Lapata 2002]

TIGER corpus, German Word Net

.37

EPP German [Pado & Peirsman 2010]

HGC corpus parsed with PCFG

.33

• Result: Transfer model significantly better than monolingual model, but only if non-synonymous neighbors are included

Results: Details

1 NN

2 NN

3 NN

4 NN 5 NN

English EPP (all ) 0.34 0.41 0.44 0.46 0.40

English EPP (subjects) 0.53 0.51 0.56 0.56 0.55English EPP (objects) 0.58 0.61 0.61 0.64 0.58

English EPP (pp objects)

0.33 0.45 0.45 0.46 0.42

Sources of the positive effect•Non-synonyms are in fact informative for plausibility translation

•Semantically similar verbs: eat – munch – feast• Similar events, similar arguments [Fillmore et al.

2003, Levin 1993]

•Semantically related verbs: peel – cook – eat• Schemas/narrative chains: shared participants

[Shank & Abelson 1977, Chambers & Jurafsky 2009]

Our hypothesis with qualifications

• Using non-synonymous translation pairs is helpful1. if transferred knowledge is lexical• Many infrequently observed datapoints

2. if knowledge is stable across semantically related/similar word pairs

• Counterexample: polarity/sentiment judgments• food – feast – grub • Parallel experiment: best results for single

nearest neighbor

Summary

•Plausibility can be modeled with fairly shallow methods• Seen head words plus generalization in vector

space• Precondition: accurately parsed corpus•If unavailable: Transfer from better-endowed language• Translation through automatically induced

lexicons•Transfer of knowledge about certain phenomena can benefit from non-synonymous translations• Corresponding to monolingual results from QA

[Harabagiu et al. 2000], paraphrases [Lin & Pantel 2001], entailment [Dagan et al. 2006], …

mono- and bilingual modeling of selectional preferences

Documents