mono- and bilingual modeling of selectional preferences

25
Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin Erk, Ulrike Pado, Yves Peirsman)

Upload: melosa

Post on 24-Feb-2016

40 views

Category:

Documents


0 download

DESCRIPTION

Mono- and bilingual modeling of selectional preferences. Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin Erk , Ulrike Pado, Yves Peirsman ). Some context. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Mono- and bilingual modeling  of  selectional  preferences

Mono- and bilingual modeling of selectional preferences

Sebastian PadóInstitute for Computational Linguistics

Heidelberg University

(joint work with Katrin Erk, Ulrike Pado, Yves Peirsman)

Page 2: Mono- and bilingual modeling  of  selectional  preferences

Some context

•Computational lexical semantics: modeling the meaning of words and phrases

•Distributional approach• Observe the usage of words in corpora

• Robustness: Broad coverage, manageable complexity • Flexibility: Corpus choice determines model

Knowledge

Corpus

Page 3: Mono- and bilingual modeling  of  selectional  preferences

Structure

Methods:Distributional

semantics

Phenomena:Semantic

relations in bilingual

dictionaries

Application:Predictions of plausibility judgments

Page 4: Mono- and bilingual modeling  of  selectional  preferences

Plausibility of Verb-Relation-Argument-TriplesVerb Relation Argumen

tPlausibility

eat subject customer 6.9eat object customer 1.5eat subject apple 1.0eat object apple 6.4• Central aspect of language

• Selectional preferences [Katz & Fodor 1963, Wilks 1975]• Generalization of lexical similarity• Incremental language processing [McRae &

Matsuki 2009]• Disambiguation [Toutanova et al. 2005],

Applicability of inference rules [Pantel et al. 2007], SRL [Gildea & Jurafsky 2002]

Page 5: Mono- and bilingual modeling  of  selectional  preferences

Modelling Plausibility•Approximating plausibility by frequency

•Two lexical variables: Frequency of most triples is zero•Implausibility or sparse data?• Generalization based on an ontology (WordNet)

[Resnik 1996]• Generalization based on vector space [Erk, Padó, und

Padó 2010]

English corpus

(eat, obj, apple) 100

(eat, obj, hat) 1(eat, obj,

telephone) 0(eat, obj, caviar) 0

(eat, obj, apple): highly plausible(eat, obj, hat): somewhat plausible(eat, obj, telephone): ?(eat, obj, caviar): ?

Page 6: Mono- and bilingual modeling  of  selectional  preferences

Semantic Spaces

• Characterization of word meaning though profile over occurrence contexts [Salton, Wang, and Yang 1974, Landauer & Dumais 1997, Schütze 1998]

• Geometrically: Vector in high-dimensional space

• High vector similarity implies high semantic similarity• Next neighbors = synonyms

cultiver

rouler

mandarine

5 1

clémentine

4 1

voiture 1 20

Frcultiver

rouler

mandarineclémentine

voiture

Page 7: Mono- and bilingual modeling  of  selectional  preferences

Similarity-based generalization[Pado, Pado & Erk 2010]

•Plausibility is average vector space similarity to seen arguments

• (v, r, a): verb – relation – argument head word triple

• seenargs: set of argument head words seen in the corpus

• wt: weight function• Z: normalization constant• sim: semantic (vector space) similarity

Page 8: Mono- and bilingual modeling  of  selectional  preferences

Geometrical interpretation

Peter

husbandchild

orangeapple

breakfastcaviar Seen objects of

“eat”

Seen subjects of “eat”

telephone

Page 9: Mono- and bilingual modeling  of  selectional  preferences

Evaluation

•Triples with human plausibility ratings [McRae et al. 1996]• Evaluation: Correlation of model

predictions with human judgments• Spearman’s ρ = 1: perfect correlation; ρ = 0:

no correlation•Result: Vector space model attains almost quality of “deep” model at 98% coverage

Modell Abdeckung

Spearman’s rho

Resnik 1996 [ontology-based]

100% 0.123 n.s.

EPP [vector space-based] 98% 0.325 ***U. Pado et al. 2006 [“deep” model]

78% 0.415 ***

Page 10: Mono- and bilingual modeling  of  selectional  preferences

From one to many languages…

•Vector space model reduces the need for language resources to predict plausibility judgments• No ontologies•Still necessary: Observations of triples, target words• Large, accurately parsed corpus• Problematic for basically all languages except

English

•Can we extend our strategy to new languages?

Resnik [Brockmann & Lapata 2002]

TIGER+ GermaNet

ρ= .37

EPP [Pado & Peirsman 2010]

HGC ρ= .33

Page 11: Mono- and bilingual modeling  of  selectional  preferences

Predicting plausibility for new languages

•Transfer with a bilingual lexicon [Koehn and Knight 2002]• Cross-lingual knowledge transfer

•Print dictionaries are problematic• Instead: acquire from distributional data

cultiver – grow

pomme – apple

(cultiver, Obj, pomme) Englishmodel

Englishcorpus

(grow, obj, apple): highly plausible

Page 12: Mono- and bilingual modeling  of  selectional  preferences

Bilingual semantic space

• Joint semantic space for words from both languages [Rapp 1995, Fung & McKeown 1997]• Dimensions are bilingual word pairs, can be

bootstrapped• Frequencies observable from comparable

corpora• Nearest neighbors:

Cross-lingual synonyms ⟷ Translations

(cultiver, grow)

(rouler, drive)

mandarine

5 1

mandarin

4 2

car 1 20

Fr

cultiver/grow

rouler/drive

mandarinemandarin

carE

Page 13: Mono- and bilingual modeling  of  selectional  preferences

Nearest neighbors in bilingual space

• Similar usages / context profiles do not necessarily indicate synonymy

(cultiver, grow)

(rouler, drive)

pear 5 1pomme 4 2car 1 20

Fr

cultiver/grow

rouler/drive

pearpomme

carE

• Bilingual case: Peirsman & Pado (2011)• Lexicon extraction for EN/DE and

EN/NL

Page 14: Mono- and bilingual modeling  of  selectional  preferences

Evaluation against Gold Standard

•Evaluation of nearest cross-lingual neighbors against a translators’ dictionary

Page 15: Mono- and bilingual modeling  of  selectional  preferences

Analysis of 200 noun pairs (EN-DE)

Meta-Relation Relation Frequency

Example

Synonymy (50%) 99 Verhältnis - relationship

Semantic similarity (16%)

Antonymy 1 Inneres - exteriorCo-Hyponymy

15 Straßenbahn - bus

Hyponymy 3 Kunstwerk - painting

Hypernymy 15 Dramatiker - poetSemantic relatedness (19%)

39 Kapitel - essay

Errors (14%) 28 DDR-Zeit – trainee

Page 16: Mono- and bilingual modeling  of  selectional  preferences

Similarity by relation

Page 17: Mono- and bilingual modeling  of  selectional  preferences

How to proceed?

•Classical reaction: Focus on cross-lingual synonyms• Aggressive filtering of nearest-neighbor lists • Risk: Sparse data issues

•Our hypothesis (prelimimary version):• Non-synonymous pairs still provide information about

bilingual similarity• Should be exploited for cross-lingual knowledge transfer• Experimental validation: Vary number of synonyms,

observe effect on cross-lingual knowledge transfer

Page 18: Mono- and bilingual modeling  of  selectional  preferences

Varying the number of neighbors

•Nearest neighbors: 50% of synonyms•Further neighbors: quick decline to 10% of synonyms

Page 19: Mono- and bilingual modeling  of  selectional  preferences

Experimental setup

rouler – drive

bagnole – jalopy, banger,

car

(bagnole, subj, rouler) English model

Englishcorpus

Consider plausibilities für:

(jalopy, subj, drive)(banger, subj, drive)

(car, subj, drive)

Page 20: Mono- and bilingual modeling  of  selectional  preferences

Details• Model:• English model: trained on BNC as before• Bilingual lexicon extracted from BNC und

Stuttgarter Nachrichtenkorpus HGC as comparable corpora

• Prediction based on n nearest English neighbours for German argument

• Evaluation:• 90 German (v,r,a) triples with human

plausibility ratings [Brockmann & Lapata 2003]

Page 21: Mono- and bilingual modeling  of  selectional  preferences

Results – EN-DE

1 NN

2 NN

3 NN

4 NN

5 NN

Translated English EPP 0.34 0.41 0.44 0.46 0.40

Model Resources Sperman’s ρResnik [Brockmann & Lapata 2002]

TIGER corpus, German Word Net

.37

EPP German [Pado & Peirsman 2010]

HGC corpus parsed with PCFG

.33

• Result: Transfer model significantly better than monolingual model, but only if non-synonymous neighbors are included

Page 22: Mono- and bilingual modeling  of  selectional  preferences

Results: Details

1 NN

2 NN

3 NN

4 NN 5 NN

English EPP (all ) 0.34 0.41 0.44 0.46 0.40

English EPP (subjects) 0.53 0.51 0.56 0.56 0.55English EPP (objects) 0.58 0.61 0.61 0.64 0.58

English EPP (pp objects)

0.33 0.45 0.45 0.46 0.42

Page 23: Mono- and bilingual modeling  of  selectional  preferences

Sources of the positive effect•Non-synonyms are in fact informative for plausibility translation

•Semantically similar verbs: eat – munch – feast• Similar events, similar arguments [Fillmore et al.

2003, Levin 1993]

•Semantically related verbs: peel – cook – eat• Schemas/narrative chains: shared participants

[Shank & Abelson 1977, Chambers & Jurafsky 2009]

Page 24: Mono- and bilingual modeling  of  selectional  preferences

Our hypothesis with qualifications

• Using non-synonymous translation pairs is helpful1. if transferred knowledge is lexical• Many infrequently observed datapoints

2. if knowledge is stable across semantically related/similar word pairs

• Counterexample: polarity/sentiment judgments• food – feast – grub • Parallel experiment: best results for single

nearest neighbor

Page 25: Mono- and bilingual modeling  of  selectional  preferences

Summary

•Plausibility can be modeled with fairly shallow methods• Seen head words plus generalization in vector

space• Precondition: accurately parsed corpus•If unavailable: Transfer from better-endowed language• Translation through automatically induced

lexicons•Transfer of knowledge about certain phenomena can benefit from non-synonymous translations• Corresponding to monolingual results from QA

[Harabagiu et al. 2000], paraphrases [Lin & Pantel 2001], entailment [Dagan et al. 2006], …