a generative model for parsing natural language to meaning representations

A Generative Model for Parsing Natural Language to Meaning Representations

Wei Lu, Hwee Tou Ng, Wee Sun Lee

National University of Singapore

Luke S. Zettlemoyer

Massachusetts Institute of Technology

Classic Goal of NLP: Understanding Natural Language

• Mapping Natural Language (NL) to Meaning Representations (MR)

How many states do not have rivers ?

2

Natural Language Sentence Meaning Representation

… … … … … … …

Meaning Representation (MR)

3

do not havestates riversHow many ?

QUERY:answer(NUM)

NUM:count(STATE)

STATE:exclude(STATE STATE)

STATE:state(all) STATE:loc_1(RIVER)

RIVER:river(all)

MR production

• Meaning representation production (MR production)

• Example:

NUM:count(STATE)

• Semantic category: NUM• Function symbol: count• Child semantic category: STATE• At most 2 child semantic categories

Task Description

• Training data: NL-MR pairs• Input: A new NL sentence• Output: An MR

Challenge

• Mapping of individual NL words to their associated MR productions is not given in the NL-MR pairs

Mapping Words to MR Productions

do not ?how many states have rivers

QUERY:answer(NUM)

NUM:count(STATE)



RIVER:river(all)

7

Talk Outline

• Generative model – Goal: flexible model that can parse a wide range of

input sentences– Efficient algorithms for EM training and decoding– In practice: correct output is often in top-k list, but is not

always the best scoring option

• Reranking– Global features

• Evaluation– Generative model combined with reranking technique

achieves state-of-the-art performance

8

Hybrid Tree

do not havestates riversHow many ?

QUERY:answer(NUM)

NUM:count(STATE)



RIVER:river(all)

9

NL-MR Pair

Hybrid sequences

Model Parameters

do not

havestates

rivers

How many

?

QUERY:answer(NUM)

NUM:count(STATE)



RIVER:river(all)

P(w,m,T)=P(QUERY:answer(NUM)|-,arg=1)*P(NUM ?|QUERY:answer(NUM)) *P(NUM:count(STATE)|QUERY:answer(NUM),arg=1)*P(How many STATE|NUM:count(STATE))*P(STATE:exclude(STATE STATE)|NUM:count(STATE),arg=1) *P(STATE1 do not STATE2|STATE:exclude(STATE STATE))*P(STATE:state(all)|STATE:exclude(STATE STATE),arg=1)*P(states|STATE:state(all))*P(STATE:loc_1(RIVER)|STATE:exclude(STATE STATE),arg=2)*P(have RIVER|STATE:loc_1(RIVER))*P(RIVER:river(all)|STATE:loc_1(RIVER),arg=1)*P(rivers|RIVER:river(all))

w: the NL sentencem: the MRT: the hybrid tree

MR Model

Parameters

ρ(m’|m,arg=k)

10

Hybrid Patterns

#RHS Hybrid Pattern # Patterns

0 M w 1

1 M [w] Y [w] 4

2M [w] Y [w] Z [w] 8

M [w] Z [w] Y [w] 8

• M is an MR production, w is a word sequence

• Y and Z are respectively the first and second child MR production

Note: [] denotes optional

12

Model Parameters

do not

havestates

rivers

How many

?



RIVER:river(all)

P(How many STATE|NUM:count(STATE))= P(mwY|NUM:count(STATE))* P(How|NUM:count(STATE),BEGIN)* P(many|NUM:count(STATE),How)* P(STATE|NUM:count(STATE),many)* P(END|NUM:count(STATE),STATE)


Emission

Parameters

θ(t|m,Λ)

13

QUERY:answer(NUM)

NUM:count(STATE)

Model Parameters

• MR model parametersΣmi

ρ(mi|mj,arg=k) = 1

They model the meaning representation

• Emission parametersΣt

Θ(t|mj,Λ) = 1

They model the emission of words and semantic categories of MR productions. Λ is the context.

• Pattern parametersΣr

Φ(r|mj) = 1

They model the selection of hybrid patterns

15

Parameter Estimation

• MR model parameters are easy to estimate• Learning the emission parameters and pattern

parameters is challenging• Inside-outside algorithm with EM

– Naïve implementation: O(n6m)– n: number of words in an NL sentence– m: number of MR productions in an MR

• Improved efficient algorithm– Two-layer dynamic programming– Improved time complexity: O(n3m)

16

Decoding

• Given an NL sentence w, find the optimal MR M*: M* = argmaxm P(m|w)

= argmaxmΣT P(m,T |w)

= argmaxmΣT P(w,m,T )

• We find the most likely hybrid tree M* = argmaxmmaxT P(w,m,T )

• Similar DP techniques employed• Implemented Exact top-k decoding algorithm

17

Reranking

• Weakness of the generative model– Lacks the ability to model long range dependencies

• Reranking with the averaged perceptron– Output space

• Hybrid trees from exact top-k (k=50) decoding algorithm for each training/testing instance’s NL sentence

– Single correct reference• Output of Viterbi algorithm for each training instance

– Feature functions• Features 1-5 are indicator functions, while feature 6 is real-

valued.– Threshold b that prunes unreliable predictions even when they

score the highest, to optimize F-measure

18

Reranking Features: Examples

19

do not

havestates

rivers

How many

?

QUERY:answer(NUM)

NUM:count(STATE)



RIVER:river(all)

Feature 1: Hybrid Rule: A MR production and its child hybrid sequenceFeature 2: Expanded Hybrid Rule: A MR production and its child hybrid sequence expandedFeature 3: Long-range Unigram: A MR production and a NL word appearing below in treeFeature 4: Grandchild Unigram: A MR production and its grandchild NL wordFeature 5: Two Level Unigram: A MR production, its parent production, and its child NL wordFeature 6: Model Log-Probability: Logarithm of base model’s joint probability

log(P(w,m,T))

Related Work

• SILT (2005) by Kate, Wong, and Mooney– A system that learns deterministic rules to transform

either sentences or their syntactic parse trees to meaning structures

• WASP (2006) by Wong and Mooney– A system motivated by statistical machine translation

techniques

• KRISP (2006) by Kate and Mooney– A discriminative approach where meaning

representation structures are constructed from the natural language strings hierarchically

20

Evaluation Metrics

• Precision # correct output structures

# output structures

• Recall # correct output structures

# input sentences

• F measure 2

1/Precision + 1/Recall

21

Comparison over three models

• I/II/III: Unigram/Bigram/Mixgram model; +R: w/ reranking• Reranking is shown to be effective• Overall, model III with reranking performs the best

Evaluations

ModelGeoquery (880) Robocup (300)

Prec. Rec. F Prec. Rec. F

I 81.3 77.1 79.1 71.1 64.0 67.4

II 89.0 76.0 82.0 82.4 57.7 67.8

III 86.2 81.8 84.0 70.4 63.3 66.7

I + R 87.5 80.5 83.8 79.1 67.0 72.6

II + R 93.2 73.6 82.3 88.4 56.0 68.6

III + R 89.3 81.5 85.2 82.5 67.7 74.4

22

Comparison with other models

On Geoquery:• Able to handle more than 25% of the inputs that could not be

handled by previous systems• Error reduction rate of 22%

Evaluations

SystemGeoquery (880) Robocup (300)


SILT 89.0 54.1 67.3 83.9 50.7 63.2

WASP 87.2 74.8 80.5 88.9 61.9 73.0

KRISP 93.3 71.7 81.1 85.2 61.9 71.7

Model III + R 89.3 81.5 85.2 82.5 67.7 74.4

23

Evaluations

SystemEnglish Spanish


WASP 95.42 70.00 80.76 91.99 72.40 81.03Model III + R 91.46 72.80 81.07 95.19 79.20 86.46

Comparison on other languages

• Achieves performance comparable to previous system

SystemJapanese Turkish


WASP 91.98 74.40 82.86 96.96 62.40 75.93Model III + R 87.56 76.00 81.37 93.82 66.80 78.04

24

Contributions

• Introduced a hybrid tree representation framework for this task

• Proposed a new generative model that can be applied to the task of transforming NL sentences to MRs

• Developed a new dynamic programming algorithm for efficient training and decoding

• The approach, augmented with reranking, achieves state-of-the-art performance on benchmark corpora, with a notable improvement in recall

25

Questions?

26

a generative model for parsing natural language to meaning representations

Documents