natural langauge understanding assignment 1: semantic parsing

4

Click here to load reader

Upload: antony-ingram

Post on 19-Jan-2016

16 views

Category:

Documents


3 download

DESCRIPTION

Submission Information To submit your assignment use the submit command in the direc-tory with your code and answers and run the following command:submit nlu cw1 geoparser.py output*.txt answers.pdfPlease do not submit any other code/data.Plagiarism Please remember that plagiarism

TRANSCRIPT

Page 1: Natural Langauge Understanding Assignment 1: Semantic Parsing

Natural Langauge Understanding

Assignment 1: Semantic Parsing

Siva Reddy, Frank Keller, Mirella LapataDue date: Friday 14th February 2014, 4:00pm

Submission Information To submit your assignment use the submit command in the direc-tory with your code and answers and run the following command:

submit nlu cw1 geoparser.py output*.txt answers.pdf

Please do not submit any other code/data.

Plagiarism Please remember that plagiarism is a university offense. Do not show your writ-ten/coded solutions to anyone else, or try to see anyone else’s, and do not discuss the specifics ofyour solutions with other students. You may discuss the general topics surrounding the problemswith one another, ideally after you have considered them yourself. However, to ensure that youactually understand the issues yourself, you must write up your solutions by yourself, away fromyour friends. The solution or approach you describe should be one you have chosen. If youdon’t understand it don’t write it – it will generally be obvious you don’t understand. And ifyou have questions or problems involving the specifics of your solution, please contact the courseteaching staff rather than your fellow students. Finally, if you choose to use any outside sourcesof information or ideas for this or future assignments, remember to acknowledge those sourcesappropriately (e.g., provide the URL, the name of the book, or, if you’re at a loss for specifics,even say “In my class on XX last year, we learned that . . . ”).

Question 1 [10 marks]

Answer all the questions in a PDF file named answers.pdf. These questions test your under-standing of the use of lambda calculus for semantic construction (if you require a refresher onlambda calculus, please consult Jurafsky and Martin, Ch. 17 and 18).

1. Simplify (λx.x)2.

2. Simplify (λx.x ∗ x)3.

3. Simplify ((λx.λy.eat(y, x))John)Pizza.

4. Simplify (λf.f(f(f(x))))g.

5. Simplify (λf.f(f(f(x))))(λt.a(c(t))).

6. If F(Edinburgh) = Scotland loves Edinburgh, what is F? (Write a lambda expression for F.)

7. If A = λx.president(x), B = λy.human(y) and ((C)A)B = ∃z[statement(z, president(z),human(z))], what is C?

8. If F (λx.peak(Texas, x)) = (λx.highest(peak(Texas, x))) what is F?

1

Page 2: Natural Langauge Understanding Assignment 1: Semantic Parsing

9. In the sentence Edinburgh loves Scotland what is the lambda expression for the verb?

10. In the sentence Edinburgh loves Scotland what is the lambda expression for the verb phrase?

Question 2 [20 marks]

Your task is to implement a Semantic Parser in Python which converts natural language (NL)questions to database (DB) queries. For example,

NL Question : which rivers run through states bordering newmexicoDB Query : answer(A,(river(A), traverse(A,B),state(B),

next_to(B,C), const(C,stateid(newmexico))))

We will work with a database of US geography called Geoquery. When we run the abovequery on Geoquery, we get the following answers: [”arkansas”, ”canadian”, ”cimarron”, ”col-orado”, ”gila”, ”green”, ”neosho”, ”north platte”, ”pecos”, ”red”, ”republican”, ”rio grande”,”san juan”, ”smoky hill”, ”south platte”, ”washita”]. You are provided with geolib.py, a pythonAPI containing useful functions. Use geolib.execute geoquery to retrieve answers to databasequeries. Please download the code from from the NLU course homepage.

Your first task is to build semantic parses of questions from their corresponding syntacticphrase structure trees. To simplify the problem, we will assume syntactic tree and semanticparse tree have equivalent structure. You are provided with a lexicon which maps leaf wordsto lambda semantic categories (use the function geolib.word to semantic categories to retrieveall semantic categories of a word). Use these semantic categories and build up the semanticparses of each subtree in the syntactic tree in a bottom-up fashion using lambda functionalapplication. An example semantic parse construction is shown in Figure 1. Here, the wordbordering consumes newmexico to form bordering newmexico, and this in turn is consumed bystates to form states bordering newmexico. To simplify further, assume a word on the left handside consumes (functional application) the word on the right hand side.

The syntactic trees are provided to you along with the question in json format. You coulduse your own tree parser or use geolib.read tree and geolib.get children functions to load treesinto objects. Write all your code in geoparser.py.

1. Complete the function convert tree to semantic expressions. It should retrieve all the se-mantic expressions of a sentence. Pseudo code is provided inside the function. Use thepackage nltk.logic to work with lambda expressions. [10 marks]

2. How would you reduce the number of parses generated? (You need not implement this.)[5 marks]

3. How do you think we built the lexicon files data/main lexicon.txt and data/types.txt?[5 marks]

Question 3 [50 marks]

In this question, use the training data (data/training.json.txt) to learn a Structured Perceptronmodel (code provided in structuredperceptron.py) which ranks semantic parses of a given sentence.Your learning aim is to score the correct parse higher than all other predictions. As you know,Question 2.1 predicts all possible parses of a given sentence, but only one (or few) among themwill be correct. Use the gold query provided in the training data to pick a gold parse. Using the

2

Page 3: Natural Langauge Understanding Assignment 1: Semantic Parsing

first best prediction of your model and the gold parse, you can train your model to learn to rankgold parses higher than other predicted parses.

Define a set of features which you think should help in scoring the correct parses higher. Afeature has a name and a tupled value containing different linguistic qualities of a semantic parsewhich help in determining if the parse is good. For example, we know that whenever run appearsin the NL question, we are likely to see the predicate traverse in the DB query. So we can createa feature named Word Predicate with (word, predicate) as its value, e.g., (run, traverse). Youcan use tuple of any length but as size increases the sparsity increases too. You can create afeature using the class geolib.Feature(feature name, feature value). Another useful feature couldbe Predicate Predicate which signifies the frequently co-occurring predicates, e.g., traverse andriver. Implement this feature. Adding additional features will attract extra credit.

The code already implements Word Predicate feature. It is easier to extract features whileyou are building up a parse (in convert tree to semantic expressions itself) than to extract fea-tures after getting the parse. While building the parse from constituents, sum up the featuresof the constituents to form a new feature dictionary along with adding new features. Semantic-ParseLambda class contains lambda expression and its features stored in a dictionary. A featuresdictionary contains the features and their associated frequency/probability in the semantic parse.

Once you extract features, you can train a perceptron model using predicted and gold parsefeatures. If properly trained, your model will learn to weight good features like Feature{name:Word Word, value:(run, traverse)} with higher weights.

Split up the training data into two parts one, for training and the other for development.Train your model on the training set and test it on the development set to get an idea of yourmodel’s performance and to understand which features worked. You can write a simple evaluationfunction to see how many of your predictions are correct. Once you are satisfied, retrain themodel on the complete data. You might also have to choose the number of training iterations tobuild a good model (for large number of iterations the model might overfit, and for small numberof iterations the model might not generalize well).

1. Give some examples of useful features. [5 marks]

2. Complete the function learn from training data (check the pseudo code for details). [15 marks]

3. Complete the function rank semantic parse queries (check the pseudo code). [10 marks]

4. Complete the function predict test data answers (check the pseudo code). [10 marks]

5. How good is your model on the development dataset? [10 marks]

6. Run evaluate me which will generate outputs required for us to evaluate your model 2.1,3.2, 3.3, 3.4.

Question 4 [20 marks]

1. Give an example sentence where our simplified assumption of left hand side node consumingright hand side node in the syntactic phrase tree fails to generate correct semantic parse.How do you think you could handle these? [10 marks]

2. How do you think you can learn a semantic parser when you are not provided with goldstandard queries like we did in this assignment? [10 marks]

3

Page 4: Natural Langauge Understanding Assignment 1: Semantic Parsing

S t∃x

[answ

er(x,(r

iver(x)∧∃y

[tra

vers

e(x,y)∧

state

(y)∧∃y

′ [nextt

o(y,y′ )∧

const

(y′ ,

state

id(n

ew

mexic

o))

]]))

]

WH

NP

〈et,t〉

λQ.[∃x

[answ

er(x,(r

iver(x)∧Q

(x))

)]]

WD

T〈et,〈et,t〉〉

λP.λQ.[∃x

[answ

er(x,(P

(x)∧Q

(x))

)]]

whic

h〈et,〈et,t〉〉

λP.λQ.[∃x

[answ

er(x,(P

(x)∧Q

(x))

)]]

NPet

λx.[ri

ver(x)]

NN

Set

λx.[ri

ver(x)]

rivers

et

λx.[ri

ver(x)]

VPet

λx.[∃y

[tra

vers

e(x,y)∧

state

(y)∧∃y

′ [nextt

o(y,y′ )∧

const

(y′ ,

state

id(n

ew

mexic

o))

]]]

VB

P〈et,et〉

λP.λx.[∃y

[tra

vers

e(x,y)∧P

(y)]

]

run

〈et,et〉

λP.λx.[∃y

[tra

vers

e(x,y)∧P

(y)]

]

PP et

λx.[st

ate

(x)∧∃y

[nextt

o(x,y)∧

const

(y,st

ate

id(n

ew

mexic

o))

]]

IN〈et,et〉

λP.λx.[P

(x)]

thro

ugh

〈et,et〉

λP.λx.[P

(x)]

NPet

λx.[st

ate

(x)∧∃y

[nextt

o(x,y)∧

const

(y,st

ate

id(n

ew

mexic

o))

]]

NP

〈et,et〉

λP.λx.[st

ate

(x)∧P

(x)]

NN

S〈et,et〉

λP.λx.[st

ate

(x)∧P

(x)]

state

s〈et,et〉

λP.λx.[st

ate

(x)∧P

(x)]

VPet

λx.[∃y

[nextt

o(x,y)∧

const

(y,st

ate

id(n

ew

mexic

o))

]]

VB

G〈et,et〉

λP.λx.[∃y

[nextt

o(x,y)∧P

(y)]

]

bord

eri

ng

〈et,et〉

λP.λx.[∃y

[nextt

o(x,y)∧P

(y)]

]

NPet

λx.[const

(x,st

ate

id(n

ew

mexic

o))

]

NN et

λx.[const

(x,st

ate

id(n

ew

mexic

o))

]

new

mexic

oet

λx.[const

(x,st

ate

id(n

ew

mexic

o))

]

Fig

ure

1:Se

man

tic

Par

seco

nstr

ucti

onfr

omP

hras

eSt

ruct

ure

Tre

efo

rth

ese

nten

cewhi

chri

vers

run

thro

ugh

stat

esbo

rder

ing

newm

exic

o.

Fig

ure

gene

rate

dus

inghttp://dylnb.github.io/LambdaCalculator/