earley’s algorithm earley’s algorithm employs the dynamic programming technique to address the...

40
Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming involves storing of results so they don’t ever need to be recomputed. Dynamic programming reduces exponential time requirement to polynomial time requirement: O(N 3 ), where N is length of input in words.

Post on 15-Jan-2016

235 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

Earley’s algorithm

Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing.

Dynamic programming involves storing of results so they don’t ever need to be recomputed.

Dynamic programming reduces exponential time requirement to polynomial time requirement: O(N3), where N is length of input in words.

Page 2: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

Data structure

Earley’s algorithm uses a data structure called a chart to store information about the progress of the parse.

A chart contains an entry for each position in the input A position occurs before the first word, between

words, and after the last word.

word1 word2 … wordN

A position is represented by a number; positions in the input are numbered from 0 (at the left) to N (at the right).

Page 3: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

Chart details

A chart entry consists of a sequence of states. A state represents

– a subtree corresponding to a single grammar rule– information about how much of a rule has been processed– information about the span of the subtree w.r.t. the input

A state is represented by an annotated grammar rule– a dot () is used to show how much of the rule has been

processed– a pair of positions, [x,y], indicates the span of the subtree

w.r.t. the input; x is the position of the left edge of the subtree, and y is the position of the dot.

Page 4: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

Three operators on a chart

Predictor– applies when NonTerminal to right of in a state is not a

POS category (i.e. is not a pre-terminal)– adds states to current chart entry

Scanner– applies when NonTerminal to right of in a state is a POS

category (i.e. is a pre-terminal)– adds states to next chart entry

Completer– applies when there is no NonTerminal (and hence no

Terminal) to right of in a state (i.e. is at end)– adds states to current chart entry

Page 5: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

Predictor

Suppose state to which Predictor applies is:

X NT [x,y] Predictor adds, to the current chart entry, a

new state for each possible expansion of NT For each expansion EX of NT, state added is

NT EX [y,y]

Page 6: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

Scanner

Suppose rule to which Scanner applies is:

X POS [x,y] Scanner adds, to the next chart entry, a new

state if the word in the next position can be a member of the category POS.

The new state added is

POS word [y,y+1]

Page 7: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

Completer

Suppose rule to which Completer applies is:X [x,y]

Completer adds, to the current chart entry, a new state for each possible reduction using the (now completed) state

For each state (from any earlier chart entry) of the form

Y X [w,x]a new state of the following form is added

Y X [w,y]

Page 8: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

Completer (modification)

In order to recover parse tree information from the chart once parsing is complete, we need to modify the completer slightly.

Each state in the chart must be given a unique identifier (N for state N)

Each time the completer adds a state, it also adds the unique identifier of the state completed to the list of previous states for that new state (which is a copy of an already existing state, waiting for the category which the current state just completed).

Page 9: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

Initial state of chart for “book that flight”

chart[0] chart[1] chart[2] chart[3]

0: S

Page 10: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

chart[0] – initial state

Id state span previous states

0: S [0,0] []

This is a dummy start state.

Page 11: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

chart[0] – after 0: S (Predictor)

Id state span previous states

0: S [0,0] []

1: S NP VP [0,0] []

2: S Aux NP VP [0,0] []

3: S VP [0,0] []

Page 12: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

chart[0] – after 1: S NP VP (Predictor)

Id state span previous states

0: S [0,0] []

1: S NP VP [0,0] []

2: S Aux NP VP [0,0] []

3: S VP [0,0] []

4: NP Det Nominal [0,0] []

5: NP ProperNoun [0,0] []

Page 13: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

chart[1] – after 2: S Aux NP VP (Scanner)

Since the input does not start with an auxiliary verb, the scanner does not add any state to chart[1], which therefore remains empty.

Page 14: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

chart[0] – after 3: S VP (Predictor)

Id state span previous states

0: S [0,0] []

1: S NP VP [0,0] []

2: S Aux NP VP [0,0] []

3: S VP [0,0] []

4: NP Det Nominal [0,0] []

5: NP ProperNoun [0,0] []

6: VP Verb [0,0] []

7: VP Verb NP [0,0] []

Page 15: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

chart[1] – after 4: NP Det Nominal (Scanner)

Since the input does not start with an determiner, the scanner does not add any state to chart[1], which therefore remains empty.

Page 16: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

chart[1] – after 5: NP ProperNoun (Scanner)

Since the input does not start with an proper noun, the scanner does not add any state to chart[1], which therefore remains empty.

Page 17: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

chart[1] – after 6: VP Verb (Scanner)

Id state span previous states

8: Verb book [0,1] []

Page 18: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

chart[1] – after 7: VP Verb NP (Scanner)

Id state span previous states

8: Verb book [0,1] []

The state to be added is already in chart[1], so no change.

Page 19: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

After finishing processing of chart[0]

chart[0]

0: S [0,0] []

1: S NP VP [0,0] []

2: S Aux NP VP [0,0] []

3: S VP [0,0] []

4: NP Det Nominal [0,0] []

5: NP ProperNoun [0,0] []

6: VP Verb [0,0] []

7: VP Verb NP [0,0] []

chart[1]

8: Verb book [0,1] []

Page 20: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

chart[1] – after 8: Verb book (Completer)

Id state span previous states

8: Verb book [0,1] []

9: VP Verb [0,1] [8]

10: VP Verb NP [0,1] [8]

The completer moves the dot in those states already in a chart state with annotation [0,0]

More generally, for a completed state with annotation [j,k], the completer moves the dot in those states already in a chart state with annotation [i,j].

Page 21: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

chart[1] – after 9: VP Verb (Completer)

Id state span previous states

8: Verb book [0,1] []

9: VP Verb [0,1] [8]

10: VP Verb NP [0,1] [8]

11: S VP [0,1] [9]

Page 22: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

chart[1] – after 10: VP Verb NP (Predictor)

Id state span previous states

8: Verb book [0,1] []

9: VP Verb [0,1] [8]

10: VP Verb NP [0,1] [8]

11: S VP [0,1] [9]

12: NP Det Nominal [1,1] []

13: NP ProperNoun [1,1] []

Page 23: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

chart[1] – after 11: S VP (Completer)

Id state span previous states

8: Verb book [0,1] []

9: VP Verb [0,1] [8]

10: VP Verb NP [0,1] [8]

11: S VP [0,1] [9]

12: NP Det Nominal [1,1] []

13: NP ProperNoun [1,1] []

The book does not process this rule. I’m not sure why.

However, if it were processed it would clearly not indicate a successful parse since it does not span entire input.

Page 24: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

chart[2] – after 12: NP Det Nominal (Scanner)

Id state span previous states

14: Det that [1,2] []

Page 25: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

chart[2] – after 13: NP ProperNoun (Scanner)

Id state span previous states

14: Det that [1,2] []

Since the input does not start with an proper noun, the scanner does not add any state to chart[2], which therefore remains the same.

Page 26: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

After finishing processing of chart[1]

chart[1]

8: Verb book [0,1] []

9: VP Verb [0,1] [8]

10: VP Verb NP [0,1] [8]

11: S VP [0,1] [9]

12: NP Det Nominal [1,1] []

13: NP ProperNoun [1,1] []

chart[2]

14: Det that [1,2] []

Page 27: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

chart[2] – after 14: Det that (Completer)

Id state span previous states

14: Det that [1,2] []

15: NP Det Nominal [1,2] [14]

Page 28: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

chart[2] – after 15: NP Det Nominal (Predictor)

Id state span previous states

14: Det that [1,2] []

15: NP Det Nominal [1,2] [14]

16: Nominal Noun [2,2] []

17: Nominal Noun Nominal [2,2] []

Page 29: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

chart[3] – after 16: Nominal Noun (Scanner)

Id state span previous states

18: Noun flight [2,3] []

Page 30: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

chart[3] – after 17: Nominal Noun Nominal (Scanner)

Id state span previous states

18: Noun flight [2,3] []

The state to be added is already in chart[3], so no change.

Page 31: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

After finishing processing of chart[2]

chart[2]

14: Det that [1,2] []

15: NP Det Nominal [1,2] [14]

16: Nominal Noun [2,2] []

17: Nominal Noun Nominal [2,2] []

chart[3]

18: Noun flight [2,3] []

Page 32: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

chart[3] – after 18: Noun flight (Completer)

Id state span previous states

18: Noun flight [2,3] []

19: Nominal Noun [2,3] [18]

20: Nominal Noun Nominal [2,3] [18]

Page 33: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

chart[3] – after 19: Nominal Noun (Completer)

Id state span previous states

18: Noun flight [2,3] []

19: Nominal Noun [2,3] [18]

20: Nominal Noun Nominal [2,3] [18]

21: NP Det Nominal [1,3] [14, 19]

Page 34: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

chart[3] – after 20: Nominal Noun Nominal (Predictor)

Id state span previous states

18: Noun flight [2,3] []

19: Nominal Noun [2,3] [18]

20: Nominal Noun Nominal [2,3] [18]

21: NP Det Nominal [1,3] [14, 19]

22: Nominal Noun [3,3] []

23: Nominal Noun Nominal [3,3] []

Page 35: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

chart[3] – after 21: NP Det Nominal (Completer)

Id state span previous states

18: Noun flight [2,3] []

19: Nominal Noun [2,3] [18]

20: Nominal Noun Nominal [2,3] [18]

21: NP Det Nominal [1,3] [14, 19]

22: Nominal Noun [3,3] []

23: Nominal Noun Nominal [3,3] []

24: VP Verb NP [0,3] [8, 21]

Page 36: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

chart[3] – after 22: Nominal Noun (Scanner)

Id state span previous states

18: Noun flight [2,3] []

19: Nominal Noun [2,3] [18]

20: Nominal Noun Nominal [2,3] [18]

21: NP Det Nominal [1,3] [14, 19]

22: Nominal Noun [3,3] []

23: Nominal Noun Nominal [3,3] []

24: VP Verb NP [0,3] [8, 21]

Since there is no more input, no new states are added.

Page 37: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

chart[3] – after 23: Nominal Noun Nominal (Scanner)

Id state span previous states

18: Noun flight [2,3] []

19: Nominal Noun [2,3] [18]

20: Nominal Noun Nominal [2,3] [18]

21: NP Det Nominal [1,3] [14, 19]

22: Nominal Noun [3,3] []

23: Nominal Noun Nominal [3,3] []

24: VP Verb NP [0,3] [8, 21]

Since there is no more input, no new states are added.

Page 38: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

chart[3] – after 24: VP Verb NP (Completer)

Id state span previous states

18: Noun flight [2,3] []

19: Nominal Noun [2,3] [18]

20: Nominal Noun Nominal [2,3] [18]

21: NP Det Nominal [1,3] [14, 19]

22: Nominal Noun [3,3] []

23: Nominal Noun Nominal [3,3] []

24: VP Verb NP [0,3] [8, 21]

25: S VP [0,3] [24]

Page 39: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

We’re done!

All states in chart[3] have been processed, no new states have been added to chart[4], and a state with LHS S spanning all the input is in chart[3]:

25: S VP [0,3] [24]

Page 40: Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming

Recovering the tree

The basic idea is to trace back through the “previous state” links:25: S VP [0,3] [24]

24: VP Verb NP [0,3] [8, 21]

21: NP Det Nominal [1,3] [14, 19]

19: Nominal Noun [2,3] [18]

18: Noun flight [2,3] []14: Det that [1,2] []

8: Verb book [0,1] []

S

VP

NPVerb

Det Nominal

book that

Noun

flight