25.11.2003csa3050: parsing algorithms 11 csa350: nlp algorithms parsing algorithms 1 top down...
TRANSCRIPT
25.11.2003 csa3050: Parsing Algorithms 1 1
CSA350: NLP Algorithms
Parsing Algorithms 1
• Top Down
• Bottom-Up
• Left Corner
25.11.2003 csa3050: Parsing Algorithms 1 2
References
• This lecture is based on material found in Juracky & Martin chapter 10.
• Relevant material available from Vince.
25.11.2003 csa3050: Parsing Algorithms 1 3
Simple Grammar
fl ||||
| |
25.11.2003 csa3050: Parsing Algorithms 1 4
Parsing Problem
• Find all trees such that:– root is S– leaves exactly cover all the input words, e.g.
fl
25.11.2003 csa3050: Parsing Algorithms 1 5
Parsing as Search
• Search within a space defined by– Start State– Goal State– State to state transformations
• Shape of space depends on parsing strategy• Two distinct strategies for finding the parse
trees:– Top down– Bottom up
25.11.2003 csa3050: Parsing Algorithms 1 6
Top Down
• Top down parser tries to build from the root node S down to the leaves by replacing nodes with non-terminal labels with RHS of corresponding grammar rules.
• Nodes with pre-terminal (word class) labels are compared to input words.
25.11.2003 csa3050: Parsing Algorithms 1 7
Top Down Search Space
Start node →
Goal node↓
25.11.2003 csa3050: Parsing Algorithms 1 8
Bottom Up
• Each state is a forest of trees.
• Start node is a forest of nodes labelled with pre-terminal categories (word classes derived from lexicon)
• Transformations look for places where RHS of rules can fit.
• Any such place is replaced with a node labelled with LHS of rule.
25.11.2003 csa3050: Parsing Algorithms 1 9
Bottom Up Search Space
fl fl
fl fl fl
fl fl
25.11.2003 csa3050: Parsing Algorithms 1 10
Top Down vs Bottom Up
• Top down – For: Never wastes
time exploring trees that cannot be derived from S
– Against: Can generate trees that are not consistent with the input
• Bottom up– For: Never wastes
time building trees that cannot lead to input text segments.
– Against: Can generate subtrees that can never lead to an S node.
25.11.2003 csa3050: Parsing Algorithms 1 11
Development of a Concrete Strategy
• Combine best features of both top down and bottom up strategies.– Top down, grammar directed control.– Bottom up filtering.
• Examination of alternatives in parallel uses too much memory.
• Depth first strategy using agenda-based control.
25.11.2003 csa3050: Parsing Algorithms 1 12
Top Down Algorithm
25.11.2003 13
Derivation top down,
left-to-right,
depth first
25.11.2003 csa3050: Parsing Algorithms 1 14
A Problem with the Algorithm
• Note that the first three steps of the parse involve a failed attempt to expand the first ruleS → NP VP.
• The parser recursively expands the leftmost NT of this rule (NP).
• While all this work is going on, the input is not even consulted!
• Only when a terminal symbol is encountered is the input compared and the failure discovered.
• This is pretty inefficient.
25.11.2003 csa3050: Parsing Algorithms 1 15
Bottom Up Filtering
• We know the current input word must serve as the first word in the derivation of the unexpanded node the parser is currently processing.
• Therefore the parser should not consider grammar rule for which the current word cannot serve as the "left corner"
• The left corner is the first word (or preterminal node) along the left edge of a derivation.
25.11.2003 csa3050: Parsing Algorithms 1 16
Left Corner
The nodes Verb and prefer are each left corners of VP
fl fl
25.11.2003 csa3050: Parsing Algorithms 1 17
Left Corner
• B is a left corner of A iffA * Bαfor non-terminal A, pre-terminal B and symbol string α.
• Possible left corners of all non-terminal categories can be determined in advance and placed in a table.
25.11.2003 csa3050: Parsing Algorithms 1 18
Example of Left Corner Table
Category Left Corners
S
NP
Nominal
VP
Det, Proper-Noun, Aux, Verb
Det, Proper-Noun
Noun
Verb
25.11.2003 csa3050: Parsing Algorithms 1 19
How to use the Left Corner Table
• If attempting to parse category A, only consider rules A → Bα for which category(current input) LeftCorners(B)
• S → NP VPS → Aux NP VPS → VP
25.11.2003 csa3050: Parsing Algorithms 1 20
Next Week
• Problems with top down parser
• left recursion
• repeated work
• Early Algorithm
• Assignment
• See J & M ch 10