october 2005csa3180: parsing algorithms 11 csa350: nlp algorithms sentence parsing i the parsing...

20
October 2005 csa3180: Parsing Algorith ms 1 1 CSA350: NLP Algorithms Sentence Parsing I • The Parsing Problem • Parsing as Search • Top Down/Bottom Up Parsing Strategies

Upload: neil-shepherd

Post on 03-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies

October 2005 csa3180: Parsing Algorithms 1 1

CSA350: NLP Algorithms

Sentence Parsing I • The Parsing Problem• Parsing as Search• Top Down/Bottom Up Parsing Strategies

Page 2: October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies

October 2005 csa3180: Parsing Algorithms 1 2

References

• This lecture is based on material found in– Jurafsky & Martin chapter 10

• Relevant material available from Vince.

Page 3: October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies

October 2005 csa3180: Parsing Algorithms 1 3

Why not use FS techniques for parsing NL sentences

• Descriptive Adequacy– some NL phenomena cannot be described within FS

framework.– example: central embedding

• Notational Adequacy– Elegance with which notation describes the real-world

objects. Elegance implies• Notation which allows short descriptions.• Notation which exploits similarities between different

structures and permits general properties to be stated.• Representation of dependency and hierarchy

Page 4: October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies

October 2005 csa3180: Parsing Algorithms 1 4

Central Embedding

• The following sentences– The cat spat

1 1– The cat the boy saw spat

1 2 2 1– The cat the boy the girl liked saw spat

1 2 3 3 2 1

• Require at least a grammar of the formS → An Bn

Page 5: October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies

October 2005 csa3180: Parsing Algorithms 1 5

DCG-style Grammar/Lexicon

s --> np, vp.s --> aux, np, vp.

s --> vp.np --> det nom.nom --> noun.nom --> noun, nom.nom --> nom, pppp --> prep, np.np --> pn.vp --> v.vp --> v np

d --> [that];[this];[a].n --> [book];[flight];

[meal];[money].v --> [book];[include];

[prefer].aux --> [does].prep --> [from];[to];

[on].pn --> [‘Houston’];

[‘TWA’].

Page 6: October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies

October 2005 csa3180: Parsing Algorithms 1 6

Parse Tree

A valid parse tree for a grammar G is a tree– whose root is the start symbol for G – whose interior nodes are nonterminals of G – whose children of a node T (from left to right)

correspond to the symbols on the right hand side of some production for T in G.

– whose leaf nodes are terminal symbols of G.

• Every sentence generated by a grammar has a corresponding valid parse tree

• Every valid parse tree exactly covers a sentence generated by the grammar

Page 7: October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies

October 2005 csa3180: Parsing Algorithms 1 7

Parsing Problem• Given grammar G and sentence A find all valid

parse trees for G that exactly cover A

S

VP

NPV

DetNom

Nbook

that

flight

Page 8: October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies

October 2005 csa3180: Parsing Algorithms 1 8

Soundness and Completeness

• A parser is sound if every parse tree it returns is valid.

• A parser is complete for grammar G if for all sL(G)– it terminates– it produces the corresponding parse tree

• For many purposes, we settle for sound but incomplete parsers

Page 9: October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies

October 2005 csa3180: Parsing Algorithms 1 9

Parsing as Search

• Search within a space defined by– Start State– Goal State– State to state transformations

• Two distinct parsing strategies:– Top down– Bottom up

• Different parsing strategy, different state space, different problem.

• Parsing strategy ≠ search strategy

Page 10: October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies

October 2005 csa3180: Parsing Algorithms 1 10

Top Down

• Each state is a tree (which encodes the current state of the parse).

• Top down parser tries to build from the root node S down to the leaves by replacing nodes with non-terminal labels with RHS of corresponding grammar rules.

• Nodes with pre-terminal (word class) labels are compared to input words.

Page 11: October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies

October 2005 csa3180: Parsing Algorithms 1 11

Top Down Search Space

Start node →

Goal node↓

Page 12: October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies

October 2005 csa3180: Parsing Algorithms 1 12

Bottom Up

• Each state is a forest of trees.

• Start node is a forest of nodes labelled with pre-terminal categories (word classes derived from lexicon)

• Transformations look for places where RHS of rules can fit.

• Any such place is replaced with a node labelled with LHS of rule.

Page 13: October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies

October 2005 csa3180: Parsing Algorithms 1 13

Bottom Up Search Space

fl fl

fl fl fl

fl fl

Page 14: October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies

October 2005 csa3180: Parsing Algorithms 1 14

Top Down vs Bottom UpGeneral

• Top down – For: Never wastes

time exploring trees that cannot be derived from S

– Against: Can generate trees that are not consistent with the input

• Bottom up– For: Never wastes

time building trees that cannot lead to input text segments.

– Against: Can generate subtrees that can never lead to an S node.

Page 15: October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies

October 2005 csa3180: Parsing Algorithms 1 15

Top Down Parsing - Remarks

• Top-down parsers do well if there is useful grammar driven control: search can be directed by the grammar.

• Left recursive rules can cause problems.• A top-down parser will do badly if there are many

different rules for the same LHS. Consider if there are 600 rules for S, 599 of which start with NP, but one of which starts with V, and the sentence starts with V.

• Top-down is unsuitable for rewriting parts of speech (preterminals) with words (terminals). In practice that is always done bottom-up as lexical lookup.

• Useless work: expands things that are possible top-down but not there.

• Repeated work: anywhere there is common substructure

Page 16: October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies

October 2005 csa3180: Parsing Algorithms 1 16

Bottom Up Parsing - Remarks

• Empty categories: termination problem unless rewriting of empty constituents is somehow restricted (but then it’s generally incomplete)

• Inefficient when there is great lexical ambiguity (grammar driven control might help here)

• Conversely, it is data-directed: it attempts to parse the words that are there.

• Both TD (LL) and BU (LR) parsers can do work exponential in the sentence length on NLP problems

• Useless work: locally possible, but globally impossible.• Repeated work: anywhere there is common substructure

Page 17: October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies

October 2005 csa3180: Parsing Algorithms 1 17

Development of a Concrete Strategy

• Combine best features of both top down and bottom up strategies.– Top down, grammar directed control.– Bottom up filtering.

• Examination of alternatives in parallel uses too much memory.

• Depth first strategy using agenda-based control.

Page 18: October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies

October 2005 csa3180: Parsing Algorithms 1 18

Top Down Algorithm

Page 19: October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies

October 2005 csa3180: Parsing Algorithms 1 19

Derivation top down,

left-to-right,

depth first

Page 20: October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies

October 2005 csa3180: Parsing Algorithms 1 20

A Problem with the Algorithm

• Note that the first three steps of the parse involve a failed attempt to expand the first ruleS → NP VP.

• The parser recursively expands the leftmost NT of this rule (NP).

• While all this work is going on, the input is not even consulted!

• Only when a terminal symbol is encountered is the input compared and the failure discovered.

• This is pretty inefficient.