7/2/2015cpsc503 winter 20101 cpsc 503 computational linguistics lecture 8 giuseppe carenini

49
03/25/22 CPSC503 Winter 2010 1 CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini

Upload: elaine-ross

Post on 21-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

04/19/23 CPSC503 Winter 2010 1

CPSC 503Computational Linguistics

Lecture 8Giuseppe Carenini

04/19/23 CPSC503 Winter 2010 2

Today 5/10

• Start Syntax / Parsing (Chp 12!)

04/19/23 CPSC503 Winter 2010 3

Knowledge-Formalisms Map(next three lectures)

Logical formalisms (First-Order Logics)

Rule systems (and prob. versions)(e.g., (Prob.) Context-Free

Grammars)

State Machines (and prob. versions)

(Finite State Automata,Finite State Transducers, Markov Models)

Morphology

Syntax

PragmaticsDiscourse

and Dialogue

Semantics

AI planners

04/19/23 CPSC503 Winter 2010 4

Today 5/10

• English Syntax • Context-Free Grammar for English

– Rules– Trees– Recursion– Problems

• Start Parsing

04/19/23 CPSC503 Winter 2010 5

SyntaxDef. The study of how sentences are formed by

grouping and ordering words

Example: Ming and Sue prefer morning flights

* Ming Sue flights morning and prefer

Groups behave as single unit wrt Substitution, Movement, Coordination

04/19/23 CPSC503 Winter 2010 6

Syntax: Useful tasks

• Why should you care?– Grammar checkers– Basis for semantic interpretation

•Question answering •Information extraction•Summarization

– Machine translation– ……

04/19/23 CPSC503 Winter 2010 7

Key Constituents – with heads (English)• Noun phrases• Verb phrases• Prepositional

phrases • Adjective phrases• Sentences

• (Det) N (PP)• (Qual) V (NP)• (Deg) P (NP)• (Deg) A (PP)• (NP) (I) (VP)

Some simple specifiersCategory Typical function

ExamplesDeterminer specifier of N the, a, this,

no..Qualifier specifier of V never,

often..Degree word specifier of A or P very,

almost..

Complements?

(Specifier) X (Complement)

04/19/23 CPSC503 Winter 2010 8

Key Constituents: Examples• Noun phrases

• Verb phrases

• Prepositional phrases

• Adjective phrases

• Sentences

• (Det) N (PP) the cat on the

table• (Qual) V (NP) never eat a cat• (Deg) P (NP) almost in the net• (Deg) A (PP) very happy

about it• (NP) (I) (VP) a mouse -- ate it

04/19/23 CPSC503 Winter 2010 9

Context Free Grammar (Example)

• S -> NP VP• NP -> Det NOMINAL• NOMINAL -> Noun• VP -> Verb• Det -> a• Noun -> flight• Verb -> left

Terminal

Non-terminal

Start-symbol

04/19/23 CPSC503 Winter 2010 10

CFG more complex Example

LexiconGrammar with example phrases

04/19/23 CPSC503 Winter 2010 11

Context Free Grammars (CFGs)

• Define a Formal Language (un/grammatical sentences)

• Generative Formalism– Generate strings in the language– Reject strings not in the language– Impose structures (trees) on strings

in the language

04/19/23 CPSC503 Winter 2010 12

CFG: Formal Definitions

• 4-tuple (non-term., term., productions, start)

• (N, , P, S)

• P is a set of rules A; AN, (N)*

• A derivation is the process of rewriting 1 into m (both strings in (N)*) by

applying a sequence of rules: 1 * m

• L G = W|w* and S * w

04/19/23 CPSC503 Winter 2010 13

Derivations as Trees

flight

Nominal

Nominal

Context Free?

04/19/23 CPSC503 Winter 2010 14

CFG Parsing

• It is completely analogous to running a finite-state transducer with a tape– It’s just more powerful

• Chpt. 13

Parser

I prefer a morning flight

flight

Nominal

Nominal

04/19/23 CPSC503 Winter 2010 15

Other Options• Regular languages (FSA) A xB or A x

– Too weak (e.g., cannot deal with recursion in a general way – no center-embedding)

• CFGs A (also produce more understandable and “useful” structure)

• Context-sensitive A ; ≠– Can be computationally intractable

• Turing equiv. ; ≠– Too powerful / Computationally

intractable

04/19/23 CPSC503 Winter 2010 16

Common Sentence-Types• Declaratives: A plane left

S -> NP VP

• Imperatives: Leave!S -> VP

• Yes-No Questions: Did the plane leave?S -> Aux NP VP

• WH Questions: Which flights serve breakfast?

S -> WH NP VP

When did the plane leave?S -> WH Aux NP VP

04/19/23 CPSC503 Winter 2010 17

NP: more detailsNP -> Specifiers N Complements

• NP -> (Predet)(Det)(Card)(Ord)(Quant) (AP) Nom

e.g., all the other cheap cars

• Nom -> Nom PP (PP) (PP) e.g., reservation on BA456 from NY to

YVRNom -> Nom GerundVP e.g., flight arriving on Monday Nom -> Nom RelClause Nom RelClause ->(who | that) VP e.g., flight that arrives in the evening

04/19/23 CPSC503 Winter 2010 18

Conjunctive Constructions• S -> S and S

– John went to NY and Mary followed him

• NP -> NP and NP– John went to NY and Boston

• VP -> VP and VP– John went to NY and visited MOMA

• …• In fact the right rule for English is

X -> X and X

04/19/23 CPSC503 Winter 2010 19

Problems with CFGs

• Agreement

• Subcategorization

04/19/23 CPSC503 Winter 2010 20

Agreement• In English,

– Determiners and nouns have to agree in number

– Subjects and verbs have to agree in person and number

• Many languages have agreement systems that are far more complex than this (e.g., gender).

04/19/23 CPSC503 Winter 2010 21

Agreement

• This dog• Those dogs

• This dog eats• You have it• Those dogs eat

• *This dogs• *Those dog

• *This dog eat• *You has it• *Those dogs

eats

04/19/23 CPSC503 Winter 2010 22

Possible CFG Solution

• S -> NP VP• NP -> Det Nom• VP -> V NP• …

• SgS -> SgNP SgVP• PlS -> PlNp PlVP• SgNP -> SgDet SgNom• PlNP -> PlDet PlNom• PlVP -> PlV NP• SgVP3p ->SgV3p NP• …

Sg = singularPl = plural

OLD Grammar

NEW Grammar

04/19/23 CPSC503 Winter 2010 23

CFG Solution for Agreement

• It works and stays within the power of CFGs

• But it doesn’t scale all that well (explosion in the number of rules)

04/19/23 CPSC503 Winter 2010 24

Subcategorization

• *John sneezed the book• *I prefer United has a flight• *Give with a flight

• Def. It expresses constraints that a predicate (verb here) places on the number and type of its arguments (see first table)

04/19/23 CPSC503 Winter 2010 25

Subcategorization

• Sneeze: John sneezed

• Find: Please find [a flight to NY]NP

• Give: Give [me]NP[a cheaper fare]NP

• Help: Can you help [me]NP[with a flight]PP

• Prefer: I prefer [to leave earlier]TO-VP

• Told: I was told [United has a flight]S

• …

04/19/23 CPSC503 Winter 2010 26

So?

• So the various rules for VPs overgenerate.– They allow strings containing verbs

and arguments that don’t go together– For example:

•VP -> V NP therefore Sneezed the book•VP -> V S therefore go she will go there

04/19/23 CPSC503 Winter 2010 27

Possible CFG Solution

• VP -> V• VP -> V NP• VP -> V NP PP• …

• VP -> IntransV• VP -> TransV NP

• VP -> TransPPto NP PPto

• …

• TransPPto -> hand,give,..

This solution has the same problem as the one for agreement

OLD Grammar

NEW Grammar

04/19/23 CPSC503 Winter 2010 28

CFG for NLP: summary• CFGs cover most syntactic structure in

English.

• But there are problems (over-generation)– That can be dealt with adequately, although

not elegantly, by staying within the CFG framework.

• Many practical computational grammars simply rely on CFG

• For more elegant / concise approaches see Chpt 15 “Features and Unification”

04/19/23 CPSC503 Winter 2010 29

Dependency Grammars• Syntactic structure: binary relations

between words

• Links: grammatical function or very general semantic relation

• Abstract away from word-order variations (simpler grammars)

• Useful features in many NLP applications (for classification, summarization and NLG)

04/19/23 CPSC503 Winter 2010 30

Today 5/10

• English Syntax • Context-Free Grammar for English

– Rules– Trees– Recursion– Problems

• Start Parsing (if time left)

04/19/23 CPSC503 Winter 2010 31

Parsing with CFGs

Assign valid trees: covers all and only the elements of the input and has an S at

the top

Parser

I prefer a morning flight

flight

Nominal

Nominal

CFG

Sequence of words Valid parse trees

04/19/23 CPSC503 Winter 2010 32

Parsing as Search• S -> NP VP• S -> Aux NP VP• NP -> Det Noun• VP -> Verb• Det -> a• Noun -> flight• Verb -> left,

arrive• Aux -> do, does

Search space of possible parse trees

CFG

defines

Parsing: find all trees that cover all and only the words in the input

04/19/23 CPSC503 Winter 2010 33

Constraints on Search

Parser

I prefer a morning flight

flight

Nominal

NominalCFG

(search space)

Sequence of words Valid parse trees

Search Strategies: • Top-down or goal-directed• Bottom-up or data-directed

04/19/23 CPSC503 Winter 2010 34

Top-Down Parsing• Since we’re trying to find trees

rooted with an S (Sentences) start with the rules that give us an S.

• Then work your way down from there to the words. flightInput

:

04/19/23 CPSC503 Winter 2010 35

Next step: Top Down Space

• When POS categories are reached, reject trees whose leaves fail to match all words in the input

…….. …….. ……..

04/19/23 CPSC503 Winter 2010 36

Bottom-Up Parsing• Of course, we also want trees that

cover the input words. So start with trees that link up with the words in the right way.

• Then work your way up from there.

flight

flight

flight

04/19/23 CPSC503 Winter 2010 37

Two more steps: Bottom-Up Space

flightflightflight

flightflight

flightflight

…….. …….. ……..

04/19/23 CPSC503 Winter 2010 38

Top-Down vs. Bottom-Up• Top-down

– Only searches for trees that can be answers

– But suggests trees that are not consistent with the words

• Bottom-up– Only forms trees consistent with the

words– Suggest trees that make no sense

globally

04/19/23 CPSC503 Winter 2010 39

So Combine Them

• Top-down: control strategy to generate trees

• Bottom-up: to filter out inappropriate parses

Top-down Control strategy:• Depth vs. Breadth first• Which node to try to expand next• Which grammar rule to use to expand a

node

(left-most)

(textual order)

04/19/23 CPSC503 Winter 2010 40

Top-Down, Depth-First, Left-to-Right Search

Sample sentence: “Does this flight include a meal?”

04/19/23 CPSC503 Winter 2010 41

Example “Does this flight include a meal?”

04/19/23 CPSC503 Winter 2010 42

flightflight

Example “Does this flight include a meal?”

04/19/23 CPSC503 Winter 2010 43

flight flight

Example “Does this flight include a meal?”

04/19/23 CPSC503 Winter 2010 44

Adding Bottom-up Filtering

The following sequence was a waste of time because an NP cannot generate a parse tree starting with an AUX

Aux Aux Aux Aux

04/19/23 CPSC503 Winter 2010 45

Bottom-Up FilteringCategory Left Corners

S Det, Proper-Noun, Aux, Verb

NP Det, Proper-Noun

Nominal Noun

VP Verb

Aux Aux Aux

04/19/23 CPSC503 Winter 2010 46

Problems with TD-BU-filtering

• Ambiguity• Repeated Parsing

• SOLUTION: Earley Algorithm (once again dynamic programming!)

04/19/23 CPSC503 Winter 2010 47

For Next Time

• Read Chapter 13 (Parsing)• Optional: Read Chapter 15 (Features and

Unification) – skip algorithms and implementation

04/19/23 CPSC503 Winter 2010 48

Grammars and Constituency• Of course, there’s nothing easy or obvious

about how we come up with right set of constituents and the rules that govern how they combine...

• That’s why there are so many different theories of grammar and competing analyses of the same data.

• The approach to grammar, and the analyses, adopted here are very generic (and don’t correspond to any modern linguistic theory of grammar).

04/19/23 CPSC503 Winter 2010 49

Syntactic Notions so far...

• N-grams: prob. distr. for next word can be effectively approximated knowing previous n words

• POS categories are based on:– distributional properties (what other

words can occur nearby) – morphological properties (affixes they

take)