7/2/2015cpsc503 winter 20101 cpsc 503 computational linguistics lecture 8 giuseppe carenini
TRANSCRIPT
04/19/23 CPSC503 Winter 2010 3
Knowledge-Formalisms Map(next three lectures)
Logical formalisms (First-Order Logics)
Rule systems (and prob. versions)(e.g., (Prob.) Context-Free
Grammars)
State Machines (and prob. versions)
(Finite State Automata,Finite State Transducers, Markov Models)
Morphology
Syntax
PragmaticsDiscourse
and Dialogue
Semantics
AI planners
04/19/23 CPSC503 Winter 2010 4
Today 5/10
• English Syntax • Context-Free Grammar for English
– Rules– Trees– Recursion– Problems
• Start Parsing
04/19/23 CPSC503 Winter 2010 5
SyntaxDef. The study of how sentences are formed by
grouping and ordering words
Example: Ming and Sue prefer morning flights
* Ming Sue flights morning and prefer
Groups behave as single unit wrt Substitution, Movement, Coordination
04/19/23 CPSC503 Winter 2010 6
Syntax: Useful tasks
• Why should you care?– Grammar checkers– Basis for semantic interpretation
•Question answering •Information extraction•Summarization
– Machine translation– ……
04/19/23 CPSC503 Winter 2010 7
Key Constituents – with heads (English)• Noun phrases• Verb phrases• Prepositional
phrases • Adjective phrases• Sentences
• (Det) N (PP)• (Qual) V (NP)• (Deg) P (NP)• (Deg) A (PP)• (NP) (I) (VP)
Some simple specifiersCategory Typical function
ExamplesDeterminer specifier of N the, a, this,
no..Qualifier specifier of V never,
often..Degree word specifier of A or P very,
almost..
Complements?
(Specifier) X (Complement)
04/19/23 CPSC503 Winter 2010 8
Key Constituents: Examples• Noun phrases
• Verb phrases
• Prepositional phrases
• Adjective phrases
• Sentences
• (Det) N (PP) the cat on the
table• (Qual) V (NP) never eat a cat• (Deg) P (NP) almost in the net• (Deg) A (PP) very happy
about it• (NP) (I) (VP) a mouse -- ate it
04/19/23 CPSC503 Winter 2010 9
Context Free Grammar (Example)
• S -> NP VP• NP -> Det NOMINAL• NOMINAL -> Noun• VP -> Verb• Det -> a• Noun -> flight• Verb -> left
Terminal
Non-terminal
Start-symbol
04/19/23 CPSC503 Winter 2010 11
Context Free Grammars (CFGs)
• Define a Formal Language (un/grammatical sentences)
• Generative Formalism– Generate strings in the language– Reject strings not in the language– Impose structures (trees) on strings
in the language
04/19/23 CPSC503 Winter 2010 12
CFG: Formal Definitions
• 4-tuple (non-term., term., productions, start)
• (N, , P, S)
• P is a set of rules A; AN, (N)*
• A derivation is the process of rewriting 1 into m (both strings in (N)*) by
applying a sequence of rules: 1 * m
• L G = W|w* and S * w
04/19/23 CPSC503 Winter 2010 14
CFG Parsing
• It is completely analogous to running a finite-state transducer with a tape– It’s just more powerful
• Chpt. 13
Parser
I prefer a morning flight
flight
Nominal
Nominal
04/19/23 CPSC503 Winter 2010 15
Other Options• Regular languages (FSA) A xB or A x
– Too weak (e.g., cannot deal with recursion in a general way – no center-embedding)
• CFGs A (also produce more understandable and “useful” structure)
• Context-sensitive A ; ≠– Can be computationally intractable
• Turing equiv. ; ≠– Too powerful / Computationally
intractable
04/19/23 CPSC503 Winter 2010 16
Common Sentence-Types• Declaratives: A plane left
S -> NP VP
• Imperatives: Leave!S -> VP
• Yes-No Questions: Did the plane leave?S -> Aux NP VP
• WH Questions: Which flights serve breakfast?
S -> WH NP VP
When did the plane leave?S -> WH Aux NP VP
04/19/23 CPSC503 Winter 2010 17
NP: more detailsNP -> Specifiers N Complements
• NP -> (Predet)(Det)(Card)(Ord)(Quant) (AP) Nom
e.g., all the other cheap cars
• Nom -> Nom PP (PP) (PP) e.g., reservation on BA456 from NY to
YVRNom -> Nom GerundVP e.g., flight arriving on Monday Nom -> Nom RelClause Nom RelClause ->(who | that) VP e.g., flight that arrives in the evening
04/19/23 CPSC503 Winter 2010 18
Conjunctive Constructions• S -> S and S
– John went to NY and Mary followed him
• NP -> NP and NP– John went to NY and Boston
• VP -> VP and VP– John went to NY and visited MOMA
• …• In fact the right rule for English is
X -> X and X
04/19/23 CPSC503 Winter 2010 20
Agreement• In English,
– Determiners and nouns have to agree in number
– Subjects and verbs have to agree in person and number
• Many languages have agreement systems that are far more complex than this (e.g., gender).
04/19/23 CPSC503 Winter 2010 21
Agreement
• This dog• Those dogs
• This dog eats• You have it• Those dogs eat
• *This dogs• *Those dog
• *This dog eat• *You has it• *Those dogs
eats
04/19/23 CPSC503 Winter 2010 22
Possible CFG Solution
• S -> NP VP• NP -> Det Nom• VP -> V NP• …
• SgS -> SgNP SgVP• PlS -> PlNp PlVP• SgNP -> SgDet SgNom• PlNP -> PlDet PlNom• PlVP -> PlV NP• SgVP3p ->SgV3p NP• …
Sg = singularPl = plural
OLD Grammar
NEW Grammar
04/19/23 CPSC503 Winter 2010 23
CFG Solution for Agreement
• It works and stays within the power of CFGs
• But it doesn’t scale all that well (explosion in the number of rules)
04/19/23 CPSC503 Winter 2010 24
Subcategorization
• *John sneezed the book• *I prefer United has a flight• *Give with a flight
• Def. It expresses constraints that a predicate (verb here) places on the number and type of its arguments (see first table)
04/19/23 CPSC503 Winter 2010 25
Subcategorization
• Sneeze: John sneezed
• Find: Please find [a flight to NY]NP
• Give: Give [me]NP[a cheaper fare]NP
• Help: Can you help [me]NP[with a flight]PP
• Prefer: I prefer [to leave earlier]TO-VP
• Told: I was told [United has a flight]S
• …
04/19/23 CPSC503 Winter 2010 26
So?
• So the various rules for VPs overgenerate.– They allow strings containing verbs
and arguments that don’t go together– For example:
•VP -> V NP therefore Sneezed the book•VP -> V S therefore go she will go there
04/19/23 CPSC503 Winter 2010 27
Possible CFG Solution
• VP -> V• VP -> V NP• VP -> V NP PP• …
• VP -> IntransV• VP -> TransV NP
• VP -> TransPPto NP PPto
• …
• TransPPto -> hand,give,..
This solution has the same problem as the one for agreement
OLD Grammar
NEW Grammar
04/19/23 CPSC503 Winter 2010 28
CFG for NLP: summary• CFGs cover most syntactic structure in
English.
• But there are problems (over-generation)– That can be dealt with adequately, although
not elegantly, by staying within the CFG framework.
• Many practical computational grammars simply rely on CFG
• For more elegant / concise approaches see Chpt 15 “Features and Unification”
04/19/23 CPSC503 Winter 2010 29
Dependency Grammars• Syntactic structure: binary relations
between words
• Links: grammatical function or very general semantic relation
• Abstract away from word-order variations (simpler grammars)
• Useful features in many NLP applications (for classification, summarization and NLG)
04/19/23 CPSC503 Winter 2010 30
Today 5/10
• English Syntax • Context-Free Grammar for English
– Rules– Trees– Recursion– Problems
• Start Parsing (if time left)
04/19/23 CPSC503 Winter 2010 31
Parsing with CFGs
Assign valid trees: covers all and only the elements of the input and has an S at
the top
Parser
I prefer a morning flight
flight
Nominal
Nominal
CFG
Sequence of words Valid parse trees
04/19/23 CPSC503 Winter 2010 32
Parsing as Search• S -> NP VP• S -> Aux NP VP• NP -> Det Noun• VP -> Verb• Det -> a• Noun -> flight• Verb -> left,
arrive• Aux -> do, does
Search space of possible parse trees
CFG
defines
Parsing: find all trees that cover all and only the words in the input
04/19/23 CPSC503 Winter 2010 33
Constraints on Search
Parser
I prefer a morning flight
flight
Nominal
NominalCFG
(search space)
Sequence of words Valid parse trees
Search Strategies: • Top-down or goal-directed• Bottom-up or data-directed
04/19/23 CPSC503 Winter 2010 34
Top-Down Parsing• Since we’re trying to find trees
rooted with an S (Sentences) start with the rules that give us an S.
• Then work your way down from there to the words. flightInput
:
04/19/23 CPSC503 Winter 2010 35
Next step: Top Down Space
• When POS categories are reached, reject trees whose leaves fail to match all words in the input
…….. …….. ……..
04/19/23 CPSC503 Winter 2010 36
Bottom-Up Parsing• Of course, we also want trees that
cover the input words. So start with trees that link up with the words in the right way.
• Then work your way up from there.
flight
flight
flight
04/19/23 CPSC503 Winter 2010 37
Two more steps: Bottom-Up Space
flightflightflight
flightflight
flightflight
…….. …….. ……..
04/19/23 CPSC503 Winter 2010 38
Top-Down vs. Bottom-Up• Top-down
– Only searches for trees that can be answers
– But suggests trees that are not consistent with the words
• Bottom-up– Only forms trees consistent with the
words– Suggest trees that make no sense
globally
04/19/23 CPSC503 Winter 2010 39
So Combine Them
• Top-down: control strategy to generate trees
• Bottom-up: to filter out inappropriate parses
Top-down Control strategy:• Depth vs. Breadth first• Which node to try to expand next• Which grammar rule to use to expand a
node
(left-most)
(textual order)
04/19/23 CPSC503 Winter 2010 40
Top-Down, Depth-First, Left-to-Right Search
Sample sentence: “Does this flight include a meal?”
04/19/23 CPSC503 Winter 2010 44
Adding Bottom-up Filtering
The following sequence was a waste of time because an NP cannot generate a parse tree starting with an AUX
Aux Aux Aux Aux
04/19/23 CPSC503 Winter 2010 45
Bottom-Up FilteringCategory Left Corners
S Det, Proper-Noun, Aux, Verb
NP Det, Proper-Noun
Nominal Noun
VP Verb
Aux Aux Aux
04/19/23 CPSC503 Winter 2010 46
Problems with TD-BU-filtering
• Ambiguity• Repeated Parsing
• SOLUTION: Earley Algorithm (once again dynamic programming!)
04/19/23 CPSC503 Winter 2010 47
For Next Time
• Read Chapter 13 (Parsing)• Optional: Read Chapter 15 (Features and
Unification) – skip algorithms and implementation
04/19/23 CPSC503 Winter 2010 48
Grammars and Constituency• Of course, there’s nothing easy or obvious
about how we come up with right set of constituents and the rules that govern how they combine...
• That’s why there are so many different theories of grammar and competing analyses of the same data.
• The approach to grammar, and the analyses, adopted here are very generic (and don’t correspond to any modern linguistic theory of grammar).