parsing, and context-free grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · syntactic...

45
Parsing, and Context-Free Grammars Michael Collins, Columbia University

Upload: others

Post on 10-Jan-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

Parsing, and Context-Free Grammars

Michael Collins, Columbia University

Page 2: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

Overview

I An introduction to the parsing problem

I Context free grammars

I A brief(!) sketch of the syntax of English

I Examples of ambiguous structures

Page 3: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

Parsing (Syntactic Structure)

INPUT:Boeing is located in Seattle.

OUTPUT:S

NP

N

Boeing

VP

V

is

VP

V

located

PP

P

in

NP

N

Seattle

Page 4: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

Syntactic Formalisms

I Work in formal syntax goes back to Chomsky’s PhD thesis inthe 1950s

I Examples of current formalisms: minimalism, lexicalfunctional grammar (LFG), head-driven phrase-structuregrammar (HPSG), tree adjoining grammars (TAG), categorialgrammars

Page 5: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

Data for Parsing Experiments

I Penn WSJ Treebank = 50,000 sentences with associated trees

I Usual set-up: 40,000 training sentences, 2400 test sentences

An example tree:

Canadian

NNP

Utilities

NNPS

NP

had

VBD

1988

CD

revenue

NN

NP

of

IN

C$

$

1.16

CD

billion

CD

,

PUNC,

QP

NP

PP

NP

mainly

RB

ADVP

from

IN

its

PRP$

natural

JJ

gas

NN

and

CC

electric

JJ

utility

NN

businesses

NNS

NP

in

IN

Alberta

NNP

,

PUNC,

NP

where

WRB

WHADVP

the

DT

company

NN

NP

serves

VBZ

about

RB

800,000

CD

QP

customers

NNS

.

PUNC.

NP

VP

S

SBAR

NP

PP

NP

PP

VP

S

TOP

Canadian Utilities had 1988 revenue of C$ 1.16 billion ,mainly from its natural gas and electric utility businesses inAlberta , where the company serves about 800,000customers .

Page 6: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

The Information Conveyed by Parse Trees

(1) Part of speech for each word(N = noun, V = verb, DT = determiner)

S

NP

DT

the

N

burglar

VP

V

robbed

NP

DT

the

N

apartment

Page 7: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

The Information Conveyed by Parse Trees (continued)(2) Phrases

S

NP

DT

the

N

burglar

VP

V

robbed

NP

DT

the

N

apartment

Noun Phrases (NP): “the burglar”, “the apartment”Verb Phrases (VP): “robbed the apartment”Sentences (S): “the burglar robbed the apartment”

Page 8: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

The Information Conveyed by Parse Trees (continued)

(3) Useful Relationships

S

NP

subject

VP

V

verb

S

NP

DT

the

N

burglar

VP

V

robbed

NP

DT

the

N

apartment

⇒ “the burglar” is the subject of “robbed”

Page 9: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

An Example Application: Machine Translation

I English word order is subject – verb – object

I Japanese word order is subject – object – verb

English: IBM bought LotusJapanese: IBM Lotus bought

English: Sources said that IBM bought Lotus yesterdayJapanese: Sources yesterday IBM Lotus bought that said

Page 10: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

S

NP-A

Sources

VP⇔

SBAR-A⇔

S

NP

yesterday

NP-A

IBM

VP⇔

NP-A

Lotus

VB

bought

COMP

that

VB

said

Page 11: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

Overview

I An introduction to the parsing problem

I Context free grammars

I A brief(!) sketch of the syntax of English

I Examples of ambiguous structures

Page 12: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

Context-Free Grammars

Hopcroft and Ullman, 1979

A context free grammar G = (N,Σ, R, S) where:

I N is a set of non-terminal symbols

I Σ is a set of terminal symbols

I R is a set of rules of the form X → Y1Y2 . . . Ynfor n ≥ 0, X ∈ N , Yi ∈ (N ∪ Σ)

I S ∈ N is a distinguished start symbol

Page 13: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

A Context-Free Grammar for English

N = {S, NP, VP, PP, DT, Vi, Vt, NN, IN}S = SΣ = {sleeps, saw, man, woman, telescope, the, with, in}

R =

S → NP VP

VP → ViVP → Vt NPVP → VP PP

NP → DT NNNP → NP PP

PP → IN NP

Vi → sleepsVt → saw

NN → manNN → womanNN → telescope

DT → the

IN → withIN → in

Note: S=sentence, VP=verb phrase, NP=noun phrase,PP=prepositional phrase, DT=determiner, Vi=intransitive verb,Vt=transitive verb, NN=noun, IN=preposition

Page 14: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

Left-Most Derivations

A left-most derivation is a sequence of strings s1 . . . sn, whereI s1 = S, the start symbolI sn ∈ Σ∗, i.e. sn is made up of terminal symbols onlyI Each si for i = 2 . . . n is derived from si−1 by picking the

left-most non-terminal X in si−1 and replacing it by some βwhere X → β is a rule in R

For example: [S], [NP VP], [D N VP], [the N VP], [the man VP],[the man Vi], [the man sleeps]

Representation of a derivation as a tree: S

NP

D

the

N

man

VP

Vi

sleeps

Page 15: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

An Example

DERIVATION RULES USEDS

S → NP VPNP VP

NP → DT NDT N VP

DT → thethe N VP

N → dogthe dog VP

VP → VBthe dog VB

VB → laughsthe dog laughs

S

NP

DT

the

N

dog

VP

VB

laughs

Page 16: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

An Example

DERIVATION RULES USEDS

S → NP VPNP VP

NP → DT NDT N VP

DT → thethe N VP

N → dogthe dog VP

VP → VBthe dog VB

VB → laughsthe dog laughs

S

NP

DT

the

N

dog

VP

VB

laughs

Page 17: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

An Example

DERIVATION RULES USEDS

S → NP VPNP VP

NP → DT NDT N VP

DT → thethe N VP

N → dogthe dog VP

VP → VBthe dog VB

VB → laughsthe dog laughs

S

NP

DT

the

N

dog

VP

VB

laughs

Page 18: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

An Example

DERIVATION RULES USEDS

S → NP VPNP VP

NP → DT NDT N VP

DT → thethe N VP

N → dogthe dog VP

VP → VBthe dog VB

VB → laughsthe dog laughs

S

NP

DT

the

N

dog

VP

VB

laughs

Page 19: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

An Example

DERIVATION RULES USEDS

S → NP VPNP VP

NP → DT NDT N VP

DT → thethe N VP

N → dogthe dog VP

VP → VBthe dog VB

VB → laughsthe dog laughs

S

NP

DT

the

N

dog

VP

VB

laughs

Page 20: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

An Example

DERIVATION RULES USEDS

S → NP VPNP VP

NP → DT NDT N VP

DT → thethe N VP

N → dogthe dog VP

VP → VBthe dog VB

VB → laughsthe dog laughs

S

NP

DT

the

N

dog

VP

VB

laughs

Page 21: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

An Example

DERIVATION RULES USEDS

S → NP VPNP VP

NP → DT NDT N VP

DT → thethe N VP

N → dogthe dog VP

VP → VBthe dog VB

VB → laughsthe dog laughs

S

NP

DT

the

N

dog

VP

VB

laughs

Page 22: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

An Example

DERIVATION RULES USEDS

S → NP VPNP VP

NP → DT NDT N VP

DT → thethe N VP

N → dogthe dog VP

VP → VBthe dog VB

VB → laughsthe dog laughs

S

NP

DT

the

N

dog

VP

VB

laughs

Page 23: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

Properties of CFGs

I A CFG defines a set of possible derivations

I A string s ∈ Σ∗ is in the language defined by theCFG if there is at least one derivation that yields s

I Each string in the language generated by the CFGmay have more than one derivation (“ambiguity”)

Page 24: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

An Example of Ambiguity

S

NP

he

VP

VP

VB

drove

PP

IN

down

NP

DT

the

NN

street

PP

IN

in

NP

DT

the

NN

car

Page 25: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

An Example of Ambiguity (continued)S

NP

he

VP

VB

drove

PP

IN

down

NP

NP

DT

the

NN

street

PP

IN

in

NP

DT

the

NN

car

Page 26: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

The Problem with Parsing: Ambiguity

INPUT:She announced a program to promote safety in trucks and vans

⇓POSSIBLE OUTPUTS:

S

NP

She

VP

announced NP

NP

a program

VP

to promote NP

safety PP

in NP

trucks and vans

S

NP

She

VP

announced NP

NP

NP

a program

VP

to promote NP

safety PP

in NP

trucks

and NP

vans

S

NP

She

VP

announced NP

NP

a program

VP

to promote NP

NP

safety PP

in NP

trucks

and NP

vans

S

NP

She

VP

announced NP

NP

a program

VP

to promote NP

safety

PP

in NP

trucks and vans

S

NP

She

VP

announced NP

NP

NP

a program

VP

to promote NP

safety

PP

in NP

trucks

and NP

vans

S

NP

She

VP

announced NP

NP

NP

a program

VP

to promote NP

safety

PP

in NP

trucks and vans

And there are more...

Page 27: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

Overview

I An introduction to the parsing problem

I Context free grammars

I A brief(!) sketch of the syntax of English

I Examples of ambiguous structures

Page 28: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

2/8/13 9:36 AMA Comprehensive Grammar of the English Language: Randolph Quirk, Si…um, Geoffrey Leech, Jan Svartvik: 9780582517349: Amazon.com: Books

Page 1 of 1http://www.amazon.com/gp/product/images/0582517346/ref=dp_image_0?ie=UTF8&n=283155&s=books

A Comprehensive Grammar of the English Language

Close Window

Product Details (from Amazon)Hardcover: 1779 pagesPublisher: Longman; 2nd Revised editionLanguage: EnglishISBN-10: 0582517346ISBN-13: 978-0582517349Product Dimensions: 8.4 x 2.4 x 10 inchesShipping Weight: 4.6 pounds

Page 29: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

A Brief Overview of English Syntax

Parts of Speech (tags from the Brown corpus):

I NounsNN = singular noun e.g., man, dog, parkNNS = plural noun e.g., telescopes, houses, buildingsNNP = proper noun e.g., Smith, Gates, IBM

I DeterminersDT = determiner e.g., the, a, some, every

I AdjectivesJJ = adjective e.g., red, green, large, idealistic

Page 30: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

A Fragment of a Noun Phrase Grammar

N̄ ⇒ NNN̄ ⇒ NN N̄N̄ ⇒ JJ N̄N̄ ⇒ N̄ N̄NP ⇒ DT N̄

NN ⇒ boxNN ⇒ carNN ⇒ mechanicNN ⇒ pigeonDT ⇒ theDT ⇒ a

JJ ⇒ fastJJ ⇒ metalJJ ⇒ idealisticJJ ⇒ clay

Page 31: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

Prepositions, and Prepositional Phrases

I PrepositionsIN = preposition e.g., of, in, out, beside, as

Page 32: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

An Extended Grammar

N̄ ⇒ NNN̄ ⇒ NN N̄N̄ ⇒ JJ N̄N̄ ⇒ N̄ N̄NP ⇒ DT N̄

PP ⇒ IN NPN̄ ⇒ N̄ PP

NN ⇒ boxNN ⇒ carNN ⇒ mechanicNN ⇒ pigeon

DT ⇒ theDT ⇒ a

JJ ⇒ fastJJ ⇒ metalJJ ⇒ idealisticJJ ⇒ clay

IN ⇒ inIN ⇒ underIN ⇒ ofIN ⇒ onIN ⇒ withIN ⇒ as

Generates:in a box, under the box, the fast car mechanic under the pigeon inthe box, . . .

Page 33: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

An Extended Grammar

N̄ ⇒ NNN̄ ⇒ NN N̄N̄ ⇒ JJ N̄N̄ ⇒ N̄ N̄NP ⇒ DT N̄

PP ⇒ IN NPN̄ ⇒ N̄ PP

Page 34: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

Verbs, Verb Phrases, and Sentences

I Basic Verb TypesVi = Intransitive verb e.g., sleeps, walks, laughsVt = Transitive verb e.g., sees, saw, likesVd = Ditransitive verb e.g., gave

I Basic VP RulesVP → ViVP → Vt NPVP → Vd NP NP

I Basic S RuleS → NP VP

Examples of VP:sleeps, walks, likes the mechanic, gave the mechanic the fast carExamples of S:

the man sleeps, the dog walks, the dog gave the mechanic the fast car

Page 35: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

PPs Modifying Verb Phrases

A new rule: VP → VP PP

New examples of VP:sleeps in the car, walks like the mechanic, gave the mechanic thefast car on Tuesday, . . .

Page 36: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

Complementizers, and SBARs

I ComplementizersCOMP = complementizer e.g., that

I SBARSBAR → COMP S

Examples:that the man sleeps, that the mechanic saw the dog . . .

Page 37: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

More Verbs

I New Verb TypesV[5] e.g., said, reportedV[6] e.g., told, informedV[7] e.g., bet

I New VP RulesVP → V[5] SBARVP → V[6] NP SBARVP → V[7] NP NP SBAR

Examples of New VPs:

said that the man sleepstold the dog that the mechanic likes the pigeonbet the pigeon $50 that the mechanic owns a fast car

Page 38: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

Coordination

I A New Part-of-Speech:CC = Coordinator e.g., and, or, but

I New RulesNP → NP CC NPN̄ → N̄ CC N̄VP → VP CC VPS → S CC SSBAR → SBAR CC SBAR

Page 39: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

We’ve Only Scratched the Surface...

I Agreement

The dogs laugh vs. The dog laughs

I Wh-movementThe dog that the cat liked

I Active vs. passive

The dog saw the cat vs.The cat was seen by the dog

I If you’re interested in reading more:

Syntactic Theory: A Formal Introduction, 2ndEdition. Ivan A. Sag, Thomas Wasow, and EmilyM. Bender.

Page 40: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

Overview

I An introduction to the parsing problem

I Context free grammars

I A brief(!) sketch of the syntax of English

I Examples of ambiguous structures

Page 41: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

Sources of Ambiguity

I Part-of-Speech ambiguityNN → duckVi → duck

VP

VP

Vt

saw

NP

PRP

her

NN

duck

PP

IN

with

NP

the telescope

VP

VP

V

saw

S

NP

her

VP

Vi

duck

PP

IN

with

NP

the telescope

Page 42: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

S

NP

I

VP

VP

Vi

drove

PP

IN

down

NP

DT

the

NN

road

PP

IN

in

NP

DT

the

NN

car

Page 43: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

S

NP

I

VP

Vi

drove

PP

IN

down

NP

NP

DT

the

NN

road

PP

IN

in

NP

DT

the

NN

car

Page 44: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

Two analyses for: John was believed to have been shot by Bill

Page 45: Parsing, and Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019-01-17 · Syntactic Formalisms I Work in formal syntax goes back to Chomsky’s PhD thesis in the 1950s

Sources of Ambiguity: Noun Premodifiers

I Noun premodifiers:

NP

D

the

JJ

fast

NN

car

NN

mechanic

NP

D

the

JJ

fast

NN

car

NN

mechanic