cs395t: structured models for nlp administrivia lecture 7...

19
1 CS395T: Structured Models for NLP Lecture 7: Parsing I Greg Durre? Adapted from Dan Klein – UC Berkeley Administrivia Project 1 due one week from today! Recall: EM for HMMs log X y P (x, y|) E q(y) log P (x, y|) + Entropy[q(y)] Maximize lower bound of log marginal likelihood: EM: alternaUng maximizaUon E-step: maximize w.r.t. q q t = P (y|x, t-1 ) M-step: maximize w.r.t. theta t = argmax E q t log P (x, y|) Supervised learning from fracUonal annotaUon Road Map § Done: Sequences: generaUve, discriminaUve, supervised, unsupervised § Now: trees (parsing) a li?le more linguisUcs... § This week: consUtuency lots of generaUve models § Next week: dependency (Project 2) more discriminaUve models

Upload: others

Post on 17-Mar-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS395T: Structured Models for NLP Administrivia Lecture 7 ...gdurrett/courses/fa2017/lectures/lec7-parsing-4pp.pdf · § R: the set of rules § Of the form X → Y 1 Y 2 … Y k,

1

CS395T:StructuredModelsforNLPLecture7:ParsingI

GregDurre?AdaptedfromDanKlein–UCBerkeley

Administrivia

Project1dueoneweekfromtoday!

Recall:EMforHMMs

log

X

y

P (x,y|✓) � Eq(y) logP (x,y|✓) + Entropy[q(y)]

Maximizelowerboundoflogmarginallikelihood:

EM:alternaUngmaximizaUon

E-step:maximizew.r.t.qqt = P (y|x, ✓t�1)

M-step:maximizew.r.t.theta✓t = argmax✓Eqt logP (x,y|✓)

SupervisedlearningfromfracUonalannotaUon

RoadMap§  Done:Sequences:generaUve,discriminaUve,supervised,unsupervised

§  Now:trees(parsing)–ali?lemorelinguisUcs...

§  Thisweek:consUtuency–lotsofgeneraUvemodels

§  Nextweek:dependency(Project2)–morediscriminaUvemodels

Page 2: CS395T: Structured Models for NLP Administrivia Lecture 7 ...gdurrett/courses/fa2017/lectures/lec7-parsing-4pp.pdf · § R: the set of rules § Of the form X → Y 1 Y 2 … Y k,

2

ThisLecture§  ConsUtuencyformalism

§  (ProbabilisUc)Context-freeGrammars

§  CKY

§  Refininggrammars

§  NextUme:finishconsUtuency+wriUngUps

Syntax

ParseTrees

The move followed a round of similar increases by other lenders, reflecting a continuing decline in that market

PhraseStructureParsing§  Phrasestructureparsing

organizessyntaxintocons%tuentsorbrackets

§  Ingeneral,thisinvolvesnestedtrees

§  Linguistscan,anddo,argueaboutdetails

§  Lotsofambiguity

§  Dependency(nextweek)makesmoresenseforsomelanguages

Page 3: CS395T: Structured Models for NLP Administrivia Lecture 7 ...gdurrett/courses/fa2017/lectures/lec7-parsing-4pp.pdf · § R: the set of rules § Of the form X → Y 1 Y 2 … Y k,

3

ConsUtuencyTests

§  Howdoweknowwhatnodesgointhetree?

§  ClassicconsUtuencytests:§  SubsUtuUonbyproform §  Cleaing(Itwaswithaspoon...)§  Answerellipsis(Whatdidyoueat?)

§  CoordinaUon

§  Cross-linguisUcarguments,too

ConflicUngTests§  ConsUtuencyisn’talwaysclear

§  PhonologicalreducUon:§  Iwillgo→I’llgo§  Iwanttogo→Iwannago§  alecentre→aucentre

§  CoordinaUon§  Hewenttoandcamefromthestore.

La vélocité des ondes sismiques

ClassicalNLP:Parsing

§  Writesymbolicorlogicalrules:

§  UsededucUonsystemstoproveparsesfromwords§  Minimalgrammaron“Fedraises”sentence:36parses§  Simple10-rulegrammar:592parses§  Real-sizegrammar:manymillionsofparses

§  Thisscaledverybadly,didn’tyieldbroad-coveragetools

Grammar (CFG) Lexicon

ROOT → S

S → NP VP

NP → DT NN

NP → NN NNS

NN → interest

NNS → raises

VBP → interest

VBZ → raises

NP → NP PP

VP → VBP NP

VP → VBP NP PP

PP → IN NP

AmbiguiUes

Page 4: CS395T: Structured Models for NLP Administrivia Lecture 7 ...gdurrett/courses/fa2017/lectures/lec7-parsing-4pp.pdf · § R: the set of rules § Of the form X → Y 1 Y 2 … Y k,

4

AmbiguiUes:PPA?achment A?achments

§  Icleanedthedishesinmypajamas

§  Icleanedthedishesinthesink

SyntacUcAmbiguiUesI

§  PreposiUonalphrases:Theycookedthebeansinthepotonthestovewithhandles.

§  ParUclevs.preposiUon:Thepuppytoreupthestaircase.

§  ComplementstructuresThetouristsobjectedtotheguidethattheycouldn’thear.Sheknowsyoulikethebackofherhand.

§  Gerundvs.parUcipialadjecUveVisi%ngrela%vescanbeboring.Changingschedulesfrequentlyconfusedpassengers.

SyntacUcAmbiguiUesII§  ModifierscopewithinNPs

imprac%caldesignrequirementsplas%ccupholder

§  CoordinaUonscope:

Smallratsandmicecansqueezeintoholesorcracksinthewall.

Page 5: CS395T: Structured Models for NLP Administrivia Lecture 7 ...gdurrett/courses/fa2017/lectures/lec7-parsing-4pp.pdf · § R: the set of rules § Of the form X → Y 1 Y 2 … Y k,

5

DarkAmbiguiUes

§  Darkambigui%es:mostanalysesareshockinglybad(meaning,theydon’thaveaninterpretaUonyoucangetyourmindaround)

§  Unknownwordsandnewusages§  SoluUon:WeneedprobabilisUctechniqueshandlethisuncertainty

Thisanalysiscorrespondstothecorrectparseof

“Thiswillpanicbuyers!”

PCFGs

ProbabilisUcContext-FreeGrammars

§  Acontext-freegrammarisatuple<N,T,S,R>§  N:thesetofnon-terminals

§  Phrasalcategories:S,NP,VP,ADJP,etc.§  Parts-of-speech(pre-terminals):NN,JJ,DT,VB

§  T:thesetofterminals(thewords)§  S:thestartsymbol

§  Oaenwri?enasROOTorTOP(notS–notall“sentences”aresentences)§  R:thesetofrules

§  OftheformX→Y1Y2…Yk,withX,Yi∈N§  Examples:S→NPVP,VP→VPCCVP§  AlsocalledrewritesorproducUons

§  APCFGadds:§  Atop-downproducUonprobabilityperruleP(Y1Y2…Yk|X)

TreebankGrammars

§  NeedaPCFGforbroadcoverageparsing.§  Cantakeagrammarrightoffthetrees(doesn’tworkwell):

§  Maximum-likelihoodesUmate:getP(NP->PRP|NP)bycounUng+normalizing§  Be?erresultsbyenrichingthegrammar(lexicalizaUon,othertechniques)

ROOT → S 1

S → NP VP . 1

NP → PRP 1

VP → VBD ADJP 1

…..

Page 6: CS395T: Structured Models for NLP Administrivia Lecture 7 ...gdurrett/courses/fa2017/lectures/lec7-parsing-4pp.pdf · § R: the set of rules § Of the form X → Y 1 Y 2 … Y k,

6

ChomskyNormalForm

§  Chomskynormalform:§  AllrulesoftheformX→YZorX→w§  Inprinciple,thisisnolimitaUononthespaceof(P)CFGs

§  N-aryrulesintroducenewnon-terminals

§  Unaries/empUesare“promoted”§  NOTequivalenttothis:

§  InpracUce:binarizethegrammar,keepunaries

VP

[VP → VBD NP •]

VBD NP PP PP

[VP → VBD NP PP •]

VBD NP PP PP

VP

VP

VP

VBD NP PP PP

VP

CKYParsing

ARecursiveParser

§  maxoverkandrulebeingapplied§  Willthisparserwork?

bestScore(X,i,j,s) if (j = i+1) return tagScore(X,s[i]) else return max score(X->YZ) *

bestScore(Y,i,k,s) * bestScore(Z,k,j,s)

§  Canalsoorganizethingsbo?om-up§  Noteverytag/nonterminalcanbebuiltovereveryspan!

ABo?om-UpParser(CKY)

bestScore(s) for (i : [0,n-1]) for (X : tags[s[i]]) score[X][i][i+1] =

tagScore(X,s[i]) for (diff : [2,n]) for (i : [0,n-diff]) j = i + diff for (X->YZ : rule) for (k : [i+1, j-1]) score[X][i][j] = max score[X][i][j],

score(X->YZ) * score[Y][i][k] * score[Z][k][j]

Y Z

X

i k j

Page 7: CS395T: Structured Models for NLP Administrivia Lecture 7 ...gdurrett/courses/fa2017/lectures/lec7-parsing-4pp.pdf · § R: the set of rules § Of the form X → Y 1 Y 2 … Y k,

7

UnaryRules§  Unaryrules?

§  Problem:dynamicprogramisself-referenUal!

bestScore(X,i,j,s) if (j = i+1) return tagScore(X,s[i]) else return max max score(X->YZ) *

bestScore(Y,i,k,s) * bestScore(Z,k,j,s)

max score(X->Y) * bestScore(Y,i,j,s)

UnaryClosure

§  Weneedunariestobenon-cyclic§  Canaddressbypre-calculaUngtheunaryclosure§  Ratherthanhavingzeroormoreunaries,alwayshaveexactlyone

§  Alternateunaryandbinarylayers§  Reconstructunarychainsaaerwards

NP

DT NN

VP

VBD NP

DT NN

VP

VBD NP

VP

S

SBAR

VP

SBAR

AlternaUngLayers

bestScoreU(X,i,j,s) if (j = i+1) return tagScore(X,s[i]) else return max max score(X->Y) * bestScoreB(Y,i,j)

bestScoreB(X,i,j,s) return max max score(X->YZ) *

bestScoreU(Y,i,k) * bestScoreU(Z,k,j)

Analysis

Page 8: CS395T: Structured Models for NLP Administrivia Lecture 7 ...gdurrett/courses/fa2017/lectures/lec7-parsing-4pp.pdf · § R: the set of rules § Of the form X → Y 1 Y 2 … Y k,

8

Time:Theory§  HowmuchUmewillittaketoparse?

§  Foreachdiff(<=n)§  Foreachi(<=n)

§  ForeachruleX→YZ§  ForeachsplitpointkDoconstantwork

Y Z

X

i k j

Time:Theory§  HowmuchUmewillittaketoparse?

§  Foreachdiff(<=n)§  Foreachi(<=n)

§  ForeachruleX→YZ§  ForeachsplitpointkDoconstantwork

§  TotalUme:|rules|*n3

§  Simplegrammartakes0.1sectoparsea20-wordsentence,biggergrammarscantake10+secondsunopUmized

Y Z

X

i k j

Time:PracUce

§  Parsingwiththevanillatreebankgrammar:

§  Why’sitworseinpracUce?§  Longersentences“unlock”moreofthegrammar§  Allkindsofsystemsissuesdon’tscale

~ 20K Rules

(not an optimized parser!)

Observed exponent:

3.6

Same-SpanReachability

ADJP ADVP FRAG INTJ NP PP PRN QP S SBAR UCP VP

WHNP

TOP

LST

CONJP

WHADJP

WHADVP

WHPP

NX

NAC

SBARQ

SINV

RRC SQ X

PRT

Page 9: CS395T: Structured Models for NLP Administrivia Lecture 7 ...gdurrett/courses/fa2017/lectures/lec7-parsing-4pp.pdf · § R: the set of rules § Of the form X → Y 1 Y 2 … Y k,

9

EfficientCKY

§  LotsoftrickstomakeCKYefficient§  Someofthemareli?leengineeringdetails:

§  E.g.,firstchoosek,thenenumeratethroughtheY:[i,k]whicharenon-zero,thenloopthroughrulesbyleachild.

§  OpUmallayoutofthedynamicprogramdependsongrammar

§  Somearealgorithmicimprovements:§  Pruning:ruleoutchunksofthechartbasedonasimplermodel

LearningPCFGs

TypicalExperimentalSetup

§  Corpus:PennTreebank,WSJ

§  Accuracy–F1:harmonicmeanofper-nodelabeledprecisionandrecall.

§  Here:alsosize–numberofsymbolsingrammar.

Training: sections 02-21 Development: section 22 (here, first 20 files) Test: section 23

TreebankPCFGs§  UsePCFGsforbroadcoverageparsing§  Cantakeagrammarrightoffthetrees(doesn’tworkwell):

ROOT → S 1

S → NP VP . 1

NP → PRP 1

VP → VBD ADJP 1

…..

Model F1 Baseline 72.0

[Charniak 96]

Page 10: CS395T: Structured Models for NLP Administrivia Lecture 7 ...gdurrett/courses/fa2017/lectures/lec7-parsing-4pp.pdf · § R: the set of rules § Of the form X → Y 1 Y 2 … Y k,

10

CondiUonalIndependence?

§  NoteveryNPexpansioncanfilleveryNPslot§  Agrammarwithsymbolslike“NP”won’tbecontext-free§  StaUsUcally,condiUonalindependencetoostrong

Non-Independence§  IndependenceassumpUonsareoaentoostrong.

§  Example:theexpansionofanNPishighlydependentontheparentoftheNP(i.e.,subjectsvs.objects).

§  Also:thesubjectandobjectexpansionsarecorrelated!

11%9%

6%

NP PP DT NN PRP

9% 9%

21%

NP PP DT NN PRP

7%4%

23%

NP PP DT NN PRP

All NPs NPs under S NPs under VP

GrammarRefinement

§  StructureAnnotaUon[Johnson’98,Klein&Manning’03]§  LexicalizaUon[Collins’99,Charniak’00]§  LatentVariables[Matsuzakietal.05,Petrovetal.’06]

StructuralAnnotaUon

Page 11: CS395T: Structured Models for NLP Administrivia Lecture 7 ...gdurrett/courses/fa2017/lectures/lec7-parsing-4pp.pdf · § R: the set of rules § Of the form X → Y 1 Y 2 … Y k,

11

TheGameofDesigningaGrammar

§  AnnotaUonrefinesbasetreebanksymbolstoimprovestaUsUcalfitofthegrammar§  StructuralannotaUon

VerUcalMarkovizaUon

§  VerUcalMarkovorder:rewritesdependonpastkancestornodes.(cf.parentannotaUon)

Order 1 Order 2

72%73%74%75%76%77%78%79%

1 2v 2 3v 3

Vertical Markov Order

05000

10000

150002000025000

1 2v 2 3v 3

Vertical Markov Order

Symbols

KleinandManning(2003)

HorizontalMarkovizaUon

70%

71%

72%

73%

74%

0 1 2v 2 inf

Horizontal Markov Order

0

3000

6000

9000

12000

0 1 2v 2 inf

Horizontal Markov Order

Symbols

Order 1 Order ∞

KleinandManning(2003)

TagSplits

§  Problem:Treebanktagsaretoocoarse.

§  Example:SentenUal,PP,andotherpreposiUonsareallmarkedIN.

§  ParUalSoluUon:§  SubdividetheINtag. Annotation F1 Size

Previous 78.3 8.0K SPLIT-IN 80.3 8.1K

KleinandManning(2003)

Page 12: CS395T: Structured Models for NLP Administrivia Lecture 7 ...gdurrett/courses/fa2017/lectures/lec7-parsing-4pp.pdf · § R: the set of rules § Of the form X → Y 1 Y 2 … Y k,

12

AFullyAnnotated(Unlex)Tree SomeTestSetResults

§  Beats“firstgeneraUon”lexicalizedparsers.§  Baseline:~72

Parser LP LR F1 CB 0 CB

Magerman 95 84.9 84.6 84.7 1.26 56.6

Collins 96 86.3 85.8 86.0 1.14 59.9

K+M 2003 86.9 85.7 86.3 1.10 60.3

Charniak 97 87.4 87.5 87.4 1.00 62.1

Collins 99 88.7 88.6 88.6 0.90 67.1

KleinandManning(2003)

LexicalizaUon

§  Annotation refines base treebank symbols to improve statistical fit of the grammar §  Structural annotation [Johnson ’98, Klein and Manning 03] §  Head lexicalization [Collins ’99, Charniak ’00]

TheGameofDesigningaGrammar

Page 13: CS395T: Structured Models for NLP Administrivia Lecture 7 ...gdurrett/courses/fa2017/lectures/lec7-parsing-4pp.pdf · § R: the set of rules § Of the form X → Y 1 Y 2 … Y k,

13

ProblemswithPCFGs

§  IfwedonoannotaUon,thesetreesdifferonlyinonerule:§  VP→VPPP§  NP→NPPP

§  Parsewillgoonewayortheother,regardlessofwords§  LexicalizaUonallowsustobesensiUvetospecificwords

ProblemswithPCFGs

§  What’sdifferentbetweenbasicPCFGscoreshere?§  What(lexical)correlaUonsneedtobescored?

LexicalizedTrees

§  Add“headwords”toeachphrasalnode§  SyntacUcvs.semanUc

heads§  Headshipnotin(most)

treebanks§  Usuallyuseheadrules,

e.g.:§  NP:

§  TakeleamostNP§  TakerightmostN*§  TakerightmostJJ§  Takerightchild

§  VP:§  TakeleamostVB*§  TakeleamostVP§  Takeleachild

LexicalizedPCFGs?§  Problem:wenowhavetoesUmateprobabiliUeslike

§  Nevergoingtogettheseatomicallyoffofatreebank

§  SoluUon:breakupderivaUonintosmallersteps

1) 2)

3) 4)

Page 14: CS395T: Structured Models for NLP Administrivia Lecture 7 ...gdurrett/courses/fa2017/lectures/lec7-parsing-4pp.pdf · § R: the set of rules § Of the form X → Y 1 Y 2 … Y k,

14

LexicalDerivaUonSteps§  AderivaUonofalocaltree[Collins99]

ChooseaheadtagandwordP(childsymbol|parent,headword)

GeneratechildrenfromheadsequenUally

P(childtag,childhead|parent,headsymbol,headword)

FinishgeneraUngthechildren;eachnewonecondiUonsonthepreviousones

LexicalizedCKY§  Howbigisthestatespace?

§  Nonterminals=Numsymbolsxvocabsize–waytoolarge!

§  Can’tusestandardCKY

LexicalizedCKY§  TrackindexhofheadinDP:O(n5)§  Be?eralgorithmsnextlecture!

bestScore(X,i,j,h,s) if (j = i+1) return tagScore(X,s[i]) else return max max score(X[h]->Y[h] Z[h’]) * bestScore(Y,i,k,h,s) * bestScore(Z,k,j,h’,s) max score(X[h]->Y[h’] Z[h]) * bestScore(Y,i,k,h’,s) * bestScore(Z,k,j,h,s)

Y[h] Z[h’]

X[h]

i h k h’ j

k,h’,X->YZ

k,h’,X->YZ

Results

§  Someresults§  Collins99–88.6F1(generaUvelexical)§  CharniakandJohnson05–89.7/91.3F1(generaUvelexical/reranked)

§  McCloskyetal06–92.1F1(gen+rerank+self-train)§  92.1wasSOTAforaround8years!

Page 15: CS395T: Structured Models for NLP Administrivia Lecture 7 ...gdurrett/courses/fa2017/lectures/lec7-parsing-4pp.pdf · § R: the set of rules § Of the form X → Y 1 Y 2 … Y k,

15

LatentVariablePCFGs

§  AnnotaUonrefinesbasetreebanksymbolstoimprovestaUsUcalfitofthegrammar§  ParentannotaUon[Johnson’98]§  HeadlexicalizaUon[Collins’99,Charniak’00]§  AutomaUcclustering?

TheGameofDesigningaGrammar

LatentVariableGrammars

Parse Tree Sentence Parameters

...

Derivations

Forward

LearningLatentAnnotaUons

EMalgorithm:

X1

X2 X7 X4

X5 X6 X3

He was right

.

§  Brackets are known §  Base categories are known §  Only induce subcategories

JustlikeForward-BackwardforHMMs.Learnlabelrefinements,baselabelsareknown! Backward

Page 16: CS395T: Structured Models for NLP Administrivia Lecture 7 ...gdurrett/courses/fa2017/lectures/lec7-parsing-4pp.pdf · § R: the set of rules § Of the form X → Y 1 Y 2 … Y k,

16

RefinementoftheDTtag

DT

DT-1 DT-2 DT-3 DT-4

Hierarchicalrefinement

HierarchicalEsUmaUonResults

74

76

78

80

82

84

86

88

90

100 300 500 700 900 1100 1300 1500 1700

Total Number of grammar symbols

Par

sing

acc

urac

y (F

1)

Model F1 Flat Training 87.3 Hierarchical Training 88.4

Refinementofthe,tag§  Spli~ngallcategoriesequallyiswasteful:

Page 17: CS395T: Structured Models for NLP Administrivia Lecture 7 ...gdurrett/courses/fa2017/lectures/lec7-parsing-4pp.pdf · § R: the set of rules § Of the form X → Y 1 Y 2 … Y k,

17

Adaptive Splitting

§  Wanttosplitcomplexcategoriesmore§  Idea:spliteverything,rollbacksplitswhichwereleastuseful

AdapUveSpli~ngResults

Model F1 Previous 88.4 With 50% Merging 89.5

0

5

10

15

20

25

30

35

40

NP

VP PP

ADVP S

ADJP

SBAR Q

P

WH

NP

PRN

NX

SIN

V

PRT

WH

PP SQ

CO

NJP

FRAG

NAC UC

P

WH

ADVP INTJ

SBAR

Q

RR

C

WH

ADJP X

RO

OT

LST

NumberofPhrasalSubcategories NumberofLexicalSubcategories

0

10

20

30

40

50

60

70

NNP JJ

NNS NN VBN RB

VBG VB VBD CD IN

VBZ

VBP DT

NNPS CC JJ

RJJ

S :PR

PPR

P$ MD

RBR

WP

POS

PDT

WRB

-LRB

- .EX

WP$

WDT

-RRB

- ''FW RB

S TO$

UH, ``

SYM RP LS #

Page 18: CS395T: Structured Models for NLP Administrivia Lecture 7 ...gdurrett/courses/fa2017/lectures/lec7-parsing-4pp.pdf · § R: the set of rules § Of the form X → Y 1 Y 2 … Y k,

18

LearnedSplits

§  Proper Nouns (NNP): NNP-14 Oct. Nov. Sept. NNP-12 John Robert James NNP-2 J. E. L. NNP-1 Bush Noriega Peters

NNP-15 New San Wall NNP-3 York Francisco Street

LearnedSplits

§  Proper Nouns (NNP):

§  Personal pronouns (PRP):

NNP-14 Oct. Nov. Sept. NNP-12 John Robert James NNP-2 J. E. L. NNP-1 Bush Noriega Peters

NNP-15 New San Wall NNP-3 York Francisco Street

PRP-0 It He I PRP-1 it he they PRP-2 it them him

§  CardinalNumbers(CD):

CD-7 one two Three CD-4 1989 1990 1988 CD-11 million billion trillion CD-0 1 50 100 CD-3 1 30 31 CD-9 78 58 34

LearnedSplits HierarchicalPruning

… QP NP VP … coarse:

split in two: … QP1 QP2 NP1 NP2 VP1 VP2 …

… QP1 QP1 QP3 QP4 NP1 NP2 NP3 NP4 VP1 VP2 VP3 VP4 … split in four:

split in eight: … … … … … … … … … … … … … … … … …

Page 19: CS395T: Structured Models for NLP Administrivia Lecture 7 ...gdurrett/courses/fa2017/lectures/lec7-parsing-4pp.pdf · § R: the set of rules § Of the form X → Y 1 Y 2 … Y k,

19

BracketPosteriors Speedup

§  100xspeedupifyouusethefullcoarse-to-finehierarchyvs.justparsingwiththefinestgrammar

§  Parsethetestsetin15minutes–2sentences/second

FinalResults(Accuracy)

≤ 40 words F1

all F1

ENG

Charniak&Johnson ‘05 (generative) 90.1 89.6

Split / Merge 90.6 90.1

GER

Dubey ‘05 76.3 -

Split / Merge 80.8 80.1

CH

N

Chiang et al. ‘02 80.0 76.6

Split / Merge 86.3 83.4

Ensemble of split-merge parsers = best cross-lingual parser for ~7 years!