a survey of unsupervised grammar induction

39
A Survey of Unsupervised Grammar Induction Baskaran Sankaran Senior Supervisor: Dr Anoop Sarkar School of Computing Science Simon Fraser University

Upload: nita

Post on 22-Feb-2016

41 views

Category:

Documents


0 download

DESCRIPTION

A Survey of Unsupervised Grammar Induction. Baskaran Sankaran Senior Supervisor: Dr Anoop Sarkar. School of Computing Science Simon Fraser University. Motivation. Languages have hidden regularities karuppu naay puunaiyai thurathiyathu iruttil karuppu uruvam marainthathu - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Survey of  Unsupervised Grammar Induction

A Survey of Unsupervised Grammar InductionBaskaran Sankaran

Senior Supervisor:Dr Anoop Sarkar

School of Computing ScienceSimon Fraser University

Page 2: A Survey of  Unsupervised Grammar Induction

2

MotivationLanguages have hidden

regularitieskaruppu naay puunaiyai thurathiyathuiruttil karuppu uruvam marainthathunaay thurathiya puunai vekamaaka ootiyathu

Page 3: A Survey of  Unsupervised Grammar Induction

3

MotivationLanguages have hidden

regularitieskaruppu naay puunaiyai thurathiyathuiruttil karuppu uruvam marainthathunaay thurathiya puunai vekamaaka ootiyathu

Page 4: A Survey of  Unsupervised Grammar Induction

4

FORMAL STRUCTURES

Page 5: A Survey of  Unsupervised Grammar Induction

5

Phrase-Structure

Sometimes the bribed became partners in the company

Page 6: A Survey of  Unsupervised Grammar Induction

6

Phrase-Structure

Binarize, CNF

• Sparsity issue with words• Use POS tags

S

ADVP

@S

RB NP VP

VBD

@VP

NP PP

DT

VBN

NNS

IN NP

IN DT NN

S ADVP @S@S NP VPVP VBD @VP@VP NP PPNP DT VBNNP DT NNNP NNSPP IN NPADVP

RB

IN IN

Page 7: A Survey of  Unsupervised Grammar Induction

7

Evaluation Metric-1Unsupervised

Induction◦ Binarized output tree

Possibly unlabelledEvaluation

◦ Gold treebank parse◦ Recall - % of true

constituents found◦ Also precision and F-

scoreWall Street Journal

(WSJ) dataset

S

X X

XX VBD

X

X X

IN

X

VBNDTRB

NNS

NNDT

Page 8: A Survey of  Unsupervised Grammar Induction

8

Dependency Structure

VBD

VBN NNS

VBN*DT

VBD*

IN

IN* NN

DT NN*

RB

Sometimes

the

NNS*

the company

bribed partners

became

in

Page 9: A Survey of  Unsupervised Grammar Induction

9

Dependency Structure

VBDDT NNS NNIN DTVBNRB

Sometimes the bribed became partners in the company

Page 10: A Survey of  Unsupervised Grammar Induction

10

Evaluation Metric-2Unsupervised Induction

◦Generates directed dependency arcsCompute (directed) attachment

accuracy◦Gold dependencies◦WSJ10 dataset

VBDDT NNS NNIN DTVBNRB

Sometimes the bribed became partners in the company

Page 11: A Survey of  Unsupervised Grammar Induction

11

Unsupervised Grammar Induction

To learn the hidden structure of a language◦POS tag sequences as input◦Generates phrase-structure/ dependencies ◦No attempt to find the meaning

Overview◦Phrase-structure and dependency grammars◦Mostly on English (few on Chinese, German

etc.)◦Learning restricted to shorter sentences◦Significantly lags behind the supervised

methods

Page 12: A Survey of  Unsupervised Grammar Induction

12

PHRASE-STRUCTURE INDUCTION

Page 13: A Survey of  Unsupervised Grammar Induction

13

Toy ExampleCorpus

the dog bites a man dog sleeps

a dog bites a bone the man sleeps Grammar

S NP VP NP N N manVP V NP Det a N boneVP V Det the V sleepsNP Det N N dog V bites

Page 14: A Survey of  Unsupervised Grammar Induction

14

EM for PCFG(Baker ’79; Lari and Young ’90)

Inside-Outside◦EM instance for probabilistic CFG

Generalization of Forward-backward for HMMs

◦Non-terminals are fixed◦Estimate maximum likelihood rule

probabilities

S NP VP V --> dogNP --> Det N Det --> manNP --> N N --> manVP --> V V --> manVP --> V NP Det --> boneVP --> NP V N --> boneDet --> the V --> bone

N --> the Det --> bites

V --> the N --> bitesDet --> a V --> bitesN --> a Det --> sleepsV --> a N --> sleepsDet --> dog V --> sleeps

N --> dog

S NP VP 1.0 V --> dogNP --> Det N 0.875 Det --> manNP --> N 0.125 N --> man 0.375VP --> V 0.5 V --> manVP --> V NP 0.5 Det --> boneVP --> NP V N --> bone 0.125Det --> the 0.428571 V --> bone

N --> the Det --> bites

V --> the N --> bitesDet --> a 0.571429 V --> bites 0.5N --> a Det --> sleepsV --> a N --> sleeps 0.5Det --> dog V --> sleeps

N --> dog 0.5

Page 15: A Survey of  Unsupervised Grammar Induction

15

Inside-OutsideSometim

esthe

bribed

became

partners

in

the

company

@S NP VP

P(NP the bribed)

P(@S NP VP)

P(VP became … company)

P(S Sometimes @S)

Page 16: A Survey of  Unsupervised Grammar Induction

16

Constraining Search

Sometimes

the

bribed

became

partners

in

the

company

(Pereira and Schabes ’92; Schabes et al. ’93)

Page 17: A Survey of  Unsupervised Grammar Induction

17

Constraining Search(Pereira and Schabes ’92; Schabes et al. ’93; Hwa ’99)

Treebank bracketings◦Bracketing boundaries constrain

inductionWhat happens with limited

supervision?◦More bracketed data exposed iteratively◦0% bracketed data◦100% bracketed data

Right-branching baseline

Recall: 50.0Recall: 78.0

Recall: 76.0

Page 18: A Survey of  Unsupervised Grammar Induction

18

Distributional clustering(Adriaans et al. ’00; Clark ’00; van Zaanen ’00)

Cluster the word sequences◦Context: adjacent words or

boundaries◦Relative frequency distribution of

contexts the black dog bites the manthe man eats an apple

Identifies constituents◦Evaluation on ATIS corpus

Recall: 35.6

Page 19: A Survey of  Unsupervised Grammar Induction

19

Constituent-Context Model(Klein and Manning ’02)

• Valid constituents in a tree should not crossS

XX

X X

VBD

X

X

DT

VBN

X

XX

DT

NN

RB

NNS

IN

S

X X

XX VBD

X

X X

IN

X

VBNDTRB

NNS

NNDT

Page 20: A Survey of  Unsupervised Grammar Induction

20

Constituent-Context ModelSometim

esthe

bribed

became

partners

in

the

company

DT

VBN

RB

VBD

RecallRight-branch:

70.0CCM: 81.6

S

XX

X X

VBD

X

X

DT

VBN

X

XX

DT

NN

RB

NNS

IN

Page 21: A Survey of  Unsupervised Grammar Induction

21

DEPENDENCY INDUCTION

Page 22: A Survey of  Unsupervised Grammar Induction

22

Dependency Model w/ Valence

(Klein and Manning ’04)Simple generative model

◦Choose head – P(Root)◦End – P(End | h, dir, v)

Attachment dir (right, left) Valence (head outward)

◦Argument – P(a | h, dir)

Dir Accuracy CCM: 23.8DMV: 43.2 Joint: 47.5

VBDDT NNS NNIN DTVBNRB

Sometimes the bribed became partners in the company

Sometimes

the

bribed

became

partners

in

the

company

• Head – P(Root)• Argument – P(a | h, dir)• End – P(End | h, dir, v)

Page 23: A Survey of  Unsupervised Grammar Induction

23

DMV Extensions(Headden et al. ’09; Blunsom and Cohn ’10)

Extended Valence (EVG)◦Valence frames for the head

Allows different distributions over arguments

Lexicalization (L-EVG)Tree Substitution Grammar

◦Tree fragments instead of CFG rules

Dir Acc: 68.8

Dir Acc: 65.0

VBDDT NNS NNIN DTVBNRBSometimes the bribed became partners in the company

Dir Acc: 67.7

Page 24: A Survey of  Unsupervised Grammar Induction

24

MULTILINGUAL SETTING

Page 25: A Survey of  Unsupervised Grammar Induction

25

Bilingual Alignment & Parsing

(Wu ’97)Inversion Transduction Grammar

(ITG)◦Allows reordering S

X X

e2

f4

e1

f3

e4

f2

e3

f1

e1 e2 e3 e4

f1 f2 f3 f4

Page 26: A Survey of  Unsupervised Grammar Induction

26

Bilingual Parsing(Snyder et al. ’09)

Bilingual Parsing◦PP Attachment ambiguity

I saw (the student (from MIT)1 )2

◦Not ambiguous in Urdu)میں سے( آ�ئٹی ) 1یم علم( دیکھ� 2ط�لب کو

I ((MIT of) student) saw

Page 27: A Survey of  Unsupervised Grammar Induction

27

Summary & OverviewEM for PCFG

Constrain with bracketingDistributional Clustering

CCMDMV

Contrastive EstimationEVG & L-EVGTSG + DMV

Data-oriented Parsing

Parametric Search Methods

Structural Search Methods

EM for PCFGConstrain with bracketing

Contrastive Estimation

Distributional Clustering

CCMDMV

EVG & L-EVGTSG + DMV

Data-oriented Parsing

•State-of-the-art• Phrase-structure (CCM +

DMV)Recall: 88.0

• Dependency (Lexicalized EVG)

Dir Acc: 68.8

PrototypePrototype

Page 28: A Survey of  Unsupervised Grammar Induction

28

QUESTIONS?

Thanks!

Page 29: A Survey of  Unsupervised Grammar Induction

29

MotivationLanguages have hidden

regularities

Page 30: A Survey of  Unsupervised Grammar Induction

30

MotivationLanguages have hidden

regularities◦The guy in China◦… new leader in China◦That’s what I am asking you …◦I am telling you …

Page 31: A Survey of  Unsupervised Grammar Induction

31

Issues with EM(Carroll and Charniak ’92; Periera and Schabes ’92; de

Marcken ’05)(Liang and Klein ’08; Spitkovsky et al. ’10)

Phrase-structure◦Finds local maxima instead of global◦Multiple ordered adjuctions

Both phrase-structure & dependency◦Disconnect between likelihood and

optimal grammar

Page 32: A Survey of  Unsupervised Grammar Induction

32

Constituent-Context Model(Klein and Manning ’02)

CCM◦Only constituent identity◦Valid constituents in a tree should

not cross

Page 33: A Survey of  Unsupervised Grammar Induction

33

Bootstrap phrases(Haghighi and Klein ’06)

Bootstrap with seed examples for constituents types◦Chosen from most frequent treebank

phrases◦Induces labels for constituents

Integrate with CCM◦CCM generates brackets

(constituents)◦Proto labels them

Recall: 59.6

Recall: 68.4

Page 34: A Survey of  Unsupervised Grammar Induction

34

Dependency Model w/ Valence

(Klein and Manning ’04)Simple generative model

◦Choose head; attachment dir (right, left)

◦Valence (head outward) End of generation modelled separately

Dir Acc: 43.2

VBDDT NNS NNIN DTVBNRB

Sometimes the bribed became partners in the company

Page 35: A Survey of  Unsupervised Grammar Induction

35

Learn from how not to speakContrastive Estimation (Smith

and Eisner ’05)◦Log-linear Model of dependency

Features: f(q, T) P(Root); P(a | h, dir); P(End | h, dir, v)

Conditional likelihood

Page 36: A Survey of  Unsupervised Grammar Induction

36

Learn from how not to speak(Smith and Eisner ’05)

Contrastive Estimation Ex. the brown cat vs. cat brown the

◦Neighborhoods Transpose (Trans), delete & transpose

(DelOrTrans)

Dir Acc: 48.8

Page 37: A Survey of  Unsupervised Grammar Induction

37

DMV Extensions-1(Cohen and Smith ’08, ’09)

Tying parameters◦Correlated Topic Model (CTM)

Correlation between different word types◦Two types of tying parameters

Logistic Normal (LN) Shared LN

Dir Acc: 61.3Dir Acc: 61.3

Page 38: A Survey of  Unsupervised Grammar Induction

38

DMV Extensions-2VBD

VBN

NNS

VBN*

DT

VBD*

IN

IN*

NN

DT NN*

RB

Sometimes

the

NNS*

the company

bribed

partners

became

in

VBD

VBN

VBD

became

VBD

NNSVBD

NNS

(Blunsom and Cohn ’10)

NNS

IN

IN

in

NN

Page 39: A Survey of  Unsupervised Grammar Induction

39

DMV Extensions-2(Blunsom and Cohn ’10)

Tree Substitution Grammar (TSG)◦Lexicalized trees◦Hierarchical prior

Different levels of backoff

Dir Acc: 67.7

VBD

VBN

VBD

became

VBD

NNSVBD

NNS

NNS

IN

IN

in

NN