a survey of unsupervised grammar induction

Post on 22-Feb-2016

41 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

A Survey of Unsupervised Grammar Induction. Baskaran Sankaran Senior Supervisor: Dr Anoop Sarkar. School of Computing Science Simon Fraser University. Motivation. Languages have hidden regularities karuppu naay puunaiyai thurathiyathu iruttil karuppu uruvam marainthathu - PowerPoint PPT Presentation

TRANSCRIPT

A Survey of Unsupervised Grammar InductionBaskaran Sankaran

Senior Supervisor:Dr Anoop Sarkar

School of Computing ScienceSimon Fraser University

2

MotivationLanguages have hidden

regularitieskaruppu naay puunaiyai thurathiyathuiruttil karuppu uruvam marainthathunaay thurathiya puunai vekamaaka ootiyathu

3

MotivationLanguages have hidden

regularitieskaruppu naay puunaiyai thurathiyathuiruttil karuppu uruvam marainthathunaay thurathiya puunai vekamaaka ootiyathu

4

FORMAL STRUCTURES

5

Phrase-Structure

Sometimes the bribed became partners in the company

6

Phrase-Structure

Binarize, CNF

• Sparsity issue with words• Use POS tags

S

ADVP

@S

RB NP VP

VBD

@VP

NP PP

DT

VBN

NNS

IN NP

IN DT NN

S ADVP @S@S NP VPVP VBD @VP@VP NP PPNP DT VBNNP DT NNNP NNSPP IN NPADVP

RB

IN IN

7

Evaluation Metric-1Unsupervised

Induction◦ Binarized output tree

Possibly unlabelledEvaluation

◦ Gold treebank parse◦ Recall - % of true

constituents found◦ Also precision and F-

scoreWall Street Journal

(WSJ) dataset

S

X X

XX VBD

X

X X

IN

X

VBNDTRB

NNS

NNDT

8

Dependency Structure

VBD

VBN NNS

VBN*DT

VBD*

IN

IN* NN

DT NN*

RB

Sometimes

the

NNS*

the company

bribed partners

became

in

9

Dependency Structure

VBDDT NNS NNIN DTVBNRB

Sometimes the bribed became partners in the company

10

Evaluation Metric-2Unsupervised Induction

◦Generates directed dependency arcsCompute (directed) attachment

accuracy◦Gold dependencies◦WSJ10 dataset

VBDDT NNS NNIN DTVBNRB

Sometimes the bribed became partners in the company

11

Unsupervised Grammar Induction

To learn the hidden structure of a language◦POS tag sequences as input◦Generates phrase-structure/ dependencies ◦No attempt to find the meaning

Overview◦Phrase-structure and dependency grammars◦Mostly on English (few on Chinese, German

etc.)◦Learning restricted to shorter sentences◦Significantly lags behind the supervised

methods

12

PHRASE-STRUCTURE INDUCTION

13

Toy ExampleCorpus

the dog bites a man dog sleeps

a dog bites a bone the man sleeps Grammar

S NP VP NP N N manVP V NP Det a N boneVP V Det the V sleepsNP Det N N dog V bites

14

EM for PCFG(Baker ’79; Lari and Young ’90)

Inside-Outside◦EM instance for probabilistic CFG

Generalization of Forward-backward for HMMs

◦Non-terminals are fixed◦Estimate maximum likelihood rule

probabilities

S NP VP V --> dogNP --> Det N Det --> manNP --> N N --> manVP --> V V --> manVP --> V NP Det --> boneVP --> NP V N --> boneDet --> the V --> bone

N --> the Det --> bites

V --> the N --> bitesDet --> a V --> bitesN --> a Det --> sleepsV --> a N --> sleepsDet --> dog V --> sleeps

N --> dog

S NP VP 1.0 V --> dogNP --> Det N 0.875 Det --> manNP --> N 0.125 N --> man 0.375VP --> V 0.5 V --> manVP --> V NP 0.5 Det --> boneVP --> NP V N --> bone 0.125Det --> the 0.428571 V --> bone

N --> the Det --> bites

V --> the N --> bitesDet --> a 0.571429 V --> bites 0.5N --> a Det --> sleepsV --> a N --> sleeps 0.5Det --> dog V --> sleeps

N --> dog 0.5

15

Inside-OutsideSometim

esthe

bribed

became

partners

in

the

company

@S NP VP

P(NP the bribed)

P(@S NP VP)

P(VP became … company)

P(S Sometimes @S)

16

Constraining Search

Sometimes

the

bribed

became

partners

in

the

company

(Pereira and Schabes ’92; Schabes et al. ’93)

17

Constraining Search(Pereira and Schabes ’92; Schabes et al. ’93; Hwa ’99)

Treebank bracketings◦Bracketing boundaries constrain

inductionWhat happens with limited

supervision?◦More bracketed data exposed iteratively◦0% bracketed data◦100% bracketed data

Right-branching baseline

Recall: 50.0Recall: 78.0

Recall: 76.0

18

Distributional clustering(Adriaans et al. ’00; Clark ’00; van Zaanen ’00)

Cluster the word sequences◦Context: adjacent words or

boundaries◦Relative frequency distribution of

contexts the black dog bites the manthe man eats an apple

Identifies constituents◦Evaluation on ATIS corpus

Recall: 35.6

19

Constituent-Context Model(Klein and Manning ’02)

• Valid constituents in a tree should not crossS

XX

X X

VBD

X

X

DT

VBN

X

XX

DT

NN

RB

NNS

IN

S

X X

XX VBD

X

X X

IN

X

VBNDTRB

NNS

NNDT

20

Constituent-Context ModelSometim

esthe

bribed

became

partners

in

the

company

DT

VBN

RB

VBD

RecallRight-branch:

70.0CCM: 81.6

S

XX

X X

VBD

X

X

DT

VBN

X

XX

DT

NN

RB

NNS

IN

21

DEPENDENCY INDUCTION

22

Dependency Model w/ Valence

(Klein and Manning ’04)Simple generative model

◦Choose head – P(Root)◦End – P(End | h, dir, v)

Attachment dir (right, left) Valence (head outward)

◦Argument – P(a | h, dir)

Dir Accuracy CCM: 23.8DMV: 43.2 Joint: 47.5

VBDDT NNS NNIN DTVBNRB

Sometimes the bribed became partners in the company

Sometimes

the

bribed

became

partners

in

the

company

• Head – P(Root)• Argument – P(a | h, dir)• End – P(End | h, dir, v)

23

DMV Extensions(Headden et al. ’09; Blunsom and Cohn ’10)

Extended Valence (EVG)◦Valence frames for the head

Allows different distributions over arguments

Lexicalization (L-EVG)Tree Substitution Grammar

◦Tree fragments instead of CFG rules

Dir Acc: 68.8

Dir Acc: 65.0

VBDDT NNS NNIN DTVBNRBSometimes the bribed became partners in the company

Dir Acc: 67.7

24

MULTILINGUAL SETTING

25

Bilingual Alignment & Parsing

(Wu ’97)Inversion Transduction Grammar

(ITG)◦Allows reordering S

X X

e2

f4

e1

f3

e4

f2

e3

f1

e1 e2 e3 e4

f1 f2 f3 f4

26

Bilingual Parsing(Snyder et al. ’09)

Bilingual Parsing◦PP Attachment ambiguity

I saw (the student (from MIT)1 )2

◦Not ambiguous in Urdu)میں سے( آ�ئٹی ) 1یم علم( دیکھ� 2ط�لب کو

I ((MIT of) student) saw

27

Summary & OverviewEM for PCFG

Constrain with bracketingDistributional Clustering

CCMDMV

Contrastive EstimationEVG & L-EVGTSG + DMV

Data-oriented Parsing

Parametric Search Methods

Structural Search Methods

EM for PCFGConstrain with bracketing

Contrastive Estimation

Distributional Clustering

CCMDMV

EVG & L-EVGTSG + DMV

Data-oriented Parsing

•State-of-the-art• Phrase-structure (CCM +

DMV)Recall: 88.0

• Dependency (Lexicalized EVG)

Dir Acc: 68.8

PrototypePrototype

28

QUESTIONS?

Thanks!

29

MotivationLanguages have hidden

regularities

30

MotivationLanguages have hidden

regularities◦The guy in China◦… new leader in China◦That’s what I am asking you …◦I am telling you …

31

Issues with EM(Carroll and Charniak ’92; Periera and Schabes ’92; de

Marcken ’05)(Liang and Klein ’08; Spitkovsky et al. ’10)

Phrase-structure◦Finds local maxima instead of global◦Multiple ordered adjuctions

Both phrase-structure & dependency◦Disconnect between likelihood and

optimal grammar

32

Constituent-Context Model(Klein and Manning ’02)

CCM◦Only constituent identity◦Valid constituents in a tree should

not cross

33

Bootstrap phrases(Haghighi and Klein ’06)

Bootstrap with seed examples for constituents types◦Chosen from most frequent treebank

phrases◦Induces labels for constituents

Integrate with CCM◦CCM generates brackets

(constituents)◦Proto labels them

Recall: 59.6

Recall: 68.4

34

Dependency Model w/ Valence

(Klein and Manning ’04)Simple generative model

◦Choose head; attachment dir (right, left)

◦Valence (head outward) End of generation modelled separately

Dir Acc: 43.2

VBDDT NNS NNIN DTVBNRB

Sometimes the bribed became partners in the company

35

Learn from how not to speakContrastive Estimation (Smith

and Eisner ’05)◦Log-linear Model of dependency

Features: f(q, T) P(Root); P(a | h, dir); P(End | h, dir, v)

Conditional likelihood

36

Learn from how not to speak(Smith and Eisner ’05)

Contrastive Estimation Ex. the brown cat vs. cat brown the

◦Neighborhoods Transpose (Trans), delete & transpose

(DelOrTrans)

Dir Acc: 48.8

37

DMV Extensions-1(Cohen and Smith ’08, ’09)

Tying parameters◦Correlated Topic Model (CTM)

Correlation between different word types◦Two types of tying parameters

Logistic Normal (LN) Shared LN

Dir Acc: 61.3Dir Acc: 61.3

38

DMV Extensions-2VBD

VBN

NNS

VBN*

DT

VBD*

IN

IN*

NN

DT NN*

RB

Sometimes

the

NNS*

the company

bribed

partners

became

in

VBD

VBN

VBD

became

VBD

NNSVBD

NNS

(Blunsom and Cohn ’10)

NNS

IN

IN

in

NN

39

DMV Extensions-2(Blunsom and Cohn ’10)

Tree Substitution Grammar (TSG)◦Lexicalized trees◦Hierarchical prior

Different levels of backoff

Dir Acc: 67.7

VBD

VBN

VBD

became

VBD

NNSVBD

NNS

NNS

IN

IN

in

NN

top related