a survey of unsupervised grammar induction

A Survey of Unsupervised Grammar InductionBaskaran Sankaran

Senior Supervisor:Dr Anoop Sarkar

School of Computing ScienceSimon Fraser University

MotivationLanguages have hidden

regularitieskaruppu naay puunaiyai thurathiyathuiruttil karuppu uruvam marainthathunaay thurathiya puunai vekamaaka ootiyathu

FORMAL STRUCTURES

Phrase-Structure

Sometimes the bribed became partners in the company

Phrase-Structure

Binarize, CNF

• Sparsity issue with words• Use POS tags

RB NP VP

IN DT NN

S ADVP @S@S NP VPVP VBD @VP@VP NP PPNP DT VBNNP DT NNNP NNSPP IN NPADVP

Evaluation Metric-1Unsupervised

Induction◦ Binarized output tree

Possibly unlabelledEvaluation

◦ Gold treebank parse◦ Recall - % of true

constituents found◦ Also precision and F-

scoreWall Street Journal

(WSJ) dataset

XX VBD

VBNDTRB

Dependency Structure

VBN NNS

VBN*DT

IN* NN

DT NN*

Sometimes

the company

bribed partners

became

Dependency Structure

VBDDT NNS NNIN DTVBNRB

Evaluation Metric-2Unsupervised Induction

◦Generates directed dependency arcsCompute (directed) attachment

accuracy◦Gold dependencies◦WSJ10 dataset

Unsupervised Grammar Induction

To learn the hidden structure of a language◦POS tag sequences as input◦Generates phrase-structure/ dependencies ◦No attempt to find the meaning

Overview◦Phrase-structure and dependency grammars◦Mostly on English (few on Chinese, German

etc.)◦Learning restricted to shorter sentences◦Significantly lags behind the supervised

methods

PHRASE-STRUCTURE INDUCTION

Toy ExampleCorpus

the dog bites a man dog sleeps

a dog bites a bone the man sleeps Grammar

S NP VP NP N N manVP V NP Det a N boneVP V Det the V sleepsNP Det N N dog V bites

EM for PCFG(Baker ’79; Lari and Young ’90)

Inside-Outside◦EM instance for probabilistic CFG

Generalization of Forward-backward for HMMs

◦Non-terminals are fixed◦Estimate maximum likelihood rule

probabilities

S NP VP V --> dogNP --> Det N Det --> manNP --> N N --> manVP --> V V --> manVP --> V NP Det --> boneVP --> NP V N --> boneDet --> the V --> bone

N --> the Det --> bites

V --> the N --> bitesDet --> a V --> bitesN --> a Det --> sleepsV --> a N --> sleepsDet --> dog V --> sleeps

N --> dog

S NP VP 1.0 V --> dogNP --> Det N 0.875 Det --> manNP --> N 0.125 N --> man 0.375VP --> V 0.5 V --> manVP --> V NP 0.5 Det --> boneVP --> NP V N --> bone 0.125Det --> the 0.428571 V --> bone

N --> the Det --> bites

V --> the N --> bitesDet --> a 0.571429 V --> bites 0.5N --> a Det --> sleepsV --> a N --> sleeps 0.5Det --> dog V --> sleeps

N --> dog 0.5

Inside-OutsideSometim

bribed

became

partners

company

@S NP VP

P(NP the bribed)

P(@S NP VP)

P(VP became … company)

P(S Sometimes @S)

Constraining Search

Sometimes

bribed

became

partners

company

(Pereira and Schabes ’92; Schabes et al. ’93)

Constraining Search(Pereira and Schabes ’92; Schabes et al. ’93; Hwa ’99)

Treebank bracketings◦Bracketing boundaries constrain

inductionWhat happens with limited

supervision?◦More bracketed data exposed iteratively◦0% bracketed data◦100% bracketed data

Right-branching baseline

Recall: 50.0Recall: 78.0

Recall: 76.0

Distributional clustering(Adriaans et al. ’00; Clark ’00; van Zaanen ’00)

Cluster the word sequences◦Context: adjacent words or

boundaries◦Relative frequency distribution of

contexts the black dog bites the manthe man eats an apple

Identifies constituents◦Evaluation on ATIS corpus

Recall: 35.6

Constituent-Context Model(Klein and Manning ’02)

• Valid constituents in a tree should not crossS

XX VBD

VBNDTRB

Constituent-Context ModelSometim

bribed

became

partners

company

RecallRight-branch:

70.0CCM: 81.6

DEPENDENCY INDUCTION

Dependency Model w/ Valence

(Klein and Manning ’04)Simple generative model

◦Choose head – P(Root)◦End – P(End | h, dir, v)

Attachment dir (right, left) Valence (head outward)

◦Argument – P(a | h, dir)

Dir Accuracy CCM: 23.8DMV: 43.2 Joint: 47.5

Sometimes

bribed

became

partners

company

• Head – P(Root)• Argument – P(a | h, dir)• End – P(End | h, dir, v)

DMV Extensions(Headden et al. ’09; Blunsom and Cohn ’10)

Extended Valence (EVG)◦Valence frames for the head

Allows different distributions over arguments

Lexicalization (L-EVG)Tree Substitution Grammar

◦Tree fragments instead of CFG rules

Dir Acc: 68.8

Dir Acc: 65.0

VBDDT NNS NNIN DTVBNRBSometimes the bribed became partners in the company

Dir Acc: 67.7

MULTILINGUAL SETTING

Bilingual Alignment & Parsing

(Wu ’97)Inversion Transduction Grammar

(ITG)◦Allows reordering S

e1 e2 e3 e4

f1 f2 f3 f4

Bilingual Parsing(Snyder et al. ’09)

Bilingual Parsing◦PP Attachment ambiguity

I saw (the student (from MIT)1 )2

◦Not ambiguous in Urdu)میں سے( آ�ئٹی ) 1یم علم( دیکھ� 2ط�لب کو

I ((MIT of) student) saw

Summary & OverviewEM for PCFG

Constrain with bracketingDistributional Clustering

CCMDMV

Contrastive EstimationEVG & L-EVGTSG + DMV

Data-oriented Parsing

Parametric Search Methods

Structural Search Methods

EM for PCFGConstrain with bracketing

Contrastive Estimation

Distributional Clustering

CCMDMV

EVG & L-EVGTSG + DMV

Data-oriented Parsing

•State-of-the-art• Phrase-structure (CCM +

DMV)Recall: 88.0

• Dependency (Lexicalized EVG)

Dir Acc: 68.8

PrototypePrototype

QUESTIONS?

Thanks!

regularities

regularities◦The guy in China◦… new leader in China◦That’s what I am asking you …◦I am telling you …

Issues with EM(Carroll and Charniak ’92; Periera and Schabes ’92; de

Marcken ’05)(Liang and Klein ’08; Spitkovsky et al. ’10)

Phrase-structure◦Finds local maxima instead of global◦Multiple ordered adjuctions

Both phrase-structure & dependency◦Disconnect between likelihood and

optimal grammar

Constituent-Context Model(Klein and Manning ’02)

CCM◦Only constituent identity◦Valid constituents in a tree should

not cross

Bootstrap phrases(Haghighi and Klein ’06)

Bootstrap with seed examples for constituents types◦Chosen from most frequent treebank

phrases◦Induces labels for constituents

Integrate with CCM◦CCM generates brackets

(constituents)◦Proto labels them

Recall: 59.6

Recall: 68.4

Dependency Model w/ Valence

(Klein and Manning ’04)Simple generative model

◦Choose head; attachment dir (right, left)

◦Valence (head outward) End of generation modelled separately

Dir Acc: 43.2

Learn from how not to speakContrastive Estimation (Smith

and Eisner ’05)◦Log-linear Model of dependency

Features: f(q, T) P(Root); P(a | h, dir); P(End | h, dir, v)

Conditional likelihood

Learn from how not to speak(Smith and Eisner ’05)

Contrastive Estimation Ex. the brown cat vs. cat brown the

◦Neighborhoods Transpose (Trans), delete & transpose

(DelOrTrans)

Dir Acc: 48.8

DMV Extensions-1(Cohen and Smith ’08, ’09)

Tying parameters◦Correlated Topic Model (CTM)

Correlation between different word types◦Two types of tying parameters

Logistic Normal (LN) Shared LN

Dir Acc: 61.3Dir Acc: 61.3

DMV Extensions-2VBD

DT NN*

Sometimes

the company

bribed

partners

became

NNSVBD

(Blunsom and Cohn ’10)

DMV Extensions-2(Blunsom and Cohn ’10)

Tree Substitution Grammar (TSG)◦Lexicalized trees◦Hierarchical prior

Different levels of backoff

Dir Acc: 67.7

became

NNSVBD

a survey of unsupervised grammar induction

hidden structure

det n0

constituent structure

det ndet

dogs np vp1

dependency arcscompute

dependency grammarsmostly

dependents9evaluation

Documents

outline grammar development platform parser/generator...

grammatical domains and syntax-guided grammar induction

year 7 induction booklet 2021 - queen mary's grammar school

unsupervised learning of event and-or grammar … learning...

prototype-driven grammar induction aria haghighi and dan...

bayesian synchronous tree-substitution grammar induction

unsupervised does not mean uninterpretable: the case for...

grammar induction with adios (automatic distillation of...

grammar induction through machine learning part 1

motif extraction and grammar induction in language and...

unsupervised induction of contingent event pairs from film

grammar induction for assistive domestic vocal interfaces

learning action symbols for hierarchical grammar induction

grammar induction: techniques and theory - univ - nantes

genetic programming for grammar induction - kth · abstract...

multilingual grammar induction with continuous language...

a survey of unsupervised grammar induction baskaran sankaran...

labeled grammar induction with minimal supervisionlabeled...

unsupervised syntactic category induction using multi-level...

1 grammatical inference vs grammar induction london 21-22...