methods in unsupervised dependency parsingrasooli/papers/candidacy2016.pdf · fully unsupervised...

118
Introduction Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency Parsing Mohammad Sadegh Rasooli Candidacy exam Department of Computer Science Columbia University April 1st, 2016 Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Upload: others

Post on 20-Sep-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Methods in Unsupervised Dependency Parsing

Mohammad Sadegh Rasooli

Candidacy examDepartment of Computer Science

Columbia University

April 1st, 2016

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 2: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Overview

1 IntroductionDependency GrammarDependency Parsing

2 Fully Unsupervised Parsing ModelsUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

3 Syntactic Transfer ModelsApproaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

4 Conclusion

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 3: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Dependency GrammarDependency Parsing

Dependency Grammar

I A formal grammar introduced by[Tesniere, 1959] inspired from thevalency theory in Chemistry

A

B

C

CH3

D

E

F

G

I In a dependency tree, each word has exactly one parentand can have as many dependents

I Benefit: explicit representation of syntactic roles

Economic news had little effect on financial markets .

nmodsbj nmod

obj

nmodnmod

pc

punc

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 4: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Dependency GrammarDependency Parsing

Dependency Grammar

I A formal grammar introduced by[Tesniere, 1959] inspired from thevalency theory in Chemistry

A

B

C

CH3

D

E

F

G

I In a dependency tree, each word has exactly one parentand can have as many dependents

I Benefit: explicit representation of syntactic roles

Economic news had little effect on financial markets .

nmodsbj nmod

obj

nmodnmod

pc

punc

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 5: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Dependency GrammarDependency Parsing

Dependency Parsing

I State-of-the-art parsing models are very accurateI Requirement: large amounts of annotated trees

I ≤50 treebanks available, '7000 languages without anytreebank

I Treebank development: an expensive and time-consumingtask

I Five years of work for the Penn Chinese Treebank[Hwa et al., 2005]

I Unsupervised dependency parsing is an alternativeapproach when no treebank is available

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 6: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Dependency GrammarDependency Parsing

Dependency Parsing

I State-of-the-art parsing models are very accurateI Requirement: large amounts of annotated trees

I ≤50 treebanks available, '7000 languages without anytreebank

I Treebank development: an expensive and time-consumingtask

I Five years of work for the Penn Chinese Treebank[Hwa et al., 2005]

I Unsupervised dependency parsing is an alternativeapproach when no treebank is available

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 7: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Overview

1 IntroductionDependency GrammarDependency Parsing

2 Fully Unsupervised Parsing ModelsUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

3 Syntactic Transfer ModelsApproaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

4 Conclusion

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 8: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Unsupervised Parsing

I Goal: Develop an accurate parser without annotated dataI Common assumptions

I Part-of-speech (POS) information is availableI Raw data is available

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 9: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Initial Attempts

I The seminal work of[Carroll and Charniak, 1992] and[Paskin, 2002] tried differenttechniques and achieved interestingresults

I Their models could not beat thebaseline of attaching every word tothe next word

Learning Language

Supervised NLP Unsupervised NLP

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 10: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

DMV: the First Breakthrough

I Dependency model with valence (DMV)[Klein and Manning, 2004] is the first model that could beatthe baseline

I Most papers extended the DMV either in the inferencemethod or parameter definition

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 11: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

The Dependency Model with Valence

I Input x, output y, p(x, y|θ) = p(y(0)|$, θ)I θc for dependency attachmentI θs for stopping getting dependentsI adj(j): true iff xj is adjacent to its parentI depdir(j) set of dependents for xj in direction dir

Recursive calculation

P (y(i)|xi, θ) =∏

dir∈{←,→}

θs(stop|xi, dir, [depdir(i)?= ∅])

×∏

j∈ydir(i)

(1− θs(stop|xi, dir, adj(j)))

× θc(xj |xi, dir)× P (y(j), θ)

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 12: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

DMV: A Running Example

ROOT PRN VB DT NN

P (y(0)

) = θc(VB|ROOT,→)× P (y(2)|VB, θ)

P (y(2)|V B, θ) =θs(stop|VB,←, true)× (1− θs(stop|VB,←, false))

×θc(PRN|VB,←)× P (y(1)|PRN, θ)

×θs(stop|VB,→, true)× (1− θs(stop|VB,→, false))

×θc(NN|VB,→)× P (y(4)|NN,θ)

P (y(1)|PRN, θ) =θs(stop|PRN,←, false)× θs(stop|PRN,→, false)

P (y(4)|NN, θ) =θs(stop|NN,←, true)× (1− θs(stop|NN,←, false))

×θc(DT |NN,←)× P (y(3)|DT, θ)

×θs(stop|NN,→, false)

P (y(3)|DT, θ) =θs(stop|DT,←, false)× θs(stop|DT,→, false)

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 13: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

DMV: A Running Example

ROOT PRN VB DT NN

P (y(0)

) = θc(VB|ROOT,→)× P (y(2)|VB, θ)

P (y(2)|V B, θ) =θs(stop|VB,←, true)× (1− θs(stop|VB,←, false))

×θc(PRN|VB,←)× P (y(1)|PRN, θ)

×θs(stop|VB,→, true)× (1− θs(stop|VB,→, false))

×θc(NN|VB,→)× P (y(4)|NN,θ)

P (y(1)|PRN, θ) =θs(stop|PRN,←, false)× θs(stop|PRN,→, false)

P (y(4)|NN, θ) =θs(stop|NN,←, true)× (1− θs(stop|NN,←, false))

×θc(DT |NN,←)× P (y(3)|DT, θ)

×θs(stop|NN,→, false)

P (y(3)|DT, θ) =θs(stop|DT,←, false)× θs(stop|DT,→, false)

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 14: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

DMV: A Running Example

ROOT PRN VB DT NN

P (y(0)

) = θc(VB|ROOT,→)× P (y(2)|VB, θ)

P (y(2)|V B, θ) =θs(stop|VB,←, true)× (1− θs(stop|VB,←, false))

×θc(PRN|VB,←)× P (y(1)|PRN, θ)

×θs(stop|VB,→, true)× (1− θs(stop|VB,→, false))

×θc(NN|VB,→)× P (y(4)|NN,θ)

P (y(1)|PRN, θ) =θs(stop|PRN,←, false)× θs(stop|PRN,→, false)

P (y(4)|NN, θ) =θs(stop|NN,←, true)× (1− θs(stop|NN,←, false))

×θc(DT |NN,←)× P (y(3)|DT, θ)

×θs(stop|NN,→, false)

P (y(3)|DT, θ) =θs(stop|DT,←, false)× θs(stop|DT,→, false)

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 15: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

DMV: A Running Example

ROOT PRN VB DT NN

P (y(0)

) = θc(VB|ROOT,→)× P (y(2)|VB, θ)

P (y(2)|V B, θ) =θs(stop|VB,←, true)× (1− θs(stop|VB,←, false))

×θc(PRN|VB,←)× P (y(1)|PRN, θ)

×θs(stop|VB,→, true)× (1− θs(stop|VB,→, false))

×θc(NN|VB,→)× P (y(4)|NN, θ)

P (y(1)|PRN, θ) =θs(stop|PRN,←, false)× θs(stop|PRN,→, false)

P (y(4)|NN, θ) =θs(stop|NN,←, true)× (1− θs(stop|NN,←, false))

×θc(DT |NN,←)× P (y(3)|DT, θ)

×θs(stop|NN,→, false)

P (y(3)|DT, θ) =θs(stop|DT,←, false)× θs(stop|DT,→, false)

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 16: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

DMV: A Running Example

ROOT PRN VB DT NN

P (y(0)

) = θc(VB|ROOT,→)× P (y(2)|VB, θ)

P (y(2)|V B, θ) =θs(stop|VB,←, true)× (1− θs(stop|VB,←, false))

×θc(PRN|VB,←)× P (y(1)|PRN, θ)

×θs(stop|VB,→, true)× (1− θs(stop|VB,→, false))

×θc(NN|VB,→)× P (y(4)|NN, θ)

P (y(1)|PRN, θ) =θs(stop|PRN,←, false)× θs(stop|PRN,→, false)

P (y(4)|NN, θ) =θs(stop|NN,←, true)× (1− θs(stop|NN,←, false))

×θc(DT |NN,←)× P (y(3)|DT, θ)

×θs(stop|NN,→, false)

P (y(3)|DT, θ) =θs(stop|DT,←, false)× θs(stop|DT,→, false)

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 17: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

DMV: A Running Example

ROOT PRN VB DT NN

P (y(0)

) = θc(VB|ROOT,→)× P (y(2)|VB, θ)

P (y(2)|V B, θ) =θs(stop|VB,←, true)× (1− θs(stop|VB,←, false))

×θc(PRN|VB,←)× P (y(1)|PRN, θ)

×θs(stop|VB,→, true)× (1− θs(stop|VB,→, false))

×θc(NN|VB,→)× P (y(4)|NN, θ)

P (y(1)|PRN, θ) =θs(stop|PRN,←, false)× θs(stop|PRN,→, false)

P (y(4)|NN, θ) =θs(stop|NN,←, true)× (1− θs(stop|NN,←, false))

×θc(DT |NN,←)× P (y(3)|DT, θ)

×θs(stop|NN,→, false)

P (y(3)|DT, θ) =θs(stop|DT,←, false)× θs(stop|DT,→, false)

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 18: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

DMV: A Running Example

ROOT PRN VB DT NN

P (y(0)

) = θc(VB|ROOT,→)× P (y(2)|VB, θ)

P (y(2)|V B, θ) =θs(stop|VB,←, true)× (1− θs(stop|VB,←, false))

×θc(PRN|VB,←)× P (y(1)|PRN, θ)

×θs(stop|VB,→, true)× (1− θs(stop|VB,→, false))

×θc(NN|VB,→)× P (y(4)|NN, θ)

P (y(1)|PRN, θ) =θs(stop|PRN,←, false)× θs(stop|PRN,→, false)

P (y(4)|NN, θ) =θs(stop|NN,←, true)× (1− θs(stop|NN,←, false))

×θc(DT |NN,←)× P (y(3)|DT, θ)

×θs(stop|NN,→, false)

P (y(3)|DT, θ) =θs(stop|DT,←, false)× θs(stop|DT,→, false)

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 19: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

DMV: Parameter Estimation

I Parameter estimation based on occurrence counts; e.g.

θc(wj |wi,→) =count(wi → wj)∑w′∈V count(wi → w′)

I In an unsupervised setting, we can use dynamicprogramming (the Inside-Outside algorithm[Lari and Young, 1990]) to estimate model parameters θ

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 20: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Problems with DMV

I A non-convex optimization problem forDMV

I Local optima is not necessarily a globaloptima

I Very sensitive to the initialization

I Encoding constraints is not embedded in the original model

I Lack of expressiveness

I Low supervised accuracy (upperbound)

I Needs inductive biasI Post-processing the DMV output by

fixing the determiner-noun directiongave a huge improvement[Klein and Manning, 2004]

DET NOUN

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 21: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Problems with DMV

I A non-convex optimization problem forDMV

I Local optima is not necessarily a globaloptima

I Very sensitive to the initialization

I Encoding constraints is not embedded in the original model

I Lack of expressiveness

I Low supervised accuracy (upperbound)

I Needs inductive biasI Post-processing the DMV output by

fixing the determiner-noun directiongave a huge improvement[Klein and Manning, 2004]

DET NOUN

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 22: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Problems with DMV

I A non-convex optimization problem forDMV

I Local optima is not necessarily a globaloptima

I Very sensitive to the initialization

I Encoding constraints is not embedded in the original model

I Lack of expressiveness

I Low supervised accuracy (upperbound)

I Needs inductive biasI Post-processing the DMV output by

fixing the determiner-noun directiongave a huge improvement[Klein and Manning, 2004]

DET NOUN

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 23: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Problems with DMV

I A non-convex optimization problem forDMV

I Local optima is not necessarily a globaloptima

I Very sensitive to the initialization

I Encoding constraints is not embedded in the original model

I Lack of expressiveness

I Low supervised accuracy (upperbound)

I Needs inductive biasI Post-processing the DMV output by

fixing the determiner-noun directiongave a huge improvement[Klein and Manning, 2004]

DET NOUN

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 24: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Problems with DMV

I A non-convex optimization problem forDMV

I Local optima is not necessarily a globaloptima

I Very sensitive to the initialization

I Encoding constraints is not embedded in the original model

I Lack of expressiveness

I Low supervised accuracy (upperbound)

I Needs inductive biasI Post-processing the DMV output by

fixing the determiner-noun directiongave a huge improvement[Klein and Manning, 2004]

DET NOUN

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 25: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Extensions to DMV

I Changing the learning algorithm from EMI Contrastive estimation [Smith and Eisner, 2005]

I Bayesian models [Headden III et al., 2009, Cohen and Smith, 2009a,

Blunsom and Cohn, 2010, Naseem et al., 2010,

Marecek and Straka, 2013]

I Local optima problemI Switching between different objectives [Spitkovsky et al., 2013]

I Lack of expressivenessI Lexicalization [Headden III et al., 2009]

I Parameter tying [Cohen and Smith, 2009b, Headden III et al., 2009]

I Tree substitution grammars [Blunsom and Cohn, 2010]

I Rereanking with a richer model [Le and Zuidema, 2015]

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 26: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Extensions to DMV

I Changing the learning algorithm from EMI Contrastive estimation [Smith and Eisner, 2005]

I Bayesian models [Headden III et al., 2009, Cohen and Smith, 2009a,

Blunsom and Cohn, 2010, Naseem et al., 2010,

Marecek and Straka, 2013]

I Local optima problemI Switching between different objectives [Spitkovsky et al., 2013]

I Lack of expressivenessI Lexicalization [Headden III et al., 2009]

I Parameter tying [Cohen and Smith, 2009b, Headden III et al., 2009]

I Tree substitution grammars [Blunsom and Cohn, 2010]

I Rereanking with a richer model [Le and Zuidema, 2015]

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 27: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Extensions to DMV

I Changing the learning algorithm from EMI Contrastive estimation [Smith and Eisner, 2005]

I Bayesian models [Headden III et al., 2009, Cohen and Smith, 2009a,

Blunsom and Cohn, 2010, Naseem et al., 2010,

Marecek and Straka, 2013]

I Local optima problemI Switching between different objectives [Spitkovsky et al., 2013]

I Lack of expressivenessI Lexicalization [Headden III et al., 2009]

I Parameter tying [Cohen and Smith, 2009b, Headden III et al., 2009]

I Tree substitution grammars [Blunsom and Cohn, 2010]

I Rereanking with a richer model [Le and Zuidema, 2015]

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 28: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Extensions to DMV

I Inductive biasI Adding constraints

I Posterior regularization [Gillenwater et al., 2010]I Forcing unambiguity [Tu and Honavar, 2012]I Universal knowledge [Naseem et al., 2010]

I Stop probability estimation from raw text[Marecek and Straka, 2013]

I Alternatives to DMVI Convex objective based on convex hull of plausible trees

[Grave and Elhadad, 2015]

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 29: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Extensions to DMV

I Inductive biasI Adding constraints

I Posterior regularization [Gillenwater et al., 2010]I Forcing unambiguity [Tu and Honavar, 2012]I Universal knowledge [Naseem et al., 2010]

I Stop probability estimation from raw text[Marecek and Straka, 2013]

I Alternatives to DMVI Convex objective based on convex hull of plausible trees

[Grave and Elhadad, 2015]

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 30: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Common Learning Algorithms for DMV

I Expectation maximization (EM) [Dempster et al., 1977]

I Posterior regularization (PR) [Ganchev et al., 2010]

I Variational Bayes (VB) [Beal, 2003]

I PR + VB [Naseem et al., 2010]

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 31: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Common Learning Algorithms for DMV

I Expectation maximization (EM) [Dempster et al., 1977]

I Posterior regularization (PR) [Ganchev et al., 2010]

I Variational Bayes (VB) [Beal, 2003]

I PR + VB [Naseem et al., 2010]

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 32: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Expectation Maximization (EM) Algorithm

I Start with initial parameters θ(t) in iteration t = 1

I Repeat until θ(t) ' θ(t+1)

I E step: Maximize the posterior probability

∀i = 1 . . . N ; ∀y ∈ Yxi

q(t)i ← pθ(t)(y|x) =

pθ(t)(xi, y)∑y′∈Yxi

pθ(t)(xi, y′)

I M step: Maximize the parameter values θ

θ(t+1) ← arg maxθ

N∑i=1

∑y∈Yxi

q(t)i (y) log pθ(xi, y)

I t← t+ 1

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 33: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Expectation Maximization (EM) Algorithm

I Start with initial parameters θ(t) in iteration t = 1

I Repeat until θ(t) ' θ(t+1)

I E step: Maximize the posterior probability

∀i = 1 . . . N ; ∀y ∈ Yxi

q(t)i ← pθ(t)(y|x) =

pθ(t)(xi, y)∑y′∈Yxi

pθ(t)(xi, y′)

Another interpretation of the E step [Neal and Hinton, 1998]

q(t) ← arg minq

KL(q(Y ) || pθ(t)(Y |X))

I t← t+ 1

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 34: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Expectation Maximization (EM) Algorithm

I Start with initial parameters θ(t) in iteration t = 1I Repeat until θ(t) ' θ(t+1)

M step

Optimal parameters for a categorical distribution is achieved bynormalization:

θ(t+1)(y|x) =

∑Ni=1 q

(t)i (y|x)∑

y′∑Ni=1 q

(t)i (y′|x)

I M step: Maximize the parameter values θ

θ(t+1) ← arg maxθ

N∑i=1

∑y∈Yxi

q(t)i (y) log pθ(xi, y)

I t← t+ 1

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 35: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Common Learning Algorithms for DMV

I Expectation maximization (EM) [Dempster et al., 1977]

I Posterior regularization (PR) [Ganchev et al., 2010]

I Variational Bayes (VB) [Beal, 2003]

I PR + VB [Naseem et al., 2010]

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 36: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Posterior Regularization

I Prior knowledge as constraint

I Just affects the E step and the M step remains unchanged

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 37: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Posterior Regularization

Original objective

q(t) ← arg minq

KL(q(Y ) || pθ(t)(Y |X))

Modified objective

q(t) ← arg minq

KL(q(Y ) || pθ(t)(Y |X)) + σ∑i

bi

s.t. ||Eq[φi(X,Y )]||β ≤ bi

σ is the regularization coefficient and bi is the proposed numericalconstraint for sentence i.

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 38: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Posterior Regularization Constraints

Modified objective

q(t) ← arg minq

KL(q(Y ) || pθ(t)(Y |X)) + σ∑i

bi

Types of constraints:

I Number of unique child-head tag pairs in a sentence (less isbetter) [Gillenwater et al., 2010]

I Number of preserved pre-defined linguistic rules in a tree(more is better) [Naseem et al., 2010]

I Information entropy of the sentence (less is better)[Tu and Honavar, 2012]

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 39: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Common Learning Algorithms for DMV

I Expectation maximization (EM) [Dempster et al., 1977]

I Posterior regularization (PR) [Ganchev et al., 2010]

I Variational Bayes (VB) [Beal, 2003]

I PR + VB [Naseem et al., 2010]

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 40: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Variational Bayes

I A Bayesian model that encodes prior information

I Just affects the M step and the E step remains unchanged

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 41: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Variational Bayes

M step

θ(t+1)(y|x) =

∑Ni=1 q

(t)i (y|x)∑

y′∑N

i=1 q(t)i (y′|x)

Modified M step in VB

θ(t+1)(y|x) =F(αy +

∑Ni=1 q

(t)i (y|x))

F(∑

y′ αy′ +∑N

i=1 q(t)i (y′|x))

α is the priorF(v) = eΨ(v)

Ψ is the digamma function

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 42: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Common Learning Algorithms for DMV

I Expectation maximization (EM) [Dempster et al., 1977]

I Posterior regularization (PR) [Ganchev et al., 2010]

I Variational Bayes (VB) [Beal, 2003]

I PR + VB [Naseem et al., 2010]

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 43: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

VB + PR

I Makes use of both methods [Naseem et al., 2010]:I E step as in PRI M step as in VB

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 44: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Discussion

I Significant improvements?I Yes!

I Satisfying performance?I No!

I Mostly optimized for EnglishI Far less than a supervised model

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 45: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Discussion

I Significant improvements?I Yes!

I Satisfying performance?I No!

I Mostly optimized for EnglishI Far less than a supervised model

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 46: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Discussion

I Significant improvements?I Yes!

I Satisfying performance?I No!

I Mostly optimized for EnglishI Far less than a supervised model

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 47: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Discussion

I Significant improvements?I Yes!

I Satisfying performance?I No!

I Mostly optimized for EnglishI Far less than a supervised model

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 48: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Unsupervised Parsing Improvement Over Time

Ra

nd

om

Ad

jace

nt

DM

V

20

08

20

09

20

10

20

11

20

12

20

13

20

15

DM

V-s

up

ervi

sed

Su

per

vise

d

30

40

50

60

70

80

90

100

30.1

33.635.9

un

lab

eled

dep

end

ency

acc

ura

cyo

nW

SJ

test

ing

da

ta

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

[Klein and Manning, 2004]

Page 49: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Unsupervised Parsing Improvement Over Time

Ra

nd

om

Ad

jace

nt

DM

V

20

08

20

09

20

10

20

11

20

12

20

13

20

15

DM

V-s

up

ervi

sed

Su

per

vise

d

30

40

50

60

70

80

90

100

30.1

33.635.9

40.5

un

lab

eled

dep

end

ency

acc

ura

cyo

nW

SJ

test

ing

da

ta

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

[Cohen et al., 2008]

Page 50: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Unsupervised Parsing Improvement Over Time

Ra

nd

om

Ad

jace

nt

DM

V

20

08

20

09

20

10

20

11

20

12

20

13

20

15

DM

V-s

up

ervi

sed

Su

per

vise

d

30

40

50

60

70

80

90

100

30.1

33.635.9

40.5 41.4

un

lab

eled

dep

end

ency

acc

ura

cyo

nW

SJ

test

ing

da

ta

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

[Cohen and Smith, 2009a]

Page 51: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Unsupervised Parsing Improvement Over Time

Ra

nd

om

Ad

jace

nt

DM

V

20

08

20

09

20

10

20

11

20

12

20

13

20

15

DM

V-s

up

ervi

sed

Su

per

vise

d

30

40

50

60

70

80

90

100

30.1

33.635.9

40.5 41.4

55.7

un

lab

eled

dep

end

ency

acc

ura

cyo

nW

SJ

test

ing

da

ta

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

[Blunsom and Cohn, 2010]

Page 52: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Unsupervised Parsing Improvement Over Time

Ra

nd

om

Ad

jace

nt

DM

V

20

08

20

09

20

10

20

11

20

12

20

13

20

15

DM

V-s

up

ervi

sed

Su

per

vise

d

30

40

50

60

70

80

90

100

30.1

33.635.9

40.5 41.4

55.759.1

un

lab

eled

dep

end

ency

acc

ura

cyo

nW

SJ

test

ing

da

ta

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

[Spitkovsky et al., 2011]

Page 53: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Unsupervised Parsing Improvement Over Time

Ra

nd

om

Ad

jace

nt

DM

V

20

08

20

09

20

10

20

11

20

12

20

13

20

15

DM

V-s

up

ervi

sed

Su

per

vise

d

30

40

50

60

70

80

90

100

30.1

33.635.9

40.5 41.4

55.759.1

61.2

un

lab

eled

dep

end

ency

acc

ura

cyo

nW

SJ

test

ing

da

ta

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

[Spitkovsky et al., 2012]

Page 54: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Unsupervised Parsing Improvement Over Time

Ra

nd

om

Ad

jace

nt

DM

V

20

08

20

09

20

10

20

11

20

12

20

13

20

15

DM

V-s

up

ervi

sed

Su

per

vise

d

30

40

50

60

70

80

90

100

30.1

33.635.9

40.5 41.4

55.759.1

61.264.4

un

lab

eled

dep

end

ency

acc

ura

cyo

nW

SJ

test

ing

da

ta

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

[Spitkovsky et al., 2013]

Page 55: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Unsupervised Parsing Improvement Over Time

Ra

nd

om

Ad

jace

nt

DM

V

20

08

20

09

20

10

20

11

20

12

20

13

20

15

DM

V-s

up

ervi

sed

Su

per

vise

d

30

40

50

60

70

80

90

100

30.1

33.635.9

40.5 41.4

55.759.1

61.264.4

66.2

un

lab

eled

dep

end

ency

acc

ura

cyo

nW

SJ

test

ing

da

ta

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

[Le and Zuidema, 2015]

Page 56: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Unsupervised Parsing Improvement Over Time

Ra

nd

om

Ad

jace

nt

DM

V

20

08

20

09

20

10

20

11

20

12

20

13

20

15

DM

V-s

up

ervi

sed

Su

per

vise

d

30

40

50

60

70

80

90

100

30.1

33.635.9

40.5 41.4

55.759.1

61.264.4

66.2

76.3

un

lab

eled

dep

end

ency

acc

ura

cyo

nW

SJ

test

ing

da

ta

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

15 minutes of programming to write down rules gives ' 60% accuracy!

Page 57: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

Unsupervised Parsing Improvement Over Time

Ra

nd

om

Ad

jace

nt

DM

V

20

08

20

09

20

10

20

11

20

12

20

13

20

15

DM

V-s

up

ervi

sed

Su

per

vise

d

30

40

50

60

70

80

90

100

30.1

33.635.9

40.5 41.4

55.759.1

61.264.4

66.2

76.3

94.4u

nla

bel

edd

epen

den

cya

ccu

racy

on

WS

Jte

stin

gd

ata

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 58: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Overview

1 IntroductionDependency GrammarDependency Parsing

2 Fully Unsupervised Parsing ModelsUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

3 Syntactic Transfer ModelsApproaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

4 Conclusion

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 59: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Syntactic Transfer Models

I Transfer Learning: learn a problem X and apply to a similar(but not the same) problem Y

I Challenges: feature mismatch, domain mismatch, and lack ofsufficient similarity between the two problems

I Syntactic transfer: Learn a parser for languages L1 . . .Lmand use them for parsing language Lm+1

I Challenges: mismatch in lexical features, difference in wordorder

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 60: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Syntactic Transfer Models

I Transfer Learning: learn a problem X and apply to a similar(but not the same) problem Y

I Challenges: feature mismatch, domain mismatch, and lack ofsufficient similarity between the two problems

I Syntactic transfer: Learn a parser for languages L1 . . .Lmand use them for parsing language Lm+1

I Challenges: mismatch in lexical features, difference in wordorder

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 61: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Approaches in Syntactic Transfer

I Direct transfer: train directly on treebanks for languagesL1 . . .Lm and apply it to language Lm+1

I Annotation projection: use parallel data and projectsupervised parse trees in language Ls to target languagethrough word alignment

I Treebank translation: develop an SMT system, translatesource treebanks to the target language, and train on thetranslated treebank [Tiedemann et al., 2014]

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 62: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Approaches in Syntactic Transfer

I Direct transfer: train directly on treebanks for languagesL1 . . .Lm and apply it to language Lm+1

I Annotation projection: use parallel data and projectsupervised parse trees in language Ls to target languagethrough word alignment

I Treebank translation: develop an SMT system, translatesource treebanks to the target language, and train on thetranslated treebank [Tiedemann et al., 2014]

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 63: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Approaches in Syntactic Transfer

I Direct transfer: train directly on treebanks for languagesL1 . . .Lm and apply it to language Lm+1

I Annotation projection: use parallel data and projectsupervised parse trees in language Ls to target languagethrough word alignment

I Treebank translation: develop an SMT system, translatesource treebanks to the target language, and train on thetranslated treebank [Tiedemann et al., 2014]

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 64: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Direct Syntactic Transfer

I A supervised parser gets input x and outputs the best treey∗, using lexical features φ(l)(x, y) and unlexicalizedfeatures φ(p)(x, y):

y∗(x) = arg maxy∈Y(x)

θl · φ(l)(x,y) + θp · φ(p)(x,y)

I A direct transfer model cannot make use of lexical features.

I Direct delexicalized transfer only uses unlexicalized features[Cohen et al., 2011, McDonald et al., 2011]

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 65: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Direct Syntactic Transfer

I A supervised parser gets input x and outputs the best treey∗, using lexical features φ(l)(x, y) and unlexicalizedfeatures φ(p)(x, y):

y∗(x) = arg maxy∈Y(x)

θl · φ(l)(x,y) + θp · φ(p)(x,y)

I A direct transfer model cannot make use of lexical features.

I Direct delexicalized transfer only uses unlexicalized features[Cohen et al., 2011, McDonald et al., 2011]

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 66: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Direct Syntactic Transfer

I A supervised parser gets input x and outputs the best treey∗, using lexical features φ(l)(x, y) and unlexicalizedfeatures φ(p)(x, y):

y∗(x) = arg maxy∈Y(x)

θl · φ(l)(x,y) + θp · φ(p)(x,y)

I A direct transfer model cannot make use of lexical features.

I Direct delexicalized transfer only uses unlexicalized features[Cohen et al., 2011, McDonald et al., 2011]

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 67: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Direct Delexicalized Transfer: Pros and Cons

Pros

I Simplicity: can employ any supervised parser

I More accurate than fully unsupervised models

Cons

I No treatment for word order difference

I Lack of lexical features

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 68: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Direct Delexicalized Transfer: Pros and Cons

Pros

I Simplicity: can employ any supervised parser

I More accurate than fully unsupervised models

Cons

I No treatment for word order difference

I Lack of lexical features

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 69: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Addressing Problems in Direct Delexicalized Transfer

Addressing problems in direct delexicalized transfer

I Word order difference

I Lack of lexical features

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 70: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Addressing Problems in Direct Delexicalized Transfer

Addressing problems in direct delexicalized transfer

I Word order difference

I Lack of lexical features

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 71: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

The World Atlas of Language Structures (WALS)

I The World Atlas of Language Structures (WALS)[Dryer and Haspelmath, 2013] is a large database of structural(phonological, grammatical, lexical) properties for near 3000languages

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 72: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Selective Sharing: Addressing Words Order Problem

I Use typological features such as the subject-verb order foreach source and target language.

I In addition to the original parameters, share typologicalfeatures for languages that have specific orderings in common

I Added features: original features conjoined with eachtypological feature

I Discriminative models with selective sharing gain very highaccuracies [Tackstrom et al., 2013, Zhang and Barzilay, 2015]

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 73: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Selective Sharing: Addressing Words Order Problem

I Use typological features such as the subject-verb order foreach source and target language.

I In addition to the original parameters, share typologicalfeatures for languages that have specific orderings in common

I Added features: original features conjoined with eachtypological feature

I Discriminative models with selective sharing gain very highaccuracies [Tackstrom et al., 2013, Zhang and Barzilay, 2015]

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 74: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Addressing Problems in Direct Delexicalized Transfer

Addressing problems in direct delexicalized transfer

I Word order difference

I Lack of lexical features

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 75: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Addressing the Lack of Lexical Features

I Using bilingual dictionaries to transfer lexical features[Durrett et al., 2012, Xiao and Guo, 2015]

I Creating cross-lingual word representationsI without parallel text [Duong et al., 2015]

I using parallel text [Zhang and Barzilay, 2015, Guo et al., 2016]

I Successful models use cross-lingual word representationsusing parallel text

I Could we leverage more if we have parallel text?I Yes!

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 76: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Addressing the Lack of Lexical Features

I Using bilingual dictionaries to transfer lexical features[Durrett et al., 2012, Xiao and Guo, 2015]

I Creating cross-lingual word representationsI without parallel text [Duong et al., 2015]

I using parallel text [Zhang and Barzilay, 2015, Guo et al., 2016]

I Successful models use cross-lingual word representationsusing parallel text

I Could we leverage more if we have parallel text?I Yes!

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 77: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Addressing the Lack of Lexical Features

I Using bilingual dictionaries to transfer lexical features[Durrett et al., 2012, Xiao and Guo, 2015]

I Creating cross-lingual word representationsI without parallel text [Duong et al., 2015]

I using parallel text [Zhang and Barzilay, 2015, Guo et al., 2016]

I Successful models use cross-lingual word representationsusing parallel text

I Could we leverage more if we have parallel text?I Yes!

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 78: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Annotation Projection

I Steps in annotation projection

1 Prepare bitext2 Align bitext3 Parse source sentence with a supervised parser4 Project dependencies5 Train on the projected dependencies

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 79: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Annotation Projection

I Steps in annotation projection

1 Prepare bitext2 Align bitext3 Parse source sentence with a supervised parser4 Project dependencies5 Train on the projected dependencies

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 80: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Annotation Projection

I Steps in annotation projection

1 Prepare bitext2 Align bitext3 Parse source sentence with a supervised parser4 Project dependencies5 Train on the projected dependencies

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 81: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Annotation Projection

I Steps in annotation projection

1 Prepare bitext2 Align bitext3 Parse source sentence with a supervised parser4 Project dependencies5 Train on the projected dependencies

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 82: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Annotation Projection

I Steps in annotation projection

1 Prepare bitext2 Align bitext3 Parse source sentence with a supervised parser4 Project dependencies5 Train on the projected dependencies

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 83: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Annotation Projection

I Steps in annotation projection

1 Prepare bitext2 Align bitext3 Parse source sentence with a supervised parser4 Project dependencies5 Train on the projected dependencies

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 84: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Projecting Dependencies from Parallel Data

Bitext

Prepare bitext

The political priorities must be set by this House and the MEPs . ROOT

Die politischen Prioritaten mussen von diesem Parlament und den Europaabgeordneten abgesteckt werden . ROOT

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 85: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Projecting Dependencies from Parallel Data

Align

Align bitext (e.g. via Giza++)

The political priorities must be set by this House and the MEPs . ROOT

Die politischen Prioritaten mussen von diesem Parlament und den Europaabgeordneten abgesteckt werden . ROOT

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 86: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Projecting Dependencies from Parallel Data

Parse

Parse source sentence with a supervised parser

The political priorities must be set by this House and the MEPs . ROOT

Die politischen Prioritaten mussen von diesem Parlament und den Europaabgeordneten abgesteckt werden . ROOT

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 87: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Projecting Dependencies from Parallel Data

Project

Project dependencies

The political priorities must be set by this House and the MEPs . ROOT

Die politischen Prioritaten mussen von diesem Parlament und den Europaabgeordneten abgesteckt werden . ROOT

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 88: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Projecting Dependencies from Parallel Data

Train

Train on the projected dependencies

Die politischen Prioritaten mussen von diesem Parlament und den Europaabgeordneten abgesteckt werden . ROOT

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 89: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Practical Problems

I Most translations are not word-to-wordI Partial alignments

I Alignment errors

I Supervised parsers are not perfect

I Difference in syntactic behavior across languages

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 90: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Approaches in Annotation Projection

I Post-processing alignments with rules and filtering sparsetrees [Hwa et al., 2005]

I Use projected dependencies as constraints in posteriorregularization [Ganchev et al., 2009]

I Use projected dependencies to lexicalize a direct model[McDonald et al., 2011]

I Entropy regularization on projected trees [Ma and Xia, 2014]

I Start with fully projected trees and self-train on partialtrees [Rasooli and Collins, 2015]

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 91: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Approaches in Annotation Projection

I Post-processing alignments with rules and filtering sparsetrees [Hwa et al., 2005]

I Use projected dependencies as constraints in posteriorregularization [Ganchev et al., 2009]

I Use projected dependencies to lexicalize a direct model[McDonald et al., 2011]

I Entropy regularization on projected trees [Ma and Xia, 2014]

I Start with fully projected trees and self-train on partialtrees [Rasooli and Collins, 2015]

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 92: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Approaches in Annotation Projection

I Post-processing alignments with rules and filtering sparsetrees [Hwa et al., 2005]

I Use projected dependencies as constraints in posteriorregularization [Ganchev et al., 2009]

I Use projected dependencies to lexicalize a direct model[McDonald et al., 2011]

I Entropy regularization on projected trees [Ma and Xia, 2014]

I Start with fully projected trees and self-train on partialtrees [Rasooli and Collins, 2015]

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 93: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Approaches in Annotation Projection

I Post-processing alignments with rules and filtering sparsetrees [Hwa et al., 2005]

I Use projected dependencies as constraints in posteriorregularization [Ganchev et al., 2009]

I Use projected dependencies to lexicalize a direct model[McDonald et al., 2011]

I Entropy regularization on projected trees [Ma and Xia, 2014]

I Start with fully projected trees and self-train on partialtrees [Rasooli and Collins, 2015]

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 94: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Approaches in Annotation Projection

I Post-processing alignments with rules and filtering sparsetrees [Hwa et al., 2005]

I Use projected dependencies as constraints in posteriorregularization [Ganchev et al., 2009]

I Use projected dependencies to lexicalize a direct model[McDonald et al., 2011]

I Entropy regularization on projected trees [Ma and Xia, 2014]

I Start with fully projected trees and self-train on partialtrees [Rasooli and Collins, 2015]

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 95: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Discussion

I Significant improvements?I Yes!

I Satisfying performance?I Yes!

I Mostly optimized for rich-resource languages

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 96: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Discussion

I Significant improvements?I Yes!

I Satisfying performance?I Yes!

I Mostly optimized for rich-resource languages

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 97: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Discussion

I Significant improvements?I Yes!

I Satisfying performance?I Yes!

I Mostly optimized for rich-resource languages

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 98: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Discussion

I Significant improvements?I Yes!

I Satisfying performance?I Yes!

I Mostly optimized for rich-resource languages

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 99: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Unsupervised Parsing Best Models Comparison

Un

sup

ervi

sed

Dir

ect

An

n.

Pro

j.

Su

per

vise

d

50

60

70

80

90

56.1

ave

rag

eu

nla

bel

edd

epen

den

cya

ccu

racy

on

6E

Ula

ng

ua

ges

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

[Grave and Elhadad, 2015]

Page 100: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Unsupervised Parsing Best Models Comparison

Un

sup

ervi

sed

Dir

ect

An

n.

Pro

j.

Su

per

vise

d

50

60

70

80

90

56.1

77.8

ave

rag

eu

nla

bel

edd

epen

den

cya

ccu

racy

on

6E

Ula

ng

ua

ges

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

[Ammar et al., 2016]

Page 101: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Unsupervised Parsing Best Models Comparison

Un

sup

ervi

sed

Dir

ect

An

n.

Pro

j.

Su

per

vise

d

50

60

70

80

90

56.1

77.8

82.2

ave

rag

eu

nla

bel

edd

epen

den

cya

ccu

racy

on

6E

Ula

ng

ua

ges

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

[Rasooli and Collins, 2015]

Page 102: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

Unsupervised Parsing Best Models Comparison

Un

sup

ervi

sed

Dir

ect

An

n.

Pro

j.

Su

per

vise

d

50

60

70

80

90

56.1

77.8

82.2

87.5a

vera

ge

un

lab

eled

dep

end

ency

acc

ura

cyo

n6

EU

lan

gu

ag

es

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 103: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Overview

1 IntroductionDependency GrammarDependency Parsing

2 Fully Unsupervised Parsing ModelsUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion

3 Syntactic Transfer ModelsApproaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion

4 Conclusion

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 104: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Conclusion

I Read 28+ papers aboutI Unsupervised dependency parsingI Direct cross-lingual transfer of dependency parsersI Annotation projection for cross-lingual transfer

I Seems that more effort may decrease the need for newtreebanks!

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 105: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Conclusion

I Read 28+ papers aboutI Unsupervised dependency parsingI Direct cross-lingual transfer of dependency parsersI Annotation projection for cross-lingual transfer

I Seems that more effort may decrease the need for newtreebanks!

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 106: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

Thanks

Thanks a lot

Danke sehr

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 107: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

References I

Ammar, W., Mulcaire, G., Ballesteros, M., Dyer, C., and Smith, N. A. (2016).One parser, many languages.arXiv preprint arXiv:1602.01595.

Beal, M. J. (2003).Variational algorithms for approximate Bayesian inference.University of London London.

Blunsom, P. and Cohn, T. (2010).Unsupervised induction of tree substitution grammars for dependency parsing.In Proceedings of the 2010 Conference on Empirical Methods in NaturalLanguage Processing, pages 1204–1213, Cambridge, MA. Association forComputational Linguistics.

Carroll, G. and Charniak, E. (1992).Two experiments on learning probabilistic dependency grammars from corpora.Department of Computer Science, Univ.

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 108: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

References II

Cohen, S. B., Das, D., and Smith, N. A. (2011).Unsupervised structure prediction with non-parallel multilingual guidance.In Proceedings of the 2011 Conference on Empirical Methods in NaturalLanguage Processing, pages 50–61, Edinburgh, Scotland, UK. Association forComputational Linguistics.

Cohen, S. B., Gimpel, K., and Smith, N. A. (2008).Logistic normal priors for unsupervised probabilistic grammar induction.In Advances in Neural Information Processing Systems, pages 321–328.

Cohen, S. B. and Smith, N. A. (2009a).Shared logistic normal distributions for soft parameter tying in unsupervisedgrammar induction.In Proceedings of Human Language Technologies: The 2009 Annual Conferenceof the North American Chapter of the Association for Computational Linguistics,NAACL ’09, pages 74–82, Stroudsburg, PA, USA. Association for ComputationalLinguistics.

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 109: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

References III

Cohen, S. B. and Smith, N. A. (2009b).Shared logistic normal distributions for soft parameter tying in unsupervisedgrammar induction.In Proceedings of Human Language Technologies: The 2009 Annual Conferenceof the North American Chapter of the Association for Computational Linguistics,pages 74–82. Association for Computational Linguistics.

Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977).Maximum likelihood from incomplete data via the em algorithm.Journal of the royal statistical society. Series B (methodological), pages 1–38.

Dryer, M. S. and Haspelmath, M., editors (2013).WALS Online.Max Planck Institute for Evolutionary Anthropology, Leipzig.

Duong, L., Cohn, T., Bird, S., and Cook, P. (2015).Cross-lingual transfer for unsupervised dependency parsing without parallel data.In Proceedings of the Nineteenth Conference on Computational NaturalLanguage Learning, pages 113–122, Beijing, China. Association forComputational Linguistics.

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 110: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

References IV

Durrett, G., Pauls, A., and Klein, D. (2012).Syntactic transfer using a bilingual lexicon.In Proceedings of the 2012 Joint Conference on Empirical Methods in NaturalLanguage Processing and Computational Natural Language Learning, pages1–11, Jeju Island, Korea. Association for Computational Linguistics.

Ganchev, K., Gillenwater, J., and Taskar, B. (2009).Dependency grammar induction via bitext projection constraints.In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACLand the 4th International Joint Conference on Natural Language Processing ofthe AFNLP, pages 369–377, Suntec, Singapore. Association for ComputationalLinguistics.

Ganchev, K., Graca, J., Gillenwater, J., and Taskar, B. (2010).Posterior regularization for structured latent variable models.The Journal of Machine Learning Research, 11:2001–2049.

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 111: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

References V

Gillenwater, J., Ganchev, K., Graca, J., Pereira, F., and Taskar, B. (2010).Sparsity in dependency grammar induction.In Proceedings of the ACL 2010 Conference Short Papers, pages 194–199.Association for Computational Linguistics.

Grave, E. and Elhadad, N. (2015).A convex and feature-rich discriminative approach to dependency grammarinduction.In Proceedings of the 53rd Annual Meeting of the Association for ComputationalLinguistics and the 7th International Joint Conference on Natural LanguageProcessing (Volume 1: Long Papers), pages 1375–1384, Beijing, China.Association for Computational Linguistics.

Guo, J., Che, W., Yarowsky, D., Wang, H., and Liu, T. (2016).A representation learning framework for multi-source transfer parsing.In The Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16), Phoenix,Arizona, USA.

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 112: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

References VI

Headden III, W. P., Johnson, M., and McClosky, D. (2009).Improving unsupervised dependency parsing with richer contexts and smoothing.In Proceedings of Human Language Technologies: The 2009 Annual Conferenceof the North American Chapter of the Association for Computational Linguistics,pages 101–109, Boulder, Colorado. Association for Computational Linguistics.

Hwa, R., Resnik, P., Weinberg, A., Cabezas, C., and Kolak, O. (2005).Bootstrapping parsers via syntactic projection across parallel texts.Natural language engineering, 11(03):311–325.

Klein, D. and Manning, C. D. (2004).Corpus-based induction of syntactic structure: Models of dependency andconstituency.In Proceedings of the 42Nd Annual Meeting on Association for ComputationalLinguistics, ACL ’04, Stroudsburg, PA, USA. Association for ComputationalLinguistics.

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 113: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

References VII

Lari, K. and Young, S. J. (1990).The estimation of stochastic context-free grammars using the inside-outsidealgorithm.Computer speech & language, 4(1):35–56.

Le, P. and Zuidema, W. (2015).Unsupervised dependency parsing: Let’s use supervised parsers.In Proceedings of the 2015 Conference of the North American Chapter of theAssociation for Computational Linguistics: Human Language Technologies,pages 651–661, Denver, Colorado. Association for Computational Linguistics.

Ma, X. and Xia, F. (2014).Unsupervised dependency parsing with transferring distribution via parallelguidance and entropy regularization.In Proceedings of the 52nd Annual Meeting of the Association for ComputationalLinguistics (Volume 1: Long Papers), pages 1337–1348, Baltimore, Maryland.Association for Computational Linguistics.

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 114: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

References VIII

Marecek, D. and Straka, M. (2013).Stop-probability estimates computed on a large corpus improve unsuperviseddependency parsing.In Proceedings of the 51st Annual Meeting of the Association for ComputationalLinguistics (Volume 1: Long Papers), pages 281–290, Sofia, Bulgaria.Association for Computational Linguistics.

McDonald, R., Petrov, S., and Hall, K. (2011).Multi-source transfer of delexicalized dependency parsers.In Proceedings of the 2011 Conference on Empirical Methods in NaturalLanguage Processing, pages 62–72, Edinburgh, Scotland, UK. Association forComputational Linguistics.

Naseem, T., Chen, H., Barzilay, R., and Johnson, M. (2010).Using universal linguistic knowledge to guide grammar induction.In Proceedings of the 2010 Conference on Empirical Methods in NaturalLanguage Processing, pages 1234–1244, Cambridge, MA. Association forComputational Linguistics.

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 115: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

References IX

Neal, R. M. and Hinton, G. E. (1998).A view of the em algorithm that justifies incremental, sparse, and other variants.In Learning in graphical models, pages 355–368. Springer.

Paskin, M. A. (2002).Grammatical digrams.Advances in Neural Information Processing Systems, 14(1):91–97.

Rasooli, M. S. and Collins, M. (2015).Density-driven cross-lingual transfer of dependency parsers.In Proceedings of the 2015 Conference on Empirical Methods in NaturalLanguage Processing, pages 328–338, Lisbon, Portugal. Association forComputational Linguistics.

Smith, N. A. and Eisner, J. (2005).Guiding unsupervised grammar induction using contrastive estimation.In Proceedings of IJCAI Workshop on Grammatical Inference Applications, pages73–82.

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 116: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

References X

Spitkovsky, V. I., Alshawi, H., Chang, A. X., and Jurafsky, D. (2011).Unsupervised dependency parsing without gold part-of-speech tags.In Proceedings of the conference on empirical methods in natural languageprocessing, pages 1281–1290. Association for Computational Linguistics.

Spitkovsky, V. I., Alshawi, H., and Jurafsky, D. (2012).Bootstrapping dependency grammar inducers from incomplete sentencefragments via austere models.In ICGI, pages 189–194.

Spitkovsky, V. I., Alshawi, H., and Jurafsky, D. (2013).Breaking out of local optima with count transforms and model recombination: Astudy in grammar induction.In Proceedings of the 2013 Conference on Empirical Methods in NaturalLanguage Processing, pages 1983–1995, Seattle, Washington, USA. Associationfor Computational Linguistics.

Tackstrom, O., McDonald, R., and Nivre, J. (2013).Target language adaptation of discriminative transfer parsers.Transactions for ACL.

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 117: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

References XI

Tesniere, L. (1959).Elements de syntaxe structurale.Librairie C. Klincksieck.

Tiedemann, J., Agic, v., and Nivre, J. (2014).Treebank translation for cross-lingual parser induction.In Proceedings of the Eighteenth Conference on Computational NaturalLanguage Learning, pages 130–140, Ann Arbor, Michigan. Association forComputational Linguistics.

Tu, K. and Honavar, V. (2012).Unambiguity regularization for unsupervised learning of probabilistic grammars.In Proceedings of the 2012 Joint Conference on Empirical Methods in NaturalLanguage Processing and Computational Natural Language Learning, pages1324–1334. Association for Computational Linguistics.

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing

Page 118: Methods in Unsupervised Dependency Parsingrasooli/papers/candidacy2016.pdf · Fully Unsupervised Parsing Models Syntactic Transfer Models Conclusion Methods in Unsupervised Dependency

IntroductionFully Unsupervised Parsing Models

Syntactic Transfer ModelsConclusion

References XII

Xiao, M. and Guo, Y. (2015).Annotation projection-based representation learning for cross-lingual dependencyparsing.In Proceedings of the Nineteenth Conference on Computational NaturalLanguage Learning, pages 73–82, Beijing, China. Association for ComputationalLinguistics.

Zhang, Y. and Barzilay, R. (2015).Hierarchical low-rank tensors for multilingual transfer parsing.In Conference on Empirical Methods in Natural Language Processing (EMNLP),Lisbon, Portugal.

Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing