lowresource++ semantic+role+labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… ·...

44
LowResource Semantic Role Labeling Matthew R. Gormley Margaret Mitchell Benjamin Van Durme Mark Dredze CLSP Seminar ; September 12, 2014 Low-Resource Semantic Role Labeling Matthew R. Gormley 1 Margaret Mitchell 2 Benjamin Van Durme 1 Mark Dredze 1 1 Human Language Technology Center of Excellence Johns Hopkins University, Baltimore, MD 21211 2 Microsoft Research Redmond, WA 98052 [email protected]| [email protected]| [email protected]| [email protected] Abstract We explore the extent to which high- resource manual annotations such as tree- banks are necessary for the task of se- mantic role labeling (SRL). We examine how performance changes without syntac- tic supervision, comparing both joint and pipelined methods to induce latent syn- tax. This work highlights a new applica- tion of unsupervised grammar induction and demonstrates several approaches to SRL in the absence of supervised syntax. Our best models obtain competitive results in the high-resource setting and state-of- the-art results in the low resource setting, reaching 72.48% F1 averaged across lan- guages. We release our code for this work along with a larger toolkit for specifying arbitrary graphical structure. 1 1 Introduction The goal of semantic role labeling (SRL) is to identify predicates and arguments and label their semantic contribution in a sentence. Such labeling defines who did what to whom, when, where and how. For example, in the sentence “The kids ran the marathon”, ran assigns a role to kids to denote that they are the runners; and a role tomarathonto denote that it is the race course. Models for SRL have increasingly come to rely on an array of NLP tools (e.g., parsers, lem- matizers) in order to obtain state-of-the-art re- sults (Bj¨ orkelund et al., 2009; Zhao et al., 2009). Each tool is typically trained on hand-annotated data, thus placing SRL at the end of a very high- resource NLP pipeline. However, richly annotated data such as that provided in parsing treebanks is expensive to produce, and may be tied to specific domains (e.g., newswire). Many languages do 1 http://www.cs.jhu.edu/ ˜ mrg/software/ not have such supervised resources (low-resource languages), which makes exploring SRL cross- linguistically difficult. The problem of SRL for low-resource lan- guages is an important one to solve, as solutions pave the way for a wide range of applications: Ac- curate identification of the semantic roles of enti- ties is a critical step for any application sensitive to semantics, from information retrieval to machine translation to question answering. In this work, we explore models that minimize the need for high-resource supervision. We ex- amine approaches in a joint setting where we marginalize over latent syntax to find the optimal semantic role assignment; and a pipeline setting where we first induce an unsupervised grammar. We find that the joint approach is a viable alterna- tive for making reasonable semantic role predic- tions, outperforming the pipeline models. These models can be effectively trained with access to only SRL annotations, and mark a state-of-the-art contribution for low-resource SRL. To better understand the effect of the low- resource grammars and features used in these models, we further include comparisons with (1) models that use higher-resource versions of the same features; (2) state-of-the-art high resource models; and (3) previous work on low-resource grammar induction. In sum, this paper makes several experimental and modeling contributions, summarized below. Experimental contributions: Comparison of pipeline and joint models for SRL. Subtractive experiments that consider the re- moval of supervised data. Analysis of the induced grammars in un- supervised, distantly-supervised, and joint training settings. See our paper from ACL ‘14

Upload: others

Post on 05-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

Low-­‐Resource    Semantic  Role  Labeling  

Matthew  R.  Gormley  Margaret  Mitchell    

Benjamin  Van  Durme  Mark  Dredze  

CLSP  Seminar  ;  September  12,  2014  

Low-Resource Semantic Role Labeling

Matthew R. Gormley1 Margaret Mitchell2 Benjamin Van Durme1 Mark Dredze1

1 Human Language Technology Center of Excellence

Johns Hopkins University, Baltimore, MD 21211

2 Microsoft Research

Redmond, WA 98052

m

r

g

@

c

s

.

j

h

u

.

e

d

u

| memi

t

c

@

m

i

c

r

o

s

o

f

t

.

c

o

m

| vand

u

r

m

e

@

c

s

.

j

h

u

.

e

d

u

| mdre

d

z

e

@

c

s

.

j

h

u

.

e

d

u

Abstract

We explore the extent to which high-

resource manual annotations such as tree-

banks are necessary for the task of se-

mantic role labeling (SRL). We examine

how performance changes without syntac-

tic supervision, comparing both joint and

pipelined methods to induce latent syn-

tax. This work highlights a new applica-

tion of unsupervised grammar induction

and demonstrates several approaches to

SRL in the absence of supervised syntax.

Our best models obtain competitive results

in the high-resource setting and state-of-

the-art results in the low resource setting,

reaching 72.48% F1 averaged across lan-

guages. We release our code for this work

along with a larger toolkit for specifying

arbitrary graphical structure.1

1 Introduction

The goal of semantic role labeling (SRL) is to

identify predicates and arguments and label their

semantic contribution in a sentence. Such labeling

defines who did what to whom, when, where and

how. For example, in the sentence “The kids ran

the marathon”, ran assigns a role to kids to denote

that they are the runners; and a role to marathon to

denote that it is the race course.

Models for SRL have increasingly come to rely

on an array of NLP tools (e.g., parsers, lem-

matizers) in order to obtain state-of-the-art re-

sults (Bjorkelund et al., 2009; Zhao et al., 2009).

Each tool is typically trained on hand-annotated

data, thus placing SRL at the end of a very high-

resource NLP pipeline. However, richly annotated

data such as that provided in parsing treebanks is

expensive to produce, and may be tied to specific

domains (e.g., newswire). Many languages do

1h

t

t

p

:

/

/

w

w

w

.

c

s

.

j

h

u

.

e

d

u

m

r

g

/

s

o

f

t

w

a

r

e

/

not have such supervised resources (low-resource

languages), which makes exploring SRL cross-

linguistically difficult.

The problem of SRL for low-resource lan-

guages is an important one to solve, as solutions

pave the way for a wide range of applications: Ac-

curate identification of the semantic roles of enti-

ties is a critical step for any application sensitive to

semantics, from information retrieval to machine

translation to question answering.

In this work, we explore models that minimize

the need for high-resource supervision. We ex-

amine approaches in a joint setting where we

marginalize over latent syntax to find the optimal

semantic role assignment; and a pipeline setting

where we first induce an unsupervised grammar.

We find that the joint approach is a viable alterna-

tive for making reasonable semantic role predic-

tions, outperforming the pipeline models. These

models can be effectively trained with access to

only SRL annotations, and mark a state-of-the-art

contribution for low-resource SRL.

To better understand the effect of the low-

resource grammars and features used in these

models, we further include comparisons with (1)

models that use higher-resource versions of the

same features; (2) state-of-the-art high resource

models; and (3) previous work on low-resource

grammar induction. In sum, this paper makes

several experimental and modeling contributions,

summarized below.

Experimental contributions:

Comparison of pipeline and joint models for

SRL.

Subtractive experiments that consider the re-

moval of supervised data.

Analysis of the induced grammars in un-

supervised, distantly-supervised, and joint

training settings.

See  our  

paper  from

 

ACL  ‘14  

Page 2: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

Shallow  Semantics  Our  End  Task:  

–  Representation:  Semantic  Role  Labeling  (SRL)  •  Intuitively  captures  who  did  what  to  whom,  when  and  where  •  Similar  to  open-­‐domain  relation  extraction  

–  Languages:  Catalan,  Czech,  German,  English,  Spanish,  Chinese    

NNP VBD DT NNP

Morsi chaired the FJP

NN NNP VBZ NN

Pres. #morsi cre8s unrest

NNP RB VBD DT NN

Proyas tb dirigio’ la peli

Agent Theme Agent Theme Holder Agent Patient Role

Page 3: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

Syntax  Intermediate  Tasks:  –  Dependency  parsing  

•  Captures  the  structure  of  the  sentence  •  Diverges  from  the  shallow  semantics  representation  

–  Part-­‐of-­‐speech  (POS)  tagging    

NNP VBD DT NNP

Morsi chaired the FJP

NN NNP VBZ NN

Pres. #morsi cre8s unrest

NNP RB VBD DT NN

Proyas tb dirigio’ la peli

Agent Theme Agent Theme Agent Patient Holder Role

Page 4: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

Agent! Theme!Holder!Role!

NN! NNP !VBZ! NN!

num=s! num=s! per=3s "tense=p!

num=s!

president! morsi! create! unrest!

President! Morsi! creates! unrest!

Background:    The  Supervised  SRL  Pipeline  

Pipelined  Training:  Train  each  component  of  the  pipeline  independently  using  the  predictions  of  the  previous  stage(s)  as  features.  

Tool  

Semantic  role  labeler  

Dependency  parser  

Part-­‐of-­‐speech  tagger  

Morphological  feature  extractor  

Lemmatizer  

•  Costly  •  Does  not  work  well  on  informal  text  •  Resources  not  available  across  

languages  

Page 5: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

Agent! Theme!Holder!Role!

NN! NNP !VBZ! NN!

num=s! num=s! per=3s "tense=p!

num=s!

president! morsi! create! unrest!

President! Morsi! creates! unrest!

Background:    The  Supervised  SRL  Pipeline  

Pipelined  Training:  Train  each  component  of  the  pipeline  independently  using  the  predictions  of  the  previous  stage(s)  as  features.  

Tool  

Semantic  role  labeler  

Dependency  parser  

Part-­‐of-­‐speech  tagger  

Morphological  feature  extractor  

Lemmatizer  

Our

 Emph

asis  

Page 6: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

This  Talk  in  a  Nutshell  

•  We  want  to  do  SRL  in  a  new  language.  

•  Syntax  helps,  but  is  very  expensive  to  annotate.  

   •  Having  annotated  syntax  would  be  nice,  but  we  can  make  progress  without  it!  

   

Supervised  Annotation   Cost  

Semantic  roles   $$$  Dependency  parses   $$$$$$$  Part-­‐of-­‐speech  (POS)  tags   $$  Morphology   $$$  Lemmas   $  

Page 7: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

This  Talk  in  a  Nutshell  

•  We  want  to  do  SRL  in  a  new  language.  

•  Syntax  helps,  but  is  very  expensive  to  annotate.  

   •  Having  annotated  syntax  would  be  nice,  but  we  can  make  progress  without  it!  

   

Supervised  Annotation   Cost  

Semantic  roles   $$$  Dependency  parses   $$$$$$$  Part-­‐of-­‐speech  (POS)  tags   $$  Morphology   $$$  Lemmas   $  

Page 8: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

Supervised  SRL  as  a  Factor  Graph  

2  1   3   4  R2,1 R1,2 R3,2 R2,3

R3,1 R1,3

R4,3 R3,4

R4,2 R2,4

R4,1 R1,4

Juan_Carlos   su  abdica   reino  

•  Jointly  identify  and  classify  semantic  roles  

•  One  variable  per  possible  labeled  edge  

•  O(n2)  independent  logistic  regressions  

Juan_Carlos   his  abdicates   throne  

Page 9: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

Supervised  SRL  as  a  Factor  Graph  •  Jointly  identify  and  

classify  semantic  roles  •  One  variable  per  

possible  labeled  edge  •  O(n2)  independent  

logistic  regressions  

2  1   3   4  Ag. ! ! !

! !

! !

! Th.

Hl. !

Juan_Carlos   su  abdica   reino  

Agent Theme Holder

Page 10: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

Supervised  SRL  as  a  Factor  Graph  •  Jointly  identify  and  

classify  semantic  roles  •  One  variable  per  

possible  labeled  edge  •  O(n2)  independent  

logistic  regressions  

2  1   3   4  Ag. ! ! Th.

! !

! !

! !

Ti. !

Juan_Carlos   su  abdica   reino  

Agent Theme Time

Page 11: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

High  Resource  

Low  Resource  

Joint  Pipeline  

Marginalized  DMV+C    

Pro:  high  accuracy  parsers  Con:  expensive  

Pro:  easy  to    throw  in  lots  of  features  Con:  propagation  of  errors  

Pro:  confidence    flows  between    levels  of  the    model  Con:  features    must  permit    efficient  inference  

DMV    

Pro:  cheap,  easily  deployable  Con:  low  accuracy  latent  syntax  

Page 12: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

Contributions  •  Experimental  contributions:  

–  Comparison  of  pipeline  and  joint  models  for  SRL.  –  Subtractive  experiments  that  consider  the  removal  of  supervised  

data.  –  Analysis  of  the  induced  grammars  in  (1)  unsupervised,  (2)  distantly-­‐

supervised,  and  (3)  joint  training  settings.  •  Modeling  contributions:  

–  Simpler  joint  CRF  for  syntactic  and  semantic  dependency  parsing  than  previously  reported.  

–  New  application  of  unsupervised  grammar  induction:  low-­‐resource  SRL.  

–  Constrained  grammar  induction  using  SRL  for  distant-­‐supervision.  –  Use  of  Brown  clusters  in  place  of  POS  tags  for  low-­‐resource  SRL.  

Page 13: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

Three  Training  Settings  for  Latent  Syntax  

1.  Fully  Unsupervised  (DMV)  2.  Distantly  Supervised  (DMV+C)  3.  Jointly  Learned  with  SRL  (Marginalized)  

Page 14: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

1.  Fully  Unsupervised  

A.  Brown  clusters  (Brown  et  al.,  1992)  in  place  of  POS  tags  •  Clusters  formed  by  hierarchical  

clustering;  maximizes  likelihood  under  latent-­‐class  bigram  model  

B.  Syntax  from  Dependency  Model  with  Valence  (DMV)    (Klein  &  Manning,  2004)  

•  Children  generated  recursively  •  Viterbi  EM  training  (Spitkovsky  et  al.,  2010)  

 

Latent  Synt

ax  

Page 15: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

2.  Distantly  Supervised  •  Viterbi  EM  training  of  DMV  •  Observes  semantic  graph  during  training  •  Constrain  a  CKY  parser  in  E-­‐step  to  respect  SRL  

Algorithm:  1.  Define  DMV  as  a  

PCFG  (Cohn  et  al.,  2010)  

2.  CKY  parse  (Younger,  1967;  Aho  and  Ullman,  1972)  

3.  Populate  cells  with  SRL-­‐compatible  non-­‐terminals  

Latent  Synt

ax  

Page 16: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

Juan_Carlos   su  abdica   reino  <WALL>  

Agent Theme Holder

2.  Distantly  Supervised  •  Viterbi  EM  training  of  DMV  •  Observes  semantic  graph  during  training  •  Constrain  a  CKY  parser  in  E-­‐step  to  respect  SRL  

Algorithm:  1.  Define  DMV  as  a  

PCFG  (Cohn  et  al.,  2010)  

2.  CKY  parse  (Younger,  1967;  Aho  and  Ullman,  1972)  

3.  Populate  cells  with  SRL-­‐compatible  non-­‐terminals  

Latent  Synt

ax  

Page 17: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

Juan_Carlos   su  abdica   reino  <WALL>  

Agent Theme Holder

2.  Distantly  Supervised  •  Viterbi  EM  training  of  DMV  •  Observes  semantic  graph  during  training  •  Constrain  a  CKY  parser  in  E-­‐step  to  respect  SRL  

Algorithm:  1.  Define  DMV  as  a  

PCFG  (Cohn  et  al.,  2010)  

2.  CKY  parse  (Younger,  1967;  Aho  and  Ullman,  1972)  

3.  Populate  cells  with  SRL-­‐compatible  non-­‐terminals  

Latent  Synt

ax  

Page 18: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

3.  Joint  Model  

Marginalize  over  latent  syntax  to  find  the  optimal  semantic  role  assignment  •  Model:  Slight  simplification  of  Naradowsky  et  

al.  (2012).    Jointly  identify  and  classify  semantic  roles.  

•  Inference:  Belief  propagation  with  inside-­‐outside  algorithm  embedded  in  global  factor  (Smith  &  Eisner,  2008)  

•  Brown  clusters  in  place  of  POS  tags  

Latent  Synt

ax  

Page 19: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

3.  Joint  Model  Laten

t  Syntax  

0   2  1   3   4  Juan_Carlos   su  abdica   reino  <WALL>  

✔ ! ! !

! !

✔ !

! ✔

! !

!

!

!

How  do  we  encode  a  syntactic  dependency  tree  with  binary  variables?  

Page 20: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

3.  Joint  Model  Laten

t  Syntax  

0   2  1   3   4  Juan_Carlos   su  abdica   reino  <WALL>  

✔ ! ! !

! !

✔ !

! ✔

! !

!

!

!

How  do  we  encode  a  syntactic  dependency  tree  with  binary  variables?  

Page 21: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

3.  Joint  Model  Laten

t  Syntax  

0   2  1   3   4  Juan_Carlos   su  abdica   reino  <WALL>  

✔ ! ! ✔

! !

! !

! ✔

! !

!

!

!

How  do  we  encode  a  syntactic  dependency  tree  with  binary  variables?  

Page 22: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

3.  Joint  Model  Laten

t  Syntax  

0   2  1   3   4  Juan_Carlos   su  abdica   reino  <WALL>  

✔ ! ! !

! !

✔ !

! ✔

! !

!

!

!

How  do  we  encode  a  syntactic  dependency  tree  with  binary  variables?  

Page 23: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

3.  Joint  Model  Laten

t  Syntax  

0   2  1   3   4  Juan_Carlos   su  abdica   reino  <WALL>  

✔ ! ! !

! !

✔ !

! ✔

! !

!

!

!

Ag. ! ! !

! !

! !

! Th.

Hl. !

Agent Theme Holder

Now  we  jointly  encode  the  semantic  roles  and  the  syntactic  dependency  tree.  

Page 24: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

3.  Joint  Model  Laten

t  Syntax  

0   2  1   3   4  Juan_Carlos   su  abdica   reino  <WALL>  

Ag.

!

!

!

!

!

! !

!

!

!

!

!

! !

!

Th.

✔ Hl.

!

!

!

!

!

!

Agent Theme Holder

Now  we  jointly  encode  the  semantic  roles  and  the  syntactic  dependency  tree.  

Page 25: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

word(p) lemma(p) pos(p) bc0(p) bc1(p) morpho(p) deprel(p) lc(p) chpre5(p) capitalized(p) wordTopN(p) morpho1(p) morpho2(p) morpho3(p) eachmorpho(p) pos(-1(p)) deprel(-1(p)) bc0(-1(p)) pos(1(p)) deprel(1(p)) bc0(1(p)) pos(head(p)) deprel(head(p)) bc0(head(p)) pos(lns(p)) deprel(lns(p)) bc0(lns(p)) pos(rns(p)) deprel(rns(p)) bc0(rns(p)) pos(lmc(p)) deprel(lmc(p)) bc0(lmc(p)) pos(rmc(p)) deprel(rmc(p)) bc0(rmc(p)) pos(lnc(p)) deprel(lnc(p)) bc0(lnc(p)) pos(rnc(p)) deprel(rnc(p)) bc0(rnc(p)) pos(lowsv(p)) deprel(lowsv(p)) bc0(lowsv(p)) pos(lowsn(p)) deprel(lowsn(p)) bc0(lowsn(p)) pos(highsv(p)) deprel(highsv(p)) bc0(highsv(p)) pos(highsn(p)) deprel(highsn(p)) bc0(highsn(p)) word(c) lemma(c) pos(c) bc0(c) bc1(c) morpho(c) deprel(c) lc(c) chpre5(c) capitalized(c) wordTopN(c) morpho1(c) morpho2(c) morpho3(c) eachmorpho(c) pos(-1(c)) deprel(-1(c)) bc0(-1(c)) pos(1(c)) deprel(1(c)) bc0(1(c)) pos(head(c)) deprel(head(c)) bc0(head(c)) pos(lns(c)) deprel(lns(c)) bc0(lns(c)) pos(rns(c)) deprel(rns(c)) bc0(rns(c)) pos(lmc(c)) deprel(lmc(c)) bc0(lmc(c)) pos(rmc(c)) deprel(rmc(c)) bc0(rmc(c)) pos(lnc(c)) deprel(lnc(c)) bc0(lnc(c)) pos(rnc(c)) deprel(rnc(c)) bc0(rnc(c)) pos(lowsv(c)) deprel(lowsv(c)) bc0(lowsv(c)) pos(lowsn(c)) deprel(lowsn(c)) bc0(lowsn(c)) pos(highsv(c)) deprel(highsv(c)) bc0(highsv(c)) pos(highsn(c)) deprel(highsn(c)) bc0(highsn(c)) pos(seq(line(p,c))) deprel(seq(line(p,c))) bc0(seq(line(p,c))) pos(seq(children(p))) deprel(seq(children(p))) bc0(seq(children(p))) pos(seq(path(p,c))) deprel(seq(path(p,c))) bc0(seq(path(p,c))) pos(dir(seq(path(p,c)))) deprel(dir(seq(path(p,c)))) bc0(dir(seq(path(p,c)))) relative(p,c) distance(p,c) geneology(p,c) len(path(p,c)) continuity(path(p,c)) pathGrams sentlen

Features  and  Feature  Selection  

•  Define  millions  of  features  using  100+  feature  templates  

•  Incorporate  feature  ideas  from:  –  Koo  et  al.  (2008)  –  Björkelund  et  al.  (2009)  –  Zhao  et  al.  (2009)  –  Lluís  et  al  (2013)  

Unigram  Templates:  

What  about  pairs  of  unigram  templates?  

Page 26: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

Features  and  Feature  Selection  

Use  Information  Gain  (IG)  to  find  top  unigram  templates  (Martins  et  al.,  2011)  

   Then  combine  top  unigram  templates  to  find  top  bigram  templates.  

Property Possible values1 word form all word forms2 lower case word form all lower-case forms3 5-char word form prefixes all 5-char form prefixes4 capitalization True, False5 top-800 word form top-800 word forms6 brown cluster 000, 1100, 010110001, ...7 brown cluster, length 5 length 5 prefixes of brown clusters8 lemma all word lemmas9 POS tag NNP, CD, JJ, DT, ...10 morphological features Gender, Case, Number, ...

(different across languages)11 dependency label SBJ, NMOD, LOC, ...12 edge direction Up, Down

Table 1: Word and edge properties in templates.

i, i-1, i+1 noFarChildren(wi

) linePath(wp

, wc

)parent(w

i

) rightNearSib(wi

) depPath(wp

, wc

)allChildren(w

i

) leftNearSib(wi

) depPath(wp

, wlca

)rightNearChild(w

i

) firstVSupp(wi

) depPath(wc

, wlca

)rightFarChild(w

i

) lastVSupp(wi

) depPath(wlca

, wroot

)leftNearChild(w

i

) firstNSupp(wi

)leftFarChild(w

i

) lastNSupp(wi

)

Table 2: Word positions used in templates. Basedon current word position (i), positions related tocurrent word w

i

, possible parent, child (wp

, wc

),lowest common ancestor between parent/child(w

lca

), and syntactic root (wroot

).

train our CRF models by maximizing conditionallog-likelihood using stochastic gradient descentwith an adaptive learning rate (AdaGrad) (Duchiet al., 2011) over mini-batches.

The unary and binary factors are defined withexponential family potentials. In the next section,we consider binary features of the observations(the sentence and labels from previous pipelinestages) which are conjoined with the state of thevariables in the factor.

3.3 Features for CRF Models

Our feature design stems from two key ideas.First, for SRL, it has been observed that fea-ture bigrams (the concatenation of simple fea-tures such as a predicate’s POS tag and an ar-gument’s word) are important for state-of-the-art(Zhao et al., 2009; Bjorkelund et al., 2009). Sec-ond, for syntactic dependency parsing, combiningBrown cluster features with word forms or POStags yields high accuracy even with little trainingdata (Koo et al., 2008).

We create binary indicator features for eachmodel using feature templates. Our feature tem-plate definitions build from those used by the topperforming systems in the CoNLL-2009 SharedTask, Zhao et al. (2009) and Bjorkelund et al.(2009) and from features in syntactic dependencyparsing (McDonald et al., 2005; Koo et al., 2008).

Template Possible valuesrelative position before, after, ondistance, continuity Z+

binned distance > 2, 5, 10, 20, 30, or 40geneological relationship parent, child, ancestor, descendantpath-grams the NN went

Table 3: Additional standalone templates.

Template Creation Feature templates are de-fined over triples of hproperty, positions, orderi.Properties, listed in Table 1, are extracted fromword positions within the sentence, shown in Ta-ble 2. Single positions for a word w

i

includeits syntactic parent, its leftmost farthest child(leftFarChild), its rightmost nearest sibling (rightNearSib),etc. Following Zhao et al. (2009), we include thenotion of verb and noun supports and sections ofthe dependency path. Also following Zhao et al.(2009), properties from a set of positions can beput together in three possible orders: as the givensequence, as a sorted list of unique strings, and re-moving all duplicated neighbored strings. We con-sider both template unigrams and bigrams, com-bining two templates in sequence.

Additional templates we include are the relativeposition (Bjorkelund et al., 2009), geneological re-lationship, distance (Zhao et al., 2009), and binneddistance (Koo et al., 2008) between two words inthe path. From Lluıs et al. (2013), we use 1, 2, 3-gram path features of words/POS tags (path-grams),and the number of non-consecutive token pairs ina predicate-argument path (continuity).

3.4 Feature SelectionConstructing all feature template unigrams and bi-grams would yield an unwieldy number of fea-tures. We therefore determine the top N templatebigrams for a dataset and factor a according to aninformation gain measure (Martins et al., 2011):

IG

a,m

=

X

f2Tm

X

xa

p(f, x

a

) log2p(f, x

a

)

p(f)p(x

a

)

where T

m

is the mth feature template, f is a par-ticular instantiation of that template, and x

a

is anassignment to the variables in factor a. The proba-bilities are empirical estimates computed from thetraining data. This is simply the mutual informa-tion of the feature template instantiation with thevariable assignment.

This filtering approach was treated as a sim-ple baseline in Martins et al. (2011) to contrastwith increasingly popular gradient based regular-ization approaches. Unlike the gradient based ap-

Page 27: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

Experiments  

Datasets:  –  Semantic  Roles:  

•  CoNLL-­‐2009  Shared  Task  •  Languages:  Catalan,  Czech,  German,  English,  Spanish,  Chinese  

– Grammar  Induction:  •  Additional  experiments  on  WSJ  portion  of  Penn  Treebank  for  comparability  

– Brown  Clusters:  •  Wikipedia  

CoNLL-­‐2009  Supervised  Data  

Semantic  roles  

Dependency  parses  

Part-­‐of-­‐speech  tags  

Morphological  features  

Lemmas  

Page 28: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

Experiments  

•  Abbreviations  for  Latent  Syntax:  1.  Fully  Unsupervised    2.  Distantly  Supervised    3.  Jointly  Learned  with  SRL  

•  Abbreviations  for  Tag  Types:  1.  Part-­‐of-­‐speech  Tags  2.  Brown  Clusters  

(DMV)  (DMV+C)  (Marginalized)  

(pos)  (bc)  

Page 29: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

Grammar  Induction  Analysis  Does  the  latent  syntax  look  any  good?  

 Dependency  Parser  

 Avg  UAS  

Supervised   87.1   89.4   85.3   89.6   88.4   89.2   80.7  

Page 30: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

Grammar  Induction  Analysis  Does  the  latent  syntax  look  any  good?  

 Dependency  Parser  

 Avg  UAS  

Supervised   87.1   89.4   85.3   89.6   88.4   89.2   80.7  Marginalized,  IGB   50.2   52.4   43.4   41.3   52.6   55.2   56.2  

Marginalized,  IGc   43.8   50.3   45.8   27.2   44.2   46.3   48.5  

DMV+C  (bc)   40.2   46.3   37.5   28.7   40.6   50.4   37.5  

DMV+C  (pos)   37.5   50.2   34.9   21.5   36.9   49.8   32  

DMV  (pos)   30.2   45.3   22.7   20.9   32.9   41.9   17.2  

DMV  (bc)   22.1   18.8   32.8   19.6   22.4   20.5   18.6  

Gap  between  supervised  and  not-­‐so-­‐supervised  parser  is  very  large  

Page 31: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

Subtractive  Experiments  Effectiveness  of  our  joint  models  as  the  available  supervision  is  decreased  

0"10"20"30"40"50"60"70"80"

"+""""(127+32)"

Dep"(40+32)"

Mor"(30+32)"

POS"(23+32)"

Lem"(21+32)"

SRL$F1$

Supervision$Removed$(#$Feat$Templates)$

Catalan"

Spanish"

German"0"10"20"30"40"50"60"70"80"

"+""""(127+32)"

Dep"(40+32)"

Mor"(30+32)"

POS"(23+32)"

Lem"(21+32)"

SRL$F1$

Supervision$Removed$(#$Feat$Templates)$

Catalan"

Spanish"

German"

CoNLL-­‐2009  Supervised  Data  

Semantic  roles  

Dependency  parses  

Morphology  

Part-­‐of-­‐speech  tags  

Lemmas  

Page 32: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

CoNLL-­‐2009  Supervised  Data  

Semantic  roles  

Dependency  parses  

Morphology  

Part-­‐of-­‐speech  tags  

Lemmas  

Subtractive  Experiments  Effectiveness  of  our  joint  models  as  the  available  supervision  is  decreased  

0"10"20"30"40"50"60"70"80"

"+""""(127+32)"

Dep"(40+32)"

Mor"(30+32)"

POS"(23+32)"

Lem"(21+32)"

SRL$F1$

Supervision$Removed$(#$Feat$Templates)$

Catalan"

Spanish"

German"0"10"20"30"40"50"60"70"80"

"+""""(127+32)"

Dep"(40+32)"

Mor"(30+32)"

POS"(23+32)"

Lem"(21+32)"

SRL$F1$

Supervision$Removed$(#$Feat$Templates)$

Catalan"

Spanish"

German"

Page 33: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

CoNLL-­‐2009  Supervised  Data  

Semantic  roles  

Dependency  parses  

Morphology  

Part-­‐of-­‐speech  tags  

Lemmas  

Subtractive  Experiments  Effectiveness  of  our  joint  models  as  the  available  supervision  is  decreased  

0"10"20"30"40"50"60"70"80"

"+""""(127+32)"

Dep"(40+32)"

Mor"(30+32)"

POS"(23+32)"

Lem"(21+32)"

SRL$F1$

Supervision$Removed$(#$Feat$Templates)$

Catalan"

Spanish"

German"0"10"20"30"40"50"60"70"80"

"+""""(127+32)"

Dep"(40+32)"

Mor"(30+32)"

POS"(23+32)"

Lem"(21+32)"

SRL$F1$

Supervision$Removed$(#$Feat$Templates)$

Catalan"

Spanish"

German"

Page 34: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

CoNLL-­‐2009  Supervised  Data  

Semantic  roles  

Dependency  parses  

Morphology  

Part-­‐of-­‐speech  tags  

Lemmas  

Subtractive  Experiments  Effectiveness  of  our  joint  models  as  the  available  supervision  is  decreased  

0"10"20"30"40"50"60"70"80"

"+""""(127+32)"

Dep"(40+32)"

Mor"(30+32)"

POS"(23+32)"

Lem"(21+32)"

SRL$F1$

Supervision$Removed$(#$Feat$Templates)$

Catalan"

Spanish"

German"0"10"20"30"40"50"60"70"80"

"+""""(127+32)"

Dep"(40+32)"

Mor"(30+32)"

POS"(23+32)"

Lem"(21+32)"

SRL$F1$

Supervision$Removed$(#$Feat$Templates)$

Catalan"

Spanish"

German"

Page 35: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

CoNLL-­‐2009  Supervised  Data  

Semantic  roles  

Dependency  parses  

Morphology  

Part-­‐of-­‐speech  tags  

Lemmas  

Subtractive  Experiments  Effectiveness  of  our  joint  models  as  the  available  supervision  is  decreased  

0"10"20"30"40"50"60"70"80"

"+""""(127+32)"

Dep"(40+32)"

Mor"(30+32)"

POS"(23+32)"

Lem"(21+32)"

SRL$F1$

Supervision$Removed$(#$Feat$Templates)$

Catalan"

Spanish"

German"

Page 36: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

Learning  Curves  Lowest  resource  setting:  joint  training  yields  higher  SRL  F1  than  distant-­‐supervision.  

 

heads, which runs counter to our head-based syn-tactic representation. This creates a mismatchedtrain/test scenario: we must train our model to pre-dict argument heads, but then test on our modelsability to predict argument spans.8 We thereforetrain our models on the CoNLL-2008 argumentheads,9 and post-process and convert from headsto spans using the conversion algorithm availablefrom Johansson and Nugues (2008).10 The headsare either from an MBR tree or an oracle tree. Thisgives Boxwell et al. (2011) the advantage, sinceour syntactic dependency parses are optimized topick out semantic argument heads, not spans.

Table 5 presents our results. Boxwell et al.(2011) (B’11) uses additional supervision in theform of a CCG tag dictionary derived from su-pervised data with (tdc) and without (tc) a cut-off. Our model does very poorly on the ’05 span-based evaluation because the constituent bracket-ing of the marginalized trees are inaccurate. Thisis elucidated by instead evaluating on the ora-cle spans, where our F1 scores are higher thanBoxwell et al. (2011). We also contrast with rela-vant high-resource methods with span/head con-versions from Johansson and Nugues (2008): Pun-yakanok et al. (2008) (PRY’08) and Johansson andNugues (2008) (JN’08).

Subtractive Study In our subsequent experi-ments, we study the effectiveness of our modelsas the available supervision is decreased. We in-crementally remove dependency syntax, morpho-logical features, POS tags, then lemmas. For theseexperiments, we utilize the coarse-grained featureset (IG

C

), which includes Brown clusters.Across languages, we find the largest drop in

F1 when we remove POS tags; and we find again in F1 when we remove lemmas. This indi-cates that lemmas, which are a high-resource an-notation, may not provide a significant benefit forthis task. The effect of removing morphologicalfeatures is different across languages, with littlechange in performance for Catalan and Spanish,

8We were unable to obtain the system output of Boxwellet al. (2011) in order to convert their spans to dependenciesand evaluate the other mismatched train/test setting.

9CoNLL-2005, -2008, and -2009 were derived from Prop-Bank and share the same source text; -2008 and -2009 useargument heads.

10Specifically, we use their Algorithm 2, which producesthe span dominated by each argument, with special handlingof the case when the argument head dominates that of thepredicate. Also following Johansson and Nugues (2008), werecover the ’05 sentences missing from the ’08 evaluation set.

Rem #FT ca de es

– 127+32 74.46 72.62 74.23Dep 40+32 67.43 64.24 67.18Mor 30+32 67.84 59.78 66.94POS 23+32 64.40 54.68 62.71Lem 21+32 64.85 54.89 63.80

Table 6: Subtractive experiments. Each row con-tains the F1 for SRL only (without sense disam-biguation) where the supervision type of that rowand all above it have been removed. Removed su-pervision types (Rem) are: syntactic dependencies(Dep), morphology (Mor), POS tags (POS), andlemmas (Lem). #FT indicates the number of fea-ture templates used (unigrams+bigrams).

20

30

40

50

60

70

0 20000 40000 60000Number of Training Sentences

Labe

led

F1

Language / Dependency ParserCatalan / MarginalizedCatalan / DMV+CGerman / MarginalizedGerman / DMV+C

Figure 3: Learning curve for semantic dependencysupervision in Catalan and German. F1 of SRLonly (without sense disambiguation) shown as thenumber of training sentences is increased.

but a drop in performance for German. This mayreflect a difference between the languages, or mayreflect the difference between the annotation of thelanguages: both the Catalan and Spanish data orig-inated from the Ancora project,11 while the Ger-man data came from another source.

Figure 3 contains the learning curve for SRL su-pervision in our lowest resource setting for twoexample languages, Catalan and German. Thisshows how F1 of SRL changes as we adjustthe number of training examples. We find thatthe joint training approach to grammar inductionyields consistently higher SRL performance thanits distantly supervised counterpart.

4.5 Analysis of Grammar InductionTable 7 shows grammar induction accuracy inlow-resource settings. We find that the gap be-tween the supervised parser and the unsupervisedmethods is quite large, despite the reasonable ac-curacy both methods achieve for the SRL end task.

11http://clic.ub.edu/corpus/ancora

SRL

F1

Page 37: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

SRL  Main  Results  Parse   SRL  Approach   Feat  

Avg  SRL  F1  

Gold  

Pipeline   IGc   84.98   84.97   87.65   79.14   86.54   84.22   87.35  

Pipeline   IGB   84.74   85.15   86.64   79.50   85.77   84.40   86.95  

Naradowsky  et  al.  (2012)   72.73   69.59   74.84   66.49   78.55   68.93   77.97  

Super-­‐vised  

Björkelund  et  al.  (2009)   81.55   80.01   85.41   79.71   85.63   79.91   78.60    

Zhao  et  al.  (2009)   80.85   80.32   85.19   75.99   85.44   80.46   77.72    

Pipeline   IGc   78.03   76.24   83.34   74.19   81.96   76.12   76.35    

Pipeline   IGB   75.68   74.59   81.61   69.08   78.86   74.51   75.44  

Joint  (Margin-­‐alized)  

Joint   IGc   72.48   71.35   81.03   65.15   76.16   71.03   70.14    

Joint   IGB   72.40   71.55   80.04   64.8   75.57   71.21    71.21  

Naradowsky  et  al.  (2012)   71.27   67.99   73.16   67.26   76.12   66.74   76.32    

Distant  (DMV+C)  

Pipeline   IGc   70.08   68.21   79.63   62.25   73.81   68.73   67.86    

Pipeline   IGB   65.61   61.89   77.48   58.97   69.11   63.31   62.92    

Unsup.  (DMV)  

Pipeline   IGc   69.26   68.04   79.58   58.47   74.78   68.36   66.35  Pipeline   IGB   66.81   63.31   77.38   59.91   72.02   65.96   62.28  

0   20   40   60   80   100  

IGc        Features  from  Information  Gain  template  selection,  Coarse-­‐Grained  properties  IGB      Features  from  Information  Gain  template  selection,  Björkelund  et  al.  (2009)  properties    

Page 38: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

SRL  Main  Results  Parse   SRL  Approach   Feat  

Avg  SRL  F1  

Gold  

Pipeline   IGc   84.98   84.97   87.65   79.14   86.54   84.22   87.35  

Pipeline   IGB   84.74   85.15   86.64   79.50   85.77   84.40   86.95  

Naradowsky  et  al.  (2012)   72.73   69.59   74.84   66.49   78.55   68.93   77.97  

Super-­‐vised  

Björkelund  et  al.  (2009)   81.55   80.01   85.41   79.71   85.63   79.91   78.60    

Zhao  et  al.  (2009)   80.85   80.32   85.19   75.99   85.44   80.46   77.72    

Pipeline   IGc   78.03   76.24   83.34   74.19   81.96   76.12   76.35    

Pipeline   IGB   75.68   74.59   81.61   69.08   78.86   74.51   75.44  

Joint  (Margin-­‐alized)  

Joint   IGc   72.48   71.35   81.03   65.15   76.16   71.03   70.14    

Joint   IGB   72.40   71.55   80.04   64.8   75.57   71.21    71.21  

Naradowsky  et  al.  (2012)   71.27   67.99   73.16   67.26   76.12   66.74   76.32    

Distant  (DMV+C)  

Pipeline   IGc   70.08   68.21   79.63   62.25   73.81   68.73   67.86    

Pipeline   IGB   65.61   61.89   77.48   58.97   69.11   63.31   62.92    

Unsup.  (DMV)  

Pipeline   IGc   69.26   68.04   79.58   58.47   74.78   68.36   66.35  Pipeline   IGB   66.81   63.31   77.38   59.91   72.02   65.96   62.28  

0   20   40   60   80   100  

IGc        Features  from  Information  Gain  template  selection,  Coarse-­‐Grained  properties  IGB      Features  from  Information  Gain  template  selection,  Björkelund  et  al.  (2009)  properties    

High-­‐Re

source

 

Page 39: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

SRL  Main  Results  Parse   SRL  Approach   Feat  

Avg  SRL  F1  

Gold  

Pipeline   IGc   84.98   84.97   87.65   79.14   86.54   84.22   87.35  

Pipeline   IGB   84.74   85.15   86.64   79.50   85.77   84.40   86.95  

Naradowsky  et  al.  (2012)   72.73   69.59   74.84   66.49   78.55   68.93   77.97  

Super-­‐vised  

Björkelund  et  al.  (2009)   81.55   80.01   85.41   79.71   85.63   79.91   78.60    

Zhao  et  al.  (2009)   80.85   80.32   85.19   75.99   85.44   80.46   77.72    

Pipeline   IGc   78.03   76.24   83.34   74.19   81.96   76.12   76.35    

Pipeline   IGB   75.68   74.59   81.61   69.08   78.86   74.51   75.44  

Joint  (Margin-­‐alized)  

Joint   IGc   72.48   71.35   81.03   65.15   76.16   71.03   70.14    

Joint   IGB   72.40   71.55   80.04   64.8   75.57   71.21    71.21  

Naradowsky  et  al.  (2012)   71.27   67.99   73.16   67.26   76.12   66.74   76.32    

Distant  (DMV+C)  

Pipeline   IGc   70.08   68.21   79.63   62.25   73.81   68.73   67.86    

Pipeline   IGB   65.61   61.89   77.48   58.97   69.11   63.31   62.92    

Unsup.  (DMV)  

Pipeline   IGc   69.26   68.04   79.58   58.47   74.78   68.36   66.35  Pipeline   IGB   66.81   63.31   77.38   59.91   72.02   65.96   62.28  

0   20   40   60   80   100  

IGc        Features  from  Information  Gain  template  selection,  Coarse-­‐Grained  properties  IGB      Features  from  Information  Gain  template  selection,  Björkelund  et  al.  (2009)  properties    

Low-­‐Res

ource  

Page 40: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

SRL  Main  Results  Parse   SRL  Approach   Feat  

Avg  SRL  F1  

Gold  

Pipeline   IGc   84.98   84.97   87.65   79.14   86.54   84.22   87.35  

Pipeline   IGB   84.74   85.15   86.64   79.50   85.77   84.40   86.95  

Naradowsky  et  al.  (2012)   72.73   69.59   74.84   66.49   78.55   68.93   77.97  

Super-­‐vised  

Björkelund  et  al.  (2009)   81.55   80.01   85.41   79.71   85.63   79.91   78.60    

Zhao  et  al.  (2009)   80.85   80.32   85.19   75.99   85.44   80.46   77.72    

Pipeline   IGc   78.03   76.24   83.34   74.19   81.96   76.12   76.35    

Pipeline   IGB   75.68   74.59   81.61   69.08   78.86   74.51   75.44  

Joint  (Margin-­‐alized)  

Joint   IGc   72.48   71.35   81.03   65.15   76.16   71.03   70.14    

Joint   IGB   72.40   71.55   80.04   64.8   75.57   71.21    71.21  

Naradowsky  et  al.  (2012)   71.27   67.99   73.16   67.26   76.12   66.74   76.32    

Distant  (DMV+C)  

Pipeline   IGc   70.08   68.21   79.63   62.25   73.81   68.73   67.86    

Pipeline   IGB   65.61   61.89   77.48   58.97   69.11   63.31   62.92    

Unsup.  (DMV)  

Pipeline   IGc   69.26   68.04   79.58   58.47   74.78   68.36   66.35  Pipeline   IGB   66.81   63.31   77.38   59.91   72.02   65.96   62.28  

0   20   40   60   80   100  

IGc        Features  from  Information  Gain  template  selection,  Coarse-­‐Grained  properties  IGB      Features  from  Information  Gain  template  selection,  Björkelund  et  al.  (2009)  properties    

Low-­‐Res

ource  

High-­‐Re

source

 

Page 41: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

Comparison  with  Work  in  Grammar  Induction  in  Low-­‐Resource  Setting  

WSJ  portion  of  Penn  Treebank:  Approach   Distant    

Supervision  Unlabeled  Syntactic  Dependency  Accuracy  

Spitkovsky  et  al  (2010)   None  

Spitkovsky  et  al  (2013)   None  

Spitkovsky  et  al  (2010)   HTML  

Naseem  and  Barzilay  (2011)   ACE05  

DMV     None  

DMV+C   SRL  

Marginalized,  IGc   SRL  

Marginalized,  IGB   SRL   58.9  48.8  

44.8  24.8  

59.4  50.4  

64.4  44.8  

•  MBR  decoding  of  marginalized  grammars  best  DMV  method  •  May  get  gains  with  better  search  to  break  out  of  local  optima  

Page 42: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

Is  dependency  accuracy    the  right  evaluation  metric?  

In  the  joint  model,  higher  dependency  accuracy  (UAS)  does  not  always  correlate  with  higher  Labeled  F1  on  SRL.  

Parse   SRL  Approach   Feat  English  UAS    

English  SRL  F1  

Joint  (Margin-­‐alized)  

Joint   IGc   44.2   76.16  

Joint   IGB   52.6   75.57  

Page 43: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

Conclusions  

•  Semantic  role  labeling  doesn’t  necessarily  require  a  long  costly  pipeline  of  NLP  tools  (cf.  Boxwell  et  al.  (2011);  Naradowsky  et  al.  (2012))  

•  “Quality”  of  the  latent  syntax  has  a  big  effect    (especially  with  limited  end-­‐task  training  data)  

•  Joint  models  seem  to  outperform  the  pipeline  models  in  the  low-­‐resource  se4ng  

Page 44: LowResource++ Semantic+Role+Labeling+mgormley/papers/gormley+al.acl.2014.slides-clsp… · LowResource++ Semantic+Role+Labeling+ Matthew’R.Gormley’ Margaret’Mitchell’ BenjaminVan

Questions?