a metric-based framework for automatic taxonomy induction

17
6/27/13 1 A Metric-based Framework for Automatic Taxonomy Induction Hui Yang and Jamie Callan Language Technologies Institute Carnegie Mellon University ACL2009, Singapore R OADMAP Introduc)on Related Work MetricBased Taxonomy Induc)on Framework The Features Experimental Results Conclusions

Upload: others

Post on 27-Apr-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Metric-based Framework for Automatic Taxonomy Induction

6/27/13

1

A Metric-based Framework for Automatic Taxonomy Induction

Hui Yang and Jamie Callan Language Technologies Institute Carnegie Mellon University ACL2009, Singapore

ROADMAP

¥  Introduc)on    

¥  Related  Work  

¥  Metric-­‐Based  Taxonomy  Induc)on  Framework  

¥  The  Features  

¥  Experimental  Results  

¥  Conclusions  

Page 2: A Metric-based Framework for Automatic Taxonomy Induction

6/27/13

2

INTRODUCTION

¥  Seman)c  taxonomies,  such  as  WordNet,    play  an  important  role  in  solving  knowledge-­‐rich  problems  

¥  Limita)ons  of  Manually-­‐created  Taxonomies  ¤  Rarely  complete  

¤  Difficult  to  include  new  terms  from  emerging/changing  domains  

¤  Time-­‐consuming  to  create;  May  make  it  unfeasible  for  specialized  domains  and  personalized  tasks  

INTRODUCTION

¥  Automa)c  Taxonomy  Induc)on  is  a  solu)on  to  ¤  Augment  exis)ng  resources  

¤  Quickly  produce  new  taxonomies  for  specialized  domains  and  personalized  tasks  

¥  Subtasks  in  Automa)c  Taxonomy  Induc)on  ¤  Term  extrac)on  

¤  Rela)on  forma)on  

¥  This  paper  focuses  on  Rela)on  Forma)on  

Page 3: A Metric-based Framework for Automatic Taxonomy Induction

6/27/13

3

Related Work ¥ Pa#ern-­‐based  Approaches  ¥  Define  lexical-­‐syntac)c  paPerns  for  rela)ons,  and  use  these  paPerns  to  discover  instances  

¥  Have  been  applied  to  extract  Is-­‐a,  part-­‐of,  sibling,  synonym,  causal,  etc,  rela)ons  

¥  Strength:  Highly  accurate  

¥ Weakness:  Sparse  coverage  of  paPerns  

¥  Clustering-­‐based  Approaches  ¥  Hierarchically  cluster  terms  based  

on  similari)es  of  their  meanings  usually  represented  by  a  feature  vector    

¥  Have  only  been  applied  to  extract  is-­‐a  and  sibling  rela)ons  

¥  Strength:  Allowing  discovery  of  rela)ons  which  do  not  explicitly  appear  in  text;  higher  recall  

¥  Weaknesses:  Generally  fail  to  produce  coherent  cluster  for  small  corpora  [Pantel  and  PennacchioV  2006];  Hard  to  label  non-­‐leaf  nodes  

A UNIFIED SOLUTION

¥  Combine  strengths  of  both  approaches  in  a  unified  framework    ¤  Flexibly  incorporate  heterogeneous  features  ¤  Use  lexical-­‐syntac)c  paPerns  as  one  types  of  features  in  a  

clustering  framework    

Metric-­‐based  Taxonomy  Induc)on  

Page 4: A Metric-based Framework for Automatic Taxonomy Induction

6/27/13

4

THE FRAMEWORK

¥  A  novel  framework,  which    ¤  Incrementally  clusters  terms    ¤  Transforms  taxonomy  induc)on  into  a  mul)-­‐criteria  op)miza)on  ¤  Using  heterogeneous  features  

¥  Op)miza)on  based  on  two  criteria  ¤  Minimiza)on  of  taxonomy  structures  ó    

 Minimum  Evolu)on  Assump)on  ¤  Modeling  of  term  abstractness  ó    

 Abstractness  Assump)on  

LET’S BEGIN WITH SOME IMPORTANT DEFINITIONS

¤  A  Taxonomy  is  a  data  model    

Concept  Set   Rela)onship  Set   Domain  

Page 5: A Metric-based Framework for Automatic Taxonomy Induction

6/27/13

5

MORE DEFINITIONS

ball   table  

Game Equipment

A  Full  Taxonomy:  

AssignedTermSet={game  equipment,  ball,  table,  basketball,  volleyball,  soccer,  table-­‐tennis  table,  snooker  table}  UnassignedTermSet={}

MORE DEFINITIONS

ball  

Game Equipment

A  Par)al  Taxonomy  

table  

AssignedTermSet={game  equipment,  ball,  table,  basketball,  volleyball}  UnassignedTermSet={soccer,  table-­‐tennis  table,  snooker  table}

Page 6: A Metric-based Framework for Automatic Taxonomy Induction

6/27/13

6

MORE DEFINITIONS Ontology  Metric  

distance = 1.5 distance = 2

distance =1

distance =1

d( , ) = 2

d( , ) = 1 ball  

d( , ) = 4.5 table  

ASSUMPTIONS Minimum  Evolu)on  Assump)on:  The  

Op)mal  Ontology  is  One  that  Introduces  Least  Informa)on  

Changes!    

Page 7: A Metric-based Framework for Automatic Taxonomy Induction

6/27/13

7

ILLUSTRATION Minimum  Evolu)on  Assump)on  

ILLUSTRATION Minimum  Evolu)on  Assump)on  

Page 8: A Metric-based Framework for Automatic Taxonomy Induction

6/27/13

8

ILLUSTRATION Minimum  Evolu)on  Assump)on  

ball  

ILLUSTRATION Minimum  Evolu)on  Assump)on   ball  

table  

Page 9: A Metric-based Framework for Automatic Taxonomy Induction

6/27/13

9

ILLUSTRATION Minimum  Evolu)on  Assump)on  

ball   table  

Game Equipment

ILLUSTRATION Minimum  Evolu)on  Assump)on  

ball   table  

Game Equipment

Page 10: A Metric-based Framework for Automatic Taxonomy Induction

6/27/13

10

ILLUSTRATION Minimum  Evolu)on  Assump)on  

ball   table  

Game Equipment

ASSUMPTIONS Abstractness  

Assump)on:  Each  abstrac)on  level  

has  its  own  Informa)on  func)on  

Page 11: A Metric-based Framework for Automatic Taxonomy Induction

6/27/13

11

ASSUMPTIONS Abstractness  Assump)on  

ball   table  

Game Equipment

MULTIPLE CRITERION OPTIMIZATION

Minimum  Evolu)on    

objec)ve  func)on  

Abstractness  objec)ve  func)on  

Scalariza)on  variable  

Page 12: A Metric-based Framework for Automatic Taxonomy Induction

6/27/13

12

ESTIMATING ONTOLOGY METRIC

¥  Assume  ontology  metric  is  a  linear  interpola)on  of  some  underlying  feature  func)ons  

¥  Ridge  Regression  to  es)mate  and  predict  the  ontology  metric  

THE FEATURES

¥  Our  framework  allows  a  wide  range  of  features  to  be  used  

¥  Input  for  the  Feature  Func)ons:  Two  terms    

¥  Output:  A  numeric  score                                                      to  measure  seman)c  distance  between  these  two  terms  

¥  We  can  use  the  following  types  of  feature  func)ons,  but  not  restricted  to  only  these:  ¤  Contextual  Features  ¤  Term  Co-­‐occurrence  ¤  Lexical-­‐Syntac)c  PaPerns  ¤  Syntac)c  Dependency  Features  ¤  Word  Length  Difference  ¤  Defini)on  Overlap,  etc  

Page 13: A Metric-based Framework for Automatic Taxonomy Induction

6/27/13

13

EXPERIMENTAL RESULTS

¥  Task:  Reconstruct  taxonomies  from  WordNet  and  ODP  ¤  Not  the  en)re  WordNet  or  ODP,  but  fragments  of  WordNet  or  

ODP  

¥  Ground  Truth:  50  hypernym  taxonomies  from  WordNet;  50  hypernym  taxonomies  from  ODP;  50  meronym  taxonomies  from  WordNet.  

¥  Auxiliary  Datasets:  1000  Google  documents  per  term  or  per  term  pair;  100  Wikipedia  documents  per  term.  

¥  Evalua)on  Metrics:  F1-­‐measure  (averaged  by  Leave-­‐One-­‐Out  Cross  Valida)on).  

DATASETS

Page 14: A Metric-based Framework for Automatic Taxonomy Induction

6/27/13

14

PERFORMANCE OF TAXONOMY INDUCTION

¥  Compare  our  system  (ME)  with  other  state-­‐of-­‐the-­‐art  systems  ¤  HE:  6  is-­‐a  paPerns  [Hearst  1992]  

¤  GI:  3  part-­‐of  paPerns  [Girju  et  al.  2003]  

¤  PR:  a  probabilis)c  framework  [Snow  et  al.  2006]  

¤  ME:  our  metric-­‐based  framework  

PERFORMANCE OF TAXONOMY INDUCTION

¥  Our  system  (ME)  consistently  gives  the  best  F1  for  all  three  tasks.  

¥  Systems  using  heterogeneous  features  (ME  and  PR)  achieve  a  significant  absolute  F1  gain  (>30%)  

Page 15: A Metric-based Framework for Automatic Taxonomy Induction

6/27/13

15

FEATURES VS. RELATIONS

¥  This  is  the  first  study  of  the  impact  of  using  different  features  on  taxonomy  induc)on  for  different  rela)ons  

¥  Co-­‐occurrence  and  lexico-­‐syntac0c  pa3erns  are  good  for  is-­‐a,  part-­‐of,  and  sibling  rela)ons  

¥  Contextual  and  syntac0c  dependency  features  are  only  good  for  sibling  rela)on  

FEATURES VS. ABSTRACTNESS

¥  This  is  the  first  study  of  the  impact  of  using  different  features  on  taxonomy  induc)on  for  terms  at  different  abstrac)on  levels  

¥  Contextual,  co-­‐occurrence,  lexical-­‐syntac0c  pa3erns,  and  syntac0c  dependency  features  work  well  for  concrete  terms;  

¥  Only  co-­‐occurrence  works  well  for  abstract  terms  

Page 16: A Metric-based Framework for Automatic Taxonomy Induction

6/27/13

16

CONCLUSIONS

¥  This  paper  presents  a  novel  metric-­‐based  taxonomy  induc)on  framework,  which  ¤  Combines  strengths  of  paPern-­‐based  and  clustering-­‐based  

approaches  

¤  Achieves  bePer  F1  than  3  state-­‐of-­‐the-­‐art  systems  

¥  The  first  study  on  the  impact  of  using  different  features  on  taxonomy  induc)on  for  different  types  of  rela)ons  and  for  terms  at  different  abstrac)on  levels  

CONCLUSIONS

¥  This  work  is  a  general  framework,  which  

¤  Allows  a  wider  range  of  features    

¤  Allows  different  metric  func)ons  at  different  abstrac)on  levels  

¥  This  work  has  a  poten)al  to  learn  more  complex  taxonomies  than  previous  approaches  

Page 17: A Metric-based Framework for Automatic Taxonomy Induction

6/27/13

17

THANK YOU AND QUESTIONS [email protected] [email protected]