crowdsourcing inference-rule evaluation naomi zeichner, jonathan berant, ido dagan
DESCRIPTION
Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan. Outline. Allowing us to. Empirically Compare Different Resources. 1. We address. Inference-Rule Evaluation. 2. By. Crowdsourcing Rule Applications Annotation. 3. Bar Ilan University @ ACL 2012. 2. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/1.jpg)
Crowdsourcing Inference-Rule Evaluation
Naomi Zeichner, Jonathan Berant, Ido Dagan
Crowdsourcing Inference-Rule Evaluation
Naomi Zeichner, Jonathan Berant, Ido Dagan
![Page 2: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/2.jpg)
Outline
Bar Ilan University @ ACL 2012 2
Inference-Rule EvaluationWe addressWe address
Crowdsourcing Rule Applications Annotation
Empirically Compare Different Resources
Allowing us toAllowing us to
1
2
3
ByBy
![Page 3: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/3.jpg)
Bar Ilan University @ ACL 2012 2
Inference-Rule EvaluationWe addressWe address
Crowdsourcing Rule Applications Annotation
By
1
2
Empirically Compare Different Resources
Allowing us to3
![Page 4: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/4.jpg)
Inference Rules – important component in semantic applications
Bar Ilan University @ ACL 2012 3
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations Empirically Compare Different Resources
X brought up in Y X raised in Y
Q Where was Reagan raised?
A Reagan was brought up in Dixon.
![Page 5: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/5.jpg)
Inference Rules – important component in semantic applications
Bar Ilan University @ ACL 2012 3
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations Empirically Compare Different Resources
X brought up in Y X raised in Y
Q Where was Reagan raised?
A Reagan was brought up in Dixon.
Hiring Event
PERSON ROLE
Bob worked as an analyst for Dell
![Page 6: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/6.jpg)
Inference Rules – important component in semantic applications
Bar Ilan University @ ACL 2012 3
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations Empirically Compare Different Resources
X brought up in Y X raised in Y
Q Where was Reagan raised?
A Reagan was brought up in Dixon.
Hiring Event
PERSON ROLE
Bob worked as an analyst for Dell
X work as Y X hired as Y
![Page 7: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/7.jpg)
Inference Rules – important component in semantic applications
Bar Ilan University @ ACL 2012 3
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations Empirically Compare Different Resources
X brought up in Y X raised in Y
Q Where was Reagan raised?
A Reagan was brought up in Dixon.
Hiring Event
PERSON ROLE
Bob worked as an analyst for Dell
X work as Y X hired as Y
analystBob
![Page 8: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/8.jpg)
Evaluation - What are the options?
4Bar Ilan University @ ACL 2012
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations Empirically Compare Different Resources
![Page 9: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/9.jpg)
Evaluation - What are the options?
4
Impact on end task QA, IE, RTEPro: What interests an inference system developer
Con: Many components, address multiple phenomena Hard to asses the effect of a single resource.
1
Bar Ilan University @ ACL 2012
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations Empirically Compare Different Resources
![Page 10: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/10.jpg)
Evaluation - What are the options?
4
Impact on end task QA, IE, RTEPro: What interests an inference system developer
Con: Many components, address multiple phenomena Hard to asses the effect of a single resource.
1
Bar Ilan University @ ACL 2012
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations
Judge rule correctness directlyPro: Theoretically most intuitive
Con: In fact hard to do Often results in low inter-annotator agreement.
2
Empirically Compare Different Resources
![Page 11: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/11.jpg)
Evaluation - What are the options?
4
Impact on end task QA, IE, RTEPro: What interests an inference system developer
Con: Many components, address multiple phenomena Hard to asses the effect of a single resource.
1
Bar Ilan University @ ACL 2012
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations
Judge rule correctness directlyPro: Theoretically most intuitive
Con: In fact hard to do Often results in low inter-annotator agreement.
2
Empirically Compare Different Resources
X reside in Y X live in Y
X reside in Y X born in Y
![Page 12: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/12.jpg)
Evaluation - What are the options?
4
Impact on end task QA, IE, RTEPro: What interests an inference system developer
Con: Many components, address multiple phenomena Hard to asses the effect of a single resource.
1
Bar Ilan University @ ACL 2012
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations
Judge rule correctness directlyPro: Theoretically most intuitive
Con: In fact hard to do Often results in low inter-annotator agreement.
2
Empirically Compare Different Resources
X reside in Y X live in Y
X reside in Y X born in Y
X criticize Y X attack Y
![Page 13: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/13.jpg)
Evaluation - What are the options?
4
Impact on end task QA, IE, RTEPro: What interests an inference system developer
Con: Many components, address multiple phenomena Hard to asses the effect of a single resource.
1
Instance-based evaluation(Szpektor et al 2007., Bhagat et al. 2007)
Pro: Simulates utility of rules in an application
Yields high inter-annotator agreement.
3
Bar Ilan University @ ACL 2012
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations
Judge rule correctness directlyPro: Theoretically most intuitive
Con: In fact hard to do Often results in low inter-annotator agreement.
2
Empirically Compare Different Resources
X reside in Y X live in Y
X reside in Y X born in Y
X criticize Y X attack Y
![Page 14: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/14.jpg)
Evaluation - What are the options?
4
Impact on end task QA, IE, RTEPro: What interests an inference system developer
Con: Many components, address multiple phenomena Hard to asses the effect of a single resource.
1
Instance-based evaluation(Szpektor et al 2007., Bhagat et al. 2007)
Pro: Simulates utility of rules in an application
Yields high inter-annotator agreement.
3
Bar Ilan University @ ACL 2012
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations
Judge rule correctness directlyPro: Theoretically most intuitive
Con: In fact hard to do Often results in low inter-annotator agreement.
2
Empirically Compare Different Resources
X reside in Y X live in Y
X reside in Y X born in Y
X criticize Y X attack Y
![Page 15: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/15.jpg)
5Bar Ilan University @ ACL 2012
Target: Judge if a rule application is valid or not
Empirically Compare Different Resources
Instance Based Evaluation – Decisions
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations
![Page 16: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/16.jpg)
5Bar Ilan University @ ACL 2012
Target: Judge if a rule application is valid or not
Empirically Compare Different Resources
Instance Based Evaluation – Decisions
Rule: X teach Y X explain to Y
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations
![Page 17: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/17.jpg)
5Bar Ilan University @ ACL 2012
Target: Judge if a rule application is valid or not
Empirically Compare Different Resources
Instance Based Evaluation – Decisions
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations
Rule: X teach Y X explain to YLHS: Steve teaches kids
![Page 18: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/18.jpg)
5Bar Ilan University @ ACL 2012
Target: Judge if a rule application is valid or not
Empirically Compare Different Resources
Instance Based Evaluation – Decisions
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations
Rule: X teach Y X explain to YLHS: Steve teaches kidsRHS: Steve explains to kids
![Page 19: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/19.jpg)
5Bar Ilan University @ ACL 2012
Target: Judge if a rule application is valid or not
Empirically Compare Different Resources
Instance Based Evaluation – Decisions
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations
Rule: X teach Y X explain to YLHS: Steve teaches kidsRHS: Steve explains to kids
![Page 20: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/20.jpg)
5Bar Ilan University @ ACL 2012
Target: Judge if a rule application is valid or not
Empirically Compare Different Resources
Instance Based Evaluation – Decisions
Rule: X resides in Y X born in YLHS: He resides in ParisRHS: He born in Paris
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations
Rule: X teach Y X explain to YLHS: Steve teaches kidsRHS: Steve explains to kids
![Page 21: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/21.jpg)
5Bar Ilan University @ ACL 2012
Target: Judge if a rule application is valid or not
Empirically Compare Different Resources
Instance Based Evaluation – Decisions
Rule: X resides in Y X born in YLHS: He resides in ParisRHS: He born in Paris
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations
Rule: X teach Y X explain to YLHS: Steve teaches kidsRHS: Steve explains to kids
![Page 22: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/22.jpg)
5Bar Ilan University @ ACL 2012
Target: Judge if a rule application is valid or not
Empirically Compare Different Resources
Instance Based Evaluation – Decisions
Rule: X turn in Y X bring in YLHS: humans turn in bedRHS: humans bring in bed
Rule: X resides in Y X born in YLHS: He resides in ParisRHS: He born in Paris
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations
Rule: X teach Y X explain to YLHS: Steve teaches kidsRHS: Steve explains to kids
![Page 23: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/23.jpg)
5Bar Ilan University @ ACL 2012
Target: Judge if a rule application is valid or not
Empirically Compare Different Resources
Instance Based Evaluation – Decisions
Rule: X turn in Y X bring in YLHS: humans turn in bedRHS: humans bring in bed
Rule: X resides in Y X born in YLHS: He resides in ParisRHS: He born in Paris
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations
Rule: X teach Y X explain to YLHS: Steve teaches kidsRHS: Steve explains to kids
![Page 24: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/24.jpg)
5Bar Ilan University @ ACL 2012
Target: Judge if a rule application is valid or not
Empirically Compare Different Resources
Instance Based Evaluation – Decisions
Rule: X turn in Y X bring in YLHS: humans turn in bedRHS: humans bring in bed
Rule: X resides in Y X born in YLHS: He resides in ParisRHS: He born in Paris
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations
Rule: X teach Y X explain to YLHS: Steve teaches kidsRHS: Steve explains to kids
Our Goal:
Robust Replicable
![Page 25: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/25.jpg)
Crowdsourcing
Bar Ilan University @ ACL 2012 6
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations Empirically Compare Different Resources
• Recent trend of using crowdsourcing for
annotation tasks
• Previous Works
(Snow et al., 2008; Wang and Callison-Burch, 2010;
Mehdad et al., 2010; Negri et al., 2011)
• Focused on
RTE text-hypothesis pairs
• Didn’t address
annotation and evaluation of rules
![Page 26: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/26.jpg)
Crowdsourcing
Bar Ilan University @ ACL 2012 6
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations Empirically Compare Different Resources
• Recent trend of using crowdsourcing for
annotation tasks
• Previous Works
(Snow et al., 2008; Wang and Callison-Burch, 2010;
Mehdad et al., 2010; Negri et al., 2011)
• Focused on
RTE text-hypothesis pairs
• Didn’t address
annotation and evaluation of rules
Challenges
![Page 27: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/27.jpg)
Crowdsourcing
Bar Ilan University @ ACL 2012 6
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations Empirically Compare Different Resources
• Recent trend of using crowdsourcing for
annotation tasks
• Previous Works
(Snow et al., 2008; Wang and Callison-Burch, 2010;
Mehdad et al., 2010; Negri et al., 2011)
• Focused on
RTE text-hypothesis pairs
• Didn’t address
annotation and evaluation of rules
Challenges
• Simplify
![Page 28: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/28.jpg)
Crowdsourcing
Bar Ilan University @ ACL 2012 6
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations Empirically Compare Different Resources
• Recent trend of using crowdsourcing for
annotation tasks
• Previous Works
(Snow et al., 2008; Wang and Callison-Burch, 2010;
Mehdad et al., 2010; Negri et al., 2011)
• Focused on
RTE text-hypothesis pairs
• Didn’t address
annotation and evaluation of rules
Challenges
• Simplify
• Communicate
![Page 29: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/29.jpg)
Bar Ilan University @ ACL 2012 7
Inference-Rule EvaluationWe address
Crowdsourcing Rule Applications AnnotationByBy
2
Empirically Compare Different Resources
Allowing us to3
1
![Page 30: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/30.jpg)
8Bar Ilan University @ ACL 2012
Empirically Compare Different Resources
Simplify Process
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations
![Page 31: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/31.jpg)
8Bar Ilan University @ ACL 2012
Empirically Compare Different Resources
Simplify Process
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations
Simple
Tasks
![Page 32: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/32.jpg)
8Bar Ilan University @ ACL 2012
Is a phrase meaningful?1
Empirically Compare Different Resources
Simplify Process
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations
Simple
Tasks
![Page 33: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/33.jpg)
8Bar Ilan University @ ACL 2012
Is a phrase meaningful?1
Empirically Compare Different Resources
Simplify Process
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations
Simple
Tasks
Rule: X resides in Y X born in YLHS: He resides in ParisRHS: He born in Paris
Rule: X turn in Y X bring in YLHS: humans turn in bedRHS: humans bring in bed
Rule: X teach Y X explain to YLHS: Steve teaches kidsRHS: Steve explains to kids
![Page 34: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/34.jpg)
8Bar Ilan University @ ACL 2012
Is a phrase meaningful?1
Empirically Compare Different Resources
Simplify Process
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations
Steve teaches kidsSteve explains to kids
He born in ParisHe resides in Paris
humans turn in bedhumans bring in bed
Simple
Tasks
Rule: X resides in Y X born in YLHS: He resides in ParisRHS: He born in Paris
Rule: X turn in Y X bring in YLHS: humans turn in bedRHS: humans bring in bed
Rule: X teach Y X explain to YLHS: Steve teaches kidsRHS: Steve explains to kids
![Page 35: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/35.jpg)
8Bar Ilan University @ ACL 2012
Is a phrase meaningful?1
Empirically Compare Different Resources
Simplify Process
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations
Steve teaches kids
Steve explains to kids
He born in Paris
He resides in Paris
humans turn in bed
humans bring in bed
Simple
Tasks
Rule: X resides in Y X born in YLHS: He resides in ParisRHS: He born in Paris
Rule: X turn in Y X bring in YLHS: humans turn in bedRHS: humans bring in bed
Rule: X teach Y X explain to YLHS: Steve teaches kidsRHS: Steve explains to kids
![Page 36: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/36.jpg)
8Bar Ilan University @ ACL 2012
Is a phrase meaningful?1
Empirically Compare Different Resources
Simplify Process
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations
He born in Paris
He resides in Paris
humans turn in bed
humans bring in bed
Simple
Tasks
2 Judge if one phrase is true given another.
Steve explains to kids
Steve teaches kids
Rule: X resides in Y X born in YLHS: He resides in ParisRHS: He born in Paris
Rule: X turn in Y X bring in YLHS: humans turn in bedRHS: humans bring in bed
Rule: X teach Y X explain to YLHS: Steve teaches kidsRHS: Steve explains to kids
![Page 37: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/37.jpg)
8Bar Ilan University @ ACL 2012
Is a phrase meaningful?1
Empirically Compare Different Resources
Simplify Process
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations
Steve teaches kids
Steve explains to kids
He born in Paris
He resides in Paris
humans turn in bed
humans bring in bed
Simple
Tasks
2 Judge if one phrase is true given another.
He resides in Paris
He born in Paris
Steve explains to kids
Steve teaches kids
Rule: X resides in Y X born in YLHS: He resides in ParisRHS: He born in Paris
Rule: X turn in Y X bring in YLHS: humans turn in bedRHS: humans bring in bed
Rule: X teach Y X explain to YLHS: Steve teaches kidsRHS: Steve explains to kids
Steve explains to kids
![Page 38: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/38.jpg)
8Bar Ilan University @ ACL 2012
Is a phrase meaningful?1
Empirically Compare Different Resources
Simplify Process
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations
they observe holidays
they celebrate holidays
He born in Paris
He resides in Paris
humans turn in bed
humans bring in bed
Simple
Tasks
2 Judge if one phrase is true given another.
Steve teaches kids
Steve explains to kids
He born in Paris
He resides in Paris
Rule: X resides in Y X born in YLHS: He resides in ParisRHS: He born in Paris
Rule: X turn in Y X bring in YLHS: humans turn in bedRHS: humans bring in bed
Rule: X teach Y X explain to YLHS: Steve teaches kidsRHS: Steve explains to kids
![Page 39: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/39.jpg)
9Bar Ilan University @ ACL 2012
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations Empirically Compare Different Resources
Communicate Entailment
![Page 40: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/40.jpg)
9Bar Ilan University @ ACL 2012
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations Empirically Compare Different Resources
Communicate Entailment Gold Standard
![Page 41: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/41.jpg)
9Bar Ilan University @ ACL 2012
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations Empirically Compare Different Resources
Communicate Entailment
Educating “Confusing” examples used as gold with feedback if Turkers get them wrong
1
Gold Standard
![Page 42: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/42.jpg)
9Bar Ilan University @ ACL 2012
2 Enforcing Unanimous examples used as gold to estimate Turker reliability
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations Empirically Compare Different Resources
Communicate Entailment
Educating “Confusing” examples used as gold with feedback if Turkers get them wrong
1
Gold Standard
![Page 43: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/43.jpg)
10Bar Ilan University @ ACL 2012
Inference-Rule Evaluation
Without With
Agreement with Gold 0.79
Kappa with gold 0.54
False-positive rate 18%
False-negative rate 4%
Crowdsourcing Rule Application Annotations Empirically Compare Different Resources
Communicate - Effect of Communication
![Page 44: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/44.jpg)
10Bar Ilan University @ ACL 2012
Inference-Rule Evaluation
Without With
Agreement with Gold 0.79
Kappa with gold 0.54
False-positive rate 18%
False-negative rate 4%
Crowdsourcing Rule Application Annotations Empirically Compare Different Resources
Communicate - Effect of Communication
0.9
0.79
6%
5%
![Page 45: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/45.jpg)
10Bar Ilan University @ ACL 2012
Inference-Rule Evaluation
Without With
Agreement with Gold 0.79
Kappa with gold 0.54
False-positive rate 18%
False-negative rate 4%
Crowdsourcing Rule Application Annotations Empirically Compare Different Resources
Communicate - Effect of Communication
0.9
0.79
6%
5%
63% of annotations judged unanimously between annotators and with our annotation
![Page 46: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/46.jpg)
Bar Ilan University @ ACL 2012 11
Inference-Rule EvaluationWe address
Crowdsourcing Rule Applications Annotation
By
1
Empirically Compare Different Resources
AllowingAllowing usus toto3
2
![Page 47: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/47.jpg)
Case Study – Data Set
Bar Ilan University @ ACL 2012 12
Executed four entailment rule learning methods on a set of 1B extractions extracted by ReVerb (Fader et al. 2011)
Applied rules on randomly sampled extractions to get 20,000 rule applications
Annotated each rule application using our framework
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations Empirically Compare Different Resources
![Page 48: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/48.jpg)
Case Study – Algorithm Comparison
13Bar Ilan University @ ACL 2012
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations
Algorithm AUC
DIRT (Lin and Pantel, 2001) 0.40
Cover (Weeds andWeir, 2003) 0.43
BInc (Szpektor and Dagan, 2008) 0.44
Berant (Berant et al., 2010) 0.52
Empirically Compare Different Resources
![Page 49: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/49.jpg)
Case Study – Output
14
• Task 1• 1,012 meaningful LHS; meaningless RHS
• 8,264 both sides were judged meaningful
•Task 2• 2,447 positive entailment
• 3,108 negative entailment
• Overall• 6,567 rule applications
• Annotated for $1000
• About a week
• Task 1• 1,012 meaningful LHS; meaningless RHS
• 8,264 both sides were judged meaningful
•Task 2• 2,447 positive entailment
• 3,108 negative entailment
• Overall• 6,567 rule applications
• Annotated for $1000
• About a week
Bar Ilan University @ ACL 2012
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations Empirically Compare Different Resources
non-entailment
passed to Task 2
![Page 50: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/50.jpg)
Summary
15
A framework for crowdsourcing inference rule evaluation
• Simplifies instance-based evaluation
• Communicates entailment decision across to Turkers
• Proposed framework can be beneficial for– resource developers – inference system developers
Crowdsourcing forms and annotated extractions can be found at:
BIU NLP downloads: http://www.cs.biu.ac.il/~nlp/downloads
A framework for crowdsourcing inference rule evaluation
• Simplifies instance-based evaluation
• Communicates entailment decision across to Turkers
• Proposed framework can be beneficial for– resource developers – inference system developers
Crowdsourcing forms and annotated extractions can be found at:
BIU NLP downloads: http://www.cs.biu.ac.il/~nlp/downloads
Bar Ilan University @ ACL 2012
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations Empirically Compare Different Resources
![Page 51: Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814aa2550346895db7b61c/html5/thumbnails/51.jpg)
Summary
15
A framework for crowdsourcing inference rule evaluation
• Simplifies instance-based evaluation
• Communicates entailment decision across to Turkers
• Proposed framework can be beneficial for– resource developers – inference system developers
Crowdsourcing forms and annotated extractions can be found at:
BIU NLP downloads: http://www.cs.biu.ac.il/~nlp/downloads
A framework for crowdsourcing inference rule evaluation
• Simplifies instance-based evaluation
• Communicates entailment decision across to Turkers
• Proposed framework can be beneficial for– resource developers – inference system developers
Crowdsourcing forms and annotated extractions can be found at:
BIU NLP downloads: http://www.cs.biu.ac.il/~nlp/downloads
Bar Ilan University @ ACL 2012
Inference-Rule Evaluation Crowdsourcing Rule Application Annotations
Thank
You
Empirically Compare Different Resources