progress update lin ziheng. system overview 2 components – connective classifier features from...

23
Progress update Lin Ziheng

Upload: winston-neal

Post on 14-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self

Progress update

Lin Ziheng

Page 2: Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self

2

System overview

Page 3: Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self

Components – Connective classifier

• Features from Pitler and Nenkova (2009):– Connective: because– Self category: IN– Parent category: SBAR– Left sibling category: none– Right sibling category: S– Right sibling contains a VP: yes

3

Page 4: Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self

Components – Connective classifier

• New features– Conn POS– Prev word + conn: even though, particularly since– Prev word POS– Prev word POS + conn POS– Conn + Next word– Next word POS– Conn POS + Next word POS– All lemmatized verbs in the sentence containing conn

4

Page 5: Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self

5

Components – Argument labeler

Page 6: Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self

6

Argument labeler – Argument position classifier

• Relative positions of Arg1– Arg1 and Arg2 in the same sentence: SS (60.9%)– Arg1 in the immediately previous sentence: IPS (30.1%)– Arg1 in some non-adjacent previous sentence: NAPS (9.0%)– Arg1 in some following sentence: FS (0%, only 8 instances)

• FS ignored

Page 7: Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self

Argument labeler – Argument position classifier

• Features:– Connective string– Conn POS– Conn position in the sentence: first, second, third, third last, second

last, or last– Prev word– Prev word POS– Prev word + conn– Prev word POS + conn POS– Second prev word– Second prev word POS– Second prev word + conn– Second prev word POS + conn POS

7

Page 8: Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self

8

Argument labeler – Argument extractor

• SS cases: handcrafted a set of syntactically motivated rules to extract Arg1 and Arg2

Page 9: Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self

9

Argument labeler – Argument extractor

• An example:

Page 10: Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self

10

Argument labeler – Argument extractor

• IPS cases: label the sentence containing the connective as Arg2 and the immediately previous sentence as Arg1

• NAPS cases: – Arg1 locates in the second previous sentence in

45.8% of the NAPS cases– Use the majority decision and assume Arg1 is

always in the second previous sentence

Page 11: Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self

11

Components – Explicit classifier

• Prasad et al. (2008) reported human agreements of 94% on Level 1 classes and 84% on Level 2 types

• A baseline using only connectives as features gives 95.7% and 86% on Sec. 23– Difficult to improve acc. on testing section

• 3 types of features:– Connective string– Conn POS– Conn + prev word

Page 12: Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self

12

Components – Non-explicit classifier

• Non-explicit: Implicit, AltLex, EntRel, NoRel– 11 Level 2 types for Implicit/AltLex, plus EntRel and

NoRel 13 types• 4 feature sets from Lin et al. (2009)– Contextual features– Constituent parse features– Dependency parse features– Word-pair features

• 3 features to capture AltLex: Arg2_word1, Arg2_word2, Arg2_word3

Page 13: Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self

13

Components – Attribution span labeler

• Two steps: split the text into clauses, and decide which clauses are attribution spans

• Rule-based clause splitter: – first split a sentence into clauses by punctuations – for each clause, we further split it if one of the

following production links if found: VPSBAR, SSINV, SS, SINVS, SSBAR, VPS

Page 14: Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self

14

Components – Attribution span labeler

• Attr span classifier features: (curr, prev and next clauses)– Unigrams of curr– Lowercased and lemmatized vers in curr– The first and last terms of curr– The last term of prev– The first term of next– The last term of prev + the first term of curr– The last term of curr + the first term of next– The position of curr in the sentence– Punctuations rules extracted from curr

Page 15: Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self

15

Evaluation

• Train: 02-21, dev: 22, test: 23• Each component is tested – without and with error propagation (EP) from

previous component– with gold standard (GS) parse trees and sentence

boundaries, and with automatic (Auto) parser and sentence splitter

Page 16: Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self

16

Evaluation – Connective classifier

• GS: increased acc and F1 by 2.05% and 3.05%• Auto: increased acc and F1 by 1.71% and

2.54%• Contextual info is helpful

Page 17: Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self

17

Evaluation – Argument position classifier

• Able to accurately label SS• But performs badly on the NAPS class– Due to the similarity between IPS and NAPS

classes

Page 18: Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self

18

Evaluation – Argument extractor

• Human agreements on partial and exact matches: 94.5% and 90.2%

• Exact F1 much lower than partial F1– Due to small portions of text deleted

Page 19: Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self

19

Evaluation – Explicit classifier

• Baseline: using only connective strings– 86%

• GS + no EP F1 increased by 0.44%

Page 20: Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self

20

Evaluation – Non-explicit classifier

• Majority baseline: all classified as EntRel• Adding EP degrades F1 by ~13%, but still

outperforms baseline by ~6%

Page 21: Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self

21

Evaluation – Attribution span labeler

• When EP added: the decrease of F1 is largely due to the drop in precision

• When Auto added: the decrease of F1 is largely due the drop in recall

Page 22: Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self

22

Evaluation – The whole pipeline

• Definition: a relation is correct if its relation type is classified correctly, and both Arg1 and Arg2 are partially or exactly matched

• GS + EP– Partial: 46.38% F1– Exact: 31.72% F1

Page 23: Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self

23

On-going changes

• Joint learning• Change rule-based argument extractor to a

machine learning approach