1 the pyramid method at duc05 ani nenkova becky passonneau kathleen mckeown other team members:...

52
1 The Pyramid Method at The Pyramid Method at DUC05 DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

Upload: allyson-mathews

Post on 17-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

1

The Pyramid Method at The Pyramid Method at DUC05DUC05

Ani Nenkova

Becky Passonneau

Kathleen McKeown

Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

Page 2: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

2

OverviewOverview

Review of Pyramids (Kathy) Characteristics of the responses Analyses (Ani)

Scores and Significant Differences Reliability of Pyramid scoring

Comparisons between annotators Impact of editing on scores Impact of Weight 1 SCUs Correlation with responsiveness and Rouge

Lessons learned

Page 3: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

3

PyramidsPyramids Uses multiple human summaries

Previous data indicated 5 needed for score stability

Information is ranked by its importance Allows for multiple good summaries A pyramid is created from the human

summaries Elements of the pyramid are content units System summaries are scored by comparison with

the pyramid

Page 4: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

4

Summarization Content UnitsSummarization Content Units

Near-paraphrases from different human summaries

Clause or less

Avoids explicit semantic representation

Emerges from analysis of human summaries

Page 5: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

5

SCU: SCU: A cable car caught fireA cable car caught fire (Weight = 4)(Weight = 4)A. The cause of the fire was unknown.B. A cable car caught fire just after entering a

mountainside tunnel in an alpine resort in Kaprun, Austria on the morning of November 11, 2000.

C. A cable car pulling skiers and snowboarders to the Kitzsteinhorn resort, located 60 miles south of Salzburg in the Austrian Alps, caught fire inside a mountain tunnel, killing approximately 170 people.

D. On November 10, 2000, a cable car filled to capacity caught on fire, trapping 180 passengers inside the Kitzsteinhorn mountain, located in the town of Kaprun, 50 miles south of Salzburg in the central Austrian Alps.

Page 6: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

6

SCU: SCU: The cause of the fire is The cause of the fire is unknownunknown (Weight = 1) (Weight = 1)A. The cause of the fire was unknown.B. A cable car caught fire just after entering a

mountainside tunnel in an alpine resort in Kaprun, Austria on the morning of November 11, 2000.

C. A cable car pulling skiers and snowboarders to the Kitzsteinhorn resort, located 60 miles south of Salzburg in the Austrian Alps, caught fire inside a mountain tunnel, killing approximately 170 people.

D. On November 10, 2000, a cable car filled to capacity caught on fire, trapping 180 passengers inside the Kitzsteinhorn mountain, located in the town of Kaprun, 50 miles south of Salzburg in the central Austrian Alps.

Page 7: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

7

SCU: SCU: The accident happened in The accident happened in the Austrian Alpsthe Austrian Alps (Weight = 3) (Weight = 3)A. The cause of the fire was unknown.B. A cable car caught fire just after entering a

mountainside tunnel in an alpine resort in Kaprun, Austria on the morning of November 11, 2000.

C. A cable car pulling skiers and snowboarders to the Kitzsteinhorn resort, located 60 miles south of Salzburg in the Austrian Alps, caught fire inside a mountain tunnel, killing approximately 170 people.

D. On November 10, 2000, a cable car filled to capacity caught on fire, trapping 180 passengers inside the Kitzsteinhorn mountain, located in the town of Kaprun, 50 miles south of Salzburg in the central Austrian Alps.

Page 8: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

8

Idealized representationIdealized representation

Tiers of differentially weighted SCUs

Top: few SCUs, high weight

Bottom: many SCUs, low weight

W=1

W=2

W=3

Page 9: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

9

Creation of pyramids Creation of pyramids

Done for each of 20 out of 50 sets

Primary annotator, secondary checker

Held round-table discussions of problematic constructions that occurred in this data set

Comma separated lists Extractive reserves have been formed for managed harvesting of

timber, rubber, Brazil nuts, and medical plants without deforestation.

General vs. specific Eastern Europe vs. Hungary, Poland, Lithuania, and Turkey

Page 10: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

10

Characteristics of the ResponsesCharacteristics of the Responses

Proportion of SCUs of Weight 1 is large 44% (D324) to 81% (D695)

Mean SCU weight: 1.9

Agreement among human responders is quite low

Page 11: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

11 SCU Weights

# of SCUs at each weight

Page 12: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

12

Pyramids: DUC 2003Pyramids: DUC 2003

100 word summaries (vs. 250 word) 10 500-word articles per cluster (vs. 30 720-

word articles) 3 clusters (vs. 20 clusters)

Mean SCU Weight (7 models) 2005: avg 1.9 2003: avg 2.4

Proportion of SCUs of W=1 2005: avg – 60%, 44% to 81% 2003: avg – 40%, 37% to 47%

Page 13: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

13

DUC03 DUC05DUC03 DUC05

.4

.4

Page 14: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

14

Computing pyramid scores:Computing pyramid scores:Ideally informative summaryIdeally informative summary

Does not include an SCU from a lower tier unless all SCUs from higher tiers are included as well

Page 15: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

15

Ideally informative summaryIdeally informative summary

Does not include an SCU from a lower tier unless all SCUs from higher tiers are included as well

Page 16: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

16

Ideally informative summaryIdeally informative summary

Does not include an SCU from a lower tier unless all SCUs from higher tiers are included as well

Page 17: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

17

Ideally informative summaryIdeally informative summary

Does not include an SCU from a lower tier unless all SCUs from higher tiers are included as well

Page 18: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

18

Ideally informative summaryIdeally informative summary

Does not include an SCU from a lower tier unless all SCUs from higher tiers are included as well

Page 19: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

19

Ideally informative summaryIdeally informative summary

Does not include an SCU from a lower tier unless all SCUs from higher tiers are included as well

Page 20: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

20

Original Pyramid ScoreOriginal Pyramid Score

SCORE = D/MAX

D: Sum of the weights of the SCUs in a summary

MAX: Sum of the weights of the SCUs in a ideally informative summary

Measures the proportion of good information in the summary: precision

Page 21: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

21

Modified pyramid score Modified pyramid score (recall)(recall) EN = average SCUs in human models

This is the number of content units humans chose to convey about the story

W=Compute the weight of a maximally informative summary of size EN

D/W is the modified pyramid score Shows the proportion of expected good

information

Page 22: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

22

Scoring MethodsScoring Methods

Presents scores for the 20 pyramid sets Recompute Rouge for comparison

We compute Rouge using only 7 models 8 and 9 reserved for computing human performance Best because of significant topic effect

Comparisons between Pyramid (original,modified), responsiveness, and Rouge-SU4

Pyramids score computed from multiple humans Responsiveness is just one human’s judgment Rouge-SU4 equivalent to Rouge-2

Page 23: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

23

Preview of ResultsPreview of Results

Manual metrics Large differences between humans and machines

No single system the clear winner But a top group identified by all metrics

Significant differences Different predictions from manual and automatic metrics

Correlations between metrics Some correlation but one cannot be substituted for another This is good

Page 24: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

24

Human performance/Best sysHuman performance/Best sys

Pyramid Modified Resp ROUGE-SU4

B: 0.5472 B: 0.4814 A: 4.895 A: 0.1722 A: 0.4969 A: 0.4617 B: 4.526 B: 0.1552~~~~~~~~~~~~~~~~~

14: 0.2587 10: 0.2052 4: 2.85 15: 0.139 Best system ~50% of human performance on manual metrics

Best system ~80% of human performance on ROUGE

Page 25: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

25

Pyramid original Modified Resp Rouge-SU414: 0.2587 10: 0.2052 4: 2.85 15: 0.139 17: 0.2492 17: 0.1972 14: 2.8 4: 0.134 15: 0.2423 14: 0.1908 10: 2.65 17: 0.1346 10: 0.2379 7: 0.1852 15: 2.6 19: 0.1275 4: 0.2321 15: 0.1808 17: 2.55 11: 0.1259 7: 0.2297 4: 0.177 11: 2.5 10: 0.127816: 0.2265 16: 0.1722 28: 2.45 6: 0.1239 6: 0.2197 11: 0.1703 21: 2.45 7: 0.1213 32: 0.2145 6: 0.1671 6: 2.4 14: 0.1264 21: 0.2127 12: 0.1664 24: 2.4 25: 0.1188 12: 0.2126 19: 0.1636 19: 2.4 21: 0.1183 11: 0.2116 21: 0.1613 6: 2.4 16: 0.1218 26: 0.2106 32: 0.1601 27: 2.35 24: 0.118 19: 0.2072 26: 0.1464 12: 2.35 12: 0.116 28: 0.2048 3: 0.145 7: 2.3 3: 0.1198 13: 0.1983 28: 0.1427 25: 2.2 28: 0.1203 3: 0.1949 13: 0.1424 32: 2.15 27: 0.110 1: 0.1747 25: 0.1406 3: 2.1 13: 0.1097

Page 26: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

26

Pyramid original Modified Resp Rouge-SU414: 0.2587 10: 0.2052 4: 2.85 15: 0.139 17: 0.2492 17: 0.1972 14: 2.8 4: 0.134 15: 0.2423 14: 0.1908 10: 2.65 17: 0.1346 10: 0.2379 7: 0.1852 15: 2.6 19: 0.1275 4: 0.2321 15: 0.1808 17: 2.55 11: 0.1259 7: 0.2297 4: 0.177 11: 2.5 10: 0.127816: 0.2265 16: 0.1722 28: 2.45 6: 0.1239 6: 0.2197 11: 0.1703 21: 2.45 7: 0.1213 32: 0.2145 6: 0.1671 6: 2.4 14: 0.1264 21: 0.2127 12: 0.1664 24: 2.4 25: 0.1188 12: 0.2126 19: 0.1636 19: 2.4 21: 0.1183 11: 0.2116 21: 0.1613 6: 2.4 16: 0.1218 26: 0.2106 32: 0.1601 27: 2.35 24: 0.118 19: 0.2072 26: 0.1464 12: 2.35 12: 0.116 28: 0.2048 3: 0.145 7: 2.3 3: 0.1198 13: 0.1983 28: 0.1427 25: 2.2 28: 0.1203 3: 0.1949 13: 0.1424 32: 2.15 27: 0.110 1: 0.1747 25: 0.1406 3: 2.1 13: 0.1097

Page 27: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

27

Pyramid original Modified Resp Rouge-SU414: 0.2587 10: 0.2052 4: 2.85 15: 0.139 17: 0.2492 17: 0.1972 14: 2.8 4: 0.134 15: 0.2423 14: 0.1908 10: 2.65 17: 0.1346 10: 0.2379 7: 0.1852 15: 2.6 19: 0.1275 4: 0.2321 15: 0.1808 17: 2.55 11: 0.1259 7: 0.2297 4: 0.177 11: 2.5 10: 0.127816: 0.2265 16: 0.1722 28: 2.45 6: 0.1239 6: 0.2197 11: 0.1703 21: 2.45 7: 0.1213 32: 0.2145 6: 0.1671 6: 2.4 14: 0.1264 21: 0.2127 12: 0.1664 24: 2.4 25: 0.1188 12: 0.2126 19: 0.1636 19: 2.4 21: 0.1183 11: 0.2116 21: 0.1613 6: 2.4 16: 0.1218 26: 0.2106 32: 0.1601 27: 2.35 24: 0.118 19: 0.2072 26: 0.1464 12: 2.35 12: 0.116 28: 0.2048 3: 0.145 7: 2.3 3: 0.1198 13: 0.1983 28: 0.1427 25: 2.2 28: 0.1203 3: 0.1949 13: 0.1424 32: 2.15 27: 0.110 1: 0.1747 25: 0.1406 3: 2.1 13: 0.1097

Page 28: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

28

Pyramid original Modified Resp Rouge-SU414: 0.2587 10: 0.2052 4: 2.85 15: 0.139 17: 0.2492 17: 0.1972 14: 2.8 4: 0.134 15: 0.2423 14: 0.1908 10: 2.65 17: 0.1346 10: 0.2379 7: 0.1852 15: 2.6 19: 0.1275 4: 0.2321 15: 0.1808 17: 2.55 11: 0.1259 7: 0.2297 4: 0.177 11: 2.5 10: 0.127816: 0.2265 16: 0.1722 28: 2.45 6: 0.1239 6: 0.2197 11: 0.1703 21: 2.45 7: 0.1213 32: 0.2145 6: 0.1671 6: 2.4 14: 0.1264 21: 0.2127 12: 0.1664 24: 2.4 25: 0.1188 12: 0.2126 19: 0.1636 19: 2.4 21: 0.1183 11: 0.2116 21: 0.1613 6: 2.4 16: 0.1218 26: 0.2106 32: 0.1601 27: 2.35 24: 0.118 19: 0.2072 26: 0.1464 12: 2.35 12: 0.116 28: 0.2048 3: 0.145 7: 2.3 3: 0.1198 13: 0.1983 28: 0.1427 25: 2.2 28: 0.1203 3: 0.1949 13: 0.1424 32: 2.15 27: 0.110 1: 0.1747 25: 0.1406 3: 2.1 13: 0.1097

Page 29: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

29

Significant DifferencesSignificant Differences

Manual metrics Few differences between systems

Pyramid: 23 is worse Responsive: 23 and 31 are worse

Both humans better than all systems

Automatic (Rouge-SU4) Many differences between systems One human indistinguishable from 5 systems

Page 30: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

30

Multiple and pairwise comparisonsMultiple and pairwise comparisons

Multiple comparisons Tukey’s method Control for the experiment-wise type I error Show fewer significant differences

Pairwise comparisons Wilcoxon paired test Controls the error for individual comparisons Appropriate how your system did for development

Page 31: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

31

21

32

6

12

19

11

16

4

15

7

14

17

10

A

B

23

23

23

23

23

23

23

23

23

23

23

23 20

23 20

23 20 30 24 31 1 27 25 28 13 26 3 21 32 6 12 19 11 16 4 15 7 14 17 10

23 20 30 24 31 1 27 25 28 13 26 3 21 32 6 12 19 11 16 4 15 7 14 17 10

Modified pyramid: significant differences• One systems accounts for most of the differences

• Humans significantly better than all systems

Peer Better than

Page 32: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

32

26

13

20

3

32

25

7

12

27

6

16

19

24

21

28

11

17

15

10

14

4

B

A

23

23

23

23

23

23

23

23

23

23 31

23 31

23 31

23 31

23 31

23 31

23 31

23 31

23 31 1

23 31 1

23 31 1 30 26 13 20

23 31 1 30 26 13 20 3

23 31 1 30 26 13 20 3 32 25 7 12 27 6 16 19 24 21 28 11 17 15 10 14 4

23 31 1 30 26 13 20 3 32 25 7 12 27 6 16 19 24 21 28 11 17 15 10 14 4

Responsiveness 1: Significant differences

• Differences primarily between 2 systems

• Differences between humans and each system

Page 33: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

33

16

12

15

28

3

7

4

14

17

10

B

A

23

23

23

23

23

23

23

23 31 20

23 31 20

23 31 20

23 31 1 30 26 13 20 3 32 25 7 12 27 6 16 19 24 21 28 11 17 15 10 14 4

23 31 1 30 26 13 20 3 32 25 7 12 27 6 16 19 24 21 28 11 17 15 10 14 4

Responsive-2

• Similar shape to original

Page 34: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

34

20

31

26

1

32 11

28

13

30

27

3

16

21

12

24

25

7

14

6

19

10

17

4

15

B

A

23

23

23

23

23 20

23 20 31

23 20 31

23 20 31

23 20 31

23 20 31

23 20 31

23 20 31 26 1

23 20 31 26 1

23 20 31 26 1

23 20 31 26 1

23 20 31 26 1

23 20 31 26 1

23 20 31 26 1

23 20 31 26 1

23 20 31 26 1

23 20 31 26 1

23 20 31 26 1

23 20 31 26 1

23 20 31 26 1

23 20 31 26 1 32 11 28 13 30 27 3 16 21 12 24 25 7 14 6

23 20 31 26 1 32 11 28 13 30 27 3 16 21 12 24 25 7 14 6 19 10 17 4 15

Skip-bigram: significant differences

• Many more differences between systems than any manual metric

• No difference between human and 5 systems

Page 35: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

35

Page 36: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

36

Pairwise comparisons: Modified Pairwise comparisons: Modified PyramidPyramid

10

17

14

7

15

4

16

11

19

12

6

32

21

3

26

13

28

25 27 31 24 30 20 23

3 25 27 24 30 20 23

25 27 1 24 30 20 23

13 25 27 31 24 30 20 23

3 25 27 1 24 30 20 23

25 27 31 24 30 20 23

24 30 23

24 30 23

24 30 23

30 23

31 30 23

24 30 20 23

24 30 23

30 23

23

23

30 20 23

Page 37: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

37

Agreement between annotatorsAgreement between annotators

Overall Low High

Percent

Agreement

95% 90% 96%

Kappa .57 .46 .62

Alpha .57 .41 .59

Alpha-Dice .67 .49 .68

Page 38: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

38

Editing of participant annotationsEditing of participant annotations

To correct obvious errors Ensures uniform checking Predominantly involved correct splitting

unmatching SCUs Average paired differences

Original: 0.0043 Modified: 0.0005

Average magnitude of the difference Original: 0.0115 Modified: 0.0032

Page 39: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

39

Excluding weight 1 SCUsExcluding weight 1 SCUs

Removing weight 1 SCUs improves agreement Kappa: 0.64 (was 0.57)

Annotating without weight 1 has negligible impact on scores Set D324 done without weight 1 SCUs Ave.magnitude between paired differences

On average 0.07 difference

Page 40: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

40

Correlations: Pearson’s, 25 Correlations: Pearson’s, 25 systemssystems

Pyr-mod Resp-1 Resp2 R-2 R-SU4

Pyr-orig 0.96 0.77 0.86 0.84 0.80

Pyr-mod 0.81 0.90 0.90 0.86

Resp-1 0.83 0.92 0.92

Resp-2 0.88 0.87

R-2 0.98

Page 41: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

41

Correlations: Pearson’s, 25 Correlations: Pearson’s, 25 systemssystems

Pyr-mod Resp-1 Resp2 R-2 R-SU4

Pyr-orig 0.96 0.77 0.86 0.84 0.80

Pyr-mod 0.81 0.90 0.90 0.86

Resp-1 0.83 0.92 0.92

Resp-2 0.88 0.87

R-2 0.98

Questionable that responsiveness could be a gold standard

Page 42: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

42

Pyramid and responsivenessPyramid and responsiveness

Pyr-mod Resp-1 Resp2 R-2 R-SU4

Pyr-orig 0.96 0.77 0.86 0.84 0.80

Pyr-mod 0.81 0.90 0.90 0.86

Resp-1 0.83 0.92 0.92

Resp-2 0.88 0.87

R-2 0.98

High correlation, but the metrics are not mutually substitutable

Page 43: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

43

Pyramid and RougePyramid and Rouge

Pyr-mod Resp-1 Resp2 R-2 R-SU4

Pyr-orig 0.96 0.77 0.86 0.84 0.80

Pyr-mod 0.81 0.90 0.90 0.86

Resp-1 0.83 0.92 0.92

Resp-2 0.88 0.87

R-2 0.98

High correlation, but the metrics are not mutually substitutable

Page 44: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

44

Lessons LearnedLessons Learned

Comparing content is hard All kinds of judgment calls We didn’t evaluate the NIST assessors in previous years

Paraphrases VP vs. NP

Ministers have been exchanged Reciprocal ministerial visits

Length and constituent type Robotics assists doctors in the medical operating theater Surgeons started using robotic assistants

Page 45: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

45

Modified scores betterModified scores better

Easier peer annotation Can drop weight 1 SCUs

Better agreement No emphasis on splitting non-matching

SCUs

Page 46: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

46

Agreement between annotatorsAgreement between annotators

Participants can perform peer annotation reliably

Absolute difference between scores Original: 0.0555 Modified: 0.0617 Empirical prediction of difference 0.06

(HLT 2004)

Page 47: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

47

CorrelationsCorrelations

Original and modified can substitute for each other

High correlation between manual and automatic, but automatic not yet a substitute

Similar patterns between pyramid and responsiveness

Page 48: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

48

Current DirectionsCurrent Directions

Automated identification of SCUs (Harnly et al 05)

Applied to DUC05 pyramid data set

Correlation of .91 with modified pyramid scores

Page 49: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

49

QuestionsQuestions

What was the experience annotating pyramids?

Does it shed insight on the problem Are people willing to do it again? Would you have been willing to go through

training?

If you’ve done pyramid analysis, can you share your insights

Page 50: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

50

Page 51: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

51

Annotators Setid Alpha Dice Alpha-dice102:218 324 0.59 0.71 0.67108:120 400 0.45 0.72 0.53109:122 407 0.41 0.59 0.49112:126 426 0.54 0.74 0.63116:124 633 0.58 0.87 0.68121:125 695 0.51 0.75 0.61

102:123 324 0.6 0.82 0.69218:123 324 0.49 0.66 0.56

Page 52: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman

52

Correlations of Scores on Correlations of Scores on Matched SetsMatched Sets

102:123 324 0.7 (.44-.85) 0.73 (.48-.87) 218:123 324 0.6 (.29-.80) 0.77 (.55-.89)

AnnotatorsSet Id Pearson's w/ Orig Pearson's w/ Modif102:218 324 0.76 (.54-.89) 0.83 (.66-.92) 108:120 400 0.84 (.67-.92) 0.89 (.77-.95) 109:122 407 0.92 (.83-.96) 0.91 (.80-.96) 112:126 426 0.9 (.78-.95) 0.95 (.90-.98) 116:124 633 0.81 (.62-.91) 0.78 (.57-.90) 121:125 695 0.91 (.81-.96) 0.92 (.83-.96)