exploiting timelines to enhance multi-document summarization

37
Exploiting Timelines to Enhance Multi- document Summarization Jun-Ping Ng, Yan Chen, Min-Yen Kan and Zhoujun Li National University of Singapore Beihang University

Upload: olga-leblanc

Post on 30-Dec-2015

77 views

Category:

Documents


1 download

DESCRIPTION

Exploiting Timelines to Enhance Multi-document Summarization. Jun-Ping Ng, Yan Chen, Min-Yen Kan and Zhoujun Li National University of Singapore Beihang University. Cyclone Sidr 2007, JTWC designation: 06B. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Exploiting Timelines to Enhance Multi-document Summarization

Exploiting Timelines to Enhance Multi-document

SummarizationJun-Ping Ng, Yan Chen, Min-Yen Kan and

Zhoujun Li

National University of SingaporeBeihang University

Page 2: Exploiting Timelines to Enhance Multi-document Summarization

Cyclone Sidr

2007, JTWC designation: 06B

Cyclone Sidr

2007, JTWC designation: 06B

Image Courtesy: Univ. Wisconsin-Madison

Image Courtesy: Univ. Wisconsin-Madison

“A fierce cyclone

packing extreme

winds and torrential

rain smashed into

Bangladesh’s

southwestern coast

Thursday, …”

“A fierce cyclone

packing extreme

winds and torrential

rain smashed into

Bangladesh’s

southwestern coast

Thursday, …”

24 Jun 2014ACL 2014 - Timelines in

Summarization2

Page 3: Exploiting Timelines to Enhance Multi-document Summarization

24 Jun 2014ACL 2014 - Timelines in

Summarization3

Image Courtesy: US Navy / Wikipedia

Image Courtesy: US Navy / Wikipedia

“… wiping out

homes and trees in

what officials

described as the

worst storm in

years.”

“… wiping out

homes and trees in

what officials

described as the

worst storm in

years.”

Page 4: Exploiting Timelines to Enhance Multi-document Summarization

24 Jun 2014ACL 2014 - Timelines in

Summarization4

Image Courtesy: US State Department / Wikipedia

Image Courtesy: US State Department / Wikipedia

“More than 100,000

coastal villagers

have been

evacuated before

the cyclone made

landfall.”

“More than 100,000

coastal villagers

have been

evacuated before

the cyclone made

landfall.”

Page 5: Exploiting Timelines to Enhance Multi-document Summarization

Image Courtesy: US Navy / Wikipedia

Image Courtesy: US Navy / Wikipedia

1991 Bangladesh Cyclone

1991 Bangladesh Cyclone

“The storm matched one in 1991 that sparked a tidal wave that killed an estimated 138,000 people, Karmakar told AFP.”

“The storm matched one in 1991 that sparked a tidal wave that killed an estimated 138,000 people, Karmakar told AFP.”

24 Jun 2014ACL 2014 - Timelines in

Summarization5

Page 6: Exploiting Timelines to Enhance Multi-document Summarization

24 Jun 2014ACL 2014 - Timelines in

Summarization6

[3] “The storm matched one in 1991 that sparked a tidal wave that killed an estimated 138,000 people, Karmakar told AFP.”

[3] “The storm matched one in 1991 that sparked a tidal wave that killed an estimated 138,000 people, Karmakar told AFP.”

[2] “More than 100,000

coastal villagers have been

evacuated before the

cyclone made landfall.”

[2] “More than 100,000

coastal villagers have been

evacuated before the

cyclone made landfall.”

[1] “A fierce

cyclone packing

extreme winds and

torrential rain

smashed into

Bangladesh’s

southwestern coast

Thursday, wiping

out homes and

trees in what

officials described

as the worst storm

in years.”

[1] “A fierce

cyclone packing

extreme winds and

torrential rain

smashed into

Bangladesh’s

southwestern coast

Thursday, wiping

out homes and

trees in what

officials described

as the worst storm

in years.”

Page 7: Exploiting Timelines to Enhance Multi-document Summarization

24 Jun 2014ACL 2014 - Timelines in

Summarization7

[3] “The storm matched one in 1991 that sparked a tidal wave that killed an estimated 138,000 people, Karmakar told AFP.”

[2] “More than 100,000

coastal villagers have been

evacuated before the

cyclone made landfall.”

[1] “A fierce

cyclone packing

extreme winds and

torrential rain

smashed into

Bangladesh’s

southwestern coast

Thursday, wiping

out homes and

trees in what

officials described

as the worst storm

in years.”

Page 8: Exploiting Timelines to Enhance Multi-document Summarization

Timelines from Text

24 Jun 2014ACL 2014 - Timelines in

Summarization8

[3] “The storm matched one in 1991 that sparked a tidal wave that killed an estimated 138,000 people, Karmakar told AFP.”

[2] “More than 100,000 coastal villagers

have been evacuated before the

cyclone made landfall.”

[1] “A fierce cyclone packing extreme

winds and torrential rain smashed into

Bangladesh’s southwestern coast

Thursday, wiping out homes and trees

in what officials described as the worst

storm in years.”

Page 9: Exploiting Timelines to Enhance Multi-document Summarization

Key time spans are summary worthy

24 Jun 2014ACL 2014 - Timelines in

Summarization9

[3] “The storm matched one in 1991 that sparked a tidal wave that killed an estimated 138,000 people, Karmakar told AFP.”

[2] “More than 100,000 coastal villagers

have been evacuated before the

cyclone made landfall.”

[1] “A fierce cyclone packing extreme

winds and torrential rain smashed into

Bangladesh’s southwestern coast

Thursday, wiping out homes and trees

in what officials described as the worst

storm in years.”

Page 10: Exploiting Timelines to Enhance Multi-document Summarization

Timelines + Summarization

Timelines (per input document)

SummarizationSystem

Summary

Lexical and positional features

Timeline-derived features

1024 Jun 2014ACL 2014 - Timelines in

Summarization

Timelines + Summarization

Page 11: Exploiting Timelines to Enhance Multi-document Summarization

Outline• Goal and Motivation• Timeline Generation• Integrating Timelines– In Scoring: (Contextual) Importance,

Density– In Re-ordering: TimeMMR

• Experiments• Discussion

1124 Jun 2014ACL 2014 - Timelines in

Summarization

Page 12: Exploiting Timelines to Enhance Multi-document Summarization

Timeline Generation

1224 Jun 2014ACL 2014 - Timelines in

Summarization

Page 13: Exploiting Timelines to Enhance Multi-document Summarization

1. Event-Event Temporal Classification

1324 Jun 2014ACL 2014 - Timelines in

Summarization

(Ng et al., 2013; EMNLP)

Page 14: Exploiting Timelines to Enhance Multi-document Summarization

2. Event-Timex Temporal Classification

1424 Jun 2014ACL 2014 - Timelines in

Summarization

(Ng and Kan, 2012; COLING)

Page 15: Exploiting Timelines to Enhance Multi-document Summarization

3. Timex Normalization

15

“Today” June 6, 2014

24 Jun 2014ACL 2014 - Timelines in

Summarization

(HeidelTime; Strötgen and Gertz, 2013)

Page 16: Exploiting Timelines to Enhance Multi-document Summarization

Timeline Construction1. Map normalized timexes to timeline2. Place events which OVERLAP with timexes onto

timeline3. Place events which OVERLAP with other events

onto the timeline4. Insert rest of events based on BEFORE/AFTER

ordering

16

1999

24 Jun 2014ACL 2014 - Timelines in

Summarization

Page 17: Exploiting Timelines to Enhance Multi-document Summarization

Integrating Timelines into SWING

17

Time Span ImportanceContextualTime Span ImportanceSentence

Temporal Coverage Density

Time MMR

24 Jun 2014ACL 2014 - Timelines in

Summarization

Temporal Processing

Summarization PipelineSWING (Ng et al., COLING 2012,

TAC 2011)

State-of-the-art open-source extractive summarizerhttps://github.com/WING-NUS/SWING

Basic, k of n sentence

summaries

Page 18: Exploiting Timelines to Enhance Multi-document Summarization

1. Time Span Importance (TSI)

• Time spans which contain many events are more salient

• Sentences which references events in these time spans are thus better candidates for a summary

24 Jun 2014ACL 2014 - Timelines in

Summarization18

Page 19: Exploiting Timelines to Enhance Multi-document Summarization

2. Contextual Time Span Importance (CTSI)

• Time spans near to important time spans are important

• Search left and right for local peaks

24 Jun 2014ACL 2014 - Timelines in

Summarization19

, where

Page 20: Exploiting Timelines to Enhance Multi-document Summarization

3. Sentence Temporal Coverage Density (TCD)

• Favour sentences which– contain more events– covering a wide

variety of time spans

24 Jun 2014ACL 2014 - Timelines in

Summarization20

Page 21: Exploiting Timelines to Enhance Multi-document Summarization

Identifying Redundancies

• SWING makes use of the Maximal Marginal Relevance (MMR) algorithm to identify redundancies in selected sentences

• MMR is based largely on surface lexical similarities

Idea: Let’s use time as a basis to penalize the selection of sentences from redundant time periods.

2124 Jun 2014ACL 2014 - Timelines in

Summarization

Page 22: Exploiting Timelines to Enhance Multi-document Summarization

TimeMMR• Beyond lexical similarities, identify sentences

which contain substantial time span overlap.• Candidate sentences which share many time

spans with selected sentences are penalized.

22

(1) An official in Barisal, 120 kilometres south of Dhaka, spoke of severe destruction as the 500 kilometre-wide mass of cloud passed overhead.

(2) “Many trees have been uprooted and houses and schools blown away,” Mostofa Kamal, a district relief and rehabilitation officer, told AFP by telephone.

(3) “Mud huts have been damaged and the roofs of several houses blown off,” said the state’s relief minister, Mortaza Hossain.

Lexic

ally

dis

sim

ilar

but

redu

nd

ant

24 Jun 2014ACL 2014 - Timelines in

Summarization

Proportion of overlap

Page 23: Exploiting Timelines to Enhance Multi-document Summarization

Experiments• Data– TAC 2010 dataset for training– TAC 2011 dataset for testing

• Temporal Processing Systems– HeidelTime (Strötgen and Gertz, 2013)– E-T temporal classification (Ng and Kan,

2012)– E-E temporal classification (Ng et al., 2013)

• Summarization baseline– SWING (Ng et al., 2012)

2324 Jun 2014ACL 2014 - Timelines in

Summarization

Page 24: Exploiting Timelines to Enhance Multi-document Summarization

Results

24

# Configuration R-2

R SWING 0.1339

B1 CLASSY 0.1278

1 SWING + Timeline Features 0.1394*

2 SWING + Timeline Features + TimeMMR 0.1389

24 Jun 2014ACL 2014 - Timelines in

Summarization

Doesn’t seem very effective!

* = p < 0.1, ** = p < 0.05, against R row

Page 25: Exploiting Timelines to Enhance Multi-document Summarization

Analysis: Timelines contain errors

• Errors from underlying temporal processing systems

• Simplifying assumptions made in timeline construction

• Lack of consistency checking and validation

For effective use, we must identify good timelines• Identify timelines which potentially contain more

errors• Exclude these when performing summarization

2524 Jun 2014ACL 2014 - Timelines in

Summarization

Page 26: Exploiting Timelines to Enhance Multi-document Summarization

Reliability Filtering• Short timelines can result when the system fails

to extract or relate events and timexes• Features derived from short timelines are prone

to have extreme values

• Use the length of a timeline as a gauge of its accuracy

• Don’t use timelines shorter than average(as computed over the whole collection)

2624 Jun 2014ACL 2014 - Timelines in

Summarization

Page 27: Exploiting Timelines to Enhance Multi-document Summarization

With Reliability Filtering

27

# Configuration R-2

R SWING 0.1339

B1 CLASSY 0.1278

1 SWING + Timeline Features 0.1394*

2 SWING + Timeline Features + TimeMMR 0.1389

3 SWING + Timeline Features [Filtered] 0.1418**

4SWING + Timeline Features + TimeMMR [Filtered]

0.1402**

24 Jun 2014ACL 2014 - Timelines in

Summarization

TimeMMR doesn’t seem effective! Why?

* = p < 0.1, ** = p < 0.05, against R row

Page 28: Exploiting Timelines to Enhance Multi-document Summarization

Does TimeMMR actually help?

28

L1 An Iraqi reporter threw his shoes at visiting U.S. President George W. Bush and called him a ”dog” in Arabic during a news conference with Iraqi Prime Minister Nuri al-Maliki in Baghdad

R1

L2 ”All I can report is it is a size 10,. R2

L3 Muntadhar al-Zaidi, reporter of Baghdadiya television jumped and threw his two shoes one by one at the president, who ducked and thus narrowly missed being struck, raising chaos in the hall in Baghdad’s heavily fortified green Zone.

The incident occurred as Bush was appearing with Iraqi Prime Minister Nouri al-Maliki.

R3

L4 The president lowered his head and the first shoe hit the American and Iraqi flags behind the two leaders.

Muntadhar al-Zaidi, reporter of Baghdadiya television jumped and threw his two shoes one by one at the president, who ducked and thus narrowly missed being struck, raising chaos in the hall in Baghdad’s heavily fortified green Zone.

R4

L5 The The president lowered his head and the R5

R-2: 0.2643, worse by R-2R-2: 0.2772, better by R-2

24 Jun 2014ACL 2014 - Timelines in

Summarization

Possibly Redundant?

=

Could an (automated) evaluation metric cater for time?

Page 29: Exploiting Timelines to Enhance Multi-document Summarization

Conclusion• Use of automatic timeline generation• Integration of timelines into

summarization– Sentence scoring via timeline features– Sentence re-ordering via TimeMMR– Length based timeline filtering helps to

ameliorate errors

2924 Jun 2014ACL 2014 - Timelines in

Summarization

For details on temporal processing, see:Jun Ping’s work at COLING 2012, EMNLP 2013 and his doctoral thesis (2014)

Questions? If not, ask for more detailed analysis!

Page 30: Exploiting Timelines to Enhance Multi-document Summarization

Additional Slides

3024 Jun 2014ACL 2014 - Timelines in

Summarization

Page 31: Exploiting Timelines to Enhance Multi-document Summarization

Related Work• For Sentence Reordering– Barzilay et al., 1999

• Recency as an indicator of salience– Goldstein et al., 2000;Wan, 2007;

Demartini et al., 2010– Liu et al., 2009 (“Temporal Graph”)– Wu, 2008 (“Largest Cluster”)

• TREC Temporal Summarization Track– Not as relevant; about monitoring an event

over time

24 Jun 2014ACL 2014 - Timelines in

Summarization31

Close to our TSI

Page 32: Exploiting Timelines to Enhance Multi-document Summarization

3224 Jun 2014ACL 2014 - Timelines in

Summarization

Baseline; worse

With time features; better

Page 33: Exploiting Timelines to Enhance Multi-document Summarization

TSI: A crane accident

With TSI, the cause of the accident in this summary is included; the alternative R1 sentence is background information and does not occur at any key time span.

24 Jun 2014ACL 2014 - Timelines in

Summarization33

With TSI; betterWithout TSI;

worse

Page 34: Exploiting Timelines to Enhance Multi-document Summarization

3424 Jun 2014ACL 2014 - Timelines in

Summarization

With CTSI; better

Without CTSI; worse

With CTSI, the “warn” and “disappear” events were promoted in importance due to their proximity with peak P

CTSI: Coral Reef Preservation

Page 35: Exploiting Timelines to Enhance Multi-document Summarization

Timeline Caveats• Some events span a long period of time (i.e.,

“1999”)

• Events are ordered based on the start of the duration

• Timeline captures relative order

• Construction algorithm does not attempt to reconcile contradictions

24 Jun 2014ACL 2014 - Timelines in

Summarization35

Page 36: Exploiting Timelines to Enhance Multi-document Summarization

Timex Normalization

Source:Bethard, 2013

24 Jun 2014ACL 2014 - Timelines in

Summarization36

Page 37: Exploiting Timelines to Enhance Multi-document Summarization

References• Jun-Ping Ng, Interpreting Text with Time, Doctoral Thesis, National University of Singapore, 2014

• Jun-Ping Ng, Min-Yen Kan, Ziheng Lin, Wei Feng, Bin Chen, Jian Su, Chew-Lim Tan, Exploiting Discourse Analysis for Article-Wide Temporal Classification, EMNLP 2013

• Jun-Ping Ng, Praveen Bysani, Ziheng Lin, Min-Yen Kan, Chew-Lim Tan, Exploiting Category-Specific Information for Multi-Document Summarization, COLING 2012

• Jun-Ping Ng, Min-Yen Kan, Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations, COLING 2012

24 Jun 2014ACL 2014 - Timelines in

Summarization37