knowledge-rich approaches for text summarization minna vasankari 27.11.2001

32
Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001

Upload: jerome-martin

Post on 29-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001

Knowledge-rich approaches for text summarization

Minna Vasankari

27.11.2001

Page 2: Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001

2

Structure

1. The idea

2. Conceptual summarization

3. Linguistic summarization

4. Example system: Plandoc

5. Summary

Page 3: Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001

3

The idea

• Full text is not the only possible source material for summarization

• Other sources:

– databases

– simulation data

– user interaction sequences

– etc

Page 4: Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001

4

The idea

• Data with structure

– easier to interpret than full text

– no source text => no shortcuts

– text generation phase is hard

– domain-dependency

Page 5: Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001

5

Conceptual summarization

• Sorting the source material

– facts, events

• Choosing what is important

– must be included in the summary

• and what is potentially important

– can be left out or included

Page 6: Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001

6

Conceptual summarization

• What is important?

– depends on the domain

– depends on the input material

– depends on the user

Page 7: Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001

7

Conceptual summarization

• Importance of a fact

– manual decision

• Importance of an event

– manual decision

– frequency analysis

Page 8: Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001

8

Conceptual summarization

– Potentially important facts/events are included only if they fit in

– Determined by

• space limit

• linguistic constraints

• possible ordering of facts

Page 9: Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001

9

Linguistic summarization

• Expressing the same information in fewer sentences

• Method: linguistic constructs & revision

• Danger: over-effective compression leads to unreadable sentences

Page 10: Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001

10

Linguistic summarization

• Linguistic constructs:

– semantically rich words

– modifiers of nouns or verbs

– conjunction and ellipsis

– abridged references

– abstraction

– aggregation

– presentational techniques

Page 11: Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001

11

Linguistic summarization

• Semantically rich words

– killing two birds with one stone

Karl Malone scored 39 points. +

Karl Malone's 39 point performance is equal to his season high.

becomes

Karl Malone tied his season high with 39 points.

Page 12: Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001

12

Linguistic summarization

• Modifiers of nouns or verbs

– one fact specifies a verb or a noun in another fact

Jay Humphries scored 24 points. He came in as a reserve.

becomes

Reserve Jay Humphries scored 24 points.

Page 13: Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001

13

Linguistic summarization

• Conjunction

– joining facts with "and" or "or"

Mick Reynes scored 265 points last season and

Jack Jones scored 265 points last season.

• Ellipsis

– removing repetition

Mick Reynes and Jack Jones scored 265 points last season.

Page 14: Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001

14

Linguistic summarization

• Abridged references

– using shorter names for already introduced things

San Antonio Spurs took a 127-111 victory over Denver Nuggets and handed Denver their seventh straight loss.

Page 15: Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001

15

Linguistic summarization

• Abstraction

– replacing a series of events with a single event

mission start, movements, firing, damages, mission abort =>

failed mission

Page 16: Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001

16

Linguistic summarization

• Aggregation

– connecting events with spatial or temporal adverbials

Site-A and Site-B simultaneously fired a missile.

• Presentational techniques

– using spatial or temporal adverbs

Site-A fired a missile at 1302. Three minutes later Site-B fired a missile.

Page 17: Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001

17

Linguistic summarization

• Revision: approach 1

– First create a draft summary from important facts

– Then enrich the draft with potentially important facts

• Revision: approach 2

– Generate the draft by collecting similar facts into each sentence

– Compress the sentences with ellipsis etc.

Page 18: Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001

18

Example system: Plandoc

• Application developed by K. McKeown, J.Robin and K.Kukich at Columbia University, New York and Bell Communication Research (1995)

• Problem– a telephone company engineer plans how a

telephone route should be developed in the next 20 years

– the engineer uses PLAN planning system software

– Goal: a documentation of the planning process

Page 19: Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001

19

Plandoc: input and output

• Input: a trace of user's actions with the PLAN system1. RUNID fiberall FIBER

6/19/93 act yes

2. FA 1301 2 1995

3. FA 1201 2 1995

4. FA 1501 3 1995

5. ANF 1201 1301 2 1995 24

END. 856.0 670.2

Page 20: Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001

20

Plandoc: input and output

• Output: a 1-2 page report

– the initial plan PLAN proposed

– refinements the engineer made

– alternative refinements the engineer tried but rejected

– the final plan

• Purpose: documentation

Page 21: Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001

21

Plandoc: conceptual summarization

• Important facts

– accepted parts of the initial plan + accepted refinements to it

= the final plan

– rejected refinements?

• the engineer decides

Page 22: Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001

22

Plandoc: overview of the method

• Fact generator converts the input to an internal representation

– facts presented as feature structures (attribute/value pairs)

• Ontologizer enriches the facts with e.g. price information

• Discourse planner groups the facts

• A lexicalizer/sentence generator converts the groups into English

Page 23: Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001

23

Plandoc: processing the input

Example: FA 1301 2 1995

Enriched feature structure:class: refinement

ref-type: fiber

action: activation

csa-site: 1301

date: year: 1995, quarter: 2

price: $56.00K

Page 24: Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001

24

Plandoc: grouping facts into sentences

• Let's construct a sentence from the FA facts:FA 1301 2 1995

FA 1201 2 1995

FA 1501 3 1995

1. Group facts by common action– action = activation for all– one sentence is neededFA 1301 2 1995

FA 1201 2 1995

FA 1501 3 1995

Page 25: Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001

25

Plandoc: grouping facts into sentences

2. For each common-action group (sentence):

(a) Collapse groups which differ by one feature into a single group

–two groups:FA 1301, 1201 2 1995

FA 1501 3 1995

Page 26: Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001

26

Plandoc: grouping facts into sentences

(b) If more than one group remains (sentence is broken into clauses by conjunction):

i. Find the feature that is shared across most groups (but has not the same value for all)FA 1301, 1201 2 1995

FA 1501 3 1995

• only the date feature is left and it has two values => two clauses are needed

Page 27: Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001

27

Plandoc: grouping facts into sentences

ii. Sort the groups to subgroups by the most common shared feature (nested conjunction inside the clause)

– each group has only one memberFA 1301, 1201 2 1995

FA 1501 3 1995

Page 28: Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001

28

Plandoc: grouping facts into sentences

iii. Repeat the selection of most common shared feature and sorting to subgroups until all have been sorted

– no subgroups left

iv. Sort the clauses by dateFA 1301, 1201 2 1995

FA 1501 3 1995

Page 29: Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001

29

Plandoc: grouping facts into sentences

FA 1301, 1201 2 1995

FA 1501 3 1995

• The produced sentence:This refinement activated fiber for CSAs 1301 and 1201 in 1995 Q2 and this refinement activated fiber for CSA 1501 in 1995 Q3.

• The final sentence after ellipsis:This refinement activated fiber for CSAs 1301 and 1201 in 1995 Q2 and for CSA 1501 in 1995 Q3.

Page 30: Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001

30

Plandoc: grouping facts into sentences

• ReadibilityThis refinement extended fiber from fiber hub 8107 to CSAs 8128,8126, 8121 and 8113 and from fiber hub 8120 to the CO in 1994 Q1 and from the CO to CSA 8120 in 1994 Q3, with the active fibers placed on the primary path.

– limit the number of facts conjoined

– limit the number of embedded conjunctions inside a clause

Page 31: Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001

31

Summary

• Also other sources than text can be summarized

• Problems:

– choosing the important elements

– generating a compact and readable summary text

– domain-dependency

Page 32: Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001

32

Summary

• Applications:

– automatic weather reports (not predictions!)

– simulation reports

– patient monitoring system summaries

– etc