1 i256: applied natural language processing marti hearst oct 2, 2006

31
1 I256: Applied Natural Language Processing Marti Hearst Oct 2, 2006

Post on 19-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

1

I256: Applied Natural Language Processing

Marti HearstOct 2, 2006 

 

2From lecture notes by Nachum Dershowitz & Dan Cohen

ContentsContents

Introduction and ApplicationsTypes of summarization tasksBasic paradigmsSingle document summarizationEvaluation methods

3From lecture notes by Nachum Dershowitz & Dan Cohen

Introduction

The problem – Information overload– 4 Billion URLs indexed by Google– 200 TB of data on the Web [Lyman and Varian 03]– Information is created every day in enormous amounts

One solution – summarization Abstracts promote current awareness

save reading timefacilitate selectionfacilitate literature searchesaid in the preparation of reviews

But what is an abstract??

4From lecture notes by Nachum Dershowitz & Dan Cohen

abstract: brief but accurate representation of the contents of a document

goal: take an information source, extract the most important content from it and present it to the user in a condensed form and in a manner sensitive to the user’s needs.

compression:the amount of text to present or the length of the summary to the length of the source.

Introduction

5From lecture notes by Nachum Dershowitz & Dan Cohen

The problem has been addressed since the 50’s[Luhn 58]Numerous methods are currently being suggestedMost methods still rely on 50’s-70’s algorithmsProblem is still hard yet there are some applications:

MS Word, www.newsinessence.com by Drago Radev’s research group

HistoryHistory

6From lecture notes by Nachum Dershowitz & Dan Cohen

7From lecture notes by Nachum Dershowitz & Dan Cohen

MSWord AutoSummarizeMSWord AutoSummarize

8From lecture notes by Nachum Dershowitz & Dan Cohen

ApplicationsApplications

Abstracts for Scientific and other articlesNews summarization (mostly multiple document summarization)Classification of articles and other written dataWeb pages for search enginesWeb access from PDAs, Cell phonesQuestion answering and data gathering

9From lecture notes by Nachum Dershowitz & Dan Cohen

Types of Summaries Types of Summaries

Indicative vs InformativeInformative: a substitute for the entire documentIndicative: give an idea of what is there

BackgroundDoes the reader have the needed prior knowledge?Expert reader vs Novice reader

Query based or GeneralQuery based – a form is being filled, answers should be answeredGeneral – General purpose summarization

10From lecture notes by Nachum Dershowitz & Dan Cohen

Types ofTypes of Summaries (input)Summaries (input)

Single document vs multiple documents

Domain specific (chemistry) or general

Genre specific (newspaper items) of general

11From lecture notes by Nachum Dershowitz & Dan Cohen

Types of Summaries (output)Types of Summaries (output)

extract vs abstractExtracts – representative paragraphs/sentences/ phrases/words, fragments of the original textAbstracts – a concise summary of the central subjects in the document.Research shows that sometimes readers prefer Extracts!

language chosen for summarizationformat of the resulting summary

(table/paragraph/key words)

12From lecture notes by Nachum Dershowitz & Dan Cohen

MethodsMethodsQuantitative heuristics, manually scoredMachine-learning based statistical scoring methodsHigher semantic/syntactic structures

Network (graph) based methodsOther methods (rhetorical analysis, lexical chains, co-reference chains)

AI methods

13From lecture notes by Nachum Dershowitz & Dan Cohen

Quantitative HeuristicsQuantitative Heuristics

General method: score each entity (sentence, word) ; combine scores; choose best sentence(s)

Scoring techniques:Word frequencies throughout the text (Luhn 58)Position in the text (Edmunson 69, Lin&Hovy 97)Title method (Edmunson 69)Cue phrases in sentences (Edmunson 69)

14From lecture notes by Nachum Dershowitz & Dan Cohen

Using Word Frequencies (Luhn 58)

Very first work in automated summarizationAssumptions:

Frequent words indicate the topicFrequent means with reference to the corpus frequencyClusters of frequent words indicate summarizing sentence

Stemming based on similar prefix charactersVery common words and very rare words are ignored

15

Ranked Word Frequency

Zipf’s curve

16From lecture notes by Nachum Dershowitz & Dan Cohen

Find consecutive sequences of high-weight keywords

Allow a certain number of gaps of low-weight terms

Sentences with highest sum of cluster weights are chosen

Word frequencies (Luhn 58)

17From lecture notes by Nachum Dershowitz & Dan Cohen

Claim : Important sentences occur in specific positions“lead-based” summary inverse of position in document works well for the “news”Important information occurs in specific sections of the document (introduction/conclusion)

Position in the text Position in the text (Edmunson 69)(Edmunson 69)Position in the text Position in the text (Edmunson 69)(Edmunson 69)

18From lecture notes by Nachum Dershowitz & Dan Cohen

Claim : title of document indicates its content Unless editors are being cuteNot true for novels usuallyWhat about blogs …?

words in title help find relevant contentcreate a list of title words, remove “stop words”Use those as keywords in order to find important sentences (for example with Luhn’s methods)

Title method Title method (Edmunson 69)(Edmunson 69)Title method Title method (Edmunson 69)(Edmunson 69)

19From lecture notes by Nachum Dershowitz & Dan Cohen

Cue phrases method method (Edmunson 69)(Edmunson 69)Cue phrases method method (Edmunson 69)(Edmunson 69)

Claim : Important sentences contain cue words/indicative phrases

“The main aim of the present paper is to describe…” (IND)“The purpose of this article is to review…” (IND)“In this report, we outline…” (IND)“Our investigation has shown that…” (INF)

Some words are considered bonus others stigmabonus: comparatives, superlatives, conclusive expressions, etc.stigma: negatives, pronouns, etc.

Claim : Important sentences contain cue words/indicative phrases

“The main aim of the present paper is to describe…” (IND)“The purpose of this article is to review…” (IND)“In this report, we outline…” (IND)“Our investigation has shown that…” (INF)

Some words are considered bonus others stigmabonus: comparatives, superlatives, conclusive expressions, etc.stigma: negatives, pronouns, etc.

20From lecture notes by Nachum Dershowitz & Dan Cohen

Linear contribution of 4 features

title, cue, keyword, positionthe weights are adjusted using training data with any minimization technique

Evaluated on a corpus of 200 chemistry articlesLength ranged from 100 to 3900 wordsJudges were told to extract 25% of the sentences, to maximize coherence, minimize redundancy.

Features– Position (sensitive to types of headings for sections)– cue– title– keyword

Best results obtained with:– cue + title + position

)(.)(.)(.)(.)( SPositionSKeywordSCueSTitleSWeight

Feature combination Feature combination ((Edmundson ’69)Edmundson ’69)Feature combination Feature combination ((Edmundson ’69)Edmundson ’69)

21From lecture notes by Nachum Dershowitz & Dan Cohen

Statistical learning methodFeature set

sentence length– |S| > 5

fixed phrases– 26 manually chosen

paragraph– sentence position in

paragraph

thematic words– binary: whether

sentence is included in manual extract

uppercase words– not common acronyms

Corpus– 188 document +

summary pairs from scientific journals

Bayesian Classifier Bayesian Classifier (Kupiec at el 95)(Kupiec at el 95)Bayesian Classifier Bayesian Classifier (Kupiec at el 95)(Kupiec at el 95)

22From lecture notes by Nachum Dershowitz & Dan Cohen

Uses Bayesian classifier:

Assuming statistical independence:

k

j j

k

j j

kFP

SsPSsFPFFFSsP

1

121

)(

)()|(),...,|(

),(

)()|,...,(),...,|(

,...21

2121

k

kk FFFP

SsPSsFFFPFFFSsP

Bayesian Classifier Bayesian Classifier (Kupiec at el 95)(Kupiec at el 95)Bayesian Classifier Bayesian Classifier (Kupiec at el 95)(Kupiec at el 95)

23From lecture notes by Nachum Dershowitz & Dan Cohen

Each Probability is calculated empirically from a corpusHigher probability sentences are chosed to be in the summaryPerformance:

For 25% summaries, 84% precision

Bayesian Classifier Bayesian Classifier (Kupiec at el 95)(Kupiec at el 95)Bayesian Classifier Bayesian Classifier (Kupiec at el 95)(Kupiec at el 95)

24From lecture notes by Nachum Dershowitz & Dan Cohen

When a manual summary is available: 1. choose a granularity (clause; sentence; paragraph),

2. create a similarity measure for that granularity (word overlap; multi-word overlap, perfect match),

3. measure the similarity of each unit in the new to the most similar unit(s)

4. measure Recall and Precision.

Otherwise1. Intrinsic –how good is the summary as a summary?2. Extrinsic – how well does the summary help the user?

Evaluation methodsEvaluation methodsEvaluation methodsEvaluation methods

25From lecture notes by Nachum Dershowitz & Dan Cohen

Intrinsic measures (glass-box): how good is the summary as a summary?

Problem: how do you measure the goodness of a summary?Studies: compare to ideal (Edmundson, 69; Kupiec et al., 95; Salton et al., 97; Marcu, 97) or supply criteria—fluency, informativeness, coverage, etc. (Brandow et al., 95).

Summary evaluated on its own or comparing it with the sourceIs the text cohesive and coherent?Does it contain the main topics of the document? Are important topics omitted?

Intrinsic measuresIntrinsic measuresIntrinsic measuresIntrinsic measures

26From lecture notes by Nachum Dershowitz & Dan Cohen

(Black box): how well does the summary help a user with a task?

Problem: does summary quality correlate with performance?

Studies: GMAT tests (Morris et al., 92); news analysis (Miike et al. 94); IR (Mani and Bloedorn, 97); text categorization (SUMMAC 98; Sundheim, 98).

Evaluation in an specific task Can the summary be used instead of the document?Can the document be classified by reading the summary?Can we answer questions by reading the summary?

Extrinsic measuresExtrinsic measuresExtrinsic measuresExtrinsic measures

27

The Document Understanding Conference (DUC)

This is really the Text Summarization CompetitionStarted in 2001

Task and Evaluation (for 2001-2004):Various target sizes were used (10-400 words)Both single and multiple-document summaries assessedSummaries were manually judged for both content and readability. Each peer (human or automatic) summary was compared against a single model summary

– using SEE (http://www.isi.edu/ cyl/SEE/) – estimates the percentage of information in the model thatwas

covered in the peer. – Also used ROUGE (Lin ’04) in 2004

Recall-Oriented Understudy for Gisting Evaluation Uses counts of n-gram overlap between candidate and

gold-standard summary, assumes fixed-length summaries

28

The Document Understanding Conference (DUC)

Made a big change in 2005Extrinsic evaluation proposed but rejected (write a natural disaster summary)Instead: a complex question-focused summarization task that required summarizers to piece together information from multiple documents to answer a question or set of questions as posed in a DUC topic.Also indicated a desired granularity of information

29

The Document Understanding Conference (DUC)

Evaluation metrics for new task:GrammaticalityNon-redundancyReferential clarityFocusStructure and CoherenceResponsiveness (content-based evaluation)

This was a difficult task to do well in.

30

Let’s make a summarizer!

Each person (or pair) write code for one small part of the problem, using Kupiec et al’s method.We’ll combine the parts in class.

31

Next Time

More on Bayesian classificationOther summarization approaches (Marcu paper)Multi-document summarization (Goldstein et al. paper)In-class summarizer!