summarizing consensus in decision-making threads stephen wan kathy mckeown

46
Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Post on 15-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Summarizing Consensus in Decision-Making Threads

Stephen Wan

Kathy McKeown

Page 2: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Presentation Outline

• The Problem

• Related work

• Description of Data 

• Architecture

• Issue Detection

• Evaluations

• Conclusions and Future Work

Page 3: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

The Problem

• Email Thread Summarization:– I want a summary that allows me to participate in

the discussions quickly

versus

• Single-document sentence extraction summarization– In contrast, conventional summaries can only give

you the gist of the thread

Page 4: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

The Problem

Sentence Extraction Summary:– Here ' s the plaque info .– http://www.affordableawards.com/plaques/ordecon.htm

– I like the plaque , and aside for exchanging Dana ' s name for " Sally Slater " and ACM for " Ladies Auxilliary " , the wording is nice .

– We just need to contact the plaque folks and ask what format they need for the logo .

(Summ-it Summarisation Applet Version 1.1)

Page 5: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

The Problem

A: [ISSUE] Let me know if you agree or disagree w/choice of plaque and (especially) wording.

B: [RESP] I like the plaque, and aside for exchanging Dana's name for "Sally Slater" and ACM for "Ladies Auxilliary", the wording is nice.

C: [RESP]I prefer Christy's wording to the plaque original.

Page 6: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

The Problem

Research questions:• Identifying the focus of the discussion: Issue• Identifying the responses to the issue• Determining consensus

– What is agreement-disagreement?• Binary like a vote? • Discrete scale?• Continuous?

Page 7: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

The ProblemAssumptions:

– Threads classified as decision-making threads– No topic shifts in the thread– The issue is presented in the first email

• Ignoring responses within first email

– Two classes: agreement vs disagreement

Goal:– Create a tool for extracting sentences to be use given to a

thread summarizer• rewrites extracted sentences into a nice coherent summary • aggregates agreement-disagreement?

Page 8: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

The Problem

Our approach:– Use the structure of the thread

• Assumption: email correspondents have already done some of the work of manually filtering out the important issues

– Use machine-learning to identify and classify agreement-disagreement response

Page 9: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

The Problem

First Email:Subject: What Can I Do?Sent 1: ... willing to help out ... Sent 2: Going to go put a few more poster...Sent 3: Why'd they have to pick today to clear off the

bulletin boards!!! Sent 4: I'm free to help with anything... Sent 5: ... fire me back an email ...Sent 6: Later,Sent 7:[ISSUE] P.S. What's the status on bringing a

book to get Stroustrup to sign? Sent 8: Good idea/bad idea? Sent 9: Proper/impolite? Sent10: Thx.

Page 10: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

The Problem

A: [RESPONSE-Y]That's probably fine. ... he loves that kind of stuff.

B: [RESPONSE-Y]We can use it as a prize ...

C: Can we get our hands on 5 fresh copies of his C++ book???

[RESPONSE-Y]This definitely sounds like a cool idea to me!!!

D: [RESPONSE-M]Not sure how overboard we should go with this.

i did bring my own C++ books for him to sign, but I don't want to present him with a towering stack.

Page 11: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Related Work

Email Summarization:– Sentence extraction using parent email for

extra context• Lam, Rohall, Schmandt, Stern (2000)

– Sentence extraction with Question-Answer pairs

• Murakoshi, Shimazu, Ochimizu (1999)

– Thread topic clustering and summarization• Newman and Blitzer 2003

Page 12: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Related Work

Internal work– Identifying opinions and their polarity

• Yu and Hatzivassiloglou 2003

– Summarizing consensus in meeting transcripts• Michel Galley

– KDD Thread Summarization• Sentence extraction, and QA

– Lokesh Shrestha and Owen Rambow

• Parsing Email Sentences– Aaron Harnly

• Classifyinig thread types– Andrew and Julia Hirshberg

– Extracting NP’s from email messages for gists• Muresan, Tzoukermann and Klavans 2001

Page 13: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Related Work

Spoken Dialogue Summarization with Similarities to Email:– Detection of agreement in meeting

transcripts• Hillard, Ostendorf and Shriberg 2003

– Sentence Extraction and QA pairs• Zechner 2002

Page 14: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Description of Data

ACM corpus:– Archive of the Columbia University ACM student

chapter committee email mailing list– Participants: ~ 5 - 10 Computer Science students– Language:

• English• Hybrid of formal-informal language use

– Both grammar and spelling

– Data: ~ 300 threads– Status:

• Current manual summarization effort • ~ 50 threads summarized

Page 15: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Description of Data

• Manual tagging of:– Issues, Miscellaneous Focus, Requests, Responses

(Responses2)– Agreement or disagreement of the responses to

issues

• Stats:– ISSUES: 88 (of 51 threads)– MISC. FOCI: 101– REQUESTS: 42– RESPONSES: 339– ( RESPONSES2; 51 )

Page 16: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Description of Data:Qualitative observations

1. Politeness of disagreements[RESPONSE-N]I personally think that both the image & announcement look better centered than if they're placed next to one another.

– Offering alternative suggestionsA: [ISSUE] We put the evening of the 24th

for Davis on hold at the Dean's office, so let us know if you would like us to reserve it.

B: [RESPONSE-N] OK, there's a big lecture hall in Pupin that we used for Stallman.

C: [RESPONSE-N]try 501Scherm also.

Page 17: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Description of Data:Qualitative observations

2. Asynchronicity of email medium:Much more context repeated than in speech

Switchboard Example:sd B.39 utt5: it is hard to get my attitude

[ to, + {F uh, } to ] get myself up there. / aa A.40 utt1: That is very true. /

...+ B.49 utt1: -- less chance -- + B.51 utt1: -- of hurting yourself. / aa A.52 utt1: -- yeah, /

Page 18: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Description of Data:Qualitative observations

2. Asynchronicity of email medium:

Email Thread:DisagreementA: [ISSUE] Why don't we make an acm-recruit alias, and

make it an opt-IN list for members, and we can then forward such crap.

B: [RESPONSE] I don't think we should allow companies/recruiters to contact members directly. Since we get a lot of our money ...

AgreementA: [RESPONSE-Y]Pizza for the Doubleclick event sounds

good.

Page 19: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Description of Data:Qualitative observations

3. Variety of stock phrases indicating agreement or disagreement

– manually found phrases might lead to brittle code

Disagreement:

A: [ISSUE] Why don't we make an acm-recruit alias, and make it an opt-IN list for members, and we can then forward such crap.

B: [RESPONSE] I don't think we should allow companies/recruiters to contact members directly.

Page 20: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Description of Data:Qualitative observations

Agreement:A: [ISSUE] What do you guys imagine might happen

tomorrow night if we were to start playing Galaxy Quest from Eugene's DVD...

B: [RESPONSE] That sounds great! C: [RESPONSE] Excellent idea - considering the fact

that people would have already paid for the ticket, they are not likely to just stand up and leave after the first minute :-)

D: [RESPONSE] Awesome! :) We should have no problems running it on DVD so long as we can get it to the projection booth by 8:15 to 8:30ish (or by whatever time Eugene decides to show up).

E: [RESPONSE] Excellent idea! F: [RESPONSE] GREAT idea.

Page 21: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Description of Data:Qualitative observations

Agreement:

A: [ISSUE]Do we want to move the list to mailman?

B: [RESPONSE]Hmmm. Does mailman allow for "multiple owners"?

[RESPONSE]To me, that would be the one win.

C:[ESPONSE]It's actually not bad software.

Page 22: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Description of Data:Qualitative observations

4. Common functional words / syntactic patterns– Negation terms– Adverbs

[RESPONSE-N]try 501Scherm also.[RESPONSE-Y]It's actually not bad software.

– Conditionals[RESPONSE-N]I would like to, if I didn't

already promise my friend– Auxiliary verbs

• for question forms• Strength of opinion[RESPONSE]Color me biased, but I think we may

want to ...

Page 23: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Interlude: Questions

Questions about data?

Page 24: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Architecture:Pipeline

~~~~~~~~~~~~~~~~~~~~~~~~

~~~~~~~~~~~~~~~~~~~~~~~~

~~~~~~~~~~~~~~~~~~~~~~~~

~~~~~~~~~~~~~~~~~~~~~~~

Isolate issue

~~~~~~~~~~~~~~~~~~~~~~~

~~~~~~~~~~~~~~~~~~~~~~~

~~~~~~~~~~~~~~~~~~~~~~~

~~~~~~~~~~~~~~~~~~~~~~~

Detect responses

~~~~~~~~~~~~~~~~~~~~~~~

~~~~~~~~~~~~~~~~~~~~~~~

~~~~~~~~~~~~~~~~~~~~~~~

~~~~~~~~~~~~~~~~~~~~~~~

Classify agreement

~~~~~~~~~~~~~~~~~~~~

Extract sentences

Page 25: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Architecture: Design Factors

Dependencies: • Jama: linear algebra package• Headliner package: vocabulary manager and

tools• KDD code for preprocessing ACM email

corpus

Module design:– API’s for data, machine learning code

Page 26: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Issue Detection

Sentence Extraction task:• Based on replies

Cosine similarity of each sentence in first email to a vector that represents the replies

1. Centroid of replies

2. SVD key sentence of replies

3. SVD Centroid of replies

• Based on whole thread4. SVD key sentence of the whole thread

Page 27: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Issue Detection:1. Centroid of Replies

Find the sentence which has greatest cosine similarity to centroid vector representing the replies

~~~~~~~~~~~~~~~~~~~~~~~~

~~~~~~~~~~~~~~~~~~~~~~~~

~~~~~~~~~~~~~~~~~~~~~~~~

~~~~~~~~~~~~~~~~~~~~~~~

Replies

IssueEmail

Matrixof

Sentencesby

Terms

0 t0

s

Centroid0 t

Sent 10 t

Sent n

Issue0 t

Page 28: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Interpretation of Singular Value Decomposition

=A

0 s0

t

0 r0

rU

0 r0

t

Vtr

0 s0

r

2

r

Represents input text

Relates words

to “concepts”

Describes“concepts”

Relates sentences

to “concepts”

astronaut

1

cosmonaut

Page 29: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Issue Detection: 2. SVD Replies Key Sentence

~~~~~~~~~~~~~~~~~~~~~~~~

~~~~~~~~~~~~~~~~~~~~~~~~

~~~~~~~~~~~~~~~~~~~~~~~~

~~~~~~~~~~~~~~~~~~~~~~~

Replies

IssueEmail

A

0 s0

t

Page 30: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Issue Detection: 2. SVD Replies Key Sentence

Key sentence as determined by SVD analysis

=A

0 s0

t

0 r0

rU

0 r0

t

Vtr

0 s0

r

2

r

astronaut

1

cosmonaut

Page 31: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Issue Detection: 2. SVD Replies Key Sentence

Key sentence as determined by SVD analysis

Vtr

0 s0

r

First Email

Sentences

0 s0

t

Utranspose

0 t0

r

Key Sent.0 r

Sent 10

r

Sent n

Issue0

r

Page 32: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Issue Detection: 3. SVD Centroid of Replies

Centroid of the replies in reduced dimensions

V

0 r

~~~~~~~~~~~~~~~~~~~~~~~~

~~~~~~~~~~~~~~~~~~~~~~~~

~~~~~~~~~~~~~~~~~~~~~~~~

~~~~~~~~~~~~~~~~~~~~~~~

Replies

IssueEmail

Matrixof

Termsby

Sentences

0 t0

s

Centroid0 r

Sent 10

r

Sent n

Issue0

r

SVD

0

s

Page 33: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Issue Detection: 4. SVD Replies Key Sentence

Key sentence as determined by SVD analysis

~~~~~~~~~~~~~~~~~~~~~~~~

~~~~~~~~~~~~~~~~~~~~~~~~

~~~~~~~~~~~~~~~~~~~~~~~~

~~~~~~~~~~~~~~~~~~~~~~~

Replies

IssueEmail

SVD Vtr

0 s0

rA

0 s0

t

Page 34: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Evaluations: Issue Detection

Compare:

• Test Corpus:– Gold Standard: Manually tagged issues

(~50)

• Extracted Issues:– 1 issue per thread extracted by each of the

4 techniques.

Page 35: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Evaluations: Issue Detection

Correct detection for single answer:

• Centroid of replies ~ 43.2%

• SVD centroid of replies ~ 35.3%

• SVD key sentence from replies ~35.3%

• SVD key sentence from thread ~ 37.3%

• Imaginary Oracle ~72.5%

Page 36: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Discussion of Results

• When will SVD vs. Centroid work?– Could certain topics (eg. Time

organization) be associated with presence of redundant information?

– Maybe insufficient redundancy in the text?• E.g. The degree of that manual context being

added, isolated words versus phrases• Maybe use corpus specific stopword list?

Page 37: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Future Work

• Try a machine learning approach to learn this oracle for issue detection

• Machine Learning for Response Identification

• Response Classification (agreement-disagreement)

Page 38: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Conclusion

• Email thread summarization

• Issue Detection using thread structure

• Singular Value Decomposition and Centroid methods as techniques for sentence extraction

Page 39: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Questions?

Page 40: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Future Work

• Bag-of-words isn’t enough– Try using Hong’s list of positive and

negative words to response classification feature set

– More sophisticated language modelling techniques, like using perplexity as a notion of similarity

Page 41: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Architecture and Algorithms:Response Identification

• Aim is to use a Machine Learning approach

• Currently using the same code for issue detection to extract responses – based on similarity

• The similarity measure will be a feature for machine learning.

Page 42: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Architecture and Algorithms:Response Classification

• Machine learning to classify responses into 2 classes: Agreement versus Disagreement

Page 43: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Description of Data:Switchboard Corpus

• Explain why we’re looking at this.

• Collected over phoneline

• Manually transcribed

• Manually tagged for dialog acts– Based on DAMSL

Page 44: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Description of Data:Switchboard Agreement

• 5 point agreement scale (#total utterances)– Agreement (10,159, 5%)– Agreement in part (916)– Maybe (69)– Disagreement in part – Disagreement (303, 0.2%)

Page 45: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

Description of Data:Switchboard Polar responses

• Yes answers (3032, 1%)

• No answers (1078, 1%)

• Positive non-yes answers (850, 0.4%)

• Negative non-no answers (299, 0.1%)

Page 46: Summarizing Consensus in Decision-Making Threads Stephen Wan Kathy McKeown

The Problem

A:[ISSUE] Perhaps we should schedule it next week or the last week of August.

B:[RESP-N] i won't be around next week. C:[RESP-Y] I'm around until middle of next week (23rd or so), and then back the week after.

D:[RESP-N] I won't be back on campus till Sept. 3...

E:[RESP-N] Hi all, I'll be on campus Sept. 1 or so.