summarizing consensus in decision-making threads stephen wan kathy mckeown
Post on 15-Jan-2016
218 views
TRANSCRIPT
Summarizing Consensus in Decision-Making Threads
Stephen Wan
Kathy McKeown
Presentation Outline
• The Problem
• Related work
• Description of Data
• Architecture
• Issue Detection
• Evaluations
• Conclusions and Future Work
The Problem
• Email Thread Summarization:– I want a summary that allows me to participate in
the discussions quickly
versus
• Single-document sentence extraction summarization– In contrast, conventional summaries can only give
you the gist of the thread
The Problem
Sentence Extraction Summary:– Here ' s the plaque info .– http://www.affordableawards.com/plaques/ordecon.htm
– I like the plaque , and aside for exchanging Dana ' s name for " Sally Slater " and ACM for " Ladies Auxilliary " , the wording is nice .
– We just need to contact the plaque folks and ask what format they need for the logo .
(Summ-it Summarisation Applet Version 1.1)
The Problem
A: [ISSUE] Let me know if you agree or disagree w/choice of plaque and (especially) wording.
B: [RESP] I like the plaque, and aside for exchanging Dana's name for "Sally Slater" and ACM for "Ladies Auxilliary", the wording is nice.
C: [RESP]I prefer Christy's wording to the plaque original.
The Problem
Research questions:• Identifying the focus of the discussion: Issue• Identifying the responses to the issue• Determining consensus
– What is agreement-disagreement?• Binary like a vote? • Discrete scale?• Continuous?
The ProblemAssumptions:
– Threads classified as decision-making threads– No topic shifts in the thread– The issue is presented in the first email
• Ignoring responses within first email
– Two classes: agreement vs disagreement
Goal:– Create a tool for extracting sentences to be use given to a
thread summarizer• rewrites extracted sentences into a nice coherent summary • aggregates agreement-disagreement?
The Problem
Our approach:– Use the structure of the thread
• Assumption: email correspondents have already done some of the work of manually filtering out the important issues
– Use machine-learning to identify and classify agreement-disagreement response
The Problem
First Email:Subject: What Can I Do?Sent 1: ... willing to help out ... Sent 2: Going to go put a few more poster...Sent 3: Why'd they have to pick today to clear off the
bulletin boards!!! Sent 4: I'm free to help with anything... Sent 5: ... fire me back an email ...Sent 6: Later,Sent 7:[ISSUE] P.S. What's the status on bringing a
book to get Stroustrup to sign? Sent 8: Good idea/bad idea? Sent 9: Proper/impolite? Sent10: Thx.
The Problem
A: [RESPONSE-Y]That's probably fine. ... he loves that kind of stuff.
B: [RESPONSE-Y]We can use it as a prize ...
C: Can we get our hands on 5 fresh copies of his C++ book???
[RESPONSE-Y]This definitely sounds like a cool idea to me!!!
D: [RESPONSE-M]Not sure how overboard we should go with this.
i did bring my own C++ books for him to sign, but I don't want to present him with a towering stack.
Related Work
Email Summarization:– Sentence extraction using parent email for
extra context• Lam, Rohall, Schmandt, Stern (2000)
– Sentence extraction with Question-Answer pairs
• Murakoshi, Shimazu, Ochimizu (1999)
– Thread topic clustering and summarization• Newman and Blitzer 2003
Related Work
Internal work– Identifying opinions and their polarity
• Yu and Hatzivassiloglou 2003
– Summarizing consensus in meeting transcripts• Michel Galley
– KDD Thread Summarization• Sentence extraction, and QA
– Lokesh Shrestha and Owen Rambow
• Parsing Email Sentences– Aaron Harnly
• Classifyinig thread types– Andrew and Julia Hirshberg
– Extracting NP’s from email messages for gists• Muresan, Tzoukermann and Klavans 2001
Related Work
Spoken Dialogue Summarization with Similarities to Email:– Detection of agreement in meeting
transcripts• Hillard, Ostendorf and Shriberg 2003
– Sentence Extraction and QA pairs• Zechner 2002
Description of Data
ACM corpus:– Archive of the Columbia University ACM student
chapter committee email mailing list– Participants: ~ 5 - 10 Computer Science students– Language:
• English• Hybrid of formal-informal language use
– Both grammar and spelling
– Data: ~ 300 threads– Status:
• Current manual summarization effort • ~ 50 threads summarized
Description of Data
• Manual tagging of:– Issues, Miscellaneous Focus, Requests, Responses
(Responses2)– Agreement or disagreement of the responses to
issues
• Stats:– ISSUES: 88 (of 51 threads)– MISC. FOCI: 101– REQUESTS: 42– RESPONSES: 339– ( RESPONSES2; 51 )
Description of Data:Qualitative observations
1. Politeness of disagreements[RESPONSE-N]I personally think that both the image & announcement look better centered than if they're placed next to one another.
– Offering alternative suggestionsA: [ISSUE] We put the evening of the 24th
for Davis on hold at the Dean's office, so let us know if you would like us to reserve it.
B: [RESPONSE-N] OK, there's a big lecture hall in Pupin that we used for Stallman.
C: [RESPONSE-N]try 501Scherm also.
Description of Data:Qualitative observations
2. Asynchronicity of email medium:Much more context repeated than in speech
Switchboard Example:sd B.39 utt5: it is hard to get my attitude
[ to, + {F uh, } to ] get myself up there. / aa A.40 utt1: That is very true. /
...+ B.49 utt1: -- less chance -- + B.51 utt1: -- of hurting yourself. / aa A.52 utt1: -- yeah, /
Description of Data:Qualitative observations
2. Asynchronicity of email medium:
Email Thread:DisagreementA: [ISSUE] Why don't we make an acm-recruit alias, and
make it an opt-IN list for members, and we can then forward such crap.
B: [RESPONSE] I don't think we should allow companies/recruiters to contact members directly. Since we get a lot of our money ...
AgreementA: [RESPONSE-Y]Pizza for the Doubleclick event sounds
good.
Description of Data:Qualitative observations
3. Variety of stock phrases indicating agreement or disagreement
– manually found phrases might lead to brittle code
Disagreement:
A: [ISSUE] Why don't we make an acm-recruit alias, and make it an opt-IN list for members, and we can then forward such crap.
B: [RESPONSE] I don't think we should allow companies/recruiters to contact members directly.
Description of Data:Qualitative observations
Agreement:A: [ISSUE] What do you guys imagine might happen
tomorrow night if we were to start playing Galaxy Quest from Eugene's DVD...
B: [RESPONSE] That sounds great! C: [RESPONSE] Excellent idea - considering the fact
that people would have already paid for the ticket, they are not likely to just stand up and leave after the first minute :-)
D: [RESPONSE] Awesome! :) We should have no problems running it on DVD so long as we can get it to the projection booth by 8:15 to 8:30ish (or by whatever time Eugene decides to show up).
E: [RESPONSE] Excellent idea! F: [RESPONSE] GREAT idea.
Description of Data:Qualitative observations
Agreement:
A: [ISSUE]Do we want to move the list to mailman?
B: [RESPONSE]Hmmm. Does mailman allow for "multiple owners"?
[RESPONSE]To me, that would be the one win.
C:[ESPONSE]It's actually not bad software.
Description of Data:Qualitative observations
4. Common functional words / syntactic patterns– Negation terms– Adverbs
[RESPONSE-N]try 501Scherm also.[RESPONSE-Y]It's actually not bad software.
– Conditionals[RESPONSE-N]I would like to, if I didn't
already promise my friend– Auxiliary verbs
• for question forms• Strength of opinion[RESPONSE]Color me biased, but I think we may
want to ...
Interlude: Questions
Questions about data?
Architecture:Pipeline
~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~
Isolate issue
~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~
Detect responses
~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~
Classify agreement
~~~~~~~~~~~~~~~~~~~~
Extract sentences
Architecture: Design Factors
Dependencies: • Jama: linear algebra package• Headliner package: vocabulary manager and
tools• KDD code for preprocessing ACM email
corpus
Module design:– API’s for data, machine learning code
Issue Detection
Sentence Extraction task:• Based on replies
Cosine similarity of each sentence in first email to a vector that represents the replies
1. Centroid of replies
2. SVD key sentence of replies
3. SVD Centroid of replies
• Based on whole thread4. SVD key sentence of the whole thread
Issue Detection:1. Centroid of Replies
Find the sentence which has greatest cosine similarity to centroid vector representing the replies
~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~
Replies
IssueEmail
Matrixof
Sentencesby
Terms
0 t0
s
Centroid0 t
Sent 10 t
…
Sent n
Issue0 t
Interpretation of Singular Value Decomposition
=A
0 s0
t
0 r0
rU
0 r0
t
Vtr
0 s0
r
2
r
Represents input text
Relates words
to “concepts”
Describes“concepts”
Relates sentences
to “concepts”
astronaut
1
cosmonaut
Issue Detection: 2. SVD Replies Key Sentence
~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~
Replies
IssueEmail
A
0 s0
t
Issue Detection: 2. SVD Replies Key Sentence
Key sentence as determined by SVD analysis
=A
0 s0
t
0 r0
rU
0 r0
t
Vtr
0 s0
r
2
r
astronaut
1
cosmonaut
Issue Detection: 2. SVD Replies Key Sentence
Key sentence as determined by SVD analysis
Vtr
0 s0
r
First Email
Sentences
0 s0
t
Utranspose
0 t0
r
Key Sent.0 r
Sent 10
r
…
Sent n
Issue0
r
Issue Detection: 3. SVD Centroid of Replies
Centroid of the replies in reduced dimensions
V
0 r
~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~
Replies
IssueEmail
Matrixof
Termsby
Sentences
0 t0
s
Centroid0 r
Sent 10
r
…
Sent n
Issue0
r
SVD
0
s
Issue Detection: 4. SVD Replies Key Sentence
Key sentence as determined by SVD analysis
~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~
Replies
IssueEmail
SVD Vtr
0 s0
rA
0 s0
t
Evaluations: Issue Detection
Compare:
• Test Corpus:– Gold Standard: Manually tagged issues
(~50)
• Extracted Issues:– 1 issue per thread extracted by each of the
4 techniques.
Evaluations: Issue Detection
Correct detection for single answer:
• Centroid of replies ~ 43.2%
• SVD centroid of replies ~ 35.3%
• SVD key sentence from replies ~35.3%
• SVD key sentence from thread ~ 37.3%
• Imaginary Oracle ~72.5%
Discussion of Results
• When will SVD vs. Centroid work?– Could certain topics (eg. Time
organization) be associated with presence of redundant information?
– Maybe insufficient redundancy in the text?• E.g. The degree of that manual context being
added, isolated words versus phrases• Maybe use corpus specific stopword list?
Future Work
• Try a machine learning approach to learn this oracle for issue detection
• Machine Learning for Response Identification
• Response Classification (agreement-disagreement)
Conclusion
• Email thread summarization
• Issue Detection using thread structure
• Singular Value Decomposition and Centroid methods as techniques for sentence extraction
Questions?
Future Work
• Bag-of-words isn’t enough– Try using Hong’s list of positive and
negative words to response classification feature set
– More sophisticated language modelling techniques, like using perplexity as a notion of similarity
Architecture and Algorithms:Response Identification
• Aim is to use a Machine Learning approach
• Currently using the same code for issue detection to extract responses – based on similarity
• The similarity measure will be a feature for machine learning.
Architecture and Algorithms:Response Classification
• Machine learning to classify responses into 2 classes: Agreement versus Disagreement
Description of Data:Switchboard Corpus
• Explain why we’re looking at this.
• Collected over phoneline
• Manually transcribed
• Manually tagged for dialog acts– Based on DAMSL
Description of Data:Switchboard Agreement
• 5 point agreement scale (#total utterances)– Agreement (10,159, 5%)– Agreement in part (916)– Maybe (69)– Disagreement in part – Disagreement (303, 0.2%)
Description of Data:Switchboard Polar responses
• Yes answers (3032, 1%)
• No answers (1078, 1%)
• Positive non-yes answers (850, 0.4%)
• Negative non-no answers (299, 0.1%)
The Problem
A:[ISSUE] Perhaps we should schedule it next week or the last week of August.
B:[RESP-N] i won't be around next week. C:[RESP-Y] I'm around until middle of next week (23rd or so), and then back the week after.
D:[RESP-N] I won't be back on campus till Sept. 3...
E:[RESP-N] Hi all, I'll be on campus Sept. 1 or so.