Download - Computational Extraction of Social and Interactional Meaning from Speech Dan Jurafsky and Mari Ostendorf Lecture 5: Agreement, Citation, Propositional

Computational Extraction of Social and Interactional Meaning

from Speech

Dan Jurafsky and Mari Ostendorf

Lecture 5: Agreement, Citation, Propositional Attitude

Mari Ostendorf

Agreement, Citation, Propositional Attitude

Agreement vs. disagreement with propositions (and people)

How to make friends & influence people…Tool for affiliation, indicator of influenceTool for distancing, indicator of factions or rifts in groups

Important component of group problem solving

Speech Examples Revisited

A: This’s probably what the LDC uses. I mean they do a lot of transcription at the LDC. B: OK. A: I could ask my contacts at the LDC what it is they actually use.B: Oh! Good idea, great idea.

A: After all these things, he raises hundreds of millions of dollars. I mean uh the fella B: but he never stops talking about it. A: but okB: Aren’t you supposed to y- I mean A: well that’s a little- the Lord saysB: Does charity mean something if you’re constantly using it as a cudgel to beat your enemies over the- I’m better than you. I give money to charity.A: Well look, now I…

Subgroups Example: Wikipedia Talk Page

By including the "Haditha Massacre" in the Human Rights Abuse section, we are effectively convicting the Marines that are currently on trial. I think we need to wait until the trial is over. – UnregisteredUser1

Disagree. All I see is the listing "Haditha killings (Under investigation)." Is the word Massacre used? If not, I believe it should be because this word fits every version of the story presented in the public, including Time, the US Marines, and the Iraqi Government. – RegisteredUser1

I agree with RegisteredUser1, this is about (current) history, not law. Just because something hasn't been decided by a court doesn't mean it didn't happen. It should be enough in the article to just mention that the marines charged/suspected of the massacre have not yet been convicted. –RegisteredUser2

I disagree, you cannot call it a human rights violation if it’s not stated what happened there. Also your statement "have not yet been convicted" is kind of the thing we are attempting to avoid. Without guilt or a better understanding of the situation I think it’s premature to put it in the human rights violation section. – RegisteredUser3

Actually, as long as NPOV, WP:Verifiability are maintained you can call it a human rights violation even if it is untrue. As Wikipedia says "As counterintuitive as it may seem, the threshold for inclusion in Wikipedia is verifiability, not truth." Like it or not, as long as there are reputable sources calling it a massacre and/or a human rights violation then it can be included in the article. —RegisteredUser4

Calling it a human rights violation in itself is POV. I also do not think anyone would appreciate you attempting to manipulate wiki policy for the sake of adding POV into an article. – RegisteredUser3

Influencing ExampleThere is a guideline that we shouldn't semi-protect articles linked from front page, so as to allow new editors a chance to edit articles they are most likely to read. But in this case all we are doing is enabling a swarm of socks. Semi-protection is definitely needed in this instance, with an apology should a new, well-intentioned editor actually show up amidst the swarm and be prevented from editing. Semi-protect this sucker, or we'll never determine the appropriate course of action for this article. RegUser2

Even though semi-protection is defidentally good for what is nominally "my" side … it's against policy and not appropriate. Please take it off. RegUser3

Is is absolutely not against policy. Wikipedia:Protection policy is very clear: … For this article at this time, it's necessary. That's in perfect compliance with policy. RegUser2

Removing the image without discussion is aggressively bad editing (which I am often guilty of). It's not vandalism. sprotect is only for vandalism. RegUser3

Repeated violations of 3RR and using sockpuppets, together with admitting that the purpose of removing the image is to curry favour with one's god and not to improve Wikipedia, doesn't so much cross the line from bad editing to vandalism as pole vault it. – RegUser4

Ok, my WP:AGF is falling. I still think sprotect is agressive, but not as badly as I did before. RegUser3

Influenced participant: alignment change

Online Political Discussion Forum

Q: Gavin Newsom- I expected more from him when I supported him in the 2003 election. He showed himself as a family-man/Catholic, but he ended up being the exact oppisate, supporting abortion, and giving homosexuals marriage licenses. I love San Francisco, but I hate the people. Sometimes, the people make me want to move to Sacramento or DC to fix things up.

R: And what is wrong with giving homosexuals the right to settle down with the person they love? What is it to you if a few limp-wrists get married in San Francisco? Homosexuals are people, too, who take out their garbage, pay their taxes, go to work, take care of their dogs, and what they do in their bedroom is none of your business.

Citations (from Teufel et al., 2006)Following Pereira et al. ‘93, we measure word

similarity by the relative entropy or Kulbach-Leibler (KL) distance, between the corresponding conditional distributions.

His [Hindle’s] notion of similarity seems to agree with our intuitions in many cases, but it is not clear how it can be used directly to construct word classes and corresponding models of association.

OverviewCommon threadsExamples:

Agreements & disagreements in meetingsAgreements & disagreements in online discussionsCitation function

More common threads

(Plus examples from unpublished UW studies on Wikipedia discussions.)



More common threads

Common Threads

Sentiment detection (sort of)Discussions: agreement/disagreement/neutralCitations: positive/negative/neutral (opt. contrast)

Most studies detect person/paper as target, not the proposition per se

ChallengesCultural bias & infrequent negativesBag of words is not enoughIdentifying person/paper target of agreement (context

can extend beyond the “sentiment” sentence)Computational modeling

Challenge: Cultural BiasEnglish meetings: many more agreements than

disagreementsMandarin wiki dicussions: fewer explicit

disagreements than in EnglishCitations: several studies find that negative citations

are rare (presumably because they are politically dangerous)

People use positive words to soften the blow:“right but….”, “yeah” with negative intonation

Challenge: Polarity Words in BOW

Need to account for negation“agree” vs. “don’t agree”, “absolutely” vs. “absolutely not”BUT fewer than half the positive words in negative turns are

lexically negated Some part-of-speech issues, e.g. “well”People include positive words to soften the blow

dissenting turns have more positive words than negative“right” occurs 75 times in dissenting turns, 162 times in

neutral turns & only 33 times in supporting turns

Polarity Word Trickiness (cont.)

Positive negatives“yeah larry i i want to correct something randi said of

course” “right but but you you can't say that punching him in the

back of the head is justified”Negative positives

“Steph- vent away – that sucks –”“no you stick with what you're doing”

Challenge: Identifying the Target

Baseline: The target is the most recent speaker:67% accurate for Wiki discussions80% accurate for meetings

Adding names doesn’t help much (70% accurate for Wiki discussions)

Target can be more than one personIn political discussion forum (Abbott et al. 11), 82% of

posts with quotes have quotes that can be linked to previous post

Citation information often not in the same sentence as the citation (Teufel et al. 06).

Chat: complication of asynchrony

PubCoord Are we agreed on about 60 for soda?Acct yeah, only ourselves are set apart, I thinkSecty They can't take a bottle.Secty Okay, I agree on 60 for sodaPubCoord VotePubCoord agreedProjMgr Yeah, agreeSecty How much does ice cost?PubCoord 2.50 per packAcct how about 50, because project manager won't drink that much sodaPubCoord probablyPubCoord What is he a camel?Acct and some folks won't drink any?Secty lolAcct no, some people dont like flavor, carbonationProjMgr Shut up! Soda can be harshAcct or, OMG caloriesSecty please stay on topicAcct yeah, i don’t like the carbonationPubCoord Alright, I've identified two of youProjMgr I was just going to say that...Acct me too!Secty so was that $50 for ice?Acct actually, I guess I know who everyone is thenPubCoord What?

Acct no, 50 for popSecty ohPubCoord No, 50 for soda is fine I guessSecty please vote between 50 or 60PubCoord I think maybe 10 for iceProjMgr Yeah :/Acct and someone already volunteered

their cooler?PubCoord YessirSecty *please vote between 50 or 60 for sodaSecty I vote 60PubCoord 60ProjMgr 50Acct i vote 50ProjMgr TIE!PubCoord then?Secty 50 it isAcct g d itAcct yeah, 55Secty okay, 55Secty so how much is left, accountant?

?

?

Computational Modeling -- Review

Standard text classification problemExtract feature vector apply model score classesChoose class with best score

Popular modelsNaïve BayesDecision trees/forests vs. boostexter/icsiboostMaximum entropySVMsK-nearest neighbor (lazy learning or memory-based)

Feature selection or regularizationEvaluation:

Classification accuracy or Macro F (mean of F measures)

New since Lec 5

Feature Extraction – Noise Issue

Both speech and text have “noise” challengesSpeech: speech recognition errors (especially when

there is overlapping speech)Online discussions: typos and funny spellings

defidentally goodthe exact oppisate

Not a big issue for edited text (e.g. most articles that would have citations)

Challenge: Skewed PriorsLarge percentage of sentences are neutral, standard

training algorithms emphasize the frequent classesSome solutions:

Use development set to tune detection thresholdsRandom sampling using biased priors and bagging

(classifier combination)



More common threads

Detecting (Dis)Agreements in Meetings

Adjacency pair speaker detection (given B, find A)Target detection for agreements & disagreementsAlso includes question/answer, offer/acceptance, etc.

Classify B as agreement/disagreement/other(Backchannels modeled separately, but including in “other

for scoring.)

A: I could ask my contacts at the LDC what it is they actually use.B: Oh! Good idea, great idea.

Galley et al. 2004

Meeting DataICSI Meeting corpus

75 1-hour meetings, average of 6.5 participants/meetingHand transcribed, audio automatically time alignedHand labeled for adjacency pairs7 meetings pause-segmented into “spurts”Class distribution:

Agree: 12%Disagree: 7%Other: 81%

Adjacency Pair – Speaker Ranking

Features (B given, A is candidate target)Structural: +/- overlap, # of speakers/spurts between A

& B, etcDuration: duration of overlap, duration of A, time

between A & B, overlap with others, speaking rateLexical: word counts, counts of shared words, cue word

indicators, name indicator, …Dialog acts (oracle)

Feature selection: incrementalClassifier: Maximum entropy

Adjacency Pair Results

Only small gain from oracle DA information: 91.3%

Agreement/Disagreement ClassifierFeatures

Structural: previous next spurt same/diffDuration: spurt, silence & overlap duration, speech rateLexical: similar to adjacency pairs, plus polarity word

countsLabel dependency: contextual tags (a speaker is likely to

disagree with someone who disagrees with them)Classifier

Conditional Markov model (Max Entropy Markov Model)

Agreement/Disagreement Results



More common threads

Detecting (Dis)Agreement in Online Discussions

Abbott et al., 2011

Task: label R in a Q-R (quote-response) pair as agreement/disagreement.

ARGUE Data110k forum posts (11k discussion threads, 2764

authors) from website 4forums.comForums include: evolution, gun control, abortion, gay

marriage, healthcare, death penalty, …Annotations by Mechanical Turkers with [-5,5] scale

Disagree-agree (Krippendorff’s = 0.62)Other annotations had < 0.5: attach, fact/emotion,

sarcasm, nice/nasty8k “good” Q-R pairs annotated sample & use (-1,1)

threshold gives 682 pairs for testingClass distribution: resampled to be balanced

(Dis)Agree Classifier

FeaturesMetaPost: author info, time between posts, # other quotesUnigram & Bigram counts, initial unigram/bigram/trigramRepeated punctuation (collapsed to ??,!!, ?!)LIWC measuresParse dependencies <relation,wi,wj>, POS-polarity opinion

dependenciesTf-idf cosine distance to previous post

Classifier: Naïve Bayes & JRip (WEKA toolkit)Chi-squared feature selection, plus feature selection

implicit in JRip (rule learner)

Sample (Dis)Agree Classifier

(Dis)Agree Classification Results

• JRip beats NB• JRip Accuracy: Local features: 68% Othe annotations: 81%

Caveat: optimistic, since neutral cases are removed.



More common threads

Classification of Citation Function

Teufel et al., 2006Agreement, usage,

compatibility (6)Weakness (4)Contrastneutral

Citation Study Data26 articles w/ 548 citationsKappa = 0.72 for 12 categoriesClass distribution: >67% neutral + neutral contrast, 4%

negative, 19% usage

Citation Classifier

FeaturesGrammar of 1762 cue phrases, e.g. “as far as we are

aware” from other work + 892 from this corpus185 POS patterns for recognizing agents (self-cites vs.

others) w/ 20 manually acquired verb clustersVerb tense, voice, modalitySentence location in paragraph & section

Classifier: K-nearest neighbor (WEKA toolkit)

Citation Classification Results

K=0.75 for humans for these categories



More common threads

Collected Observations re FeaturesPhrase patterns and location-based n-grams are usefulStructural features are useful

Location of turn relative to other authors/speakersLocation of sentence in turn & document

Broader context (beyond target sentence) is usefulSequential patterns of disagreementEmotion context

Simple cosine similarity is not so usefulProsodic features not being taken advantage of

More ChallengesExplicit agreement & disagreement do not capture all the

phenomena associated with alignment & distancingImplicit (dis)agreement via stating an opposite opinion

A: The video is still an allegationB: The video is hard evidence or rhetorical question

… or a rhetorical questionA: Such a topic is far more broad than the current article but should certainly

contain a link back to this one. B: How is the [[Iraq invasion controversy]] suggestion more broad?

Support vs. attackWell, you have proven yoruself [sic] to be a man with no brainSteph- vent away – that sucks

These phenomena are hard for human annotators to more consistently (exception: citation labels?)

Different studies may group or distinguish them

The victims were teenagers, not children. Furthermore, the teenagers were throwing rocks and makeshift grenades at the soldiers. Second, the video is still an allegation. We should wait until the investigation is completed before putting it up. – RegisteredUser1

The video is hard evidence. If this was 1945, you'd be telling us not to include any footage of the Nazi concentration camps until the Germans had concluded that they committed war crimes. As for your suggestions that those children *deserved* what happened because they allegedly throw rocks at soldiers carrying assault rifles, I find that as offensive as suggesting that America deserved the 9/11 attack because of its foreign policies. – AnonymousUser1 THEY WEREN'T CHILDREN! The article makes NO mention of children whatsoever. So before you all let your emotions run wild over this: a) they weren't children b) they had hand grenades. – RegisteredUser1 YES THEY WERE CHILDREN! Watch the video. The soldiers are clearly acting in hatred and blood-lust, not self-defense. Defending them is like defending a child molester or serial murderer. The video SHOWS children being assaulted. – AnonymousUser2 A 14 year old is definitely a child. There's a reason we don't let 14 year-olds drink, vote, drive, "consent" to sex with adults, or sign legal agreements without a guardian. – RegisteredUser2 At 14 you are definitely a teenager, not a child. 14 year olds can throw a grenade and shoot a rifle, and know the consequences of their actions. Furthermore 18 isn't the age of majority in Iraq so far as I know. In much of the world the drinking and driving ages are 14 and 16. The world is not centered upon our American beliefs, and it's high time that we started accepting that in ALL situations, not just the ones we deem acceptable. I'm absolutely sickened by the brainwashed vehemence and anti-US hatred expressed by so many so called "liberals" on Wikipedia. - RegisteredUser1

In the English language the word adult is generally not used for people under the age of 18. If you want to use it differently you need to explain it in the article in order not to be misleading. Please calm down and do not personally attack others as "brainwashed" or spreading "hatred". – RegisteredUser4

Example Wikipedia Talk Page

SummaryWhy look for (dis)agreement, support, etc?

Dissecting discussions for influence, subgroups, affiliation, successful problem solving, etc

Understanding citation impactThese tasks are very related to sentiment detection,

except that the target is often part of the problemDifferent ways of handling agreement vs. supportThe neutral class is huge – don’t ignore itComputational advice:

Many better alternatives to Naïve BayesConsider features beyond n-grams

Download - Computational Extraction of Social and Interactional Meaning from Speech Dan Jurafsky and Mari Ostendorf Lecture 5: Agreement, Citation, Propositional

Top Related