diskurs - esslli2016esslli2016.unibz.it/wp-content/uploads/2015/10/diskurs-6-on-1.pdf · • from...
TRANSCRIPT
25.08.16
1
DiscourseStructureinTwi6erConversa:ons
ManfredStede,UnivPotsdamESSLLI2016
Overview
• Twi6erconversa:ons??• Fromspeechactsviadialogactstotweetacts• Coherencerela:onsand„rhetoricalstructure“• Derivingrhetoricalstructureautoma:cally• RhetoricalstructureinTwi6erconversa:ons
ManfredStedeESSLLI2016
Twi6erConversa:ons
• reply-to-func:oncreatesconversa:onsonTwi6er
• ~20-25%oftweetsarereplies• ~40%oftweets=partofconversa:ons• treestructure:
ManfredStedeESSLLI2016
Analysis:Whatdopeopledo?
WTF? I have green energy and am supposed to finance nuclear and coal? What nonsense. WHAT NONSENSE!
wtf? Why and in which way? Oh, and I have coal and nuclear energy and have to finance green power thanks to [new law] just so you can have it cheaper? WTF!
…
Agree. But it’s also a fact that …
right, I’m destroying the climate. Put a windmill in front of your house and we’ll talk. …
…
ManfredStedeESSLLI2016
Speechacts• JohnAus:n:Howtodothingswithwords(1962)
• Performa:veu6erancesdonotmerelydescribetheworldbutchangeit(anddeterminingtheirtruthvalueispointless)
• Inamethisshipthe„QueenElizabeth.“• Igiveandbequeathmywatchtomybrother.• Ibetyousixpenceitwillraintomorrow.• Iapologize.
ManfredStedeESSLLI2016
Speechacts
• JohnAus0n:Howtodothingswithwords(1962)
• Levelsofanalyzinganu6erance:– locu0on:performingtheac:onofu6ering– illocu0on:theac:on/inten:onofthespeaker– perlocu0on:theeffectoftheu6eranceontheaddressee
ManfredStedeESSLLI2016
25.08.16
2
SpeechActs• JohnSearle:Speechacts(1969);Ataxonomyofillocu:onary
acts(1975)
• Asser0ves=speechactsthatcommitaspeakertobelievingtheexpressedproposi:on
• Direc0ves=speechactsthataretocausethehearertotakeapar:cularac:on,e.g.requests,commandsandadvice
• Commissives=speechactsthatcommitaspeakertodoingsomefutureac:on,e.g.promisesandoaths
• Expressives=speechactsthatexpressthespeaker'sagtudesandemo:onstowardstheproposi:on,e.g.congratula:ons,excusesandthanks
• Declara0ons=speechactsthatchangethesocialsphere,e.g.bap:smsorpronouncingsomeonehusbandandwife
ManfredStedeESSLLI2016
Speechacts,empirically
Hugeliteratureonspeechacttheory:theirdefini:on;linguis:crealiza:on;indirectspeechacts,...Inprac0ce,thevastmajorityofillocu:onsencounteredintextareasser0ves
ManfredStedeESSLLI2016
Example:User-generatedhotelreviews
IstayedatthisHiltonforthethird:me.Asusual,thestaffwasextremelya6en:veandfriendly.TheconciergeiscalledPierre;healwayswearsabow:e.He‘spar:cularlysweet,evenearlyinthemorning.Iguesshe‘sgegngverygoodcoffeeathisplace....
ManfredStedeESSLLI2016
Aninventoryofillocu:onsforsubjec:vetext
• Report:Thewaiterbroughtusanewcocktail.• Report_author:Ipayedforitrightaway.• Ident:Iwasafraidtoseehimagain.• Evalua:on:Thecocktailturnedouttobelousy.• Es:mate:Itprobablywasmadeinahurry.• Commitment:I‘llneverhaveitagain!• Direc:ve:Andyoushouldavoidit,too.
M.Stede,A.Peldszus:Theroleofillocu:onarystatusintheusagecondi:onsofcausalconnec:vesandincoherencerela:ons.JournalofPragma:cs44(2),2012
Dialogact
• ...capturesthefunc:onalrelevanceofanu6eranceincontext
• ExamplefromVerbmobilappointmentscheduling(Alexanderssonetal.1997)– Canwemeetinthesecondhalfofmay?– Well,that‘snotsogood,becauseI‘llbeonholiday.HowaboutearlyJune,suchasthe3rd?
– Hm,Idon‘tknow,allright,IguessIcandothat.
SUGGEST-DATE
REJECT
GIVE-REASON
SUGGEST-DATE
HESITATE
ACCEPT-DATE
ManfredStedeESSLLI2016
Dialogact
• SomeproposalsofDAtaxonomies– DAMSL(Allen&Core,1997)– DIT++(Bunt2006)h6p://dit.utv.nl
• Automa:cDArecogni:on– (Stolckeetal.00)onminutesofmee:ngs– niceoverview:P.Kral:Dialogueactrecogni:onapproaches.Compu:ngandInforma:cs29:227-250,2010
ManfredStedeESSLLI2016
25.08.16
3
WhystudythisonTwi6er?• Intheconversa:ons,dopeople
– talktoeachotherorpasteachother?– exchangeinforma:on?– exchangeopinions?– exchangearguments?– displaytheiremo:ons?– follow„standard“dialogprotocols?– ...
• Knowingthisisrelevant,interalia,forbuildinggoodTwi6erbots– User:QUESTION
Bot:ANSWER– User:OPINION
Bot:AGREE|DISAGREE– ...
ManfredStedeESSLLI2016
Fromdialogactsto„tweetacts“
• Tweetscollectedonthetopicofrenewableenergy– keywordspogng– re-crawlmissingtweetstogettreesascompleteaspossible
• 1566tweetsin172conversa:ons
E.Zarisheva,T.Scheffler:Dialogactannota:onforTwi6erconversa:ons.Proc.ofSIGDIAL,Prague,2015
ManfredStedeESSLLI2016
DAtaxonomy(part1)
ManfredStedeESSLLI2016
DAtaxonomy(part2)
ManfredStedeESSLLI2016
Annota:on
• Total:51DAlabels• Annotatorshavetochooseonelabelpertweetsegment
• [True,unfortunately.]Agreement[Butwhatabouttherealiza:onofhighsolarac:vityinthe70sand80s?]SetQues:on
ManfredStedeESSLLI2016
Annota:on
• minimally-trainedundergraduatestudents• twosteps– segmenta:on:Fleissmul:-pi0.88– DAlabelling:Fleissmul:-pi0.56
• no:ce:disagreementsonsubcategoriesarepunishedlikedisagreementsonmaincategories
• (with10DAs:mul:-pi0.76)• mostdisagreementisamongtypesofInforma:on-providing
• Finally,allannota:onsmergedintoasinglegold-standard:1213tweets/2936segments
ManfredStedeESSLLI2016
25.08.16
4
Tweet-internalstructure
[@TheBug0815@Luegendetektor@McGeiz]0[Exactly,wedon‘tneedabaseload,it‘sonlyacapitalistconstruct]Agreement–[Wind/PVaresufficient?]PropQues:on[Lol]Disagreement
ManfredStedeESSLLI2016
Automa:crecogni:on
• Assumegoldsegments• Splitconversa:ontreesintosinglestrands
QUESTION
ANSWER1 ANSWER2
DISAGREEMENT AGREE THANK
TatjanaSchefflerandElinaZarisheva:Dialogactrecogni:onforTwi6erconversa:ons.Proc.oftheLRECWorkshoponNormalisa:onandAnalysisofSocialMediaTexts,2016
Features
• userdefined(UD):– segmentlength– author– posi:onofsegmentintweet– presenceofques:onmarks,links,hashtags,etc.
• top50/100wordsforthedialogact(byTF-IDF)
• wordembeddings(pre-calculated,64dimensions)
feature combinationsUD
UD + top50UD + top100
UD + embeddingsUD + top100 +
embeddings
Experiment:FullDAsetHiddenMarkovModel
Mul:nomialdistribu:on(Discretevalues)Gaussiandistribu:on(M-dimensionalvectors)
Condi:onalRandomFieldsmajoritybaseline(INFORM)f=0.09
Results:Reduced/MinimalDAset
previouswork:– (Zhangetal.,2011):F1=0.695for5classes– (ArguelloandShaffer,2015)MOOCforumposts:averageprecision~0.65;ours,0.70
– (VosoughiandRoy,2016):F1=0.70for6classes
F1 Baseline GHMM CRFFull (50 DAs) 0.09 0.21 0.31Reduced (12 DAs) 0.16 0.36 0.51Minimal (8 DAs) 0.34 0.51 0.72
Overview
• Twi6erconversa:ons??• Fromspeechactsviadialogactstotweetacts• Coherencerela:onsand„rhetoricalstructure“• Derivingrhetoricalstructureautoma:cally• RhetoricalstructureinTwi6erconversa:ons
ManfredStedeESSLLI2016
25.08.16
5
Coherence
JohntookatraintoIstanbul.Hehasfamilythere.JohntookatraintoIstanbul.Helikesspinach.(Hobbs76)
Coherence:=Coreference+Coherencerela:ons
ManfredStedeESSLLI2016
Coherencerela:ons
• JohntookatraintoIstanbul.HefirstvisitedtheHagiaSophia.(temporal-sequence)
• JohntookatraintoIstanbul.HissisterwenttoRome.(contrast)
• JohntookatraintoIstanbul.ItwasacomfortableEurocity.(elabora:on)
ManfredStedeESSLLI2016
Claim
• Intext,coherencearisesbecauseeveryclauseislinkedviaacoherencerela:ontoitsle~context.
• OK,but...– howexactlydoyoudeterminetheminimalunits?– whatinventoryofrela:onsdoyouassume?– whatcanthecurrentclausebea6achedto?
ManfredStedeESSLLI2016
Rhetorical Structure Theory (Mann/Thompson 88)
9-14
Reason
In manchenFällen ist ihrEinfluss auf
höchster Ebenewichtig .
10-14
Elaboration
Das giltbesonders beiProjekten , dieder Bund selbst
in denRegionen
vorantreibt ,
11-14
E-elaboration
genannt seienzum Beispiel
dieUmgehungsstra
ßen und dasVerkehrsprojek
t DeutscheEinheit Nr. 17 -
derHavelausbau .
12-14
Elaboration
Letzterer ist imPotsdamer
Umlandumstritten .
13-14
Elaboration
OhneÜbertreibung
lässt sich sagen, dass die
Gegner desProjekts klar inder Mehrheit
sind .
JointFür sie muss
die Haltung derWahlkreiskandi
daten zudiesem Thema
auch einpolitisches
Signal sein .
Joint
2-7
Reason
5-7
6-7Die Einwohnervon Städten
und Gemeindenerfahren meisterst kurz vor
dem Entscheidauf
Bundesebene ,wer ihr direkter
politischerAnsprechpartner in Berlin sein
könnte .
Cause
WelcheHaltungen sie
zu spezifischenörtlichen
Problemeneinnehmen ,
bleibt für vieleWähler
zunächst imDunkeln .
Dann ist es oftschon zu spät ,die inhaltlichenSchwerpunkteder Kandidatenzu bestimmen .
Reason
2-4
Background
Derzeitbestimmen dieParteien , wen
sie in denWahlkreisen
der Region fürdie
Bundestagswahl 2002 in das
Rennenschickenwerden .
3-4
Elaboration
Dies geschiehteher beiläufig ,
zumindestohne
Beteiligungeiner breitenÖffentlichkeit .
Elaboration
Sich frühzeitigmit den
Vorstellungender
Bundestagsaspiranten vertrautzu machen , ist
deshalbsinnvoll .
2-14
ManfredStedeESSLLI2016
Overview
• Twi6erconversa:ons??• Fromspeechactsviadialogactstotweetacts• Coherencerela:onsand„rhetoricalstructure“• Derivingrhetoricalstructureautoma:cally• RhetoricalstructureinTwi6erconversa:ons
ManfredStedeESSLLI2016
Twono:onsofdiscourseparsing
• „Shallow“– RootedinthePennDiscourseTreebankcorpus– Assignargumentstoconnec:ves– [Otherreviewerssaidit‘sagreatplace,]Arg1but[myimpressionwasotherwise.]Arg2
– Iden:fyotherrela:onsandargumentsbetweensentences• „Fullstructure“– Buildacompletestructureforthetext– RhetoricalStructureTheory(Mann/Thompson88)– SegmentedDiscourseRepresenta:onTheory(Asher/Lascarides03)
ManfredStedeESSLLI2016
25.08.16
6
RSTDiscourseTreebank• 385ar:clesfromWallStreetJournal• OverlapwiththePennTreebank
• 78rela:ons,manyofthemarisingfromnuclearityreversals– Evalua:on(nucleus:evaluated/sat:evalua:ng)– Evalua:on(nucleus:evalua:ng/sat:evaluated)
ManfredStedeESSLLI2016
L.Carlsonetal.:Buildingadiscourse-taggedcorpusintheframeworkofRhetoricalStructureTheory.Proc.ofSIGDIAL,2001
HILDA:RST-parsingviaSVMclassifica:on
ManfredStedeESSLLI2016
H.Hernaultetal.LBuildingaDiscourseParserUsingSupportVectorMachineClassifica:on.Dialogue&Discourse1(3),2010
HILDA:RST-parsingviaSVMclassifica:on
• Training&Test:RSTDiscourseTreebank• Reduce78rela:onsto18„families“• AQribu:on,Background,Cause,Comparison,Condi:on,Contrast,Elabora:on,Enablement,Evalua:on,Explana:on,Joint,Manner-Means,Summary,Temporal,Topic-Change,Topic-Comment,Same-unit,Textual-organiza:on
• Anyn-arytreeswithn>2areconvertedtobinarytrees
ManfredStedeESSLLI2016
Idea
• Complexity:linearwithrespecttolengthofinputtext
• Bothsegmenta:onandrela:onlabellingrunassupervisedclassifica:ontasks,usingsupportvectormachines
• (Here,weskipthesegmenta:onstep(F-score0.95)ManfredStedeESSLLI2016
Idea
• STRUCTclassifier:Whatistheprobabilityofsegmentsi,i+1beingconnected?
• LABELclassifier:Whatisthemostlikelyrela:ontoholdbetweensegmentsi,i+1?
ManfredStedeESSLLI2016
STRUCTclassifier
Seg1Seg2Seg3Seg4Seg5Seg6elabNS(Seg1,Seg2)condNS(Seg4,Seg5)
evalSN(Seg3,condNS(Seg4,Seg5)
list(elabNS(Seg1,Seg2),evalSN(Seg3,condNS(Seg4,Seg5)))
REL(list(elabNS(Seg1,Seg2),evalSN(Seg3,condNS(Seg4,Seg5))),Seg6)
ManfredStedeESSLLI2016
25.08.16
7
RELclassifier
Seg1Seg2Seg3Seg4Seg5Seg6elabNS(Seg1,Seg2)condNS(Seg4,Seg5)
evalSN(Seg3,condNS(Seg4,Seg5)
list(elabNS(Seg1,Seg2),evalSN(Seg3,condNS(Seg4,Seg5)
elabNS(list(elabNS(Seg1,Seg2),evalSN(Seg3,condNS(Seg4,Seg5))),Seg6)
ManfredStedeESSLLI2016
Featuresforrela:onlabeling(1)
ManfredStedeESSLLI2016
Featuresforrela:onlabeling(3)
Results:Rela:onlabeling
• STRUCT– Trainedon52.683instances(1/3posi:ve)– Testedon8.558instances– Featurespacedimensionality:136.987– Accuracywithpolyn.Kernel:85.0
• LABEL– Trainedon17.742instances– Testedon2.887instances– Accuracyofmul:-classSVM:66.8
ManfredStedeESSLLI2016
Performanceonindividualrela:ons
25.08.16
8
Overview
• Twi6erconversa:ons??• Fromspeechactsviadialogactstotweetacts• Coherencerela:onsand„rhetoricalstructure“• Derivingrhetoricalstructureautoma:cally• RhetoricalstructureinTwi6erconversa:ons
ManfredStedeESSLLI2016
Inter-Tweet-Rela:ons • UdoSieverding
#Offshore-Ausbau: Warum schweigen Dauer-#EEG-Kritiker @Der_BDI @iw_koeln @insm @DICEHHU @RolandTichy @bdew_ev @igbce? http://t.co/WfZsirxMiC
• mapro67 @UdoSieverding weil sie alle Interessen der großen Stromkonzerne vertreten @Der_BDI @iw_koeln @insm @DICEHHU @RolandTichy @bdew_ev @igbce
• UdoSieverding
#Offshore expansion: Why are the big critics of the renewable energy law so quiet?
• mapro67 @UdoSieverding because they represent the interests of the big energy companies
ManfredStedeESSLLI2016
Elaboration Antithesis
Reason Elaboration
ManfredStedeESSLLI2016
Antithesis
Sequence
Ongoingwork
Elaboration Antithesis
Reason Elaboration
ManfredStedeESSLLI2016
Antithesis
Sequence
Tobeclarified:DA≈CoherenceRel
Inform
Inform
SetQuestion
Inform
Disagree Suggest
Request
Promise
Add:Tweet-internalstructure
ManfredStedeESSLLI2016
• WTF? I have green energy and am supposed to finance nuclear and coal? What nonsense. WHAT NONSENSE!
• Agree. But it’s also a fact that …