entity and aspect extraction for organizing news comments
TRANSCRIPT
Entity and Aspect Extractionfor Organizing News Comments
Radityo Eko Prasojo, Mouna Kacimi & Werner NuttMelbourne 19–23 October 2015
Comments in News WebsiteTypically, commentsarelistedbasedondate-timeandreplyrelation.
Problem:difficultytocatchtheflowofthediscussionsandtounderstandtheirmainpointsofagreementanddisagreement.
Example:whyisindependencegood/bad fortheScots?Willtheireconomy beaffected?
CIKM2015 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
2
There is a need for organizing comments
to help users to:(1) have a better understandingof the viewpoints related to each topic(2) facilitate the participation in discussions and thus increase the
chance of acquiring new viewpoints
by clustering comments containing similar discussions:
• they talk about the same entities
• they argue about the same aspects of those entities.CIKM2015 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments3
Contributions• Improvementsonstate-of-the-artunsupervisedentityextractiontools(Zemanta,NERD,AIDAYago)• Addressedissues:noisesandlowcoverage(duetocoreferences)
• Introduced aspectextractioninnewsdomain• Previously:aspectsonlyonproductreviewdomain(Zhang&Liu,2014)
CIKM2015 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
4
An entity can be…
aperson,alocation,anorganization,oranywell-definedconcept suchasnationalities,languages,orwars.
CIKM2015 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
5
Entity Extraction Tasks1. Recognition– throughpropernames/rigiddesignators
(Coates-Stephens,1992)(Thielen,1995)(Nadeau&Sekine,2006)
2. Disambiguation – bymappingtoarepresentationinaKB
CIKM2015 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
6
“Scotland can vote however it wants, it's the Scottish peoples right.”
“If he ends up sending USS Ronald Reagan aircraft carrier to the coast of Scotland, then he should have done the same to Crimea.”
<http://dbpedia.org/resource/USS_Ronald_Reagan_(CVN-76)>
<http://dbpedia.org/resource/Crimea><http://dbpedia.org/resource/Scotland>
Supervised vs Unsupervised Approachesto Entity Extraction
AspectofEE
Prominent tools
Recognitionability
Disambiguation ability
Running time
Supervised
StanfordNLP
dependsonthetrainingset
limited
fast
UnsupervisedZemanta,AlchemyAPI,NERD,AidaYAGO
domain-independent
providedbyKBs
slow
CIKM2015 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
7
Entity extraction Baseline: Unsupervised Tools
• Improvedbyapplying:• Entityfiltering• Namenormalization• EntitysearchonKB• Coreference resolution
CIKM2015 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
8
Tools:DbpediaStanfordCoreNLP
“Don't be afraid of Rasmussen or NATO, this is none of their business. If he ends up sending USS Ronald Reagan aircraft carrier to the coast of Scotland, then he should have done the same to Crimea.”
<http://dbpedia.org/resource/NATO>
<http://dbpedia.org/resource/Scotland><http://dbpedia.org/resource/USS_Ronald_Reagan>
“Don't be afraid of Rasmussen or NATO,this is none of their business. If he ends up sending USS Ronald Reagan aircraft carrier to the coast of Scotland, then he should have done the same to Crimea.”
<http://dbpedia.org/resource/Crimea><http://dbpedia.org/resource/Aircraft_Carrier>
“Don't be afraid of Rasmussen or NATO,this is none of their business. If he ends up sending USS Ronald Reagan aircraft carrier to the coast of Scotland, then he should have done the same to Crimea.”
CIKM2015 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
9
rdf:type ofNATO?à←dbo:PopulatedPlace
Anentity isaninstanceofsomewell-definedclass
rdf:type?à←dbo:Ship
rdf:type?à←dbo:Location
rdf:type?à←owl:Thing
Entity Filtering
“Don't be afraid of Rasmussen or NATO,this is none of their business. If he ends up sending USS Ronald Reagan aircraft carrier to the coast of Scotland, then he should have done the same to Crimea.”
CIKM2015 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
10
Weremoveentities thatdon’thaverdf:type otherthanowl:Thing andowl:Class
rdf:type ofNATO?à←dbo:PopulatedPlace
Anentity isaninstanceofsomewell-definedclass
rdf:type?à←dbo:Ship
rdf:type?à←dbo:Location
Risk:lowerrecall
rdf:type?à←owl:Thing
Entity Filtering
CIKM2015 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
11
Anentitymayappearusingnon-propernames(alias)
“Don't be afraid of Rasmussen or NATO, this is none of their business. If he ends up sending USS Ronald Reagan aircraft carrier to the coast of Scotland, then he should have done the same to Crimea.”
Lorem ips um dolor s it amet, consecte turadipisc ing el i t. Ut s usc ipi t lacus ac bibendum mol lis. Pe llenteas dsadas quetinci tUnitedStates
Lorem ips um dolor s it amet, consecte turadipisc ing el i t. Ut s usc ipi t lacus ac bibendum mol lis. Pe llentesquetinc idunt vu lputate ligula a e ffic itur. Aliquam era tv olutpat. Al iquam sedenim et tortor
fringi l la asdasdasdasdas das dqwrqweqweqwr wqeqweAnders FoghRasmussen. Maecenas vehiculaurnaege t metus imperdie tcomm odo.
ScotlandRejectsIndependenceinReferendumLorem ipsumdolorsitamet,consectetur adipiscing elit.Ut suscipit lacusacbibendum mollis.Pellentesque tincidunt vulputate ligulaaefficitur.Aliquam erat
…………foaf:givenName,foaf:surname?à
←{‘Anders’, ‘Fogh’, ‘Rasmussen’}
dbpedia-owl:wikiPageRedirects of?à←{‘US’,‘USA’, ‘U.S.’,‘U.S.A.’,…}
Name Normalization
CIKM2015 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
12
Anentitymayappearusingnon-propernames(alias)
“Don't be afraid of Rasmussen or NATO, this is none of their business. If he ends up sending USS Ronald Reagan aircraft carrier to the coast of Scotland, then he should have done the same to Crimea.”
Lorem ips um dolor s it amet, consecte turadipisc ing el i t. Ut s usc ipi t lacus ac bibendum mol lis. Pe llenteas dsadas quetinci tUnitedStates
Lorem ips um dolor s it amet, consecte turadipisc ing el i t. Ut s usc ipi t lacus ac bibendum mol lis. Pe llentesquetinc idunt vu lputate ligula a e ffic itur. Aliquam era tv olutpat. Al iquam sedenim et tortor
fringi l la asdasdasdasdas das dqwrqweqweqwr wqeqweAnders FoghRasmussen. Maecenas vehiculaurnaege t metus imperdie tcomm odo.
ScotlandRejectsIndependenceinReferendumLorem ipsumdolorsitamet,consectetur adipiscing elit.Ut suscipit lacusacbibendum mollis.Pellentesque tincidunt vulputate ligulaaefficitur.Aliquam erat
…………foaf:givenName,foaf:surname?à
←{‘Anders’, ‘Fogh’, ‘Rasmussen’}
dbpedia-owl:wikiPageRedirects of?à←{‘US’,‘USA’, ‘U.S.’,‘U.S.A.’,…}
Risk:lowerprecision
Name Normalization
CIKM2015 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
13
Sometimes,mappingforanaliascannotbefound
AllrelatedentitiesofNATOandtheiraliasesà
←dbprop:leaderNamedbpedia:Anders_Fogh_Rasmussen,{‘AndersFogh’,‘Rasmussen’}
Stringcomparison“Don't be afraid of Rasmussen or NATO,
Context-Related Entity Search
CIKM2015 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
14
Sometimes,mappingforanaliascannotbefound
AllrelatedentitiesofNATOandtheiraliasesà
←dbprop:leaderNamedbpedia:Anders_Fogh_Rasmussen,{‘AndersFogh’,‘Rasmussen’}
Stringcomparison
Risk:lowerprecision
“Don't be afraid of Rasmussen or NATO,
Context-Related Entity Search
CIKM2015 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
15
“Don't be afraid of Rasmussen or NATO, this is none of their business. If he ends up sending USS Ronald Reagan aircraft carrier to the coast of Scotland, then he should have done the same to Crimea.”
StanfordCoreNLP
Coreference Resolution
CIKM2015 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
16
“Don't be afraid of Rasmussen or NATO, this is none of their business. If he ends up sending USS Ronald Reagan aircraft carrier to the coast of Scotland, then he should have done the same to Crimea.”
StanfordCoreNLP
Risk:lowerprecision
Coreference resolution
Coreference Resolution
Entity Extraction – Experiment Setup• 10newsarticlesthatuseDISQUS• 100commentshavingthehighestwordcounts• 5studentsasentity andaspect annotators• Annotateddataasgroundtruth
CIKM2015 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
17
Entity Extraction – Experiment Results
CIKM2015 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
18
Aspects• ofproductentities• Theaspectsofanentitye arethecomponents andattributes ofe.(Zhang&Liu,2014)
• ofentitiesonnews• “MoreScots woulddefinitelyhavevoted nothanyes”(voting - action)• “OrlandoBloomisagoodactor”(acting - skill)• “…it’stherightthe rightofScottishPeople”(right - possession)• Other:components,attributes,andmoods
Anaspect isallwhatisarguableaboutanentityCIKM2015 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments19
Types of Aspects
• “…it’sthe rightofScottishPeople.”(explicit)
• “Tesco is large.”(implicit)
• “Scotlandisavery beautiful country.”(semi-implicit)
• “TheScotscan vote howevertheywant.” (semi-implicit)
CIKM2015 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
20
aspect::right
aspect::employer
aspect::beauty
aspect::voting
Extraction of Explicit Aspects:Exploiting Dependency
“…it’sthe rightofScottishPeople.”
CIKM2015 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
21
StanfordCoreNLP Annotation
Extractallnounphrasesthathaveannmod:of,nmod:in,nmod:on,ornmod:atrelationtowardstheentity inthesentence.
PrepositionalDependency
• Specifically,thecoreference resolutionpart
CIKM2015 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
22
Extraction of Explicit Aspects, Combined with Entity Extraction
“Don't be afraid of Rasmussen or NATO, this is none of their business.”
possessivedependency
Extraction of Implicit Aspects:Adjective-to-aspect MappingIdeafrom(Zhang&Liu,2014)
“Tesco islarge”
Inothercomments,wefoundasaspectsofTesco,qualifiedas“large”:• employer(2x)• backoffice(1x)• callcenteroperation(1x)
Weconclude:mostprobably,theemployer aspectofTesco wasmeant.Resultisfurtherimprovedby(1)takingintoaccountfrequentcontextwordsand(2)lexicalrelationsofadjective.
CIKM2015 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
23
Extraction of Semi-Implicit Aspects (1)• Semi-implicitaspects:implicitaspectsthatdon’thavemapping.• “Scotlandisavery beautiful country.”
beautiful beautyWesearchforanounphrasethatisconnectedtotheentityandthewordindicatingtheaspectusingalexicalrelation.CIKM2015 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments24
WordNet:attribute
StanfordCoreNLP Annotation
Lexicaldatabasesearch Weconsiderfollowinglexicalrelations:attribute>pertainym>participleofverb>derivationallyrelatedform(drf)>seealso
Extraction of Semi-Implicit Aspects (2)• Generallycanbeusedtoidentifyaspects fromverb,adjective,ornoun.• Otherexamples:
beautiful beautyinstrumental instrumentvote votinginexcusable excusable justifiable
justification justify• Iftherearemultiplepossibleaspects forasingleword,weuseWordNet::Similaritytodecideforthebestone.
CIKM2015 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
25
WordNet:attribute
WordNet:pertainym
WordNet:derivationally relatedform
WordNet:antonym WordNet:similar to
WordNet:drfWordNet:drf
Aspect Extraction –Experiment and Results
CIKM2015 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
26
Visualization
CIKM2015 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
27
REPrasojo,MKacimi,FDarari.IEEEInfoVis.Chicago,25-30October2015.
demoisnowavailableatorcaestra.inf.unibz.it
Conclusion• Generalcontribution:aframeworkfororganizingnewscomment usingentityextractionandaspectextraction.
• Ourentityextraction: unsupervisedtools+entityfiltering,namenormalization,entitysearch,andcoreference resolution.
• Weextractexplicit,implicit,andsemi-implicit aspectsusinggrammaranalysisandlexicaldatabasesearch.
• Experimentshowsimprovementonbothentity andaspect extractioncomparedtobaselinetechnique.
CIKM2015 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
28
Future Work• Addressinglimitations:• Conceptextractions• Idiomsandothermetaphoricalexpressions• Difficultcoreference (e.g.demonstrativepronouns)• Experimentsonmore,variousdata
• Completethemissingpieces:• Sentimentanalysisfornewscomments
CIKM2015 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
29