intro to information extraction -sentiment lexicon...
TRANSCRIPT
![Page 1: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced](https://reader034.vdocuments.site/reader034/viewer/2022043018/5f3a55e716373024d953ac3c/html5/thumbnails/1.jpg)
IntrotoInformationExtraction-SentimentLexiconInductionasan
example
Many slides adapted from Ellen Riloff and Dan Jurafsky
![Page 2: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced](https://reader034.vdocuments.site/reader034/viewer/2022043018/5f3a55e716373024d953ac3c/html5/thumbnails/2.jpg)
WhatisInformationExtraction?• Informationextraction(IE)isanumbrellatermforNLPtasksthatinvolveextractingpiecesofinformationfromtextandassigningsomemeaningtotheinformation.
• ManyIEapplicationsaimtoturnunstructuredtextintoa“structured” representation.
• IEproblemstypicallyinvolve:– identifyingtextsnippetstoextract
– assigningsemanticmeaningtoentitiesorconcepts
– findingrelationsbetweenentitiesorconcepts
![Page 3: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced](https://reader034.vdocuments.site/reader034/viewer/2022043018/5f3a55e716373024d953ac3c/html5/thumbnails/3.jpg)
IEApplications• BiologicalProcesses(Genomics)
• ClinicalMedicine
• QuestionAnswering/WebSearch
• QueryExpansion/SemanticSets
• ExtractingEntityProfiles
• TrackingEvents(Violent,Diseases,Business,etc)
• TrackingOpinions(Political,ProductReputation,FinancialPrediction,On-lineReviews,etc.)
![Page 4: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced](https://reader034.vdocuments.site/reader034/viewer/2022043018/5f3a55e716373024d953ac3c/html5/thumbnails/4.jpg)
GeneralTechniques• SyntacticAnalysis
– PhraseIdentification
– FeatureExtraction
• SemanticAnalysis
• StatisticalMeasures
• MachineLearning
– Supervised&WeaklySupervised
• GraphAlgorithms
![Page 5: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced](https://reader034.vdocuments.site/reader034/viewer/2022043018/5f3a55e716373024d953ac3c/html5/thumbnails/5.jpg)
NamedEntityRecognition(NER)
MarsOneannouncedMonday thatithaspicked1,058aspiringspaceflyerstomoveontothenextroundinitssearchforthefirsthumanstoliveanddieontheRedPlanet.
TheWallStreetJournalreportsthatGoogle planstopartnerwithToyotatodevelopAndroidsoftwarefortheirhybridcars.
NERtypicallyinvolvesextractingandlabelingcertaintypesofentities,suchaspropernamesanddates.
![Page 6: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced](https://reader034.vdocuments.site/reader034/viewer/2022043018/5f3a55e716373024d953ac3c/html5/thumbnails/6.jpg)
Domain-specificNER
IL−2 gene expressionand NFkappaB activationthrough CD28 requiresreactiveoxygenproductionby 5lipoxygenase.
Biomedicalsystemsmustrecognizegenesandproteins:
Adrenal-Sparingsurgeryissafeandeffective,andmaybecomethetreatmentofchoiceinpatientswithhereditaryphaeochromocytoma.
Clinicalmedicalsystemsmustrecognizeproblemsandtreatments:
![Page 7: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced](https://reader034.vdocuments.site/reader034/viewer/2022043018/5f3a55e716373024d953ac3c/html5/thumbnails/7.jpg)
SemanticClassIdentification
MarsOneannouncedMondaythatithaspicked1,058aspiringspaceflyerstomoveontothenextroundinitssearchforthefirsthumanstoliveanddieontheRedPlanet.
TheWallStreetJournalreportsthatGoogleplanstopartnerwithToyotatodevelopAndroidsoftwarefortheirhybridcars.
![Page 8: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced](https://reader034.vdocuments.site/reader034/viewer/2022043018/5f3a55e716373024d953ac3c/html5/thumbnails/8.jpg)
SemanticLexiconInduction• Althoughsomegeneralsemanticdictionariesexist(e.g.,WordNet),domain-specificapplicationsoftenhavespecializedvocabulary.
• SemanticLexiconInductiontechniqueslearnlistsofwordsthatbelongtoasemanticclass.
Vehicles:car,jeep,helicopter,bike,tricycle,scooter,…
Animal:tiger,zebra,wolverine,platypus,echidna,…
Symptoms:cough,sneeze,pain,pu/pd,elevatedbp,…
Products:camera,laptop,iPad,tablet,GPSdevice,…
![Page 9: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced](https://reader034.vdocuments.site/reader034/viewer/2022043018/5f3a55e716373024d953ac3c/html5/thumbnails/9.jpg)
Domain-specificVocabulary
A 14yo m/n doxy owned by a reputable breeder is being treated for IBD with pred.
doxy
predIBDbreeder
ANIMALHUMANDISEASEDRUG
Domain-specific meanings: lab, mix, m/n = ANIMAL
![Page 10: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced](https://reader034.vdocuments.site/reader034/viewer/2022043018/5f3a55e716373024d953ac3c/html5/thumbnails/10.jpg)
SemanticTaxonomyInduction• Ideally,wewantsemanticconceptstobeorganizedinataxonomy,tosupportgeneralizationbuttodistinguishdifferentsubtypes.
AnimalMammal
FelineLion,PantheraLeoTiger,PantheraTigris,FelisTigrisCougar,MountainLion,Puma,Panther,Catamount
CanineWolf,CanisLupusCoyote,PrairieWolf,BrushWolf,AmericanJackalDog,Puppy,CanisLupusFamiliaris,Mongrel
![Page 11: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced](https://reader034.vdocuments.site/reader034/viewer/2022043018/5f3a55e716373024d953ac3c/html5/thumbnails/11.jpg)
ChallengesinTaxonomyInduction• Butthereareoftenmanywaystoorganizeaconceptualspace!
• Stricthierarchiesarerareinrealdata– graphs/networksaremorerealisticthantreestructures.
• Forexample,animalscouldbesubcategorizedbasedon:– carnivorevs.herbivore– water-dwellingvs.land-dwelling– wildvs.petsvs.agricultural– physicalcharacteristics(e.g.,baleenvs.toothedwhales)– habitat(e.g.,arcticvs.desert)
![Page 12: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced](https://reader034.vdocuments.site/reader034/viewer/2022043018/5f3a55e716373024d953ac3c/html5/thumbnails/12.jpg)
RelationExtraction
InSalzburg,littleMozart grewupinalovingmiddle-classenvironment.
Birthplace(Mozart,Salzburg)
SteveBallmer isanAmericanbusinessmanwhohasbeenservingastheCEOofMicrosoft sinceJanuary2000
Employed-By(SteveBallmer,Microsoft)CEO(SteveBallmer,Microsoft)
![Page 13: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced](https://reader034.vdocuments.site/reader034/viewer/2022043018/5f3a55e716373024d953ac3c/html5/thumbnails/13.jpg)
RelationsforWebSearch
![Page 14: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced](https://reader034.vdocuments.site/reader034/viewer/2022043018/5f3a55e716373024d953ac3c/html5/thumbnails/14.jpg)
Paraphrasing• Relationscanoftenbeexpressedwithamultitudeofdifferenceexpressions.
• Paraphrasingsystemstrytoexplicitlylearnphrasesthatrepresentthesametypeofrelation.
• Examples:– XwasborninY
– YisthebirthplaceofX
– X’sbirthplaceisY
– X’shometownisY
– XgrewupinY
![Page 15: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced](https://reader034.vdocuments.site/reader034/viewer/2022043018/5f3a55e716373024d953ac3c/html5/thumbnails/15.jpg)
15
Event ExtractionGoal: extract facts about events from unstructured documents
December 29, Pakistan - The U.S. embassy in Islamabad was damaged this morning by a car bomb. Three diplomats were injured in the explosion. Al Qaeda has claimed responsibility for the attack.
EVENTType: bombingTarget: U.S. embassyLocation: Islamabad,
PakistanDate: December 29Weapon: car bombVictim: three diplomatsPerpetrator: Al Qaeda
Example: extracting information about terrorism events in news articles:
![Page 16: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced](https://reader034.vdocuments.site/reader034/viewer/2022043018/5f3a55e716373024d953ac3c/html5/thumbnails/16.jpg)
EventExtraction
Document TextNew Jersey, February, 26. An outbreak of swine flu has been confirmed in Mercer County, NJ. Five teenage boysappear to have contracted the deadly virus from an unknown source. The CDC is investigating the cases and is taking measures to prevent the spread. . .
EventDisease: swine fluLocation: Mercer County, NJVictim: Five teenage boysDate: February 26Status: confirmed
Anotherexample:extractinginformationaboutdiseaseoutbreakevents.
![Page 17: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced](https://reader034.vdocuments.site/reader034/viewer/2022043018/5f3a55e716373024d953ac3c/html5/thumbnails/17.jpg)
Large-ScaleIEfromtheWeb
• SomeresearchershavebeendevelopingIEsystemsforlarge-scaleextractionoffactsandrelationsfromtheWeb.
• ThesesystemsexploitthemassiveamountoftextandredundancyavailableontheWebanduseweaklysupervised,iterativelearningtoharvestinformationforautomatedknowledgebaseconstruction.
• TheKnowItAllprojectatUWandNELLprojectatCMUarewell-knownresearchgroupspursuingthiswork.
![Page 18: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced](https://reader034.vdocuments.site/reader034/viewer/2022043018/5f3a55e716373024d953ac3c/html5/thumbnails/18.jpg)
![Page 19: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced](https://reader034.vdocuments.site/reader034/viewer/2022043018/5f3a55e716373024d953ac3c/html5/thumbnails/19.jpg)
OpinionExtraction
Source:<writer>Target:PowershotAspect:pictures,colorsEvaluation:beautiful,easytogrip
IjustboughtaPowershotafewdaysago.Itooksomepicturesusingthecamera.Colorsaresobeautifulevenwhenflashisused.Alsoeasytogripsincethebodyhasagriphandle.[Kobayashietal.,2007]
![Page 20: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced](https://reader034.vdocuments.site/reader034/viewer/2022043018/5f3a55e716373024d953ac3c/html5/thumbnails/20.jpg)
OpinionExtractionfromNews[Wilson&Wiebe,2009]
Source:ItaliansenatorRenzoGubertTarget:theChineseGovernmentEvaluation:praisedPOSITIVE
ItaliansenatorRenzoGubertpraisedtheChineseGovernment’sefforts.
AfricanobserversgenerallyapprovedofhisvictorywhileWesterngovernmentsdenouncedit.
Source:AfricanobserversTarget:hisvictoryEvaluation:approvedPOSITIVE
Source:WesterngovernmentsTarget:it(hisvictory)Evaluation:denouncedNEGATIVE
![Page 21: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced](https://reader034.vdocuments.site/reader034/viewer/2022043018/5f3a55e716373024d953ac3c/html5/thumbnails/21.jpg)
Summary• Informationextractionsystemsfrequentlyrelyonlow-levelNLPtoolsforbasiclanguageanalysis,ofteninapipelinearchitecture.
• ThereareawidevarietyofapplicationsforIE,includingbothbroad-coverageanddomain-specificapplications.
• SomeIEtasksarerelativelywell-understood(e.g.,namedentityrecognition),whileothersarestillquitechallenging!
• We’veonlyscratchedthesurfaceofpossibleIEtasks…nearlyendlesspossibilities.
![Page 22: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced](https://reader034.vdocuments.site/reader034/viewer/2022043018/5f3a55e716373024d953ac3c/html5/thumbnails/22.jpg)
TurneyAlgorithmtolearnaSentimentLexicon
1. Extractaphrasallexiconfromreviews
2. Learnpolarityofeachphrase
3. Rateareviewbytheaveragepolarityofitsphrases
22
Turney (2002): Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews
![Page 23: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced](https://reader034.vdocuments.site/reader034/viewer/2022043018/5f3a55e716373024d953ac3c/html5/thumbnails/23.jpg)
Extracttwo-wordphraseswithadjectives
FirstWord SecondWord ThirdWord (notextracted)
JJ NNorNNS anythingRB, RBR,RBS JJ NotNNnorNNS
JJ JJ NotNNnorNNSNNorNNS JJ NotNNnor NNS
RB,RBR,orRBS
VB,VBD,VBN,VBG
anything
23
![Page 24: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced](https://reader034.vdocuments.site/reader034/viewer/2022043018/5f3a55e716373024d953ac3c/html5/thumbnails/24.jpg)
Howtomeasurepolarityofaphrase?• Positivephrasesco-occurmorewith“excellent”
• Negativephrasesco-occurmorewith“poor”
• Buthowtomeasureco-occurrence?
24
![Page 25: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced](https://reader034.vdocuments.site/reader034/viewer/2022043018/5f3a55e716373024d953ac3c/html5/thumbnails/25.jpg)
PointwiseMutualInformation
• Pointwise mutualinformation:
– Howmuchmoredoeventsxandyco-occurthaniftheywereindependent?
PMI(X,Y ) = log2P(x,y)P(x)P(y)
![Page 26: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced](https://reader034.vdocuments.site/reader034/viewer/2022043018/5f3a55e716373024d953ac3c/html5/thumbnails/26.jpg)
PointwiseMutualInformation
• Pointwise mutualinformation:
– Howmuchmoredoeventsxandyco-occurthaniftheywereindependent?
• PMIbetweentwowords:
– Howmuchmoredotwowordsco-occurthaniftheywereindependent?PMI(word1,word2 ) = log2
P(word1,word2)P(word1)P(word2)
PMI(X,Y ) = log2P(x,y)P(x)P(y)
![Page 27: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced](https://reader034.vdocuments.site/reader034/viewer/2022043018/5f3a55e716373024d953ac3c/html5/thumbnails/27.jpg)
HowtoEstimatePointwiseMutualInformation
– Querysearchengine(Altavista)
• P(word)estimatedbyhits(word)/N
• P(word1,word2)byhits(word1 NEAR word2)/N– (MorecorrectlythebigramdenominatorshouldbekN,becausethereareatotalofNconsecutivebigrams(word1,word2),butkN bigramsthatarekwordsapart,butwejustuseNontherestofthisslideandthenext.)
PMI(word1,word2 ) = log2
1Nhits(word1 NEAR word2)
1Nhits(word1) 1
Nhits(word2)
![Page 28: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced](https://reader034.vdocuments.site/reader034/viewer/2022043018/5f3a55e716373024d953ac3c/html5/thumbnails/28.jpg)
Doesphraseappearmorewith“poor” or“excellent”?
28
Polarity(phrase) = PMI(phrase,"excellent")−PMI(phrase,"poor")
= log2hits(phrase NEAR "excellent")hits("poor")hits(phrase NEAR "poor")hits("excellent")!
"#
$
%&
= log2hits(phrase NEAR "excellent")
hits(phrase)hits("excellent")hits(phrase)hits("poor")
hits(phrase NEAR "poor")
= log2
1N hits(phrase NEAR "excellent")1N hits(phrase) 1
N hits("excellent")− log2
1N hits(phrase NEAR "poor")1N hits(phrase) 1
N hits("poor")
![Page 29: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced](https://reader034.vdocuments.site/reader034/viewer/2022043018/5f3a55e716373024d953ac3c/html5/thumbnails/29.jpg)
Phrasesfromathumbs-upreview
29
Phrase POStags
Polarity
onlineservice JJNN 2.8
onlineexperience JJNN 2.3
directdeposit JJNN 1.3
localbranch JJNN 0.42…
lowfees JJNNS 0.33
trueservice JJNN -0.73
otherbank JJNN -0.85
inconvenientlylocated
JJNN -1.5
Average 0.32
![Page 30: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced](https://reader034.vdocuments.site/reader034/viewer/2022043018/5f3a55e716373024d953ac3c/html5/thumbnails/30.jpg)
Phrasesfromathumbs-downreview
30
Phrase POStags
Polarity
directdeposits JJNNS 5.8
onlineweb JJNN 1.9
veryhandy RBJJ 1.4…
virtualmonopoly JJNN -2.0
lesserevil RBRJJ -2.3
otherproblems JJNNS -2.8
lowfunds JJNNS -6.8
unethicalpractices
JJNNS -8.5
Average -1.2
![Page 31: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced](https://reader034.vdocuments.site/reader034/viewer/2022043018/5f3a55e716373024d953ac3c/html5/thumbnails/31.jpg)
ResultsofTurneyalgorithm• 410reviewsfromEpinions
– 170(41%)negative
– 240(59%)positive
• Majorityclassbaseline:59%
• Turneyalgorithm:74%
• Phrasesratherthanwords
• Learnsdomain-specificinformation31