tweeting beyond facts – the need for a linguistic perspective

35
Tweeting Beyond Facts --- The Need for a Linguistic Perspective Sabine Bergler CLaC Labs Sofia 2015

Upload: datasciencesociety

Post on 12-Feb-2017

1.724 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Tweeting beyond Facts – The Need for a Linguistic Perspective

Tweeting Beyond Facts ---The Need for a Linguistic Perspective

Sabine BerglerCLaC Labs

Sofia 2015

Page 2: Tweeting beyond Facts – The Need for a Linguistic Perspective

CLaC Labs Core Idea

Linguistics (like mathematics) is general consistent (across domains, corpora, and tasks) modular (= compositional)

Domain knowledge is specific only sometimes compositional reasonably well supported for some domains

(NLM suite of tools for BioNLP)

Page 3: Tweeting beyond Facts – The Need for a Linguistic Perspective

CLaC Modules and Architecture

discourse structure

embedding graph (typed)coreference semantic annotations parse tree, dependencies domain ontology lexical semantics

Page 4: Tweeting beyond Facts – The Need for a Linguistic Perspective

Archaeological Approach Theory

• shallow• slow and careful (small goals)• attention to context • analyzed extensively• iterative

Practice

• linguistically inspired• modular• vetted in shared tasks• extensive ablation studies• reuse in different pipelines for additional evaluation

Page 5: Tweeting beyond Facts – The Need for a Linguistic Perspective

Pertussis Seroprevalence in Korean Adolescents and Adults Using Anti-Pertussis Toxin Immunoglobulin G J Korean Med Sci. 2014 May;29(5)This finding indicates that natural pertussis infection is endemic in older adults and that Tdap booster vaccination rates at 11-12 yr of age may be insufficient. Reports from Israel and the Netherlands have already indicated that the highest pertussis seroprevalence was in older adults (13,18). Because protective immunity against pertussis may last for 4-12 yr after a primary DTaP vaccination series (19,20), natural pertussis infection could occur in older adults even after previous vaccinations.

Legend: report negation modal temporal ordering

Page 6: Tweeting beyond Facts – The Need for a Linguistic Perspective

Existence and FactsThe mean anti-PT IgG titer and pertussis seroprevalence were 35.53 ± 62.91 EU/mL and 41.4%, respectively.

The mean anti-PT IgG titers and seroprevalence were not significantly different between the age groups.

However, the seroprevalence in individuals 51 yr of age or older was significantly higher than in individuals younger than 51 yr (46.5% vs 39.1%, P = 0.017).

Legend: negation comparison contrast irrealis

Page 7: Tweeting beyond Facts – The Need for a Linguistic Perspective

Negation: explicit and implicit

trigger (different length lists available, domain specific possible) linguistic scope (derived from parser information)

We observed no genetic alterations in the IRF-4 promoter, which can account for the lack of IRF-4 expression.

entailment: no alterations? no alterations in the IRF-4 promoter?

Page 8: Tweeting beyond Facts – The Need for a Linguistic Perspective

Stanford Parse Tree

(S (NP (PRP We)) (VP (VBD observed) (NP (NP (DT no) (JJ genetic) (NNS alterations)) (PP (IN in) (NP (NP (DT the) (NN IRF-4) (NN promoter)) (SBAR (WHNP (WDT which)) (S (VP (MD can) (VP (VB account) (PP (IN for) (NP (NP (DT the) (NN lack)) (PP (IN of) (NP (NN IRF-4) (NN expression)))))))))))))))

Page 9: Tweeting beyond Facts – The Need for a Linguistic Perspective

Collapsed Typed Dependencies nsubj(observed-2, We-1)

root(ROOT-0, observed-2)neg(alterations-5, no-3)amod(alterations-5, genetic-4)dobj(observed-2, alterations-5)det(promoter-9, the-7)nn(promoter-9, IRF-4-8)prep_in(alterations-5, promoter-9)nsubj(account-13, promoter-9)aux(account-13, can-12)rcmod(promoter-9, account-13)det(lack-16, the-15)prep_for(account-13, lack-16)nn(expression-19, IRF-4-18)prep_of(lack-16, expression-19)

Page 10: Tweeting beyond Facts – The Need for a Linguistic Perspective

NEGATOR developed by Sabine

Rosenberg 1. trigger detection 2. linguistic scope determination3. focus of negation detection4. negation and modality interaction

Leader in two Shared Task competitions:

*Sem 2012 pilot task on negation focus (sole participant)

CLEF 2012 QA4MRE pilot task on interaction of negation and modality (Rank 1 and 2 of 6 with over 10% advance)

Page 11: Tweeting beyond Facts – The Need for a Linguistic Perspective

ModNegator for CLEF QA4MRE assembled from existing modules:

negation triggers from NEGATOR modality triggers from Kilicoglu

scope from NEGATOR (auxiliary rules added)

Rank 1 with wide margin (Conan Doyle data)narrow greedy

macroaverage .64 .62

microaverage .71 .68

accuracy .71 .67

Page 12: Tweeting beyond Facts – The Need for a Linguistic Perspective

Error CaseScope barrier relative clause:Dr Gallo had initially suggested that AIDS was caused by HTLV-I, a virus that noone disputes he discovered.

ModalTrigger: suggestedModal Scope: Dr Gallo had initially suggested that AIDS was caused by HTLV-I, a virus that no one disputes he discovered.

NegTrigger: no Negation Scope: Dr Gallo had initially suggested that AIDS was caused by HTLV-I, a virus that no one disputes he discovered.

NEGATOR: disputes : LABEL = NEGMODGold Standard: disputes : LABEL = NEG

Page 13: Tweeting beyond Facts – The Need for a Linguistic Perspective

Speculative Language (aka Hedging)

Also we could not find any RAG-like sequences in the recently sequenced sea urchin lancelet hydra.

Caspases can also be activated with the aid of Apaf-1, which in turn appears to be regulated by cytochrome c and dATP.

Phenotypic differences are suggestive of distinct functions for some of these genes in regulating dendrite arborization.

Page 14: Tweeting beyond Facts – The Need for a Linguistic Perspective

Speculative Language Detection Halil Kilicoglu

BioNLP 08, BioNLP 09, CoNLL 2011 same system adapted for subsequent tasks based on triggers and parser dependencies also incorporates negation, modality, etc

Page 15: Tweeting beyond Facts – The Need for a Linguistic Perspective

Embedding Predications Halil Kilicoglu

2012

Unified account of semantic phenomena beyond categorical assertions

core notion: semantic embedding categorization: comprehensive, domain-independent, consolidated embedding graph: compositional semantic interpretation genre-independent: news, molecular biology, shared tasks

Page 16: Tweeting beyond Facts – The Need for a Linguistic Perspective

Kilicoglu Processing Pipeline

Page 17: Tweeting beyond Facts – The Need for a Linguistic Perspective

Syntactic Dependency Graph 1

Page 18: Tweeting beyond Facts – The Need for a Linguistic Perspective

Dependency Graph 2

Page 19: Tweeting beyond Facts – The Need for a Linguistic Perspective

Typed Combined Embedding Graph

Page 20: Tweeting beyond Facts – The Need for a Linguistic Perspective

Sentiment Towards Vaccination

The incidence ☹ of pertussis ☹ decreased ☺ with the introduction of the diphtheria-tetanus-whole cell pertussis (DTwP) vaccination ☺ in children around the world (1), and a decrease ☺ in pertussis ☹ was also observed in Korea where the DTwP vaccination ☺ has been universally recommended ☺ for infants and children since 1954 (2).

However, pertussis ☹ began to rise ☹ in the 1990s in Europe and North America, especially in adolescents (1,3,4,5), and it has been also observed since the 2000s in Korea (2).

Summary: §1:☺ §2: ☹

Page 21: Tweeting beyond Facts – The Need for a Linguistic Perspective

Sentiment InferencesThe incidence ☹ of pertussis decreased with the intro-duction of the diphtheria-tetanus-whole cell pertussis (DTwP) vaccination.

Baseline: count sentiment words, use majority vote: ☹Lexical semantics + syntactic inferences:

NP: (The incidence of pertussis☹ ) ☹ → pertussis ☹Valence shifter verb: decreased(NP) = decreased( )☹ =☺Bonus inference:decrease(DTwP, pertussis )☺ → DTwP ☺☹

Page 22: Tweeting beyond Facts – The Need for a Linguistic Perspective

Sentiment Analysis for Tweets Canberk ÖzdemirSemEval 2015 Task 10B: rank 9 (of 40)

introduces new large semantic lexicon: Gezi

combines 5 sentiment lexica (aFinn smallest, Gezi largest)

uses linguistic scope for negation and modality (NEGATOR)

benefits from 5 point sentiment scale (strong pos, pos, neg, strong neg)

Page 23: Tweeting beyond Facts – The Need for a Linguistic Perspective

Tweets with Figurative Language Canberk Özdemir

SemEval 2015 Task 11: rank 1 (of 35) with wide margin

same system as for Task 10B

no special tailoring for figurative language apart from using training data for decision tree

linguistic notions at the moment equivalent to training for figurative language

Page 24: Tweeting beyond Facts – The Need for a Linguistic Perspective

Negation and Modality in ClacSentipipe negation triggers from Rosenberg modality triggers: modal auxiliaries scope from NEGATOR

He is hurt. -2Negation flips and dampens (*-.5) He is not hurt. +1

Modality dampens (*.5) He may be hurt. -1

features: negated-negative, modalized-negative, …

Page 25: Tweeting beyond Facts – The Need for a Linguistic Perspective

Sample Tweet Gold Annotations Need car financing? Toyota of Hollywood has you covered! http://t.co/rMFV0qYNOK

Kobe Bryant is better than the 40th best player. I would say about 25th

@TV_Exposed: Every episode of Friends is coming to Netflix on January 1st http://t.co/OiVJzaTOh9 damn i want netflix heere tooo

Equalizer tomorrow, Alexander and the Terrible Horrible No Good Very Bad Day & Fury Sunday. #lastfreemovieweekend

Page 26: Tweeting beyond Facts – The Need for a Linguistic Perspective

Current Work at CLaC Labs

Extend the trigger scope approach for

✓negation Modality sentiment annotation! modification (human monocytes)! emotion annotation! causal chain extraction! vaccine avoidance argument detection in blogs

Page 27: Tweeting beyond Facts – The Need for a Linguistic Perspective

Explicit Negation

Page 28: Tweeting beyond Facts – The Need for a Linguistic Perspective

Noun Phrases

Page 29: Tweeting beyond Facts – The Need for a Linguistic Perspective

Sundries

Page 30: Tweeting beyond Facts – The Need for a Linguistic Perspective

Underappreciated Items

numbers (IV, twice, 100,00, 100.00, 100,000) amounts (57%, 16Gb, 12ml, pH7, 7mph) locations person tense and aspect (this type of research has not been

done/was not done/is not done/is not being done) modifier semantics (prenominal modifiers: long-term

prospective studies, adverbials: virtually no risk)

Page 31: Tweeting beyond Facts – The Need for a Linguistic Perspective

Junk Language? there is much information in ignored language

linguistic treatments are universal, can be adapted to domain specific usage

a suite of general, language oriented modules should be considered as a form of preprocessing of the data, followed by domain specific treatments

this can significantly improve the downstream specialized processing

Page 32: Tweeting beyond Facts – The Need for a Linguistic Perspective

Conclusion linguistic principles form a solid baseline for modular,

adaptable NLP modules

trigger-linguistic scope approach to speculative language, negation, and modality proved effective

parsing feasible, even for tweets, with preprocessing

extra-propositional parts of text prove effective in task-oriented evaluation

Page 33: Tweeting beyond Facts – The Need for a Linguistic Perspective

Headnoun, Base NP, MaxNP, PP<MaxNP>

<BaseNP> a 1993 <headnoun> survey </headnoun> </BaseNP>

<PP> of pediatricians and family practitioners </PP></MaxNP>

overly simplistic heuristic: in <MaxNP> <BaseNP> the news in California <BaseNP> <MaxNP>

ellipsis, coordination, …<MaxNP> <BaseNP> the health <BaseNP>

of <MaxNP> <BaseNP> vaccinated vs unvaccinated children </BaseNP> </MaxNP> </MaxNP>

Page 34: Tweeting beyond Facts – The Need for a Linguistic Perspective

Causal Triggers

Page 35: Tweeting beyond Facts – The Need for a Linguistic Perspective

Causality Michelle Khalife

is pervasive in language conveys important information trigger lists exist for biomedical texts triggers require predicate argument structure