crowds & niches teaching machines to diagnose: nlesc kick off ehumanities projects 2014
DESCRIPTION
This presentation was given at the NL eSchience Center during the "De Geest Uit De Fles" event for the kick off of eHumanities project in 2014: http://esciencecenter.nl/agenda/703-26-may-de-geest-uit-de-fles/TRANSCRIPT
Crowds & Niches Teaching Machines to Diagnose
Crowds & Niches ���Teaching Machines to Diagnose
Lora Aroyo
Crowds & Niches Teaching Machines to Diagnose IBM Confidential
• Open Domain Question-Answering Machine, that given – Rich Natural Language Questions – Over a Broad Domain of Knowledge
• Won a 2-game Jeopardy match against the all-time winners – viewed by over 50,000,000
Crowds & Niches Teaching Machines to Diagnose
Watson MD
• Adapt Watson to Medical QA • Mainly an NLP task • Cognitive computing systems need human-
annotated data for training, testing, evaluation
the human annotation task is one of semantic interpretation
Now answering medical questions!
Crowds & Niches Teaching Machines to Diagnose
Gadolinium agents are useful for patients with renal impairment, but in patients with severe renal failure requiring dialysis it presents a risk of nephrogenic systemic fibrosis.
Mention detection: find the spans (begin, end) of relevant medical terms (factors) in a passage. Factor Typing: find the type of each mention
substance disorder
disorder
NER
disorder
treatment
NLP Tasks
Crowds & Niches Teaching Machines to Diagnose
NLP Tasks Gadolinium agents are useful for patients with renal
impairment, but in patients with severe renal failure requiring dialysis it presents a risk of nephrogenic systemic fibrosis.
Mention detection: find the spans (begin, end) of relevant medical terms (factors) in a passage. Factor Typing: find the type of each mention Factor (Entity) Identification: find the corresponding ids for a mentioned factor in a knowledge-base
C0016911 C1408325
C0035078
C1619692
C0019004
NLP Tasks
Crowds & Niches Teaching Machines to Diagnose
Gadolinium agents are useful for patients with renal impairment, but in patients with severe renal failure requiring dialysis it presents a risk of nephrogenic systemic fibrosis.
Mention detection: find the spans (begin, end) of relevant medical terms (factors) in a passage. Factor Typing: find the type of each mention Factor (Entity) Identification: find the corresponding ids for a mentioned factor in a knowledge-base Relation detection: find relations that are expressed in a passage between factors?
cause treats
treats
contra- indicates
NLP Tasks
Crowds & Niches Teaching Machines to Diagnose
NLP Tasks Gadolinium agents are useful for patients with renal
impairment, but in patients with severe renal failure requiring dialysis it presents a risk of nephrogenic systemic fibrosis.
Mention detection: find the spans (begin, end) of relevant medical terms (factors) in a passage. Factor Typing: find the type of each mention Factor (Entity) Identification: find the corresponding ids for a mentioned factor in a knowledge-base Relation detection: find relations that are expressed in a passage between factors? Coreference: Find the mentions in a sentence that refer to the same factor.
Crowds & Niches Teaching Machines to Diagnose
Gold Standard Assumption
• Cognitive systems need to be told what is right & what is wrong • A gold standard or ground truth
• Performance is measured on test sets vetted by human experts à never perfect, always improving against test data
• Historically, gold standards are created assuming that for each annotated instance there is a single right answer
• Gold standard quality is measured in inter-annotator agreement à does not account for perspectives, for reasonable alternative interpretations
Crowds & Niches Teaching Machines to Diagnose
but people don’t always agree…
Crowds & Niches Teaching Machines to Diagnose
Disagreement
Gadolinium agents are useful for patients with renal
impairment, but in patients with severe renal failure requiring dialysis there is a risk of nephrogenic systemic fibrosis.
cause
Crowds & Niches Teaching Machines to Diagnose
Gadolinium agents are useful for patients with renal
impairment, but in patients with severe renal failure requiring dialysis there is a risk of nephrogenic systemic fibrosis.
side-effect The human annotation task is one of semantic interpretation
Disagreement
Crowds & Niches Teaching Machines to Diagnose
Position
maybe this disagreement is a signal and not noise?
can we harness it?
Crowds & Niches Teaching Machines to Diagnose
Key Question
How do we represent & measure disagreement in a
way that it can be harnessed?
Crowds & Niches Teaching Machines to Diagnose
Crowd Truth
Annotator disagreement is signal, not noise.
It is indicative of the variation in human semantic interpretation of signs, and can indicate ambiguity,
vagueness, over-generality, etc.
http://www.freefoto.com/preview/01-47-44/Flock-of-Birds
Crowds & Niches Teaching Machines to Diagnose
Position
symbiosis between humans & machines
machines learn from humans & machine help humans
Crowds & Niches Teaching Machines to Diagnose
Crowd Truth Framework
Crowds & Niches Teaching Machines to Diagnose
Human-Machine Workflows
Crowds & Niches Teaching Machines to Diagnose
Relation Extraction Crowdsourcing Ground Truth Data: CrowdTruth
Relations overlap in meaning Sentences are vague and ambiguous Experts have different interpretations
Crowds & Niches Teaching Machines to Diagnose
Crowds & Niches Teaching Machines to Diagnose
Representation Worker Vector
1 1 1
Gadolinium agents are useful for patients with renal impairment, but in patients with severe renal failure requiring dialysis there is a risk of nephrogenic systemic fibrosis.
Crowds & Niches Teaching Machines to Diagnose
Representation Sentence Vector
1 1 1
1 1
1
1 1
1 1
1 1
1
1
1
0 1 1 0 0 4 3 0 0 5 1 0
Crowds & Niches Teaching Machines to Diagnose
Feeling the way the CHEST expands (PALPATION), can identify areas of the lung that are full of fluid.
?PALPATIONIs CHEST related to
diagnose location associated with
is_a otherpart_of
0 0 02 3 0 0 0 1 0 0 44 1
Disagreement for Sentence Clarity
Unclear relationship between the two arguments reflected in the disagreement
Crowds & Niches Teaching Machines to Diagnose
?CONJUNCTIVITISHYPERAEMIA related toIs0 0 0 1 0 0 0 013 0 0 0 0 0
symptomcause
Redness (HYPERAEMIA), irritation (chemosis) and watering (epiphora) of the eyes are symptoms common to all forms of CONJUNCTIVITIS.
Disagreement for Sentence Clarity
Clearly expressed relation between the two arguments reflected in the agreement
Crowds & Niches Teaching Machines to Diagnose
Sentence-Relation Score
Measures how clearly a sentence expresses a relation
0 1 1 0 0 4 3 0 0 5 1 0
Unit vector for relation R6
Sentence Vector
Cosine = .55
Crowds & Niches Teaching Machines to Diagnose
Worker Disagreement
Measured per worker
Worker-sentence disagreement
0 1 1 0 0 4 3 0 0 5 1 0
Worker’s sentence vector
Sentence Vector
AVG (Cosine)
Crowds & Niches Teaching Machines to Diagnose
Crowd Truth Metrics Relation Extraction
Three parts to understand human interpretations: § Sentence
• How good is a sentence for relation extraction task?
§ Workers • How well does a worker understand the sentence?
§ Relations • Is the meaning of the relation clear? • How ambiguous/confusable is it?
Crowds & Niches Teaching Machines to Diagnose
Human-Machine Workflows
Crowds & Niches Teaching Machines to Diagnose
Crowdtruth.org
Crowds & Niches Teaching Machines to Diagnose
Crowdtruth.org
Crowds & Niches Teaching Machines to Diagnose
Provenance of Crowdsourcing
Crowds & Niches Teaching Machines to Diagnose
Watson MD
• Not every task is suitable for lay crowd, some require domain expertise
• Domain experts are busy • How to get them motivated
to perform annotation tasks? • How to make it efficient for
them and effective for annotations?
Crowd vs. Experts
Crowds & Niches Teaching Machines to Diagnose
Dr. Watson Experts Game
Crowds & Niches Teaching Machines to Diagnose
Dr. Watson Experts Game
Crowds & Niches Teaching Machines to Diagnose
Dr. Watson Experts Game
Crowds & Niches Teaching Machines to Diagnose
Dr. Watson Experts Game
Crowds & Niches Teaching Machines to Diagnose
Dr. Watson Experts Game
Crowds & Niches Teaching Machines to Diagnose
Dr. Watson Experts Game
Crowds & Niches Teaching Machines to Diagnose
• Experimenting with: • different domains, e.g. art, history, news • different formats, e.g. text, images, videos • different annotation tasks, e.g.
• medical factors, relations, synonyms, negation • events, event types, participants, locations • flowers, birds
• Integrating crowds from mTurk and CrowdFlower with domain experts from Dr. Detective, Waisda? and Accurator
Domain Independent
Crowds & Niches Teaching Machines to Diagnose
The Crew
• Lora Aroyo (VU) • Chris Welty (IBM) • Robert-Jan Sips (IBM) • Anca Dumitrache (VU) • Oana Inel (VU) • Khalid Khamkham (VU) • Tatiana Cristea (VU)
• Rens v. Honschooten (VU) • Benjamin Timmermans (VU) • Harriëtte Smook (VU) • Arne Rutjes (IBM) • Jelle van der Ploeg (IBM)
Crowds & Niches Teaching Machines to Diagnose
http://crowdtruth.org