building watson a brief overview of the deepqa …...2012/02/15 · © 2012 ibm corporation...
TRANSCRIPT
© 2012 IBM Corporation
Building WatsonA Brief Overview of DeepQA and the Jeopardy! Challenge
Eric BrownWatson Technologies @ IBM Research
© 2012 IBM Corporation
Informed Decision Making: Search vs. Expert Q&A
Decision Maker
Search EngineFinds Documents containing Keywords
Delivers Documents based on Popularity
Has Question
Distills to 2-3 Keywords
Reads Documents, Finds Answers
Finds & Analyzes EvidenceExpert
Understands Question
Produces Possible Answers & Evidence
Delivers Response, Evidence & Confidence
Analyzes Evidence, Computes Confidence
Asks NL Question
Considers Answer & Evidence
Decision Maker
© 2012 IBM Corporation
Automatic Open-Domain Question AnsweringA Long-Standing Challenge in Artificial Intelligence to emulate human expertise
Given– Rich Natural Language Questions– Over a Broad Domain of Knowledge
Deliver– Precise Answers: Determine what is being asked & give precise response– Accurate Confidences: Determine likelihood answer is correct– Consumable Justifications: Explain why the answer is right– Fast Response Time: Precision & Confidence in <3 seconds
3
© 2012 IBM Corporation
Capture the imagination– The Next Deep Blue
Engage the scientific community– Envision new ways for computers to impact society & science– Drive important and measurable scientific advances
Be Relevant to IBM Customers– Enable better, faster decision making over unstructured and structured content– Business Intelligence, Knowledge Discovery and Management, Government,
Compliance, Publishing, Legal, Healthcare, Business Integrity, Customer Relationship Management, Web Self-Service, Product Support, etc.
A Grand Challenge Opportunity
4
© 2012 IBM Corporation
Real Language is Real Hard
Chess–A finite, mathematically well-defined search space–Limited number of moves and states–Grounded in explicit, unambiguous mathematical rules
Human Language–Ambiguous, contextual and implicit–Grounded only in human cognition–Seemingly infinite number of ways to express the same meaning
© 2012 IBM Corporation
What Computers Find Easier (and Hard)
6 IBM Confidential
(ln(12,546,798 * π) ^ 2) / 34,567.46 =
Owner Serial NumberDavid Jones 45322190-AK
Serial Number Type Invoice #45322190-AK LapTop INV10895
Invoice # Vendor PaymentINV10895 MyBuy $104.56
David Jones
David Jones =
0.00885
Select Payment where Owner=“David Jones” and Type(Product)=“Laptop”,
Dave Jones
David Jones≠
© 2012 IBM Corporation
What Computers Find HardComputer programs are natively explicit, fast and exacting in their calculation over numbers and symbols….But Natural Language is implicit, highly contextual, ambiguous and often imprecise.
Where was X born?One day, from among his city views of Ulm, Otto chose a water color to
send to Albert Einstein as a remembrance of Einstein´s birthplace.
X ran this?If leadership is an art then surely Jack Welch has proved himself a
master painter during his tenure at GE.
Person Birth PlaceA. Einstein ULM
Person OrganizationJ. Welch GE
Structured
Unstructured
© 2012 IBM Corporation
$200The juice of this bog
fruit is sometimes used to treat urinary
tract infections
$400A deductive system of algebra named after its creator, a
19th Century English mathematician
8
The Jeopardy! Challenge A palpable, compelling and notable way to drive the technology of Question Answering along Key Dimensions
Broad/Open Domain
Complex Language
High Precision
Accurate Confidence
HighSpeed
$600In cell division, mitosis
splits the nucleus & cytokinesis splits this liquid cushioning the
nucleus
$800Grace Murray Hopper
is credited with applying this 3-letter term to a mysterious computer problem
$1000Of the 4 countries in the world that the U.S. does
not have diplomatic relations with, the one
that’s farthest north
© 2012 IBM Corporation
Basic Game PlayTechnology Classics The Great
OutdoorsSpeak of
the DickensMind Your Manners
Before and After
$200 $200 $200 $200 $200 $200$400 $400 $400 $400 $400 $400$600 $600 $600 $600 $600 $600$800 $800 $800 $800 $800 $800
$1000 $1000 $1000 $1000 $1000 $1000
6 Categories
5 Levels of Difficulty
1 of 3 Players Selects a Clue
Host reads Clue out loud
ALL POLICEMEN CAN THANK STEPHANIE KWOLEK FOR HER INVENTION OF THIS POLYMER FIBER, 5 TIMES TOUGHER THAN STEEL
TECHNOLOGY
All Players compete to answer
1st to buzz-in gets to answer
IF correctearns $ valueselects Next Clue
IF wrongloses $ value other players buzz again
(rebounds)
Two Rounds Per Game + Final Question
ONE Daily Double in First Round, TWO in 2nd Round
© 2012 IBM Corporation
This fish was thought to be extinct millions of years ago until one was found off South Africa in 1938 Category: ENDS IN "TH" Answer:
When hit by electrons, a phosphor gives off electromagnetic energy in this formCategory: General ScienceAnswer:
Secy. Chase just submitted this to me for the third time--guess what, pal. This time I'm accepting itCategory: Lincoln BlogsAnswer:
The type of thing being asked for is often indicated but
can go from specific to very vague
coelacanth
light (or photons)
his resignation
10
Some Basic Jeopardy! Clues
© 2012 IBM Corporation11
Broad Domain
0.00%
0.50%
1.00%
1.50%
2.00%
2.50%
3.00%
he film
grou
pca
pita
lw
oman
song
sing
ersh
owco
mpo
ser
title
fruit
plan
etth
ere
pers
onla
ngua
geho
liday
colo
rpl
ace
son
tree
line
prod
uct
bird
san
imal
ssi
tela
dypr
ovin
ce dog
subs
tanc
ein
sect
way
foun
der
sena
tor
form
dise
ase
som
eone
mak
erfa
ther
wor
dsob
ject
writ
erno
velis
the
roin
edi
shpo
stm
onth
vege
tabl
esi
gnco
untri
es hat
bay
Our Focus is on reusable NLP technology for analyzing vast volumes of as-is text. Structured sources (DBs and KBs) provide background knowledge for interpreting the text.
We do NOT attempt to anticipate all questions and build databases.
In a random sample of 20,000 questions we found2,500 distinct types*. The most frequent occurring <3% of the time.
The distribution has a very long tail.
And for each these types 1000’s of different things may be asked.
*13% are non-distinct (e.g, it, this, these or NA)
Even going for the head of the tail willbarely make a dent
We do NOT try to build a formal model of the world
© 2012 IBM Corporation
Automatic Learning for “Reading”
Officials Submit Resignations (.7)People earn degrees at schools (0.9)
Inventors patent inventions (.8)
Volumes of Text Syntactic Frames Semantic Frames
Vessels Sink (0.7)People sink 8-balls (0.5) (in pool/0.8)
Fluid is a liquid (.6)Liquid is a fluid (.5)
© 2012 IBM Corporation
Evaluating Possibilities and Their Evidence
Is(“Cytoplasm”, “liquid”) = 0.2Is(“organelle”, “liquid”) = 0.1
In cell division, mitosis splits the nucleus & cytokinesissplits this liquid cushioning the nucleus.
Is(“vacuole”, “liquid”) = 0.2Is(“plasma”, “liquid”) = 0.7
“Cytoplasm is a fluid surrounding the nucleus…”
Wordnet Is_a(Fluid, Liquid) ?
Learned Is_a(Fluid, Liquid) yes.
↑
OrganelleVacuoleCytoplasmPlasmaMitochondriaBlood …
Many candidate answers (CAs) are generated from many different searches
Each possibility is evaluated according to different dimensions of evidence.
Just One piece of evidence is if the CA is of the right type. In this case a “liquid”.
© 2012 IBM Corporation
Different Types of Evidence: Keyword Evidence
celebrated
India
In May 1898
400th anniversary
arrival in
Portugal
India
In May
Garyexplorer
celebrated
anniversary
in Portugal
Keyword Matching
Keyword Matching
Keyword Matching
Keyword Matching
Keyword Matching
14
arrived in
In May, Gary arrived in India after he celebrated hisanniversary in Portugal.
In May 1898 Portugal celebrated the 400th anniversary of this explorer’s arrival in India.
Evidence suggests “Gary” is the answer BUT the system must learn that keyword matching may be weak relative to other types of evidence
© 2012 IBM Corporation
On 27th May 1498, Vasco da Gama landed in Kappad Beach
On 27th May 1498, Vasco da Gama landed in Kappad Beach
celebrated
May 1898 400th anniversary
arrival in
In May 1898 Portugal celebrated the 400th anniversary of this explorer’s arrival in India.
Portugallanded in
27th May 1498
Vasco da Gama
Temporal Reasoning
Statistical Paraphrasing
GeoSpatial Reasoning
explorer
On 27th May 1498, Vasco da Gama landed in Kappad BeachOn the 27th of May 1498, Vasco da
Gama landed in Kappad Beach
Kappad Beach
Para-phrases
Geo-KB
DateMath
15
India
Stronger evidence can be much harder to find and score.
The evidence is still not 100% certain.
Search Far and Wide
Explore many hypotheses
Find Judge Evidence
Many inference algorithms
Different Types of Evidence: Deeper Evidence
© 2012 IBM Corporation
Missing Links
On hearing of the discovery of George Mallory's body, he told reporters he still thinks he was first.
Buttons
TV remote controls,Shirts, Telephones
Mt Everest
He was first
EdmundHillary
Category: Common Bonds
© 2012 IBM Corporation17
What It Takes to compete against Top Human Jeopardy! PlayersOur Analysis Reveals the Winner’s Cloud
Winning Human Performance
2007 QA Computer System
Grand Champion Human Performance
Top human players are remarkably
good.
Each dot – actual historical human Jeopardy! games
More Confident Less Confident
© 2012 IBM Corporation18
What It Takes to compete against Top Human Jeopardy! PlayersOur Analysis Reveals the Winner’s Cloud
2007 QA Computer System
In 2007, we committed to making a Huge Leap!
More Confident Less Confident
Each dot – actual historical human Jeopardy! games
Computers?Not So Good.
Winning Human Performance
Grand Champion Human Performance
© 2012 IBM Corporation
DeepQA: The architecture underlying WatsonGenerates many hypotheses, collects a wide range of evidence and balances the combined
confidences of over 100 different analytics that analyze the evidence from different dimensions
. . .
Answer Scoring
Models
Answer & Confidence
Question
Evidence Sources
Models
Models
Models
Models
ModelsPrimarySearch
CandidateAnswer
Generation
HypothesisGeneration
Hypothesis and Evidence Scoring
Final Confidence Merging & Ranking
Synthesis
Answer Sources
Question & Topic
Analysis
EvidenceRetrieval
Deep Evidence Scoring
Learned Modelshelp combine and
weigh the Evidence
HypothesisGeneration
Hypothesis and Evidence Scoring
QuestionDecomposition
Information Retrieval
Natural Language
Processing
Knowledge Representation and Reasoning
Machine Learning
Parallel and Distributed Computing
© 2012 IBM Corporation
Grouping features to produce Evidence Profiles
Clue: Chile shares its longest land border with this country.
Positive Evidence
Negative Evidence-0.2
0
0.2
0.4
0.6
0.8
1Argentina Bolivia Bolivia is more Popular due to a
commonly discussed border dispute. But Watson learns that Argentina has better evidence.
© 2012 IBM Corporation
Evidence: Time, Space, Source, Type etc.
Clue: You’ll find Bethel College and a Seminary in this “holy” Minnesota city.
Saint PaulSouth Bend
There’s a Bethel College and a Seminary in both cities. System is not weighing location evidence high enough to give St. Paul the edge.
© 2012 IBM Corporation
Evidence: PunsClue: You’ll find Bethel College and a Seminary in this “holy” Minnesota city.
Saint PaulSouth Bend
Humans may get this based on the pun since St. Paul since is a “holy” city. We are adding a Pun Scorer that will discover and score Pun-like relationships.
© 2012 IBM Corporation
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%% Answered
Baseline
12/2007
8/2008
5/2009
10/2009
11/2010
12/2008
Deep QA: Incremental Progress in Precision and Confidence 6/2007-11/2010
5/2008
Now Playing in the Winners Cloud
4/2010
Prec
isio
n
© 2012 IBM Corporation
Speed
Evidence Retrieval
Answer Scoring
Final Merging& Ranking
Candidate Answer
Generation
Answer & Confidence
Question100s Possible
Answers 1000’s of Pieces of Evidence
100,000’s scores from many simultaneous Text Analysis
Algorithms
Primary Search
Deep Evidence Scoring
~100 Search Results
Question/Topic
Analysis
0.5s 300s 10s 100s 1ks 200s 0.5s
Total: ~25 min
Primary Search
Primary Search
Candidate Answer
Generation
Candidate Answer
GenerationAnswer Scoring
Answer Scoring Evidence
Retrieval
Evidence Retrieval
Deep Evidence Scoring
Deep Evidence Scoring
Total: ~3 sec
A Few Other Issues:
Leveraging multi-core processorsOptimizing in-memory data structuresReducing frame-work latenciesFine tuning garbage collection
© 2012 IBM Corporation
90 x IBM Power 7501 servers 2880 POWER7 coresPOWER7 3.55 GHz chip500 GB per sec on-chip bandwidth10 Gb Ethernet network15 Terabytes of memory20 Terabytes of disk, clusteredCan operate at 80 TeraflopsRuns IBM DeepQA softwareScales out with and searches vast amounts of unstructured information with UIMA & Hadoop open source componentsLinux provides a scalable, open platform, optimized to exploit POWER7 performance10 racks include servers, networking, shared disk system, cluster controllers
Watson – a Workload Optimized System
1 Note that the Power 750 featuring POWER7 is a commercially available server that runs AIX, IBM i and Linux and has been in market since Feb 2010
© 2012 IBM Corporation
Watson: Precision, Confidence & Speed
Deep Analytics – We achieved champion-levels of Precision and Confidence over a huge variety of expression
Speed – By optimizing Watson’s computation for Jeopardy! on 2,880 POWER7 processing cores we went from 2 hours per question on a single CPU to an average of just 3 seconds – fast enough to compete with the best.
Results – in 55 real-time sparring against former Tournament of Champion Players, Watson put on a very competitive performance, winning 71%. In the final Exhibition Match against Ken Jennings and Brad Rutter, Watson won!
© 2012 IBM Corporation
Potential Business Applications
Tech Support: Help-desk, Contact Centers
Healthcare / Life Sciences: Diagnostic Assistance, Evidenced-Based, Collaborative Medicine
Enterprise Knowledge Management and Business Intelligence
Government: Improved Information Sharing and Security
© 2012 IBM Corporation
The Core Technical Team*Researchers and Engineers in NLP, ML, IR, KR&R and CL at
IBM Labs and a growing number of universities
© 2012 IBM Corporation
THANK YOU