building watson a brief overview of the deepqa …...2012/02/15  · © 2012 ibm corporation...

29
© 2012 IBM Corporation Building Watson A Brief Overview of DeepQA and the Jeopardy! Challenge Eric Brown Watson Technologies @ IBM Research

Upload: others

Post on 23-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Building Watson A Brief Overview of the DeepQA …...2012/02/15  · © 2012 IBM Corporation Building WatsonA Brief Overview of DeepQA and the Jeopardy! Challenge Eric Brown Watson

© 2012 IBM Corporation

Building WatsonA Brief Overview of DeepQA and the Jeopardy! Challenge

Eric BrownWatson Technologies @ IBM Research

Page 2: Building Watson A Brief Overview of the DeepQA …...2012/02/15  · © 2012 IBM Corporation Building WatsonA Brief Overview of DeepQA and the Jeopardy! Challenge Eric Brown Watson

© 2012 IBM Corporation

Informed Decision Making: Search vs. Expert Q&A

Decision Maker

Search EngineFinds Documents containing Keywords

Delivers Documents based on Popularity

Has Question

Distills to 2-3 Keywords

Reads Documents, Finds Answers

Finds & Analyzes EvidenceExpert

Understands Question

Produces Possible Answers & Evidence

Delivers Response, Evidence & Confidence

Analyzes Evidence, Computes Confidence

Asks NL Question

Considers Answer & Evidence

Decision Maker

Page 3: Building Watson A Brief Overview of the DeepQA …...2012/02/15  · © 2012 IBM Corporation Building WatsonA Brief Overview of DeepQA and the Jeopardy! Challenge Eric Brown Watson

© 2012 IBM Corporation

Automatic Open-Domain Question AnsweringA Long-Standing Challenge in Artificial Intelligence to emulate human expertise

Given– Rich Natural Language Questions– Over a Broad Domain of Knowledge

Deliver– Precise Answers: Determine what is being asked & give precise response– Accurate Confidences: Determine likelihood answer is correct– Consumable Justifications: Explain why the answer is right– Fast Response Time: Precision & Confidence in <3 seconds

3

Page 4: Building Watson A Brief Overview of the DeepQA …...2012/02/15  · © 2012 IBM Corporation Building WatsonA Brief Overview of DeepQA and the Jeopardy! Challenge Eric Brown Watson

© 2012 IBM Corporation

Capture the imagination– The Next Deep Blue

Engage the scientific community– Envision new ways for computers to impact society & science– Drive important and measurable scientific advances

Be Relevant to IBM Customers– Enable better, faster decision making over unstructured and structured content– Business Intelligence, Knowledge Discovery and Management, Government,

Compliance, Publishing, Legal, Healthcare, Business Integrity, Customer Relationship Management, Web Self-Service, Product Support, etc.

A Grand Challenge Opportunity

4

Page 5: Building Watson A Brief Overview of the DeepQA …...2012/02/15  · © 2012 IBM Corporation Building WatsonA Brief Overview of DeepQA and the Jeopardy! Challenge Eric Brown Watson

© 2012 IBM Corporation

Real Language is Real Hard

Chess–A finite, mathematically well-defined search space–Limited number of moves and states–Grounded in explicit, unambiguous mathematical rules

Human Language–Ambiguous, contextual and implicit–Grounded only in human cognition–Seemingly infinite number of ways to express the same meaning

Page 6: Building Watson A Brief Overview of the DeepQA …...2012/02/15  · © 2012 IBM Corporation Building WatsonA Brief Overview of DeepQA and the Jeopardy! Challenge Eric Brown Watson

© 2012 IBM Corporation

What Computers Find Easier (and Hard)

6 IBM Confidential

(ln(12,546,798 * π) ^ 2) / 34,567.46 =

Owner Serial NumberDavid Jones 45322190-AK

Serial Number Type Invoice #45322190-AK LapTop INV10895

Invoice # Vendor PaymentINV10895 MyBuy $104.56

David Jones

David Jones =

0.00885

Select Payment where Owner=“David Jones” and Type(Product)=“Laptop”,

Dave Jones

David Jones≠

Page 7: Building Watson A Brief Overview of the DeepQA …...2012/02/15  · © 2012 IBM Corporation Building WatsonA Brief Overview of DeepQA and the Jeopardy! Challenge Eric Brown Watson

© 2012 IBM Corporation

What Computers Find HardComputer programs are natively explicit, fast and exacting in their calculation over numbers and symbols….But Natural Language is implicit, highly contextual, ambiguous and often imprecise.

Where was X born?One day, from among his city views of Ulm, Otto chose a water color to

send to Albert Einstein as a remembrance of Einstein´s birthplace.

X ran this?If leadership is an art then surely Jack Welch has proved himself a

master painter during his tenure at GE.

Person Birth PlaceA. Einstein ULM

Person OrganizationJ. Welch GE

Structured

Unstructured

Page 8: Building Watson A Brief Overview of the DeepQA …...2012/02/15  · © 2012 IBM Corporation Building WatsonA Brief Overview of DeepQA and the Jeopardy! Challenge Eric Brown Watson

© 2012 IBM Corporation

$200The juice of this bog

fruit is sometimes used to treat urinary

tract infections

$400A deductive system of algebra named after its creator, a

19th Century English mathematician

8

The Jeopardy! Challenge A palpable, compelling and notable way to drive the technology of Question Answering along Key Dimensions

Broad/Open Domain

Complex Language

High Precision

Accurate Confidence

HighSpeed

$600In cell division, mitosis

splits the nucleus & cytokinesis splits this liquid cushioning the

nucleus

$800Grace Murray Hopper

is credited with applying this 3-letter term to a mysterious computer problem

$1000Of the 4 countries in the world that the U.S. does

not have diplomatic relations with, the one

that’s farthest north

Page 9: Building Watson A Brief Overview of the DeepQA …...2012/02/15  · © 2012 IBM Corporation Building WatsonA Brief Overview of DeepQA and the Jeopardy! Challenge Eric Brown Watson

© 2012 IBM Corporation

Basic Game PlayTechnology Classics The Great

OutdoorsSpeak of

the DickensMind Your Manners

Before and After

$200 $200 $200 $200 $200 $200$400 $400 $400 $400 $400 $400$600 $600 $600 $600 $600 $600$800 $800 $800 $800 $800 $800

$1000 $1000 $1000 $1000 $1000 $1000

6 Categories

5 Levels of Difficulty

1 of 3 Players Selects a Clue

Host reads Clue out loud

ALL POLICEMEN CAN THANK STEPHANIE KWOLEK FOR HER INVENTION OF THIS POLYMER FIBER, 5 TIMES TOUGHER THAN STEEL

TECHNOLOGY

All Players compete to answer

1st to buzz-in gets to answer

IF correctearns $ valueselects Next Clue

IF wrongloses $ value other players buzz again

(rebounds)

Two Rounds Per Game + Final Question

ONE Daily Double in First Round, TWO in 2nd Round

Page 10: Building Watson A Brief Overview of the DeepQA …...2012/02/15  · © 2012 IBM Corporation Building WatsonA Brief Overview of DeepQA and the Jeopardy! Challenge Eric Brown Watson

© 2012 IBM Corporation

This fish was thought to be extinct millions of years ago until one was found off South Africa in 1938 Category: ENDS IN "TH" Answer:

When hit by electrons, a phosphor gives off electromagnetic energy in this formCategory: General ScienceAnswer:

Secy. Chase just submitted this to me for the third time--guess what, pal. This time I'm accepting itCategory: Lincoln BlogsAnswer:

The type of thing being asked for is often indicated but

can go from specific to very vague

coelacanth

light (or photons)

his resignation

10

Some Basic Jeopardy! Clues

Page 11: Building Watson A Brief Overview of the DeepQA …...2012/02/15  · © 2012 IBM Corporation Building WatsonA Brief Overview of DeepQA and the Jeopardy! Challenge Eric Brown Watson

© 2012 IBM Corporation11

Broad Domain

0.00%

0.50%

1.00%

1.50%

2.00%

2.50%

3.00%

he film

grou

pca

pita

lw

oman

song

sing

ersh

owco

mpo

ser

title

fruit

plan

etth

ere

pers

onla

ngua

geho

liday

colo

rpl

ace

son

tree

line

prod

uct

bird

san

imal

ssi

tela

dypr

ovin

ce dog

subs

tanc

ein

sect

way

foun

der

sena

tor

form

dise

ase

som

eone

mak

erfa

ther

wor

dsob

ject

writ

erno

velis

the

roin

edi

shpo

stm

onth

vege

tabl

esi

gnco

untri

es hat

bay

Our Focus is on reusable NLP technology for analyzing vast volumes of as-is text. Structured sources (DBs and KBs) provide background knowledge for interpreting the text.

We do NOT attempt to anticipate all questions and build databases.

In a random sample of 20,000 questions we found2,500 distinct types*. The most frequent occurring <3% of the time.

The distribution has a very long tail.

And for each these types 1000’s of different things may be asked.

*13% are non-distinct (e.g, it, this, these or NA)

Even going for the head of the tail willbarely make a dent

We do NOT try to build a formal model of the world

Page 12: Building Watson A Brief Overview of the DeepQA …...2012/02/15  · © 2012 IBM Corporation Building WatsonA Brief Overview of DeepQA and the Jeopardy! Challenge Eric Brown Watson

© 2012 IBM Corporation

Automatic Learning for “Reading”

Officials Submit Resignations (.7)People earn degrees at schools (0.9)

Inventors patent inventions (.8)

Volumes of Text Syntactic Frames Semantic Frames

Vessels Sink (0.7)People sink 8-balls (0.5) (in pool/0.8)

Fluid is a liquid (.6)Liquid is a fluid (.5)

Page 13: Building Watson A Brief Overview of the DeepQA …...2012/02/15  · © 2012 IBM Corporation Building WatsonA Brief Overview of DeepQA and the Jeopardy! Challenge Eric Brown Watson

© 2012 IBM Corporation

Evaluating Possibilities and Their Evidence

Is(“Cytoplasm”, “liquid”) = 0.2Is(“organelle”, “liquid”) = 0.1

In cell division, mitosis splits the nucleus & cytokinesissplits this liquid cushioning the nucleus.

Is(“vacuole”, “liquid”) = 0.2Is(“plasma”, “liquid”) = 0.7

“Cytoplasm is a fluid surrounding the nucleus…”

Wordnet Is_a(Fluid, Liquid) ?

Learned Is_a(Fluid, Liquid) yes.

OrganelleVacuoleCytoplasmPlasmaMitochondriaBlood …

Many candidate answers (CAs) are generated from many different searches

Each possibility is evaluated according to different dimensions of evidence.

Just One piece of evidence is if the CA is of the right type. In this case a “liquid”.

Page 14: Building Watson A Brief Overview of the DeepQA …...2012/02/15  · © 2012 IBM Corporation Building WatsonA Brief Overview of DeepQA and the Jeopardy! Challenge Eric Brown Watson

© 2012 IBM Corporation

Different Types of Evidence: Keyword Evidence

celebrated

India

In May 1898

400th anniversary

arrival in

Portugal

India

In May

Garyexplorer

celebrated

anniversary

in Portugal

Keyword Matching

Keyword Matching

Keyword Matching

Keyword Matching

Keyword Matching

14

arrived in

In May, Gary arrived in India after he celebrated hisanniversary in Portugal.

In May 1898 Portugal celebrated the 400th anniversary of this explorer’s arrival in India.

Evidence suggests “Gary” is the answer BUT the system must learn that keyword matching may be weak relative to other types of evidence

Page 15: Building Watson A Brief Overview of the DeepQA …...2012/02/15  · © 2012 IBM Corporation Building WatsonA Brief Overview of DeepQA and the Jeopardy! Challenge Eric Brown Watson

© 2012 IBM Corporation

On 27th May 1498, Vasco da Gama landed in Kappad Beach

On 27th May 1498, Vasco da Gama landed in Kappad Beach

celebrated

May 1898 400th anniversary

arrival in

In May 1898 Portugal celebrated the 400th anniversary of this explorer’s arrival in India.

Portugallanded in

27th May 1498

Vasco da Gama

Temporal Reasoning

Statistical Paraphrasing

GeoSpatial Reasoning

explorer

On 27th May 1498, Vasco da Gama landed in Kappad BeachOn the 27th of May 1498, Vasco da

Gama landed in Kappad Beach

Kappad Beach

Para-phrases

Geo-KB

DateMath

15

India

Stronger evidence can be much harder to find and score.

The evidence is still not 100% certain.

Search Far and Wide

Explore many hypotheses

Find Judge Evidence

Many inference algorithms

Different Types of Evidence: Deeper Evidence

Page 16: Building Watson A Brief Overview of the DeepQA …...2012/02/15  · © 2012 IBM Corporation Building WatsonA Brief Overview of DeepQA and the Jeopardy! Challenge Eric Brown Watson

© 2012 IBM Corporation

Missing Links

On hearing of the discovery of George Mallory's body, he told reporters he still thinks he was first.

Buttons

TV remote controls,Shirts, Telephones

Mt Everest

He was first

EdmundHillary

Category: Common Bonds

Page 17: Building Watson A Brief Overview of the DeepQA …...2012/02/15  · © 2012 IBM Corporation Building WatsonA Brief Overview of DeepQA and the Jeopardy! Challenge Eric Brown Watson

© 2012 IBM Corporation17

What It Takes to compete against Top Human Jeopardy! PlayersOur Analysis Reveals the Winner’s Cloud

Winning Human Performance

2007 QA Computer System

Grand Champion Human Performance

Top human players are remarkably

good.

Each dot – actual historical human Jeopardy! games

More Confident Less Confident

Page 18: Building Watson A Brief Overview of the DeepQA …...2012/02/15  · © 2012 IBM Corporation Building WatsonA Brief Overview of DeepQA and the Jeopardy! Challenge Eric Brown Watson

© 2012 IBM Corporation18

What It Takes to compete against Top Human Jeopardy! PlayersOur Analysis Reveals the Winner’s Cloud

2007 QA Computer System

In 2007, we committed to making a Huge Leap!

More Confident Less Confident

Each dot – actual historical human Jeopardy! games

Computers?Not So Good.

Winning Human Performance

Grand Champion Human Performance

Page 19: Building Watson A Brief Overview of the DeepQA …...2012/02/15  · © 2012 IBM Corporation Building WatsonA Brief Overview of DeepQA and the Jeopardy! Challenge Eric Brown Watson

© 2012 IBM Corporation

DeepQA: The architecture underlying WatsonGenerates many hypotheses, collects a wide range of evidence and balances the combined

confidences of over 100 different analytics that analyze the evidence from different dimensions

. . .

Answer Scoring

Models

Answer & Confidence

Question

Evidence Sources

Models

Models

Models

Models

ModelsPrimarySearch

CandidateAnswer

Generation

HypothesisGeneration

Hypothesis and Evidence Scoring

Final Confidence Merging & Ranking

Synthesis

Answer Sources

Question & Topic

Analysis

EvidenceRetrieval

Deep Evidence Scoring

Learned Modelshelp combine and

weigh the Evidence

HypothesisGeneration

Hypothesis and Evidence Scoring

QuestionDecomposition

Information Retrieval

Natural Language

Processing

Knowledge Representation and Reasoning

Machine Learning

Parallel and Distributed Computing

Page 20: Building Watson A Brief Overview of the DeepQA …...2012/02/15  · © 2012 IBM Corporation Building WatsonA Brief Overview of DeepQA and the Jeopardy! Challenge Eric Brown Watson

© 2012 IBM Corporation

Grouping features to produce Evidence Profiles

Clue: Chile shares its longest land border with this country.

Positive Evidence

Negative Evidence-0.2

0

0.2

0.4

0.6

0.8

1Argentina Bolivia Bolivia is more Popular due to a

commonly discussed border dispute. But Watson learns that Argentina has better evidence.

Page 21: Building Watson A Brief Overview of the DeepQA …...2012/02/15  · © 2012 IBM Corporation Building WatsonA Brief Overview of DeepQA and the Jeopardy! Challenge Eric Brown Watson

© 2012 IBM Corporation

Evidence: Time, Space, Source, Type etc.

Clue: You’ll find Bethel College and a Seminary in this “holy” Minnesota city.

Saint PaulSouth Bend

There’s a Bethel College and a Seminary in both cities. System is not weighing location evidence high enough to give St. Paul the edge.

Page 22: Building Watson A Brief Overview of the DeepQA …...2012/02/15  · © 2012 IBM Corporation Building WatsonA Brief Overview of DeepQA and the Jeopardy! Challenge Eric Brown Watson

© 2012 IBM Corporation

Evidence: PunsClue: You’ll find Bethel College and a Seminary in this “holy” Minnesota city.

Saint PaulSouth Bend

Humans may get this based on the pun since St. Paul since is a “holy” city. We are adding a Pun Scorer that will discover and score Pun-like relationships.

Page 23: Building Watson A Brief Overview of the DeepQA …...2012/02/15  · © 2012 IBM Corporation Building WatsonA Brief Overview of DeepQA and the Jeopardy! Challenge Eric Brown Watson

© 2012 IBM Corporation

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%% Answered

Baseline

12/2007

8/2008

5/2009

10/2009

11/2010

12/2008

Deep QA: Incremental Progress in Precision and Confidence 6/2007-11/2010

5/2008

Now Playing in the Winners Cloud

4/2010

Prec

isio

n

Page 24: Building Watson A Brief Overview of the DeepQA …...2012/02/15  · © 2012 IBM Corporation Building WatsonA Brief Overview of DeepQA and the Jeopardy! Challenge Eric Brown Watson

© 2012 IBM Corporation

Speed

Evidence Retrieval

Answer Scoring

Final Merging& Ranking

Candidate Answer

Generation

Answer & Confidence

Question100s Possible

Answers 1000’s of Pieces of Evidence

100,000’s scores from many simultaneous Text Analysis

Algorithms

Primary Search

Deep Evidence Scoring

~100 Search Results

Question/Topic

Analysis

0.5s 300s 10s 100s 1ks 200s 0.5s

Total: ~25 min

Primary Search

Primary Search

Candidate Answer

Generation

Candidate Answer

GenerationAnswer Scoring

Answer Scoring Evidence

Retrieval

Evidence Retrieval

Deep Evidence Scoring

Deep Evidence Scoring

Total: ~3 sec

A Few Other Issues:

Leveraging multi-core processorsOptimizing in-memory data structuresReducing frame-work latenciesFine tuning garbage collection

Page 25: Building Watson A Brief Overview of the DeepQA …...2012/02/15  · © 2012 IBM Corporation Building WatsonA Brief Overview of DeepQA and the Jeopardy! Challenge Eric Brown Watson

© 2012 IBM Corporation

90 x IBM Power 7501 servers 2880 POWER7 coresPOWER7 3.55 GHz chip500 GB per sec on-chip bandwidth10 Gb Ethernet network15 Terabytes of memory20 Terabytes of disk, clusteredCan operate at 80 TeraflopsRuns IBM DeepQA softwareScales out with and searches vast amounts of unstructured information with UIMA & Hadoop open source componentsLinux provides a scalable, open platform, optimized to exploit POWER7 performance10 racks include servers, networking, shared disk system, cluster controllers

Watson – a Workload Optimized System

1 Note that the Power 750 featuring POWER7 is a commercially available server that runs AIX, IBM i and Linux and has been in market since Feb 2010

Page 26: Building Watson A Brief Overview of the DeepQA …...2012/02/15  · © 2012 IBM Corporation Building WatsonA Brief Overview of DeepQA and the Jeopardy! Challenge Eric Brown Watson

© 2012 IBM Corporation

Watson: Precision, Confidence & Speed

Deep Analytics – We achieved champion-levels of Precision and Confidence over a huge variety of expression

Speed – By optimizing Watson’s computation for Jeopardy! on 2,880 POWER7 processing cores we went from 2 hours per question on a single CPU to an average of just 3 seconds – fast enough to compete with the best.

Results – in 55 real-time sparring against former Tournament of Champion Players, Watson put on a very competitive performance, winning 71%. In the final Exhibition Match against Ken Jennings and Brad Rutter, Watson won!

Page 27: Building Watson A Brief Overview of the DeepQA …...2012/02/15  · © 2012 IBM Corporation Building WatsonA Brief Overview of DeepQA and the Jeopardy! Challenge Eric Brown Watson

© 2012 IBM Corporation

Potential Business Applications

Tech Support: Help-desk, Contact Centers

Healthcare / Life Sciences: Diagnostic Assistance, Evidenced-Based, Collaborative Medicine

Enterprise Knowledge Management and Business Intelligence

Government: Improved Information Sharing and Security

Page 28: Building Watson A Brief Overview of the DeepQA …...2012/02/15  · © 2012 IBM Corporation Building WatsonA Brief Overview of DeepQA and the Jeopardy! Challenge Eric Brown Watson

© 2012 IBM Corporation

The Core Technical Team*Researchers and Engineers in NLP, ML, IR, KR&R and CL at

IBM Labs and a growing number of universities

Page 29: Building Watson A Brief Overview of the DeepQA …...2012/02/15  · © 2012 IBM Corporation Building WatsonA Brief Overview of DeepQA and the Jeopardy! Challenge Eric Brown Watson

© 2012 IBM Corporation

THANK YOU