in search of answers: watson and a look forward · in search of answers: watson and a look forward...

38
Leveraging Information for Smarter Organizational Outcomes February, 2013 1 © 2009 IBM Corporation In Search of Answers: Watson and a Look Forward Frank Stein Director of Analytics Solution Center Washington, DC [email protected] Leveraging Information for Smarter Organizational Outcomes

Upload: others

Post on 22-Mar-2020

14 views

Category:

Documents


1 download

TRANSCRIPT

1 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

© 2009 IBM CorporationIBM Confidential

February, 2013

1© 2009 IBM Corporation

In Search of Answers: Watson and a Look

Forward

Frank SteinDirector of Analytics Solution CenterWashington, [email protected]

Leveraging Information for Smarter Organizational Outcomes

2 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

Agenda

What is Watson?

Double Jeopardy: Where are we going with Watson?

Final Jeopardy: Watson at Work in Health Care

3 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

Watson answers a grand challengeCan we design a computing system that rivals a human’s ability to answer questions posed in natural language, interpreting meaning and context and

retrieving, analyzing and understanding vast amounts of information in real-time?

4 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

Real Language is Real Hard

Chess• A finite, mathematically well-defined search space

• Limited number of moves and states

• Grounded in explicit, unambiguous mathematical rules

Human Language• Ambiguous, contextual and implicit

• Grounded only in human cognition• Seemingly infinite number of ways to express the same meaning

5 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

The hard part for Watson is understanding the Natural Language well enough to find and validate the correct answer in the #1 spot…Computing a confidence that it’s right…and doing it fast enough to compete on Jeopardy!

Where was Einstein born?One day, from among his city views of Ulm, Otto chose a watercolor to

send to Albert Einstein as a remembrance of Einstein´s birthplace.

Welch ran this?If leadership is an art then surely Jack Welch has proved himself a master

painter during his tenure at GE.

Structured

Unstructured

Natural Language Processing

6 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

The Jeopardy! Challenge: A compelling and notable way to drive and measure the technology of automatic Question Answering along 5 Key Dimensions

Broad/Open Broad/Open DomainDomain

Complex Complex LanguageLanguage

High High PrecisionPrecision

Accurate Accurate ConfidenceConfidence

HighHighSpeedSpeed

$800In cell division, mitosis

splits the nucleus & cytokinesis splits this liquid cushioning the

nucleus

$200If you're standing, it's the

direction you should look to check out the

wainscoting. $1000Of the 4 countries in the world that the U.S. does

not have diplomatic relations with, the one that’s farthest north

7 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

7

Broad Domain

Our Focus is on reusable NLP technology for analyzing vast volumes of as-is text. Structured sources (DBs and KBs) provide background knowledge for interpreting the text.

We do NOT attempt to anticipate all questions and build databases.

In a random sample of 20,000 questions we found2,500 distinct types*. The most frequent occurring <3% of the time.

The distribution has a very long tail.

And for each these types 1000’s of different things may be asked.

*13% are non-distinct (e.g, it, this, these or NA)

Even going for the head of the tail willbarely make a dent

We do NOT try to build a formal model of the world

8 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

Automatic Learning for “Reading”

Officials Submit Resignations (.7)People earn degrees at schools (0.9)

Inventors patent inventions (.8)

Volumes of Text Volumes of Text Syntactic FramesSyntactic Frames Semantic FramesSemantic Frames

Vessels Sink (0.7)People sink 8-balls (0.5) (in pool/0.8)

subject verb object

Sentence

Parsing Generalization &

Statistical Aggregation

Fluid is a liquid (.6)Liquid is a fluid (.5)

9 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

Medical journal concept annotations

Medications

SymptomsDiseases

Modifiers

10 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

Keyword Evidence

IBM Confidential

celebrated

India

In May 1898

400th anniversary

arrival in

Portugal

India

In May

Garyexplorer

celebrated

anniversary

in Portugal

Keyword MatchingKeyword Matching

Keyword MatchingKeyword Matching

Keyword MatchingKeyword Matching

Keyword MatchingKeyword Matching

Keyword MatchingKeyword Matching

10

arrived in

In May, Gary arrived in India after he celebrated his anniversary in Portugal.

In May 1898 Portugal celebrated the 400th anniversary of this explorer’s arrival in India.

This evidence suggests “Gary” is the answer BUT the system must learn that keyword matching may be weak relative to other types of evidence

11 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

celebrated

May 1898 400th anniversary

arrival in

In May 1898 Portugal celebrated the 400th anniversary of this explorer’s arrival in India.

Portugallanded in

27th May 1498

Vasco da Gama

Temporal Reasoning

Statistical Paraphrasing

GeoSpatial Reasoning

explorer

On the 27th of May 1498, Vasco da Gama landed in Kappad Beach

Kappad Beach

Para- phrase

s

Geo- KB

DateMath

11

IndiaStronger evidence can be much harder to find and score. The evidence is still not 100% certain.

Search Far and Wide

Explore many hypotheses

Find Judge Evidence

Many inference algorithms

DeeperEvidence

12 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

In 1968

When "60 Minutes" premiered this man was U.S. president.

this man was U.S. president.

Lyndon B Johnson

?

12

Divide and Conquer (Typical in Final Jeopardy!) Must identify and solve

sub-questions from different sources to answer

the top level question

13 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

Missing Links and Common Bonds: Identifying Hidden Associations

Cross

Your T’s,Your Legs, Rubicon

Category: Common Bonds: “Your legs, your T’s, the Rubicon”

Answer: Things you cross

Technical Approach:

• Semantic Networks• Lexical Collocation• Syntactic Relationships• Wikipedia Links

14 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

Answer with Confidence

Question

. . .

Models

Evidence Sources

Models

Models

Models

Models

ModelsPrimarySearch

CandidateAnswer

Generation

HypothesisGeneration

Final Confidence Merging &

RankingSynthesis

Answer Sources

Question & Topic

Analysis

Deep Evidence Scoring

Learned Modelshelp combine and

weigh the Evidence

HypothesisGeneration

Hypothesis and Evidence Scoring

QuestionDecomposition

Hypothesis and Evidence Scoring

Answer Scoring

EvidenceRetrieval

The technology behind IBM Watson

How it Really Works with Content

15 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

DeepQA: The Technology Behind Watson Massively Parallel Probabilistic Evidence-Based Architecture

Generates and scores many hypotheses using a combination of 1000’s Natural Language Processing, Information Retrieval, Machine Learning and Reasoning Algorithms.

These gather, evaluate, weigh and balance different types of evidence to deliver the answer with the best support it can find.

. . .

Answer Scoring

Models

Answer & Confidence

Question

Evidence Sources

Models

Models

Models

Models

ModelsPrimarySearch

CandidateAnswer

Generation

HypothesisGeneration

Hypothesis and Evidence Scoring

Final Confidence Merging & Ranking

Synthesis

Answer Sources

Question & Topic

Analysis

EvidenceRetrieval

Deep Evidence Scoring

Learned Modelshelp combine and

weigh the Evidence

HypothesisGeneration

Hypothesis and Evidence Scoring

QuestionDecomposition

1000’s of Pieces of Evidence

Multiple Interpretations

100,000’s Scores from many Deep Analysis

Algorithms

100’s sources

100’s Possible Answers

Balance& Combine

16 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

16

The Best Human Performance: Our Analysis Reveals the Winner’s Cloud

Winning Human Performance

Winning Human Performance

2007 QA Computer System2007 QA Computer System

Grand Champion Human Performance

Grand Champion Human Performance

Each dot represents an actual historical human Jeopardy! game

More ConfidentMore Confident Less ConfidentLess Confident

17 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

Baseline

12/2007

8/2008

5/2009

10/2009

11/2010

12/2008

DeepQA: Incremental Progress in Precision and Confidence 6/2007-11/2010

5/2008

4/2010

Prec

isio

n

18 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

Key Technologies Used by Watson

Natural Language processing• Computational linguistics, grammar frameworks, lexical structure• Dictionaries (Wordnet) and Ontologies (semantic web)

Information Management• Knowledge representation, Search and information retrieval• Unstructured Information Management Architecture (UIMA)

Machine learning• Predictive Analytics, Logistic Regression• Dynamic learning

Distributed computing, massively parallel processing• At HW level: clusters, generally HPC• At File System level: high performance file system, MapReduce,

Hadoop

19 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

Key Technologies Used by Watson

Natural Language processing• Computational linguistics, grammar frameworks, lexical structure• Dictionaries (Wordnet) and Ontologies (semantic web)

Information Management• Knowledge representation, Search and information retrieval• Unstructured Information Management (UIMA)

Machine learning• Predictive Analytics, Logistic Regression• Dynamic learning

Distributed computing, massively parallel processing• At HW level: clusters, generally HPC• At File System level: high performance file system, MapReduce,

Hadoop

IBM Content Analytics

20 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

90 x IBM Power 750 servers 2880 POWER7 coresPOWER7 3.55 GHz chip500 GBps on-chip bandwidth10 Gb Ethernet network15 Terabytes of memory20 Terabytes of Disk (clustered)80 Teraflops (Trillion Operations/sec)10 ‘refrigerator sized’ racks

Watson – a Workload Optimized System

21 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

Double Jeopardy: Where are we going with Watson?

22 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

Watson Enhancements

1. Multi-Dimensional Input and Pipeline

2. Dialoging with Humans

3. Providing Enhanced Evidence

4. Automatic Learning Technology

5. Decision Support over Big Data

6. Multimedia Analytics

7. Context Sensitive Q&A

23 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

23

Answer SetProblem ScenarioEvidence

Answer j

EvidenceEvidence

EvidenceEvidence

EvidenceEvidence

Question Evidence AnswerEvidenceEvidence

….

….….

….….

….….

….

Specific, short questions produce a narrow, targeted range of hypotheses.

Any subset of factors may provide some indication of an answer (e.g., Diagnosis or Treatment)

CombineCombineCombine

Much Harder – Multi-Dimensional Input and Multi-Dimensional Confidence Merging

Moving to Multi-Dimensional Input and Processing

Factor 1

Factor n

Answer i

Answer k

How to combine confidences across factors and factor sets

Evidence

Evidence

Which subsets of factors represent fruitful hypotheses and which may be sufficiently evidenced by the content? There are ~55K fields in an EMR – any subset may predict one or more of some several 100K diseases or conditions. And the clinicians will want to know WHY.

24 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

24

Medical Record

Q: What diagnosis explains the patient’s condition?

Present FactorsRed, painful eyeBlurred visionFamily history of arthritis

Dialoguing to an answer

The first symptom of Lyme disease (also called Lyme’s disease) for about 50% of people is a small, red bull’s-eye rash, called erythema migrans, at the site of an infected tick bite. … Other early, acute Lyme symptoms are flu-like – fatigue, achy muscles or joints, fever, chills, stiff neck, swollen glands, and a headache.

The first symptom of Lyme disease (also called Lyme’s disease) for about 50% of people is a small, red bull’s-eye rash, called erythema migrans, at the site of an infected tick bite. …Other early, acute Lyme symptoms are flu-like – fatigue, achy muscles or joints, fever, chills, stiff neck, swollen glands, and a headache.

Absent FactorsCircular rashFatigueHeadache

Lyme disease can affect different body systems, such as the nervous system, joints, skin, and heart. Symptoms are often described as happening in three stages (although not everyone experiences all three): 1.A circular rash, typically within 1-2 weeks of infection, often is the first sign of infection. … 2.Along with the rash, a person may have flu-like symptoms such as swollen lymph nodes, fatigue, headache, and muscle aches.

Lyme disease can affect different body systems, such as the nervous system, joints, skin, and heart. Symptoms are often described as happening in three stages (although not everyone experiences all three):1.A circular rash, typically within 1-2 weeks of infection, often is the first sign of infection. …2.Along with the rash, a person may have flu-like symptoms such as swollen lymph nodes, fatigue, headache, and muscle aches.

Lyme disease is caused by the bacterium Borrelia burgdorferi and is transmitted to humans through the bite of infected blacklegged ticks. Typical symptoms include high temperature, headache, fatigue, and a characteristic skin rash called erythema migrans.

Lyme disease is caused by the bacterium Borrelia burgdorferi and is transmitted to humans through the bite of infected blacklegged ticks. Typical symptoms include high temperature, headache, fatigue, and a characteristic skin rash called erythema migrans.

25 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

25

It’s all about the Evidence

26 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

26

Learning Technology generates questions to enhance understanding and acquire new knowledge

Automatically generates Learning Questions…

Does “contraindicates the use of” mean “should not use” in general?

These disorders can stop the nerves and muscles in your esophagus from working right. This can cause food to move slowly or even get stuck in the esophagus.

These disorders can stop the nerves and muscles in your esophagus from working right. This can cause food to move slowly or even get stuck in the esophagus.

Patients with preexisting seizure disorder should not use bupropion due to a higher-than-proportional increase in the possibility of seizure as the dose is increased.

Patients with preexisting seizure disorder should not use bupropion due to a higher-than-proportional increase in the possibility of seizure as the dose is increased.

A

Watson considers…What would have to be

true for seizure disorder to be correct?

What would have to be true for seizure

disorder to be correct?

Does contraindicates the use of bupropion mean should not use bupropion?

Q

What neurological condition contraindicates the use of

bupropion?

27 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

Increasing Volume of Information, driven by Multimedia

1990’s 2020’s

Video

Text

Exa

Peta

Tera

Giga

Dat

a Vo

lum

e

2000’s 2010’s

Structured data

Audio

ImageMed

High

Low

Com

puta

tiona

l Nee

ds

Soph

istic

atio

n of

Ana

lysi

s

Expr

essi

vene

ss

Digital Marketing

10+% of video views

Wide Area Imagery

100’s TB per day72 video hrs/minute

Media

Source: IBM Market Insights based on composite sources

Safety / Security

Healthcare

Customer

1B camera phones

1B medical images/yr

10s millions cameras

Enterprise Video

Used by 1/3 of enterprises

Video

28 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

28

Efficient decision support over “big data” unstructured (and structured) content

Unstructured DataBroad, rich in contextRapidly growing, currentInvaluable yet under utilized

SQL/XQuery

Existing BI

Inference/Rules

Structured DataPrecise, explicit Narrow, expensive

Jeopardy! Challenge

Deeper Understanding but BrittleHigh Precision at High CostNarrow Limited Coverage

Shallow UnderstandingLow Precision

Broad Coverage

Deeper Understanding, Higher Precision and Broader,

Timely Coverage at lower costs

Key WordSearch

Relevance Ranking

Open-Domain

Question-Answering

29 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

Multimedia is becoming more critical to business decision making today

More than 70% of Big Data is multimedia dataExpand value and insights from multimedia analyticsNeed to push boundaries in difficult technical domains to address needs of emerging users.

Enterprise Social Media

Customer

Huge Volumes of Multimedia Data

Inte

ract

ion

“Traditional” Data

Safety/Security Health Mobile Media

Semantic Understanding

Modeling

Feature Extraction

Data collection

UAVs

EvidenceSources

30 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

Spatial-temporal data is analyzed to provide most appropriate insights

Adding Context To Improve Answers (Contextual Computing)

Smart phones and web browsers generate vast amounts of data about users and customers.

People freely share their sentiments, expertise, desires, and intentions via social media.

Open Data and Open APIs start delivering breakthroughs in access and integration.

Semantic technologies like RDF enable automatic data discovery and ingestion.

New databases emerge to store and manage web- scale graphs of contextual information.

Advanced sensors and analytics can be exploited for dramatic new insights and predictions.

Adaptive interactions and visualizations deliver value and insight to Markets of One.

SocialBusiness

SocialMedia

Open DataOpen Data

31 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

Final Jeopardy: Watson in Health Care

32 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

Healthcare faces some of the most complex information challenges

Medical information is doubling every 5 years, much of which is unstructured

81% of physicians report spending 5 hours or less per month reading medical journals

“Medicine has become too complex. Only about 20% of the knowledge clinicians use today is evidence-base.” Steven Shapiro, Chief Medical & Scientific Officer, UPMC

33 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

IBM Watson goes to work in healthcare

Approach:

Use Case: Utilization Management

• Objective review of requested procedure

• Actionable analysis based on relevant content, policies, and guidelines

• Ingestion of disparate data – clinical/medical, and patient data

• Confidence weighted insights

• Prompt for additional information

• Learn form actions taken

Goals:• Improved patient outcomes

• Evidence based review

• Accelerated processing of requests (e.g. treatment)

• Minimize human error

• Improve consistency in reviews

• Increase effectiveness and efficiency leading to cost savings

Supporting clinicians in the review and authorization of medical procedures and treatment towards achieving better patient care and outcomes

34 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

Solution (proposed)

Proposed Use Case: Oncology Diagnosis & Treatment (ODT)

• Clinical support for patient assessment based on objective evidence – patient data, medical info, research studies, articles, best practices, guidelines, etc.

• Evidence panel identifying key information used to support diagnosis, recommendations (e.g. suggested tests) and treatment options

• Systematic applied learning based on action taken and outcome derived

• Initial focus on lung, breast, prostate and colorectal cancers

Goal• Create individualized cancer diagnostic and treatment plans

• Enhance clinical confidence with greater access to and understanding of information

• Speed up time to evidence- based treatment

• Reduce diagnostic and administrative errors

• Accelerate the dissemination of practice-changing research

Assisting physicians with the diagnosis and treatment of cancer

IBM Watson goes to work in healthcare

Note: This solution is not currently in production, and is subject to change in scope, as market conditions evolve

35 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

Oncology Diagnosis and Treatment

36 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

IBM Watson Is delivered as a service accessible through the cloud

Questions Answers

Private Cloud

Hybrid IT

Public Cloud

IBM Platform

Metadata (e.g. Query Pattern, Data Models, Ontologies)

User

IBM Watson as a ServiceIBM Watson as a Service

Algorithms

Public Info

Algorithms

3rd Party Data

Algorithms

Client Info

Algorithms

IBM Data

37 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

CognitiveComputing

Knowledge Alchemy

We have just begun to build a new era of Knowledge Alchemy powered by Cognitive Computing

SQL

NoSQL

InformationKnowledge

ESB

A D A PT

Context

Decisions & Actions

Feedback & Learning

Data

Traditional Computing

38 © 2009 IBM Corporation

Leveraging Information for Smarter Organizational Outcomes

Page 38 2/26/2013

[email protected]/ascdc