development of ibm watson with uima ducc · watson’s knowledge for jeopardy! watson has analyzed...

36
Development of IBM Watson with UIMA DUCC Eddie Epstein [email protected] Apache UIMA PMC Member and Committer ApacheCon NA 2015

Upload: others

Post on 21-Jun-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

Development ofIBM Watson

with UIMA DUCC

Eddie [email protected]

Apache UIMAPMC Member and Committer

ApacheCon NA 2015

Page 2: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

Presentation Outline

What is DUCC Overview of the IBM-Jeopardy! Question-

Answering System Interesting development problems

Solutions embodied in DUCC

Fast cruise through DUCC's web interface

Page 3: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

What is DUCC

A Linux-based cluster controller designed specifically for UIMA

Scales out any UIMA pipeline: for high throughput, or for low latency

Uses CGroups to partition user processes Flexible Resource Management Extensive Web, CLI and API interfaces

Page 4: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

What DUCC Does

Collection Processing Jobs Scale out a UIMA pipeline into multiple threads

and processes, distribute collection as work items

Shared Services Mange life cycle of services, supporting

dependencies with Jobs or other Services

Arbitrary Processes Launch arbitrary singleton processes or just

provide a container to work

Page 5: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

Motivations for DUCC

Support Ongoing Watson Development Take advantage of game playing hardware Expanding development team

Bring Functionality to Apache UIMA Community Separate implementation from Watson code Improve quality by targeting wide audience

Page 6: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

Example Jeopardy Question

IN 1698, THIS COMET DISCOVERER TOOK A

SHIP CALLED THE PARAMOUR PINK ON THE FIRST PURELY

SCIENTIFIC SEA VOYAGE

IN 1698, THIS COMET DISCOVERER TOOK A

SHIP CALLED THE PARAMOUR PINK ON THE FIRST PURELY

SCIENTIFIC SEA VOYAGE

Primary Search

Wilhelm TempelWilhelm Tempel

HMS ParamourHMS Paramour

Isaac NewtonIsaac Newton

Halley’s CometHalley’s Comet

Pink PantherPink Panther

Christiaan HuygensChristiaan Huygens

Peter SellersPeter Sellers

Edmond HalleyEdmond Halley

Candidate Answer Generation

1. Edmond Halley (0.85)2. Christiaan Huygens (0.20)3. Peter Sellers (0.05)

1. Edmond Halley (0.85)2. Christiaan Huygens (0.20)3. Peter Sellers (0.05)

Merging &Ranking

EvidenceRetrieval

Question Analysis

Keywords: 1698, comet, paramour, pink, …AnswerType(comet discoverer)Date(1698)Took(discoverer, ship)Called(ship, Paramour Pink)…

Keywords: 1698, comet, paramour, pink, …AnswerType(comet discoverer)Date(1698)Took(discoverer, ship)Called(ship, Paramour Pink)…

[0.58 0 -1.3 … 0.97]

[0.71 1 13.4 … 0.72]

[0.12 0 2.0 … 0.40]

[0.84 1 10.6 … 0.21]

[0.33 0 6.3 … 0.83]

[0.21 1 11.1 … 0.92]

[0.91 0 -8.2 … 0.61]

[0.91 0 -1.7 … 0.60]

EvidenceScoring

Spat

ial

Tem

pora

l

Lexi

cal

Taxo

nom

ic

Page 7: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

Open Source Software Critical for Watson

Runtime Apache UIMA Indri Text Search (www.lemurproject.org/indri/) Apache Lucene (Text Search) Sesame (http://aduna-software.com/technology/sesame) Apache ActiveMQ (used by UIMA-AS)

During Development Eclipse (https://eclipse.org) Weka (http://sourceforge.net/projects/weka/) Apache Hadoop

Page 8: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

Watson’s Knowledge for Jeopardy!

Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries, news articles, reference texts, plays, etc)

Watson also uses structured sources such as WordNet and DBpedia

Page 9: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

Watson on UIMA

Aggregate Analysis EngineAggregate Analysis Engine

FlowControllerFlowController

Analysis EngineAnalysis Engine

Question

Analysis

Question

Analysis

Analysis EngineAnalysis Engine

Primary

Searches

Primary

Searches

Analysis EngineAnalysis Engine

Candidate

Generation

Candidate

GenerationCASCAS

Analysis EngineAnalysis Engine

Answer

Scoring

Answer

Scoring

Analysis EngineAnalysis Engine

Supporting

Evidence Search

Supporting

Evidence Search

Analysis EngineAnalysis Engine

Deep Evidence

Scoring

Deep Evidence

Scoring

Analysis EngineAnalysis Engine

Final

Merger

Final

Merger

CASCASCASCAS

CASCASCASCAS CASCAS

CASCAS CASCAS

Page 10: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

Watson on UIMA – Data Flow

Aggregate Analysis EngineAggregate Analysis Engine

FlowControllerFlowController

Analysis EngineAnalysis Engine

Question

Analysis

Question

Analysis

Analysis EngineAnalysis Engine

Primary

Searches

Primary

Searches

Analysis EngineAnalysis Engine

Candidate

Generation

Candidate

GenerationCASCAS

Analysis EngineAnalysis Engine

Answer

Scoring

Answer

Scoring

Analysis EngineAnalysis Engine

Supporting

Evidence Search

Supporting

Evidence Search

Analysis EngineAnalysis Engine

Deep Evidence

Scoring

Deep Evidence

Scoring

Analysis EngineAnalysis Engine

Final

Merger

Final

Merger

CASCASCASCASCASCASCASCASCASCASCASCAS

CASCASCASCASCASCAS

CASCASCASCASCASCASCASCASCASCASCASCASCASCASCASCASCASCASCASCASCASCASCASCAS

CASCASCASCASCASCASCASCASCASCASCASCAS

CASCASCASCASCASCASCASCASCASCASCASCASCASCASCASCASCASCASCASCASCASCASCASCAS

CASCAS CASCAS

Page 11: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

Problem – One Experiment

Average 2 hours per question Wide range of times

28GB Java Heap on 32GB Machines Large knowledge bases (e.g. Sesame in-memory

store)

~1000 questions each To get statistically relevant results

Page 12: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

Solution – One Experiment

Run parallel pipelines in multiple threads Share the large in-memory objects Utilize the 8-cores in each machine

Replicate processes across machines Dynamically feed idle threads next question

Page 13: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

BLADE Tool (before DUCC)

http://domino.research.ibm.com/library/cyberdig.nsf/papers/152EF31994BDC3DC85257B1F005DE78F/$File/rc25356.pdf

WorkerNodeWorker

Node

RMI REST

REST

WorkerNodeWorker

NodeWorkerNodeWorker

Node

Scheduler

Server

QuestionList

RMI

Page 14: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

UIMA DUCC - Job ModelCollection ofInput Data

Analysis Results

Analytic Pipeline

Analytic Pipeline

Analytic Pipeline

Raw Data

Work ItemGenerator

Data Ref’s

Inspect Data

Page 15: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

Job Model – Core UIMA Job

QIds

QIds AEAECM CC

CM

CM

CC

CC

AEAE

AEAEQIds

QIds AEAECM CC

CM

CM

CC

CC

AEAE

AEAE

Job Driver

Collection

Reader

Collection

Reader

Job Processes

AEAECM CC

CM

CM

CC

CC

AEAE

AEAE

Application CodeApplication Code

Ducc Code

HTTP

Page 16: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

Job Model – UIMA-AS Job

QIds

QIds AEAECM CC

CM

CM

CC

CC

AEAE

AEAEQIds

QIds AEAECM CC

CM

CM

CC

CC

AEAE

AEAE

Job Driver

Collection

Reader

Collection

Reader

Job Processes

UIMA-AS

Service

UIMA-AS

Service

Application CodeApplication Code

Ducc Code

HTTP

Page 17: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

Job Model – Custom Job

QIds

QIds AEAECM CC

CM

CM

CC

CC

AEAE

AEAEQIds

QIds AEAECM CC

CM

CM

CC

CC

AEAE

AEAE

Job Driver

Collection

Reader

Collection

Reader

Job Processes

Java App

(Non-UIMA)

Java App

(Non-UIMA)

Application CodeApplication Code

Ducc Code

HTTP

Page 18: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

Job Debugging – all_in_one

Job

“processing”

Code

Job

“processing”

Code

Application CodeApplication Code

Ducc Code

Collection

Reader

Collection

Reader

All Job code deployed in a single thread in a single process for development & debug

Page 19: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

Problem – 15 Researchers

Personnel evaluated by their contribution to overall accuracy With exceptions, e.g. reduce “stupid answers”

Wanted their resource “fair share” NOW

Page 20: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

Solution – 15 Researchers

Preempt running processes Kill processes with least CPU investment < 10% overhead for lost investment

Ramp up after successful initialization Saved more than preemption loses

Allow processes to be non-preemptable Reserve entire machines Singleton processes (in CGroup containers) Jobs

Page 21: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

Watson on a 32GB Machine?

Aggregate Analysis EngineAggregate Analysis Engine

FlowControllerFlowController

Analysis EngineAnalysis Engine

Question

Analysis

Question

Analysis

Analysis EngineAnalysis Engine

Primary

Searches

Primary

Searches

Analysis EngineAnalysis Engine

Candidate

Generation

Candidate

Generation

Analysis EngineAnalysis Engine

Answer

Scoring

Answer

Scoring

Analysis EngineAnalysis Engine

Supporting

Evidence Search

Supporting

Evidence Search

Analysis EngineAnalysis Engine

Deep Evidence

Scoring

Deep Evidence

Scoring

Analysis EngineAnalysis Engine

Final

Merger

Final

Merger

CASCAS CASCAS

No, from the start some UIMA componentswere shared UIMA-AS services

Page 22: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

Performance Bottleneck (Development Mode)

32GB Machines

JVM with JNI

~30 GB

JVM with JNI

~30 GB

File system Buffers

50 GBSearch Index

NFS Filesystem JVM with JNI

~30 GB

JVM with JNI

~30 GB

File system Buffers

JVM with JNI

~30 GB

JVM with JNI

~30 GB

File system Buffers

JVM

~30 GB

JVM

~30 GB

File system Buffers

Page 23: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

Services Improve Performance

JVM with JNI

~30 GB

JVM with JNI

~30 GB

File system Buffers

50 GBSearch Index

NFS Filesystem JVM with JNI

~30 GB

JVM with JNI

~30 GB

File system Buffers

JVM with JNI

~30 GB

JVM with JNI

~30 GB

File system Buffers

JVM with JNI

~30 GB

JVM with JNI

~30 GB

File system Buffers

SharedUIMA-ASService

Indri SearchIndri Search

FilesystemBuffers

32GB Machines48GB Machines

Indri SearchIndri Search

FilesystemBuffers

Page 24: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

Problem – Managing Services

Startup and number of instances manual Team had ~3 week sprints

Integrate changes and create new baseline New indexes or code meant new services Several baselines active concurrently

Page 25: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

DUCC Services

Service registry UIMA-AS or CUSTOM

Service “pinger” class required Built-in pinger for UIMA-AS

Always-on or start-on-demand Pinger interface supports autonomous

instance management

Page 26: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

DUCC Service

Application CodeApplication Code

DUCC Code

ServiceManager

Service

Pinger

Service

Pinger

Service

Code

Instantiate& query

Instantiate

MonitorService

Code

Service

Process

Page 27: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

DUCC – Node Visualization

Page 28: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

DUCC Web Demo

Page 29: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

Backup if no Demo

Page 30: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

DUCC Viz Page

Page 31: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

DUCC Job Page

Page 32: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

Job Details - Processes

Page 33: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

Job Details - Performance

Page 34: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

DUCC Service Page

Page 35: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

DUCC Reservation Page

Page 36: Development of IBM Watson with UIMA DUCC · Watson’s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries,

Thank You