system combination for hlt

57
Challenges and Opportunities for HLT System Combination David Murgatroyd @dmurga VP, Engineering, Basis Technology

Upload: david-murgatroyd

Post on 16-Jul-2015

115 views

Category:

Documents


0 download

TRANSCRIPT

Challenges and Opportunities for

HLT System CombinationDavid Murgatroyd @dmurga

VP, Engineering, Basis Technology

Bottom Line Up Front

We reduce errors by

combining systemsto benefit from the strengths of each.

Outline

● Why Combine?

● What to Combine?

● When to Combine?

● Where to Combine?

● How to Combine?

Existing

System

New

System

Why Combine

● Reduce errors by using newer, different

technology

Existing

System

New

System

An Example: First, add a name from a list

Existing

System

New

SystemJohn Jacob Jingleheimer Schmidt

Added a name from a list

Existing System

John Jacob Jingleheimer Schmidt

New System

John Jacob Jingleheimer Schmidt

Querying the systems

Existing System

John Jacob Jingleheimer Schmidt

New System

John Jacob Jingleheimer Schmidt

John Jinglhiemer Schmidt

Should these match?

John Jacob Jingleheimer Schmidt

John

Jinglhiemer

Schmidt

Should these match?

John Jacob Jingleheimer Schmidt

John SAME

Jinglhiemer

Schmidt

Should these match?

JohnJacob

(DELETED)Jingleheimer Schmidt

John SAME

Jinglhiemer

Schmidt

Should these match?

JohnJacob

(DELETED)Jingleheimer Schmidt

John SAME

JinglhiemerMINOR

TYPOS

Schmidt

Should these match?

JohnJacob

(DELETED)Jingleheimer Schmidt

John SAME

JinglhiemerMINOR

TYPOS

Schmidt SAME

An Example

Existing System

John Jacob Jingleheimer Schmidt

New System

John Jacob Jingleheimer Schmidt

John Jinglhiemer Schmidt

An Example

Existing System

John Jacob Jingleheimer Schmidt

New System

John Jacob Jingleheimer Schmidt

John Jinglhiemer Schmidt

Desire: Positive

An Example

Existing System

John Jacob Jingleheimer Schmidt

New System

John Jacob Jingleheimer Schmidt

John Jinglhiemer Schmidt

An Example

Existing System

John Jacob Jingleheimer Schmidt

New System

John Jacob Jingleheimer Schmidt

John Jinglhiemer Schmidt

Should these match?

JohnJacob

(DELETED)Jingleheimer Schmidt

John SAME

JinglhiemerMINOR

TYPOS

Schmidt SAME

An Example

Existing System

John Jacob Jingleheimer Schmidt

New System

John Jacob Jingleheimer Schmidt

John Jinglhiemer Schmidt 90

An Example

Existing System

John Jacob Jingleheimer Schmidt

New System

John Jacob Jingleheimer Schmidt

John Jinglhiemer Schmidt 90

True Positive

An Example

Existing System

John Jacob Jingleheimer Schmidt

New System

John Jacob Jingleheimer Schmidt

John Jinglhiemer Schmidt 90

True Positive

Should these match?

JohnJacob

(DELETED)Jingleheimer Schmidt

John SAME

JinglhiemerMINOR

TYPO

Schmidt SAME

Should these match?

JohnJacob

(DELETED)Jingleheimer Schmidt

John SAME

Jinglhiemer CONFLICT

Schmidt SAME

An Example

Existing System

John Jacob Jingleheimer Schmidt

New System

John Jacob Jingleheimer Schmidt

John Jinglhiemer Schmidt 90 75

An Example

Existing System

John Jacob Jingleheimer Schmidt

New System

John Jacob Jingleheimer Schmidt

John Jinglhiemer Schmidt 90 75

False Negative

An Example

Existing System

John Jacob Jingleheimer Schmidt

New System

John Jacob Jingleheimer Schmidt

John Jinglhiemer Schmidt 90 75

False Negative

An Example

Existing System

John Jacob Jingleheimer Schmidt

New System

John Jacob Jingleheimer Schmidt

John Jinglhiemer Schmidt 90 75

John Cobby Schmidt

Should these match?

John JacobJingleheimer

(DELETED)Schmidt

John SAME

Cobby NICKNAME

Schmidt SAME

An Example

Existing System

John Jacob Jingleheimer Schmidt

New System

John Jacob Jingleheimer Schmidt

John Jinglhiemer Schmidt 90 75

John Cobby Schmidt

An Example

Existing System

John Jacob Jingleheimer Schmidt

New System

John Jacob Jingleheimer Schmidt

John Jinglhiemer Schmidt 90 75

John Cobby Schmidt 90

An Example

Existing System

John Jacob Jingleheimer Schmidt

New System

John Jacob Jingleheimer Schmidt

John Jinglhiemer Schmidt 90 75

John Cobby Schmidt 90

Should these match?

John JacobJingleheimer

(DELETED)Schmidt

John SAME

Cobby NICKNAME

Schmidt SAME

Should these match?

John JacobJingleheimer

(DELETED)Schmidt

John SAME

Cobby CONFLICT

Schmidt SAME

An Example

Existing System

John Jacob Jingleheimer Schmidt

New System

John Jacob Jingleheimer Schmidt

John Jinglhiemer Schmidt 90 75

John Cobby Schmidt 75 90

An Example

Existing System

John Jacob Jingleheimer Schmidt

New System

John Jacob Jingleheimer Schmidt

John Jinglhiemer Schmidt 90 75

John Cobby Schmidt 75 90

Juan Herrero

Should these match?

JohnJacob

(DELETED)

Jingleheimer

(DELETED)Schmidt

JuanCLOSE

COGNATE

HerreroFAR

COGNATE

An Example

Existing System

John Jacob Jingleheimer Schmidt

New System

John Jacob Jingleheimer Schmidt

John Jinglhiemer Schmidt 90 75

John Cobby Schmidt 75 90

Juan Herrero

Should these match?

JohnJacob

(DELETED)

Jingleheimer

(DELETED)Schmidt

JuanCLOSE

COGNATE

Herrero CONFLICT

An Example

Existing System

John Jacob Jingleheimer Schmidt

New System

John Jacob Jingleheimer Schmidt

John Jinglhiemer Schmidt 90 75

John Cobby Schmidt 75 90

Juan Herrero 75

An Example

Existing System

John Jacob Jingleheimer Schmidt

New System

John Jacob Jingleheimer Schmidt

John Jinglhiemer Schmidt 90 75

John Cobby Schmidt 75 90

Juan Herrero 75

Should these match?

JohnJacob

(DELETED)

Jingleheimer

(DELETED)Schmidt

Juan COGNATE

Herrero COGNATE

An Example

Existing System

John Jacob Jingleheimer Schmidt

New System

John Jacob Jingleheimer Schmidt

John Jinglhiemer Schmidt 90 75

John Cobby Schmidt 75 90

Juan Herrero 75 85

An Example

Existing System

John Jacob Jingleheimer Schmidt

New System

John Jacob Jingleheimer Schmidt

John Jinglhiemer Schmidt 90 75

John Cobby Schmidt 75 90

Juan Herrero 75 85

False Positive

An Example

Existing System

John Jacob Jingleheimer Schmidt

New System

John Jacob Jingleheimer Schmidt

John Jinglhiemer Schmidt 90 75

John Cobby Schmidt 75 90

Juan Herrero 75 85

False Positive

An Example

Existing System

John Jacob Jingleheimer Schmidt

New System

John Jacob Jingleheimer Schmidt

John Jinglhiemer Schmidt 90 75

John Cobby Schmidt 75 90

Juan Herrero 75 85

An Example

New System

John Jacob Jingleheimer Schmidt

John Jinglhiemer Schmidt 90 75

John Cobby Schmidt 75 90

Juan Herrero 75 85

Existing System

John Jacob Jingleheimer Schmidt

An Example

John Jinglhiemer Schmidt 90 75

John Cobby Schmidt 75 90

Juan Herrero 75 85

Combined System

John Jacob Jingleheimer Schmidt

Combined System: New Fills Holes of Old

Combined System

John Jacob Jingleheimer Schmidt

John Jinglhiemer Schmidt match

John Cobby Schmidt match

Juan Herrero no-match

Why Combine

● Reduce errors by using newer technology

● Minimize risk of destabilizing system

Existing

System

New

System

What to Combine

● address the same task

● new system should take different

approach

● old system improvement not feasible “Cobby”

v.

“Jacob”

“Jinglhiemer”

v.

“Jingleheimer”

What to Combine (cont’d)

● systems with rich output for rich

combination

● new adaptable to compensate for existing

● new can be integrated like existingExisting

System

New

System

match /

no match0 … 100

When to Combine

● existing has known error types

● new easily turned on/off

● new’s effect can be reviewed without

commitment

● budget for integration, resource and

license costs

Existing

System

New

System

When to Combine (cont’d)

● Balance hits added vs. hits removed

● Keep workload of consumers the same

Existing

System

New

System

Where to Insert New System?

● In parallel on all inputs

Existing

System

New

System

JJS, JCS

JJS, JCS JJS, JCS

JJS, JCS

Where to Combine

● In parallel on all inputs

● In parallel on some inputs

Existing

System

New

System

JJS, JCS

JJS JJS, JCS

JJS, JCS

Where to Combine

● In parallel on all inputs

● In parallel on some inputs

● In series as a post-filter for false positive

suppressionExisting

System

New

System

JJS, JCS, JH

JCS, JH

JCS

How to Derive a Decision from Results?

● Experimentation to produce hand-tune rules, eg.,

● if Old or New > 0.95 MATCH

● if Old and New > 0.85 MATCH

● else NO-MATCH

● Annotation to produce a machine-learned model.

● Need hand annotated data and measurement discipline

(separate tuning & testing data sets)

Any Questions? Some suggestions...

● Have others combined name matchers?

● How much error reduction is targeted?

● What other HLT tasks have used combined systems?

● What if I don’t have lots of annotated data to measure with?

● Is there academic literature on this?

● How can Basis’s name matcher (RNI) be adapted to different

use cases?