quirk: question answering = information retrieval + knowledge cycorp ibm presenter: stefano bertolo...

32
QUIRK: QUIRK: QU QU estion Answering = estion Answering = I I nformation nformation R R etrieval + etrieval + K K nowledge nowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

Upload: hugo-nathan-hall

Post on 28-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

QUIRK: QUIRK: QUQUestion Answering = estion Answering = IInformation nformation RRetrieval + etrieval + KKnowledgenowledge

Cycorp

IBM

Presenter: Stefano Bertolo (Cycorp)

Page 2: QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

Project GoalsProject Goals

Break answer-by-retrieval bottleneckDeep (semantic) understanding of

queries and answersIntegration of heterogeneous

sourcesFormalized knowledge to integrate

state-of-the-art IR components with state-of-the-art knowledge bases

Page 3: QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

Answer by retrievalAnswer by retrieval

Q: Who was the first president of Zambia?

………………………………………… Kenneth Kaunda, the first president, kept Zambia within the Commonwealth of Nations… …………………………..

Page 4: QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

Answer by reasoningAnswer by reasoning

Q: Who sponsored Kai’s attack against Pamina?

…On February 13, Kai detonated the truck in front of Pamina’s HQ…

…On January 25, Kai bought a truckload of fertilizer drawing against account 9999 at MegaBank…

… On January 15, Vitas Bayo deposited $50,000 on account 9999 at MegaBank…

Page 5: QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)
Page 6: QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)
Page 7: QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)
Page 8: QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)
Page 9: QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

QUIRK strategyQUIRK strategy

Use Formalized knowledge for:– Semantic understanding of queries;– Justification of answers;

Use Formalized knowledge as:– Format for data normalization– ‘Glue’ for data integration of:

• information extracted from unstructured data• SQL queries against structured DBs• Cyc’s knowledge

Page 10: QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

Blackboard

Query Manager

Answer Manager

Inference Agent

IR Agent

Cyc KB

GuruQA

(IBM)

DB1

DB2

DB-N

Preemptive annotations

Unstructured

Documents

Page 11: QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

Q-Eng

A-Eng

Q-CycL

A-CycL

Q-Guru

A-Guru

Query Interpreter GuruQA Assistant

GuruQA (IBM)

Cyc English Generator Cyc Inference EngineAnswer Manager

Query Refiner

Blackboard

Page 12: QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

Blackboard architectureBlackboard architecture

Add/remove agents without disrupting existing architecture

Test performance/speed with several combinations of agents

Operate asynchronously.

Page 13: QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

Query InterpreterQuery Interpreter

Q: “Who opposes the WTO?”

(and (isa ?WHO Person)

(thereExists ?EVENT

(and (isa ?EVENT ActOfDissent)

(performedBy ?EVENT ?WHO)

(maleficiary ?EVENT WorldTradeOrganization))))

Page 14: QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

GuruQA AssistantGuruQA Assistant

CycL query =>

PERSON$ oppose(s/d) the WTO

denounce(s/d) the World Trade Organization

attacke(s/d)

Page 15: QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

Cyc Inference EngineCyc Inference Engine

CycL Query =>

[(PersonNamedFn “Kai”) JUSTIFICATION-1]

[(PersonNamedFn “Dr. Chen”) JUSTIFICATION-2]

[(PersonNamedFn “Kai”) JUSTIFICATION-N]

Page 16: QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

Cyc JustificationsCyc Justifications

A?

A from [B and C] (source 6743)

B from source 67430

C from source 78539

Page 17: QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

Sources for Cyc InferenceSources for Cyc Inference

1.4M+ CycL assertions already in Cyc’s Knowledge Base

Virtual Assertions in DataBases

Unsupervised Textract / CycL annotation of unstructured documents

Page 18: QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

Data Source IntegrationData Source Integration

Data Normalization

Data Fusion

Page 19: QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

Data NormalizationData Normalization

Interpretation

Search

cat chat Katze gato gatto “felis felis”

cat OR chat OR Katze OR gato OR gatto OR “felis felis”

Page 20: QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

Data NormalizationData Normalization

…Zhang Mei Li, was born on January 1, 1927…

Name DOBZhang Mei Li 01-01-1927

… …

(birthDate (PersonNamedFn “Zhang Mei Li”) (DayFn 01 (MonthFn January (YearFn 1927))))

Page 21: QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

Data NormalizationData Normalization

language independent representation of- entities- concepts- relationships

CycL contains 100K+ primitives, cancompositionally define infinitely many non-atomic terms.

Page 22: QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

Data FusionData Fusion

Dr. Chen lives in FresnoZhang Mei Li lives in OaklandKai lives in Los AngelesCalifornia is in the Pacific Time Zone

Dr. Chen/Zhang Mei Li/Kai and Dr. Chen/Zhang Mei Li/Kai live in the same time zone

Page 23: QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

HeterogeneousHeterogeneous Sources Sources

Q: How old is Dr. Chen’s mother?

…Zhang Mei Li, mother of Pamina’s Dr. Chen…

Name DOBZhang Mei Li 01-01-1927

… …

Page 24: QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

Data FusionData Fusion

Requires language independent connections/inferential links among

- Entities- Concepts- Propositions (Facts, Rules)Cyc’s OntologyCyc’s Knowledge Base

Page 25: QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

Consensus RealityConsensus Reality

Formalized Knowledge about `Consensus Reality’ = inferentially enabled `glue’ for Data Fusion

E.g. “Was Kai implicated in the Munich 1972 attack (when he was a toddler of 2)?”

Page 26: QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

DBs as `virtual assertions’ storesDBs as `virtual assertions’ stores

(birthDate

(PersonNamedFn “Zhang Mei Li)

?WHEN)

SELECT: DOB

FROM: PERSONAL_DATA

WHERE: NAME = “Zhang Mei Li”

Page 27: QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

Unsupervised Textract / CycL AnnotationsUnsupervised Textract / CycL Annotations

IBM Textract relations:

[Cycorp, Inc. : located-in : Austin, TX]

mapped to CycL Assertions:

(objectFoundInLocation

Cycorp CityOfAustinTX)

Page 28: QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

Augmenting Textract AnnotationsAugmenting Textract Annotations

Concept Annotation“Boston” { CityOfBostonMA, BostonTheBand, … }

Word Sense Disambiguation“I went to Boston” CityOfBostonMA

Analysis of nominal compounds“leather jacket”

(SubcollectionOfWithRelationToTypeFn

Jacket mainConstituent Leather)

Page 29: QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

Unsupervised CycL AnnotationsUnsupervised CycL Annotations

IBM’s Nominator and Parsers to extract Named Entities and basic syntactic dependencies (SUBJ-VERB, VERB-OBJ)

Map dependencies to CycL event structures.

Page 30: QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

Cyc-to-English generatorCyc-to-English generator

(PersonNamedFn “Dr. Chen”) JUSTIFICATION-N

“Dr. Chen opposes the WTO, because people who demonstrate against organizations oppose them (Cyc KB, assertion 99999) and Dr. Chen demonstrated against the WTO in Seattle (document 12345).

Page 31: QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

Year 1 TasksYear 1 Tasks

Get entire system to run robustly with integration of all the IBM and Cycorp components described

Improve question understanding and refinement

Broaden coverage of English to CycL mapping enabling annotation of large collection of documents

Page 32: QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

Year 2 TasksYear 2 Tasks

Add new agents to the blackboard to represent the user and session context

Improve integration of answers obtained from GuruQA and Cyc

Improve integrated IBM and Cycorp modules for unstructured document annotation