reasoning and assessing trust in uncertain information using bayesian description logics

Reasoning and Assessing Trust in Uncertain Information using

Bayesian Description Logics

Achille Fokoue, Mudhakar Srivatsa (IBM-US)Rob Young (dstl-UK)

ITA BootcampJuly 12, 2010

2

Sources of Uncertainty:(Accuracy, Stochasticity and Beyond)

Decision Making under Uncertainty

•Coalition warfare– Ephemeral groups (special

forces, local militia, Medecins Sans Frontieres, etc) with heterogeneous trust levels respond to emerging threats

• Secure Information Flows

– Can I share this information with an (un)trusted entity?

– Can I trust this piece of information?

Information Flow in Yahoo!

Limitations of traditional approaches

• Coarse grained and static access control information– Rich security metadata [QoISN’08]– Semantic knowledgebase for situation awareness (e.g., need-to-

share) [SACMAT’09]

• Fail to treat uncertainty as a first-class citizen– Scalable algorithms and meaningful query answering semantics

(possible worlds model*) to reason over uncertain data [submitted]

• Lack of explanations– Provide dominant justifications to decision makers [SACMAT’09’]– Use justifications for estimating info credibility [submitted]

[QoISN’08: IBM-US & RHUL][SACMAT’09: IBM-US, CESG & dstl]

[submitted (https://www.usukitacs.com/?q=node/5401): IBM-US & CESG][submitted: IBM-US & dstl]

Our approach in a nutshell•Goal: More flexible and situation aware decision

support mechanisms for information sharing

•Key technical principles– Perform late binding of decisions (flexibility)

– Shareability/trust in information is expressed as logical statements over rich security metadata and a semantic KB• Domain specific concepts and relationships• Current state of the world

– Logical framework that supports explanations that• Allow a sender to intelligently downgrade information (e.g., delete

participant list in a meeting) • Allow a recipient to judge the credibility of information

Architecture• A Global Awareness Module

continually maintains and updates a knowledge base encoding, in a BDL language, the relevant state of the world for our application – (e.g., locations of allied and

enemy forces)

• A hybrid reasoner is responsible for making decisions on information flows– The reasoner provides

dominant explanation(s) over uncertain data that justifies the decision

• This architecture is replicated at every decision center

Global Awarenes

sBDL KB

BDL Reasoner

Rich Metadata

Rules & Policy

Justifications

DL: Semantic Knowledgebase [SACMAT’09: IBM-US, CESG, dstl]

• SHIN Description logics (OWL)– Very expressive decidable

subset of first order logic– Reasoning intractable in the

worst-case, but• SHER (Scalable Highly

Expressive Reasoner) good scalability characteristics in practice

– DL KB consists of:• TBox: terminology box

Description of the concepts and relations in the domain of discourse. Extension of KANI ontology

• ABox: extensional part. Description of instance information

ABox

Extended KANI TBox

Traditional approaches to deriving trust from data

• Drawbacks of a pure DL based approach [SACMAT 09]– Does not account for uncertainty– Trust in information and sources given – not derived from data, history

of interactions• Limitations of traditional approaches to deriving trust in data

– Assumes pair-wise numeric (dis)similarity metric between two entities:• e.g., eBay recommendation, Netflix ratings

– Lack of support for conflicts spanning multiple entities: e.g.,• 3 Sources: S1, S2,S3• Ax1 = all men are mortal• Ax2 = Socrates is a man • Ax3 = Socrates is not mortal

– Lack of support for uncertainty in information

Bayesian Description Logics (BDL)

• Challenge 1: How to scalably reason over inconsistent and uncertain knowledgebase?

• BDL experimental evaluation on an open source DL reasoner shown to scale up to 7.2 million probabilistic axioms

• Pellet (a state of the art DL reasoner) broke down at 0.2 million axioms

• Pronto (probabilistic reasoner) uses an alternate richer formulation, but does not scale beyond a few dozen axioms

• Challenge 2: What is a meaningful query answering semantics for uncertain knowledgebase

• Possible worlds model* (concrete definition in paper)

Bayesian Description Logics (BDL)

• Challenge 3: How to efficiently compute justifications over uncertain data?

• Sampling

• Challenge 4: How to use justifications?• Assess the credibility of information sources (trust-

based decision making) • Intelligently transform data to make it shareable

[TBD]

Notation: Bayesian Network•V: set of all random variables in a Bayesian

network•V = {V1, V2}

•D(Vi): set of all values that Vi can take •D(V1) = D(V2) = {0, 1}

•v: assignment of all random variables to a possible value

•v = {V1 = 0, V2 = 1}•v|X (for some X V): projection of v that includes

random variables in X•v|{V2} = {V2 = 1}

•D(X) (for some X V): Cartesian product of domains D(Xi) for all Xi in X

Notation: BDL•Probabilistic knowledge base K = (A, T, BN)

• BN = Bayesian network over a set V of variables

• T = { : X = x}, where is a classical Tbox axiom; annotates with X =x

• X V• x in D(X)• e.g., Road SlipperyRoad : Rain = true

• A= { : X = x}, where is a classical Abox axiom

: p, where p [0, 1] assigns a probability value directly to a classical axiom

: Xnew = true,• Xnew new independent random boolean variable

BDL: Simplified Example• TBox:

• SlipperyRoad OpenedRoad HazardousCondition

• Road SlipperyRoad : Rain = true• ABox:

• Road(route9A)• OpenedRoad(route9A) : TrustSource = true

• BN has three variables: Rain, TrustSource, Source• PrBN(TrustSource = true | Source = Mary) = 0.8• PrBN(TrustSource = true | Source = John) = 0.5• PrBN(Rain = true) = 0.7• PrBN(Source = John) = 1

• Informally, the probability values computed through the Bayesian network is propagated to the DL side as follows

BDL: Simplified Example

• Primitive event e: Each assignment v for all random variables in BN (e.g., {Rain = true, TrustSource = false, Source = John}) corresponds to a primitive event e (or a scenario or a possible world)

• Each primitive event e is associated with• A probability value (PrBN(V=v)) through BN• and to a set of classical DL axioms (Ke) annotated with

compatible annotations (e.g., SlipperyRoad OpenedRoad HazardousCondition, Road SlipperyRoad, Road(route9A))

• Intuitively the probability value associated with an statement (e.g., HazardousCondition(route9A)) is obtained by summing the probabilities of all primitive events e such that the classical KB Ke entails (see full definition in paper)

Handling Inconsistent KBs

BDL: Query Answering Semantics

Scalable Query Answering

Experimental Evaluation•SHER – A Highly Scalable SOUND and COMPLETE

Reasoner for large OWL-DL KB– Reasons over highly expressive ontologies– Reasons over data in relational databases – Highly scalable

•Can scale to more than 60 million triples•Semantically index 300 million triples from the medical literature.

– Provide explanations

•PSHER – Probabilistic extension to SHER using BDL

Scalability via Summarization (ISWC 2006)

C1

M1

H1

isTaughtBy

C2

M2

H2

Original ABox

likes likes

P1

P2

Summary

M’

H’

likes

P’

C’

Legend: C – Course P - Person M - ManW – WomanH - Hobby

C’{C1, C2}

isTaughtBy

• The summary mapping function f that satisfies the constraints:– If any individual a is an explicit member of a concept C in the original

Abox, and f(a) is an explicit member of C in the summary Abox.– If a≠b is explicitly in the original Abox, then f(a) ≠f(b) is explicitly in

the summary Abox.– If a relation R(a, b) exists in the original ABox, then R(f(a), f(b)) exists in

the summary.• If the summary is consistent, then the original Abox is consistent (converse

is not true).

isTaughtBy isTaughtByisTaughtBy isTaughtBy

TBox:Functional (isTaughtBy)Disjoint (Man, Woman)

Results: Scalability

20

• UOBM benchmark data set (university data set)

• PSHER has sub-linear scalability with # axioms• Exact query answering (computing exact pr for ground

substitutions) is very expensive

• State of art reasoner (Pellet) broke down on UOBM-1

Results: Response Time

21

• PSHER performs well on threshold queries• 99.5% of

answers were obtained in a few 10s of seconds

• Further enhancements• PSHER is

parallelizable

Traditional approaches to deriving trust from data

• Assumes pair-wise numeric (dis)similarity metric between two entities:–e.g., eBay recommendation, Netflix ratings

• Lack of support for conflicts spanning multiple entities: e.g.,–3 Sources: S1, S2,S3–Ax1 = all men are mortal–Ax2 = Socrates is a man –Ax3 = Socrates is not mortal respectively

• Lack of support for uncertainty in information

Can I trust this information?

At the command and control center PSHER detects inconsistency (justifications point to SIGINT Vs agent X) SIGINT is deemed more trusted by the decision maker Cautiously reduce trust in information source X

Decision maker weighs in the severity of a possible biological attack and performs “what if” analysis (what if X is compromised? What if sensing device (SIGINT) had a minor glitch?, which information should be considered and which information should be discarded?)

Courtesy: E.J. Wright and K. B. Laskey. Credibility Models for Multi-Source Fusion. In 9th International Conference on Information Fusion, 2006

Overview

• Encode information as axioms in a BDL KB• Detect inconsistencies and weighted justifications using possible

world reasoning• Use justifications to assess trust in information sources• trust scoring mechanism

– Weighted scheme based on prior trust (belief) in information sources and weight of justification

Characteristics of the trust model

• Security:– robust to shilling– robust to bad-mouthing

• Scalability:– scale with the volume of information and the

number of information sources• security-scalability trade-off

– Cost of an exhaustive justification search– Cost of a perfectly random uniform sample

Trust Assessment: Degree of unsatisfiability

• Probabilistic Socrates’ example:– Axiom1: p1, Axiom2: p2, Axiom3: p3– 8 possible worlds (power set)

• Only one inconsistent world: {Axiom1, Axiom2, Axiom3}

– Probability measure of a possible world derived from the join prob. distribution of BN

• Pr({Axiom1, Axiom2}) = p1*p2*(1-p3)– Degree of Unsatisfiability

• DU = p1*p2*p3

• Trust value of a source S: Beta(α,β)– α (reward) : function of non conflicting interesting axioms– β (penalty): function of conflicting axioms

• Compute justifications of K = (A, T, BN)– J (A,T)– (J, BN) is consistent to the degree d’<1– For all J’ s.t. J’ J, (J’, BN) consistent to the degree 1⊂

• How to assign penalty to sources involved in a justification?– Probability measure, weight(J), of a justification J : DU((J,BN))

– Penalty(J) proportional to weight(J)– Penalty(J) distributed across sources contributing axioms to J

inverse proportionally to their previous trust value

Trust Assessment: Justification Weight

Security-Scalability Tradeoff

• Impracticality of computing all justifications– Exhaustive exploration of Reiter Search Tree

• Alternative approach: unbiased sampling– Malicious source cannot systematically hide conflicts

• Retaining the first K node in the Reiter Search not a solution:– The probability π(vd) the node vd in the path < v0, v1, …., vd > to be selected

is– π(vd) = ∏ (1/|vi|)

• Tradeoff : select node vi with probability min(β/π(vi), 1) with β > 0

Experimental evaluation

Summary•Decision Support System for Secure Information

Flows

– Uncertainty: support inconsistent KB and reason over uncertain information

– Derived trust values from data

– Flexibility: e.g., sensitivity of tactical information decays with space, time and external events

– Situation-awareness: e.g., encodes need-to-know based access control policies

– Supports for explanations : support intelligent information downgrade and provenance data for “what if” analysis

THANKS!

Contact: Achille FokoueEmail: [email protected]

Scenario

•Coalition: A & B

•Geo location G={G1, …,G4}

•A’s operations described in the table

Summarization effectiveness

Ontology Instances Role Assertions

I R A

Biopax 261,149 582,655 81 583

UOBM-1 42,585 214,177 410 16,233

UOBM-5 179,871 927,854 598 35,375

UOBM-10 351,422 1,816,153 673 49,176

UOBM-30 1,106,858 6,494,950 765 79,845

NIMD 1,278,540 1,999,787 19 55

ST 874,319 3,595,132 21 183

I – Instances after summarizationRA – Role assertions after summarization

Filtering effectiveness

Ontology Instances Role Assertions

I R A

Biopax 261,149 582,655 38 98

UOBM-1 42,585 214,177 280 284

UOBM-5 179,871 927,854 426 444

UOBM-10 351,422 1,816,153 474 492

UOBM-30 1,106,858 6,494,950 545 574

NIMD 1,278,540 1,999,787 2 1

ST 874,319 3,595,132 18 50

I – Instances after filteringRA – Role assertions after filtering

Refinement (AAAI 2007)

• What if summary is inconsistent?– Either,

• Original ABox has a real inconsistencyOr,• ABox was consistent but the process of summarization introduced

fake inconsistency in the summary

• Therefore, we follow a process of Refinement to check for real inconsistency

• Refinement = Selectively decompress portions of the summary• Use Justifications for the inconsistency to select portion of

summary to refine– Justification = minimal set of assertions responsible for inconsistency

• Repeat process iteratively till refined summary is consistent or justification is “precise”

Refinement: Resolving inconsistencies in a summary

C1

M1

H1

isTaughtBy

C2

M2

H2

C3

W1

Original ABox

likes likes

P1

P3

P2

Summary

M’

H’likes

P’

C’

W’

isTaughtBy


M’

H’

likes

P’

Cx’

W’

isTaughtBy

Cy’

M’

likes

Px’

Cx’

W’

isTaughtBy

Cy’

Py’

H’

After 1st Refinement After 2nd Refinement – Consistent Summary

Summary is inconsistent

Summary still inconsistent!

C’{C1, C2, C3}

Cx’{C1, C2} Cy’{C3}

isTaughtBy

isTaughtByPy’{P3}Px’{P1, P2}

isTaughtBy isTaughtBy isTaughtByisTaughtByisTaughtBy isTaughtBy


isTaughtBy isTaughtBy isTaughtByisTaughtBy

C1

M1

H1

isTaughtBy

C2

M2

H2

C3

W1

Original ABox

likes likes

P1

P3

P2

Summary

M’

H’

likes

P’

C’

W’

isTaughtBy


M’

H’

likes

P’

Cx’

W’

isTaughtBy

Cy’

M’

likes

Px’

Cx’

W’

isTaughtBy

Cy’

Py’

H’

After 1st Refinement After 2nd Refinement – Consistent Summary

Summary is inconsistent

Summary still inconsistent!

C’{C1, C2, C3}

Cx’{C1, C2} Cy’{C3}

isTaughtBy

isTaughtByPy’{P3}Px’{P1, P2}

Sample Q: PeopleWithHobby?

Not(Q)

Not(Q)

Not(Q)

Solns: P1, P2

Px’

Not(Q)

Not(Q)

Refinement: Solving Membership Query (AAAI 2007)


isTaughtBy isTaughtBy isTaughtBy

isTaughtByisTaughtByisTaughtBy isTaughtBy

Results : Consistency CheckOntology Instances Role Assertions Time for

consistency check (in s)

Biopax 261,149 582,655 2.3

UOBM-1 42,585 214,177 2.9

UOBM-5 179,871 927,854 5.4

UOBM-10 351,422 1,816,153 5.1

UOBM-30 1,106,858 6,494,950 7.9

NIMD 1,278,540 1,999,787 0.8

ST 874,319 3,595,132 0.4

Results: Membership Query AnsweringOntology Type assertions Role Assertions

UOBM-1 25,453 214,177UOBM-10 224,879 1,816,153UOBM-30 709,159 6,494,950

Reasoner Dataset Avg. Time (in sec)

St. Dev (in sec)

Range (in sec)

KAON2 UOBM-1 21 1 18 - 37

KAON2 UOBM-10 448 23 414 - 530

SHER UOBM-1 4 4 2 - 24

SHER UOBM-10 15 26 6 - 191

SHER UOBM-30 35 63 12 - 391

reasoning and assessing trust in uncertain information using bayesian description logics

Documents

uncertain information

piece of information

information flowsthe

ibmus cesgsubmitted

ibmus rhulsacmat09

ibmus dstlour approach

sources of uncertainty

heterogeneous trust