v6.aggregation of epistemic uncertainty in forms of...

22
Aggregation of Epistemic Uncertainty in Forms of Possibility and Certainty Factors Koichi Yamada Nagaoka Univ. of Tech. 1 IWACIII 2019, Nov. 2, 2019, Chengdu, China Journal of Advanced Computational Intelligence and Intelligent Informatics (in press) What is Epistemic Uncertainty? 2 - uncertainty due to incomplete knowledge / information - a kind of cognitive uncertainty usually considered as a degree of belief Epistemic Uncertainty Aleatoric Uncertainty - Statistical / Objective Uncertainty related to frequency • Examples of Epistemic Uncertainty. - Question : Which will win the today’s game, A or B ? - Question: whether the suspect arrested is the real murderer or not. IWACIII 2019, Nov. 2, 2019, Chengdu, China • Probability can be used both for frequencies and for degrees of beliefs - Objective probability (frequencies) - Subjective probability / Bayesian probability (degrees of belief ) Þ Uncertainty contained in the answers is not a frequency, but a degree of belief.

Upload: others

Post on 02-Mar-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

2018/Sep/27

ICTer 2018, University of Colombo School of Computing 1

Aggregation of Epistemic Uncertaintyin Forms of Possibility and Certainty Factors

Koichi YamadaNagaoka Univ. of Tech.

1IWACIII 2019, Nov. 2, 2019, Chengdu, China

Journal of Advanced Computational Intelligence and Intelligent Informatics (in press)

What is Epistemic Uncertainty?

2

- uncertainty due to incomplete knowledge / information

- a kind of cognitive uncertainty usually considered as a degree of belief

Epistemic Uncertainty Aleatoric Uncertainty- Statistical / Objective Uncertainty

related to frequency

• Examples of Epistemic Uncertainty.- Question : Which will win the today’s game, A or B ?- Question: whether the suspect arrested is the real murderer or not.

IWACIII 2019, Nov. 2, 2019, Chengdu, China

• Probability can be used both for frequencies and for degrees of beliefs

- Objective probability (frequencies)

- Subjective probability / Bayesian probability (degrees of belief)

Þ Uncertainty contained in the answers is not a frequency,but a degree of belief.

2018/Sep/27

ICTer 2018, University of Colombo School of Computing 2

Theories for Representing Uncertainty

3

• Dempster-Shafer theory of Evidence

• Rough Set theory

• Fuzzy set theory

• Possibility Theory

• Certainty Factors

- a theory related to an adjective, "Possible"

- a generalized theory of uncertainty

- Uncertainty representation used for Expert Systems such as MYCIN (1974)

- vagueness contained in concepts and words

- indiscernibility due to our limited knowledge

Note: These are all theories to deal with Epistemic Uncertainty,which suggests there are many aspects in Epistemic Uncertainty.

• Multi-valued logics- truth between "complete truth" and "complete false"

è focus on the degrees of belief

IWACIII 2019, Nov. 2, 2019, Chengdu, China

What is Important for Dealingwith Degrees of Belief ?

IWACIII 2019, Nov. 2, 2019, Chengdu, China 4

1) Capability to deal with "ignorance" / "unknown situation"

2) Aggregation of multiple pieces of uncertain informationderived from various information sources.

2018/Sep/27

ICTer 2018, University of Colombo School of Computing 3

Capability to deal with Ignorance / Unknown Situations

5

Example: Suppose there is a gang group in a small town, and your old friendTom has joined it. One day, a murder happened in the town. There was perfect evidence that one of the gang members did it. No other information is given.

How do you represent the uncertainty that "Tom is the murderer" ?

Probability theoryn : the number of the gang members,

but we do not know the exact value of n.

• Probability cannot give the exact value in this situation.

P(Tom) =1/ n

IWACIII 2019, Nov. 2, 2019, Chengdu, China

6

Representation in Other TheoriesPossibility theory

Dempster-Shafer theory of Evidence:

π (Tom) =1.0π (Tom) =1.0

Pl(Tom) =1.0Bel(Tom) = 0.0

Certainty Factor Model

CF(Tom) = 0.0

: Possibility that Tom is the murderer is 1.0.: Possibility that Tom is NOT the murderer is 1.0.

: Plausibility that Tom is the murderer is 1.0.: Belief that Tom is the murderer is 0.0.

+1 : perfect affirmation−1 : perfect negation

0 : unknown or no information

IWACIII 2019, Nov. 2, 2019, Chengdu, China

Uncertainty is represented by a pair of possibilities;

Uncertainty is represented by two measures;

Uncertainty is represented by a value in [-1.0, +1.0]

2018/Sep/27

ICTer 2018, University of Colombo School of Computing 4

Aggregation of Epistemic Uncertainty

7

• In everyday-decision making, we frequently gather multiple pieces of uncertain information, and aggregate them.

- Who will win the next tennis tournament ?

- Which job should I choose among the multiple offers ?

•We need to gather and aggregate much uncertain information to answer these questions.

•There are only a few theories that provide a standard aggregation function.

- Dempster-Shafer Theory of Evidence- Certainty Factor Model

- Aggregation is one of the most important information processing forEpistemic Uncertainty.

IWACIII 2019, Nov. 2, 2019, Chengdu, China

- Possibility theory

Information Source Model- Dempster Shafer Theory -

IWACIII 2019, Nov. 2, 2019, Chengdu, China 8

• Shafer’s IS Model

S : Set of Information Sources(Exclusive and Exhaustive Discrete Set)

H : Set of Possible hypotheses(Exclusive and Exhaustive Discrete Set)

P(s)

s ∈ S G (s)

G : S → 2H

Multivalued function

Belief function :

mass function :

True hypothesis is in G (s)with probability P(s).

Plausibility function : 𝑃𝑙 𝐴 =&'𝑃 𝑠 Γ(𝑠) ∩ 𝐴 ≠ ∅)

2018/Sep/27

ICTer 2018, University of Colombo School of Computing 5

Dempster’s Rule of Combination

IWACIII 2019, Nov. 2, 2019, Chengdu, China 9

m1 : a mass function on H defined by S1, P1, and Γ1 .

m2 : a mass function on H defined by S2, P2, and Γ2 .

When probability distributions P1 and P2 are independent of each other,

P12(s1, s2) = P1(s1)•P2(s2),

the two mass functions can be aggregated / combined by

S1

S2

HΓ1

Γ2

P1

P2

Information Source Model- Certainty Factors -

IWACIII 2019, Nov. 2, 2019, Chengdu, China 10

s : an information source h : the hypothesis in question

s

• Cf (h,s ) : CF of hypothesis h only given an information source s

h

+1 : perfect affirmation−1 : perfect negation0 : neither is supported

(unknown, no evidence)

MB(h,s) : degree that belief in h is revised by s toward affirmationMD(h,s) : degree that belief in h is revised by s toward negation

s supports h with

2018/Sep/27

ICTer 2018, University of Colombo School of Computing 6

Combination Rule of CFs

11IWACIII 2019, Nov. 2, 2019, Chengdu, China

When two information sources sx and sy are probabilistically independent of each other, the two CFs, x and y, are combined by

hsx

sy

CF of h only given sx and sy

Disputes over the Combination Rules

IWACIII 2019, Nov. 2, 2019, Chengdu, China 12

• Both Aggregation rules of Dempster-Shafer theory and the CF model are based on their Information Source models, where probabilistic independence between the sources is assumed in both combination rules.

• Both models are suffering from the criticisms, claiming that the rules cannot be derived only from the probabilistic independency.

• As to the CF model, it has been claimed that the combination rule is wrong from the viewpoint of probability theory.

2018/Sep/27

ICTer 2018, University of Colombo School of Computing 7

Aggregation of Possibility Distributions

IWACIII 2019, Nov. 2, 2019, Chengdu, China 13

• A few aggregation rules have been proposed so far (e.g. Dubois and Prade).

Basic idea : The combination should be intersection of sets representing thegiven information, if both are reliable. If the intersection is empty, union should

be taken assuming one of them is wrong.

•They are not based on Information Source models. Instead, they adopt an intuitive idea called a set theoretic approach.

Adaptive rule

Consistency index

If the information is represented by fuzzy sets (= possibility distributions),

A variation of the above

In the Following,

IWACIII 2019, Nov. 2, 2019, Chengdu, China 14

1) Develop a sound Information Source model for possibilistic aggregation

2) Develop combination rules to aggregate possibilities with sound theoretical assumptions.

3) Propose a new interpretation of CFs using possibility, where CFs are anotherrepresentation of possibility.

4) Develop several combination rules of CFs, and compare them mathematically,and with numerical examples.

A Possibility DistributionA Certainty Factor

Transformable toeach other

the both representthe same uncertainty

2018/Sep/27

ICTer 2018, University of Colombo School of Computing 8

Hypothesis and Opposite Hypothesis

15

• We introduce the Opposite Hypothesis k to a hypothesis h, which satisfies the following logical formulae.

h k

+1 -10male female

“unknown”

• The hypothesis h and O-hypothesis ksatisfy the following;

is not tautology.

is not contradiction.

Note: Assuming the “closed world assumption, the assertion of male (female) should be rejected, if there is no evidence for male (female). If we have no evidence both for male and female, we have to reject both assertions of male and female.

Which is the murderer,male or female ?

represents "unknown" because of no evidence.

IWACIII 2019, Nov. 2, 2019, Chengdu, China

Possibilistic Information Source Model

IWACIII 2019, Nov. 2, 2019, Chengdu, China 16

h : hypothesis

k : opposite hypothesis

e.g. male

e.g. female

opposite

sx supports h with

sy doen’t support k with

sxAn informationsource

Another informationsource

“h:s” is called causation event meaning “s supports h”

“ ! “ is read as “only given” meaning the given information sourcesare present, but the other possible sources are absent.

sx doesn’t support h with

sy supports k withsy

𝜋(ℎ: 𝑠2! 𝑠2)

𝜋(ℎ: 𝑠2! 𝑠2)

𝜋(𝑘: 𝑠4! 𝑠4)

𝜋(𝑘: 𝑠4! 𝑠4)

2018/Sep/27

ICTer 2018, University of Colombo School of Computing 9

Aggregation of Possibilities

IWACIII 2019, Nov. 2, 2019, Chengdu, China17

Possibilities in the IS model

Aggregated possibilities𝜋(ℎ! 𝑠2, 𝑠4) : possibility that h is present only given sx and 𝑠4𝜋(5ℎ! 𝑠2, 𝑠4) : possibility that h is absent only given sx and 𝑠4𝜋(𝑘! 𝑠2, 𝑠4) 𝜋(5𝑘! 𝑠2, 𝑠4)

The problem of aggregation is to derive “Aggregated Possibilities” fromthe possibilities in the IS model.

𝜋(ℎ: 𝑠2! 𝑠2)= 1

𝜋(ℎ: 𝑠2! 𝑠2) = a

𝜋(𝑘: 𝑠2! 𝑠2) = 0

𝜋(𝑘: 𝑠2! 𝑠2) = 1

𝜋(ℎ: 𝑠4! 𝑠4) = 0

𝜋(ℎ: 𝑠4! 𝑠4) = 1

𝜋(𝑘: 𝑠4! 𝑠4) = 1

𝜋(𝑘: 𝑠4! 𝑠4) = b

When sx is is a source thatpossibly supports h.

When sy is is a source thatpossibly supports k.

0 ≤ a, b, ≤ 1

Assumptions (1)

18

Sh : the set of all sources that possibly support h. Sk : the set of all sources that possibly support k.

Hypothesis h is present ⇔ at least, one of possible information sources supports h,

and no source supports k.

If we define h and k above, they satisfy the formulae

IWACIII 2019, Nov. 2, 2019, Chengdu, China

2018/Sep/27

ICTer 2018, University of Colombo School of Computing 10

Assumptions (2)

IWACIII 2019, Nov. 2, 2019, Chengdu, China 19

Causation Independence:Conditional possibilities of causation events are dependent only on the presence or absence of the information source.

𝜋 ℎ: 𝑠! 𝑠 = 𝜋 ℎ: 𝑠|𝑠 = 𝜋 ℎ: 𝑠|𝑠, Δ

𝜋 ℎ: 𝑠! �̅� = 𝜋 ℎ: 𝑠|�̅� = 𝜋 ℎ: 𝑠|�̅�, Δ

𝜋(ℎ: s!s)= 𝜋 ℎ: s|𝑠 = 𝜋 ℎ: s|𝑠, Δ

𝜋(ℎ: s! �̅�)= 𝜋 ℎ: s|�̅� = 𝜋 ℎ: s|�̅�, Δ

Δ ∶ consistent conjunction of 𝑠G, ℎ: 𝑠G, 𝑘: 𝑠G and their negationsfor sG ≠ 𝑠

Propositions

IWACIII 2019, Nov. 2, 2019, Chengdu, China 20

But in general,

2018/Sep/27

ICTer 2018, University of Colombo School of Computing 11

Aggregation rule of Possibilities

IWACIII 2019, Nov. 2, 2019, Chengdu, China 21

𝜋(𝑘! 𝑠2, 𝑠4) and 𝜋(5𝑘! 𝑠2, 𝑠4)are obtained in the same way.

New Possibilistic Interpretation of CFs

IWACIII 2019, Nov. 2, 2019, Chengdu, China22

CFs are defined by

Inversely, possibilities can be derived from CFs;

2018/Sep/27

ICTer 2018, University of Colombo School of Computing 12

Aggregation Procedure of CFs

IWACIII 2019, Nov. 2, 2019, Chengdu, China 23

Two CFs x, y are transformed into possibility distributions 𝜋𝑥 and 𝜋𝑦.

The possibilities 𝜋𝑥 , 𝜋𝑦 are aggregated using the Aggregation rules.

The aggregated possibility 𝜋𝑥𝑦 is transformed back to a CF.

• Four aggregation rules are obtained depending on 2 x 2 alternatives.

- Which operations are used; Max / Min or Probabilistic Sum/Product

- Whether the normalization is done or not, if the aggregated possibilitydistribution is not normal (the two sources are contradictory).

Note :Probabilistic independence is a stronger condition thanPossibilistic independence (max/min independence).

Aggregation Rules of CFs

IWACIII 2019, Nov. 2, 2019, Chengdu, China24

Max / MinNo normalization

Max / MinWith normalization

Probabilistic sum/productNo normalization

Probabilistic sum/productWith normalization

The last rule is completely the same as the MYCIN’s combination rule.

2018/Sep/27

ICTer 2018, University of Colombo School of Computing 13

Aggregation of CFs using Dempster’s Rule

25

Procedure of Aggregation

(1) Transform CFs to Possibility distributions, then to mass functions,respectively.

(2) Combine the two mass functions using the Dempster rule

Make the mass function be consonant so that it could be transformed to possibility distribution.

(3) Approximate the combined mass function by a possibility distribution, then transform it to a CF.

Approximation:

** Approximation is necessary, because mass functions cannot be transformed into Possibility in general.

IWACIII 2019, Nov. 2, 2019, Chengdu, China

Two ways of Aggregating Multiple CFs

IWACIII 2019, Nov. 2, 2019, Chengdu, China 26

Incremental Aggregation

Batch Aggregation

• When multiple CFs are aggregated, there are two ways to aggregate them.

The first two CFs are aggregated using the above equation,Then the result is aggregated with the third CF. Then, the result is ….

All CFs are transformed into possibilities and then into mass functions.Then, all mass functions are aggregated using Dempster’s rule.After the aggregation, the mass function is approximated by a possibilitydistribution, then the possibility is transformed into a CF.

The batch aggregation is proved to be associative, because the Dempster ruleIs associative.

• The aggregation in the previous procedure is represented simply by

𝑓Q'

𝑓QR

2018/Sep/27

ICTer 2018, University of Colombo School of Computing 14

Mathematical Properties of the Aggregation Functions

27IWACIII 2019, Nov. 2, 2019, Chengdu, China

* it is not continuous at x and y where xy=-1.

idempotent Commutative Associative Continuous

Mono. Inc.

Mono. Non-Dec.

Mono. Non-Dec.

Mono. Non-Dec.

Mono. Non-Dec.

Mono. Non-Dec.

Mono. Inc.

Mono. Inc.

Mono. Inc.

MonotonicallyAggregationfunctions

𝑓Q'𝑓QR

Numerical Example (1)

IWACIII 2019, Nov. 2, 2019, Chengdu, China28

• Five CFs are aggregated sequentially.

Case 1: Positive and Negative values are given almost alternately.

Case 2: given in the descending order

Case 3: given in the ascending order

+0.3, -0.5, +0.8, +0.4, -0.7

+0.3, -0.5, +0.8, +0.4, -0.7

+0.8, + 0.4, +0.3, - 0.5, - 0.7

- 0.7, -0.5 , +0.3, +0.4, +0.8 :

2018/Sep/27

ICTer 2018, University of Colombo School of Computing 15

Developments of CFs in Case 1 : almost alternately

IWACIII 2019, Nov. 2, 2019, Chengdu, China 29

1 2 3 4 5

fm fmN fa faN=fM fd1mfd1a fad fes feb

0.3- 0.5

0.8 0.4

-0.7- 0.1

0.44

+1.0

-1.0

0.0

Results of aggregating 0.3 and - 0.5

Developments of CFs – Case 1 : almost alternately

30

• Five CFs for a hypothesis h are given sequentially.

IWACIII 2019, Nov. 2, 2019, Chengdu, China

min-max w/o Norm.

min-max w. Norm.

algebraic w/o Norm.

algebraic w. Norm.min-max : zero for

contradiction algebraic : zero for

contradiction adaptive rule

Dempster-Incremental

Dempster-Batch

• Positive and negative values come almost alternately. So, many of the final results are almost zero.

2018/Sep/27

ICTer 2018, University of Colombo School of Computing 16

Developments of CFs in Case 2 : in descending order

IWACIII 2019, Nov. 2, 2019, Chengdu, China 31

-1.00

-0.80

-0.60

-0.40

-0.20

0.00

0.20

0.40

0.60

0.80

1.00

1 2 3 4 5

fm fmN fa faN=fM fd1mfd1a fad fes feb

0.8 0.40.3

- 0.5

- 0.7-0.7

0.44

0.0

+1.0

-1.0

Developments of CFs – Case 2 : in descending order

IWACIII 2019, Nov. 2, 2019, Chengdu, China 32

• Aggregation results when CFs are given in the descending order.

affected stronglyby the recentnegativeinformation.

Associativity

2018/Sep/27

ICTer 2018, University of Colombo School of Computing 17

Developments of CFs in Case 3 : in ascending order

IWACIII 2019, Nov. 2, 2019, Chengdu, China 33

-1.00

-0.80

-0.60

-0.40

-0.20

0.00

0.20

0.40

0.60

0.80

1.00

1 2 3 4 5

fm fmN fa faN=fM fd1mfd1a fad fes feb

- 0.7 - 0.5

0.3

0.4

0.8

0.30

0.88

0.0

+1.0

-1.0

Developments of CFs – Case 3 : in ascending order

IWACIII 2019, Nov. 2, 2019, Chengdu, China34

• Aggregation results when CFs are given in the ascending order.

affected stronglyby the recentlarge values.

Associativity

2018/Sep/27

ICTer 2018, University of Colombo School of Computing 18

Developments of CFs in All Cases byAggregation with Max/Min & Normalization : fmN

IWACIII 2019, Nov. 2, 2019, Chengdu, China 35

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5

In descending order

Almost alternately

In ascending order

0.8 0.4

0.3

- 0.5

- 0.7

0.3

- 0.5

0.8

0.4

- 0.7

- 0.7 - 0.50.3

0.4

0.8

1.0

- 1.0

0.0

Development of CFs in All Cases byAggregation with MYCIN’s Rule: faN = fM

IWACIII 2019, Nov. 2, 2019, Chengdu, China 36

1 2 3 4 5

In descending order

In ascending order

Almost alternately0.3

-0.5

0.80.4

-0.7

0.4

0.3 -0.5

-0.7

-0.7-0.5 0.3

0.4

0.8

0.44

Associativity

0.8

0.0

+1.0

-1.0

2018/Sep/27

ICTer 2018, University of Colombo School of Computing 19

Development of CFs in All Cases byBatch Aggregation of Dempster’s Rule : feb

IWACIII 2019, Nov. 2, 2019, Chengdu, China 37

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5

In descending order

Almost alternately

In ascending order

0.8 0.40.3 -0.5

-0.7

0.3

-0.5

0.80.4

-0.7

-0.7-0.5

0.30.4

0.8

0.30

Associativity

0.0

+1.0

-1.0

Numerical Example (2)

IWACIII 2019, Nov. 2, 2019, Chengdu, China 38

• Suppose we have hypothesis h with a high CF (0.9) at first, then the O-hypothesis k with a low CF (-0.1) is given repeatedly.

+ 0.9, -0.1, -0.1, -0.1, ……………….., -0.1

20 times

- h : true news from the reliable source (CF = 0.9)

- k : fake news from unreliable SNS (CF = − 0.1)

h

k :

Note: multiple pieces of fake news given repeatedly might have the same source.

à In the case, they do not satisfy the Causation Independence.à But our combination rules assume Causation Independence.

2018/Sep/27

ICTer 2018, University of Colombo School of Computing 20

Effects of Repetitive O-hypothesis with Low CF

IWACIII 2019, Nov. 2, 2019, Chengdu, China 39

• Simulation Results• Aggregation without normalization are sensitive to fake news and

the value decreases rapidly. • In the case of MAX/MIN, the result is bounded by the low CF (-0.1).

So, it might be robust for the repetitive fake news compared with probabilistic sum/product.

••• •••

•••

•••

•••

•••

•••

•••

• Set theoretic approaches produces “unknown” (0.0) in the case of contradiction.

Development of CFs in Case of Repetitive Fake News

IWACIII 2019, Nov. 2, 2019, Chengdu, China 40

-1.00

-0.80

-0.60

-0.40

-0.20

0.00

0.20

0.40

0.60

0.80

1.00

1 3 5 7 9 11 13 15 17 19 21

fm fmN fa faN=fM fd1mfd1a fad fes feb

fd1a

fd1m

fm,fa

fa

fd1mfd1a

fm,fd1m

fmN, faN, fad

febfes

2018/Sep/27

ICTer 2018, University of Colombo School of Computing 21

Lessons from Examples

IWACIII 2019, Nov. 2, 2019, Chengdu, China 41

• There is no aggregation that shows the best results in all situations.

• Aggregations having Max/Min operations could be used in applicationseven when the given pieces of information have the same causation event(the same information source ) thanks to the idempotency.

e.g. Aggregation of rumor information : many rumors seem to havethe same origin, and the conditional independence cannot be assumed.

0.2, 0.3, -0.1, 0.2, 0.3 è 0.30

aggregation using Max/Min operations with Normalization

Lessons from Examples (2)

IWACIII 2019, Nov. 2, 2019, Chengdu, China 42

• Aggregation having Probabilistic operations should be used only when the conditional independence of causation events can be really assumed in the probabilistic sense.

• If it can be assumed, the aggregation would be effective to get a conclusion with a high CF from several sources with low CFs.

0.2, 0.3, -0.1, 0.2, 0.3 è 0.65

aggregation using Probabilistic operations with Normalization(MYCIN’s aggregation)

e.g. clinical diagnosis: Symptoms obtained through interviews with patients should have low CFs.

2018/Sep/27

ICTer 2018, University of Colombo School of Computing 22

Lessons from Examples (3)

IWACIII 2019, Nov. 2, 2019, Chengdu, China 43

• Aggregation with the Dempster rule should be used in situations whereprobabilistic independence among information sources can be assumed.

•The results of incremental aggregation is proved to be more moderate (closer to zero) than MYCIN’s aggregation theoretically.

0.2, 0.3, -0.1, 0.2, 0.3 è 0.55 < 0.65

Incremental aggregation with Dempster rule

Conclusion

44

• Aggregation of epistemic uncertainties in form of Possibility and CFs was discussed.

IWACIII 2019, Nov. 2, 2019, Chengdu, China

•We intend to make cognitive experiments to check how humans change the epistemic uncertainty.

• Aggregation rules should be chosen considering the mathematicalproperties and the assumptions, as well as the independence ofinformation sources.