conditional probability distributions eran segal weizmann institute

39
Conditional Probability Distributions Eran Segal Weizmann Institute

Upload: shanna-fisher

Post on 14-Jan-2016

229 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Conditional Probability Distributions Eran Segal Weizmann Institute

Conditional Probability Distributions

Eran Segal Weizmann Institute

Page 2: Conditional Probability Distributions Eran Segal Weizmann Institute

Last Time Local Markov assumptions – basic BN independencies d-separation – all independencies via graph structure G is an I-Map of P if and only if P factorizes over G I-equivalence – graphs with identical independencies Minimal I-Map

All distributions have I-Maps (sometimes more than one) Minimal I-Map does not capture all independencies in P

Perfect Map – not every distribution P has one PDAGs

Compact representation of I-equivalence graphs Algorithm for finding PDAGs

Page 3: Conditional Probability Distributions Eran Segal Weizmann Institute

CPDs Thus far we ignored the representation of CPDs Today we will cover the range of CPD

representations Discrete Continuous Sparse Deterministic Linear

Page 4: Conditional Probability Distributions Eran Segal Weizmann Institute

Table CPDs Entry for each joint assignment of X and Pa(X) For each pax: Most general representation Represents every discrete CPD

Limitations Cannot model continuous RVs Number of parameters exponential in |Pa(X)| Cannot model large in-degree dependencies Ignores structure within the CPD

S

I s0 s1

i0 0.95

0.05

i1 0.2 0.8

I

i0 i1

0.7 0.3

P(S|I)P(I)

I

S

)(

1)|(XValx

xpaxP

Page 5: Conditional Probability Distributions Eran Segal Weizmann Institute

Structured CPDs Key idea: reduce parameters by modeling P(X|

PaX) without explicitly modeling all entries of the joint

Lose expressive power (cannot represent every CPD)

Page 6: Conditional Probability Distributions Eran Segal Weizmann Institute

Deterministic CPDs There is a function f: Val(PaX) Val(X) such

that

Examples OR, AND, NAND functions Z = Y+X (continuous variables)

otherwise0

)(1)|( xx

pafxpaxP

Page 7: Conditional Probability Distributions Eran Segal Weizmann Institute

Deterministic CPDs

S

T1 T2 s0 s1

t0 t0 0.95 0.05

t0 t1 0.2 0.8

t1 t0 0.2 0.8

t1 t1 0.2 0.8

T1

S

T2 T1

T

T2

S

S

T s0 s1

t0 0.95

0.05

t1 0.2 0.8

T

T1 T2 t0 t1

t0 t0 1 0

t0 t1 0 1

t1 t0 0 1

t1 t1 0 1

Replace spurious dependencies with deterministic CPDs

Need to make sure that deterministic CPD is compactly stored

Page 8: Conditional Probability Distributions Eran Segal Weizmann Institute

Deterministic CPDs Induce additional conditional independencies Example: T is any deterministic function of

T1,T2

T1

T

T2

S1

Ind(S1;S2 | T1,T2)

S2

Page 9: Conditional Probability Distributions Eran Segal Weizmann Institute

Deterministic CPDs Induce additional conditional independencies Example: C is an XOR deterministic function of

A,B

D

C

A

E

BInd(D;E | B,C)

Page 10: Conditional Probability Distributions Eran Segal Weizmann Institute

Deterministic CPDs Induce additional conditional independencies Example: T is an OR deterministic function of

T1,T2

T1

T

T2

S1

Ind(S1;S2 | T1=t1)

S2

Context specific independencies

Page 11: Conditional Probability Distributions Eran Segal Weizmann Institute

Context Specific Independencies

Let X,Y,Z be pairwise disjoint RV sets Let C be a set of variables and cVal(C) X and Y are contextually independent given Z and c,

denoted (XcY | Z,c) if: 0),,(whenever),|(),,|( cPcPcP ZYZXZYX

Page 12: Conditional Probability Distributions Eran Segal Weizmann Institute

Tree CPDs

D

A B C d0 d1

a0 b0 c0 0.2 0.8

a0 b0 c1 0.2 0.8

a0 b1 c0 0.2 0.8

a0 b1 c1 0.2 0.8

a1 b0 c0 0.9 0.1

a1 b0 c1 0.7 0.3

a1 b1 c0 0.4 0.6

A1 b1 C1 0.4 0.6

A

D

B C A

D

B C

A

B

C

(0.2,0.8)

(0.4,0.6)

(0.7,0.3)(0.9,0.1)

a0 a1

b1b0

c1c0

4 parameters8 parameters

Page 13: Conditional Probability Distributions Eran Segal Weizmann Institute

Activator Repressor

Regulated gene

Activator Repressor

Regulated gene

Activator

Regulated gene

Repressor

State 1

Act

ivat

or

State 2

Act

ivat

or

Repressor

State 3

Gene Regulation: Simple Example

Regulated gene

DNA Microarray

Regulators

DNA Microarray

Regulators

Page 14: Conditional Probability Distributions Eran Segal Weizmann Institute

truefalse

truefalse

Regulation Tree

Activator?

Repressor?

State 1 State 2 State 3

true Regulation

program

Module

genes

Activator expressio

n

Repressor expressio

n

Segal et al., Nature Genetics ’03

Page 15: Conditional Probability Distributions Eran Segal Weizmann Institute

Respiration Module

Regulation

program

Module genes

Hap4+Msn4 known to regulate module genes

Module genes known targets of predicted regulators?Predicted regulator

Segal et al., Nature Genetics ’03

Page 16: Conditional Probability Distributions Eran Segal Weizmann Institute

Rule CPDs A rule r is a pair (c;p) where c is an assignment

to a subset of variables C and p[0,1]. Let Scope[r]=C

A rule-based CPD P(X|Pa(X)) is a set of rules R s.t. For each rule rR Scope[r]{X}Pa(X) For each assignment (x,u) to {X}Pa(X) we have

exactly one rule (c;p)R such that c is compatible with (x,u).Then, we have P(X=x | Pa(X)=u) = p

Page 17: Conditional Probability Distributions Eran Segal Weizmann Institute

Rule CPDs Example

Let X be a variable with Pa(X) = {A,B,C} r1: (a1, b1, x0; 0.1) r2: (a0, c1, x0; 0.2) r3: (b0, c0, x0; 0.3) r4: (a1, b0, c1, x0; 0.4) r5: (a0, b1, c0; 0.5) r6: (a1, b1, x1; 0.9) r7: (a0, c1, x1; 0.8) r8: (b0, c0, x1; 0.7) r9: (a1, b0, c1, x1; 0.6)

Note: each assignment maps to exactly one rule Rules cannot always be represented compactly

within tree CPDs

Page 18: Conditional Probability Distributions Eran Segal Weizmann Institute

Tree CPDs and Rule CPDs Can represent every discrete function Can be easily learned and dealt with in

inference But, some functions are not represented

compactly XOR in tree CPDs: cannot split in one step on a0,b1

and a1,b0

Alternative representations exist Complex logical rules

Page 19: Conditional Probability Distributions Eran Segal Weizmann Institute

Context Specific Independencies

A

D

B C

C

(0.2,0.8) (0.4,0.6)(0.7,0.3)(0.9,0.1)

a0 a1

b1b0c1c0

B

A

A

D

B C

A=a1 Ind(D,C | A=a1)

A

D

B C

A=a0 Ind(D,B | A=a0)

Reasoning by cases implies that Ind(B,C | A,D)

Page 20: Conditional Probability Distributions Eran Segal Weizmann Institute

Independence of Causal Influence

X1

Y

X2 Xn... Causes: X1,…Xn

Effect: Y

General case: Y has a complex dependency on X1,…Xn

Common case Each Xi influences Y separately Influence of X1,…Xn is combined to an overall

influence on Y

Page 21: Conditional Probability Distributions Eran Segal Weizmann Institute

Example 1: Noisy OR Two independent effects X1, X2

Y=y1 cannot happen unless one of X1, X2 occurs P(Y=y0 | X1=x1

0 , X2=x20) = P(X1=x1

0)P(X2=x20)

Y

X1 X2 y0 y1

x10 x2

0 1 0

x10 x2

1 0.2 0.8

x11 x2

0 0.1 0.9

x11 x2

1 0.02 0.98

X1

Y

X2

Page 22: Conditional Probability Distributions Eran Segal Weizmann Institute

Noisy OR: Elaborate Representation

Y

X’1 X’2 y0 y1

x10 x2

0 1 0

x10 x2

1 0 1

x11 x2

0 0 1

x11 x2

1 0 1

X’1

Y

X’2

X1 X2

Deterministic OR

X’1

X1 x10 x1

1

x10 1 0

x11 0.1 0.9

Noisy CPD 1

X’2

X2 x20 x2

1

x20 1 0

x21 0.2 0.8

Noisy CPD 2Noise parameter X1

=0.9 Noise parameter X1=0.8

Page 23: Conditional Probability Distributions Eran Segal Weizmann Institute

Noisy OR: Elaborate Representation

Decomposition results in the same distribution

),|( 122

111

0 xXxXyYP

02.0

08.09.002.09.008.01.012.01.0

)','|()|'()|'(

),,','|(),,'|'(),|'(

),|',',(

21

21

21

','21

01222

1111

','

122

11121

0122

11112

122

1111

','

122

11121

0

XX

XX

XX

XXyYPxXXPxXXP

xXxXXXyYPxXxXXXPxXxXXP

xXxXXXyYP

Page 24: Conditional Probability Distributions Eran Segal Weizmann Institute

Noisy OR: General Case Y is a binary variable with k binary parents

X1,...Xn

CPD P(Y | X1,...Xn) is a noisy OR if there are k+1 noise parameters 0,1,...,n such that

)1()1(),...,|(1:

010

ixXi

k

ii

XXyYP

)1()1(1),...,|(1:

011

ixXi

k

ii

XXyYP

Page 25: Conditional Probability Distributions Eran Segal Weizmann Institute

Noisy OR Independencies

X’1

Y

X’2

X1 X2

X’n

Xn

...

ij: Ind(Xi Xj | Y=yo)

Page 26: Conditional Probability Distributions Eran Segal Weizmann Institute

Generalized Linear Models Model is a soft version of a linear threshold

function Example: logistic function

Binary variables X1,...Xn,Y

k

iiio

k

Xww

XXyYP

1

11

)1(exp1

1),...,|(

1

Page 27: Conditional Probability Distributions Eran Segal Weizmann Institute

General Formulation Let Y be a random variable with parents X1,...Xn

The CPD P(Y | X1,...Xn) exhibits independence of causal influence (ICI) if it can be described by

The CPD P(Z | Z1,...Zn) is deterministic

Z1

Z

Z2

X1 X2

Y

Zn

Xn...

... Noisy OR

Zi has noise model

Z is an OR function

Y is the identity CPD

Logistic

Zi = wi1(Xi=1)

Z = Zi

Y = logit (Z)

Page 28: Conditional Probability Distributions Eran Segal Weizmann Institute

General Formulation

Key advantage: O(n) parameters

As stated, not all that useful as any complex CPD can be represented through a complex deterministic CPD

Page 29: Conditional Probability Distributions Eran Segal Weizmann Institute

Continuous Variables One solution: Discretize

Often requires too many value states Loses domain structure

Other solution: use continuous function for P(X|Pa(X)) Can combine continuous and discrete variables,

resulting in hybrid networks Inference and learning may become more difficult

Page 30: Conditional Probability Distributions Eran Segal Weizmann Institute

Gaussian Density Functions Among the most common continuous

representations Univariate case: 2

2

12

2

1)(),(~)(

x

eXpNXP if

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

-4 -2 0 2 4

Page 31: Conditional Probability Distributions Eran Segal Weizmann Institute

Gaussian Density Functions A multivariate Gaussian distribution over

X1,...Xn has Mean vector nxn positive definite covariance matrix

positive definite: Joint density function:

i=E[Xi] ii=Var[Xi] ij=Cov[Xi,Xj]=E[XiXj]-E[Xi]E[Xj] (ij)

0: xxx Tn

)()(

2

1exp

||)2(

1)( 1

2/12/

xxxp T

n

Page 32: Conditional Probability Distributions Eran Segal Weizmann Institute

Gaussian Density Functions Marginal distributions are easy to compute

Independencies can be determined from parameters If X=X1,...Xn have a joint normal distribution N(;)

then Ind(Xi Xj) iff ij=0 Does not hold in general for non-Gaussian distributions

YYYX

XYXX

Y

XYX ;),(

NP

XXXX ;)( NP

Page 33: Conditional Probability Distributions Eran Segal Weizmann Institute

Linear Gaussian CPDs Y is a continuous variable with parents X1,...Xn

Y has a linear Gaussian model if it can be described using parameters 0,...,n and 2 such that Vector notation:

Pros Simple Captures many interesting dependencies

Cons Fixed variance (variance cannot depend on parents

values)

);...(),...,|( 21101 nnn xxNxxYP

);()|( 20 xx TNYP

Page 34: Conditional Probability Distributions Eran Segal Weizmann Institute

Linear Gaussian Bayesian Network

A linear Gaussian Bayesian network is a Bayesian network where All variables are continuous All of the CPDs are linear Gaussians

Key result: linear Gaussian models are equivalent to multivariate Gaussian density functions

Page 35: Conditional Probability Distributions Eran Segal Weizmann Institute

Equivalence Theorem Y is a linear Gaussian of its parents X1,...Xn:

Assume that X1,...Xn are jointly Gaussian with N(;)

Then: The marginal distribution of Y is Gaussian with

N(Y;Y2)

The joint distribution over {X,Y} is Gaussian where

);()|( 20 xx TNYP

TY

TY

220 μ

n

j ijji YXCov1

];[

Linear Gaussian BNs define a joint Gaussian distribution

Page 36: Conditional Probability Distributions Eran Segal Weizmann Institute

Converse Equivalence Theorem If {X,Y} have a joint Gaussian distribution then

Implications of equivalence Joint distribution has compact representation: O(n2) We can easily transform back and forth between

Gaussian distributions and linear Gaussian Bayesian networks

Representations may differ in parameters Example:

Gaussian distribution has full covariance matrix Linear Gaussian

);()|( 20 XX TNYP

XnX2X1 ...

Page 37: Conditional Probability Distributions Eran Segal Weizmann Institute

Hybrid Models Models of continuous and discrete variables

Continuous variables with discrete parents Discrete variables with continuous parents

Conditional Linear Gaussians Y continuous variable X = {X1,...,Xn} continuous parents U = {U1,...,Um} discrete parents

A Conditional Linear Bayesian network is one where Discrete variables have only discrete parents Continuous variables have only CLG CPDs

21 ,0, ;),|(: uuu xxuUu

n

i iiaaNYP

Page 38: Conditional Probability Distributions Eran Segal Weizmann Institute

Hybrid Models Continuous parents for discrete children

Threshold models

Linear sigmoid

otherwise05.0

109.0)|( 1 xxyYP

k

iiio

k

xww

xxyYP

1

11

)exp1

1),...,|(

Page 39: Conditional Probability Distributions Eran Segal Weizmann Institute

Summary: CPD Models Deterministic functions Context specific dependencies Independence of causal influence

Noisy OR Logistic function

CPDs capture additional domain structure