2014 9-22

26
. . The Chow-Liu algorithm based on the MDL with discreete and continuous variables Joe Suzuki Osaka University AIGM 2014, Paris Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variables AIGM 2014, Paris 1 / 26

Upload: joe-suzuki

Post on 20-Jun-2015

86 views

Category:

Presentations & Public Speaking


0 download

DESCRIPTION

AIGM 2014

TRANSCRIPT

Page 1: 2014 9-22

.

......

The Chow-Liu algorithm based on the MDL with discreeteand continuous variables

Joe Suzuki

Osaka University

AIGM 2014, Paris

Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 1 / 26

Page 2: 2014 9-22

The Chow-Liu Algorithm

Chow-Liu

P1,··· ,N : Probability of X (1), · · · ,X (N) N (≥ 1)G = (V ,E ): Undirected GraphE := {}, V := {1, · · · ,N} (N ≥ 1), E := {{i , j}|i = j , i , j ∈ V }do E = {}

...1 choose {i , j} ∈ E that maximizes I (i , j)

...2 remove {i , j} from E

...3 if no loop is generated, add {i , j} to E

Mutual Information of X (i),X (j):

I (i , j) :=∑x(i)

∑x(j)

Pi ,j(x(i), x (j)) log

Pi ,j(x(i), x (j))

Pi (x (j))Pi (x (i))

.Tree E s.t.

∑{i ,j}∈E I (i , j) → max

..

......D(P1,··· ,N ||Q) → min

Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 2 / 26

Page 3: 2014 9-22

The Chow-Liu Algorithm

Example

Q(x (1), x (2), x (3), x (4))

=P1,2(x

(1), x (2))P1,3(x(1), x (3))P1,4(x

(1), x (4))

P1(x (1))P2(x (1)) · P1(x (1))P3(x (1)) · P1(x (1))P4(x (4))

·P1(x(1))P2(x

(2))P3(x(3))P4(x

(4))

= P(x (1))P(x (2)|x (1))P(x (3)|x (1))P(x (4)|x (1))

i 1 1 2 1 2 3

j 2 3 3 4 4 4

I (i , j) 12 10 8 6 4 2

j jj j2 4

1 3 j jj j2 4

1 3 j jj j2 4

1 3 j jj j2 4

1 3@@

Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 3 / 26

Page 4: 2014 9-22

The Chow-Liu Algorithm

Dendroid Distribution

X (1), · · · ,X (N): Discrete Random VariablesV := {1, · · · ,N}E ⊆ {{i , j}|i = j , i , j ∈ V }

Q(x (1), · · · , x (N)|E ) =∏

{i ,j}∈E

Pi ,j(x(i), x (j))

Pi (x (i))Pj(x (j))

∏i∈V

Pi (x(i)) ,

{Pi (x(i))}i∈V , {Pi ,j(x

(i), x (j))}i =j : from P1,··· ,N(x(1), · · · , x (N))

Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 4 / 26

Page 5: 2014 9-22

The Chow-Liu Algorithm

Contribution

.Starting from Data........Learning rather than Approximation

distribution P1,··· ,N

data xn = {(x (1)i , · · · , x (N)i )}ni=1

.In any database,........some fields are discrete and others continuous

Joe Suzuki: A Construction of Bayesian Networks from DatabasesBased on an MDL Principle, UAI 1993

David Edwords, et. al: Selecting high-dimensional mixed graphicalmodels using minimal AIC or BIC forests, BMC Informatics 2010

Joe Suzuki: Learning Bayesian network structures when discrete andcontinous variables are present, PGM 2014

Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 5 / 26

Page 6: 2014 9-22

The Chow-Liu Algorithm

Maximum Likelihood (ML)

{Pi (x(i))}i∈V , {Pi ,j(x

(i), x (j))}i =j are obtained from xn

 ML Estimation of MI:

I (i , j) :=∑x(i)

∑x(j)

Pi ,j(x(i), x (j)) log

Pi ,j(x(i), x (j))

Pi (x (j))Pi (x (i))

Empirical Entropy given E (minus Likelihood given E ):

Hn(xn|E ) := n∑i∈V

H(i)− n∑

{i ,j}∈E

I (i , j)

.ML seeks a tree even if X (1), · · ·X (N) are independent........The true graph is not obtained even if n → ∞

Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 6 / 26

Page 7: 2014 9-22

The Chow-Liu Algorithm

Prior Distribution over Forest (V ,E )

pij : the prior probability of X (i) ⊥⊥ X (j)

π(E ) :=1

K

∏{i ,j}∈E

1− pijpij

K :=∑ ∏

{i ,j}∈E

1− pijpij

Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 7 / 26

Page 8: 2014 9-22

The Chow-Liu Algorithm

Minimum Description Length (Suzuki, UAI-1993)

R(i) =

∫P({x (i)k }nk=1|θ)w(θ)dθ

R(i , j) =

∫P({x (i)k , x

(j)k }nk=1|θ)w(θ)dθ

Rn(xn|E ) :=∏

{i ,j}∈E

R(i , j)

R(i)R(j)

∏i∈V

R(i)

L(xn|E ) := − logR(xn|E )Description Length:

l(xn) = − log π(E ) + L(xn|E ) → min

Bayesian Estimation of MI:

J(i , j) :=1

nlog

R(i , j)

R(i)R(j)

Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 8 / 26

Page 9: 2014 9-22

The Chow-Liu Algorithm

If we expand using approximaion, we find

k(E ): # of Parameters in Eα(i): # of values X (i) takes

L(xn|E ) ≈ Hn(xn|E ) + 1

2k(E ) log n

l(xn) ≈ Hn(xn|E ) + 1

2k(E ) log n − log π(E )

J(i , j) ≈ I (i , j)− 1

2n(α(i) − 1)(α(j) − 1) log n − 1

nlog

1− pijpij

 

the orders of choosing edges are different

J(i , j) could be negative and makes a forest while I (i , j) makes a tree

Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 9 / 26

Page 10: 2014 9-22

The Chow-Liu Algorithm

Univesality

.Universal Measure w.r.t. finte set A..

......

There exists Rn s.t.1

nlog

Pn(xn)

Rn(xn)→ 0

(xn ∈ An) with Pn-Probability one as n → ∞ for any Pn.

P(i) =∏n

k=1 P(x(i)k ) , P(i , j) =

∏nk=1 P(x

(i)k , x

(j)k )

1

nlog

P(i)

R(i)→ 0 ,

1

nlog

P(i , j)

R(i , j)→ 0

Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 10 / 26

Page 11: 2014 9-22

The Chow-Liu Algorithm

Consistency

Qn(xn|E ) :=∏

{i ,j}∈E

P(i , j)

P(i)P(j)

∏i∈V

P(i)

with Prob. 1 as n → ∞ for any Qn(·|E )

1

nlog

Qn(xn|E )Rn(xn|E )

→ 0

For large n,

π(E1)Q(xn|E1) ≤ π(E2)Q(xn|E2) ⇐⇒ π(E1)R(xn|E1) ≤ π(E2)R(x

n|E2)

A maximum posterior probability forest is obtained for large n.

Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 11 / 26

Page 12: 2014 9-22

The Chow-Liu Algorithm

ML vs MDL

ML MDL

Choices Minimize Minimize

of E Hn(xn|E ) Hn(xn|E )+1

2k(E ) log n − log π(E )

Choices of {i , j} Maximize I (i , j) Maximize J(i , j)

Criteria Fitness of xn to E Fitness of xn to Eand Simplicity of E

Consistency No Yes

Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 12 / 26

Page 13: 2014 9-22

When Density Exists

When density f exists for X (Ryabko, 2009)

A0 := {A}Aj+1 is a refinement of Aj

for each j , xn = (x1, · · · , xn) ∈ Rn 7→ (a(j)1 , · · · , a(j)n ) ∈ An

j

......

......

-

-

-

A1

A2

Aj

gn1 (x

n) =Rn1 (a

(1)1 , · · · , a(1)n )

λ(a(1)1 ) · · ·λ(a(1)n )

gn2 (x

n) =Rn2 (a

(2)1 , · · · , a(2)n )

λ(a(2)1 ) · · ·λ(a(2)n )

gnj (x

n) =Rnj (a

(j)1 , · · · , a(j)n )

λ(a(j)1 ) · · ·λ(a(j)n )

λ: Lebesgue measure (width of interval), Rnj : Universal Measure w.r.t. Aj

Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 13 / 26

Page 14: 2014 9-22

When Density Exists

∑j wj = 1, wj > 0

gn(xn) :=∞∑j=1

wjgnj (x

n)

f : density functionfj (density function of level j)f n(xn) := f (x1) · · · f (xn).Ryabko 2009..

......

for any f s.t. D(f ||fj) → 0 (j → ∞)

1

nlog

f n(xn)

gn(xn)→ 0

as n → ∞

Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 14 / 26

Page 15: 2014 9-22

When Density does not exists

Extensions from Ryabko 2009

Remove the assumption that a density exists.

Remove the restricion of density class“for any f s.t. D(f ||fj) → 0 (j → ∞)” → “for any f ”

Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 15 / 26

Page 16: 2014 9-22

When Density does not exists

When density does not exist for X (Suzuki 2011)

B1 := {{1}, {2, 3, · · · }}B2 := {{1}, {2}, {3, 4, · · · }}. . .Bk := {{1}, {2}, · · · , {k}, {k + 1, k + 2, · · · }}. . .

for each level k, xn = (x1, · · · , xn) ∈ Nn 7→ (b(k)1 , · · · , b(k)n ) ∈ Bn

k

η({k}) = 1

k− 1

k + 1

gnk (y

n) :=Rnk (b

(k)1 , · · · , b(k)n )

η(b(k)1 ) · · · η(b(k)n )

∑ωk = 1, ωk > 0, gn(xn) :=

∞∑k=1

ωkgnk (x

n)

Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 16 / 26

Page 17: 2014 9-22

When Density does not exists

D(f ||fj) −→ 0 as j → ∞ (1)

∫ 1

12

f (x)dx > 0

-0 1 x

C0

C1

C2

C3...

......

Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 17 / 26

Page 18: 2014 9-22

When Density does not exists

D(f ||fj) −→ 0 as j → ∞ (2)

∫ ∞

1f (x)dx > 0

-0 1 x

C0

C1

C2

C3...

......

Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 18 / 26

Page 19: 2014 9-22

When Density does not exists

D(f ||fj) −→ 0 as j → ∞

Universal Histogram Sequence {Ck}∞k=0

...... -

xµ σ−σ x

C0

C1

C2

C3

...

.Suzuki 2013..

......

For any (generalized) density f as n → ∞ with Prob. 1

1

nlog

f n(xn)

gn(xn)→ 0

Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 19 / 26

Page 20: 2014 9-22

When Density does not exists

Computing gn(xn)

Input xn ∈ An, output gn(xn)...1 For each k = 1, · · · ,K , gn

k (xn) := 0

...2 For each k = 1, · · · ,K and each a ∈ Ak , ck(a) := 0

...3 For each i = 1, · · · , n, for each k = 1, · · · ,K...1 Find ai ∈ Ak from xi ∈ A

...2 gnk (x

n) := gnk (x

n)− logck(ai ) + 1/2

i − 1 + |Ak |/2+ log(ηX (ai ))

...3 ck(ai ) := ck(ai ) + 1

...4 gn(xn) := 1K

∑Kk=1 g

nk (x

n)

Universal Measure w.r.t. Ak

Rnk (x

n) =n∏

i=1

c(a(k)i ) + 1/2

i − 1 + |Ak |/2

Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 20 / 26

Page 21: 2014 9-22

When Density does not exists

Computation: O(nN2K )

.Computing gn(xn) and gn(xn, yn)..

......

O(nN2K )(O(nN2) for discrete case)

Proportional to n and N + N(N − 1)/2

a(1)i 7→ a

(2)i 7→ · · · 7→ a

(K)i : Binary Search

Proprtional to K

gn(xn, yn) can be obtained byK∑

k=1

ωkgnk,k(x

n, yn) rather thanJ∑

j=1

K∑k=1

ωjkgnjk(x

n, yn).

.Computng MI and finding the forest........N(N − 1)/2

Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 21 / 26

Page 22: 2014 9-22

When Density does not exists

Bayesian Estimator of Mutual Information

J(i , j) =1

nlog

gn(i , j)

gn(i)gn(j)− 1

nlog

1− pi ,jpij

age height menarche sex igf1 tanner testvol weight

age NA 0.7627465 0.8521553 0.01010264 0.5138440 0.52534862 0.1997714 0.6091554

height NA NA 0.6706380 0.26225428 0.4132932 0.68547041 0.3105466 0.9269808

menarche NA NA NA 0.68786102 0.4919746 0.84283639 0.0000000 0.6456718

sex NA NA NA NA 0.2778511 0.08923994 0.1083901 0.1925525

igf1 NA NA NA NA NA 0.47529101 0.2272998 0.3722551

tanner NA NA NA NA NA NA 0.3796768 0.6420483

testvol NA NA NA NA NA NA NA 0.2409487

weight NA NA NA NA NA NA NA NA

Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 22 / 26

Page 23: 2014 9-22

When Density does not exists

R ISwR package juul2

The juul data frame has 1339 rows and 6 columns. It contains a referencesample of the distribution of insulin-like growth factor (IGF-I), oneobservation per subject in various ages, with the bulk of the data collectedin connection with school physical examinations.

����

����

����

����

����

����

����

����

weight height

sex

age

tanner

igf1

menar-che

testvol

Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 23 / 26

Page 24: 2014 9-22

When Density does not exists

Experiments

n 100 500 1000 2000

Jn(i , j) 0.90 0.99 1.86 3.15HSIC 0.50 9.51 40.28 185.53

(a) N = 4

n 100 500 1000 2000

perfectly matching rate 0.52 0.60 0.72 0.79K-L divergence loss 0.0169 0.00303 0.00152 0.000405execution time (sec) 1.64 12.71 22.45 51.24

(b) N = 4

n 100 500 1000 2000

perfectly matching rate 0.18 0.31 0.38 0.59K-L divergence loss 0.0652 0.00800 0.00575 0.00298execution time (sec) 4.27 24.44 52.5 116.1

Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 24 / 26

Page 25: 2014 9-22

When Density does not exists

Experiments

data.frame n N discrete timeContinuous (sec)

airquality 153 6 (d,d,c,d,d,d) 10.47anscombe 51 4 (d,c,c,d) 3.32attenu 182 5 (d,c,d,c,c) 9.64attitude 30 7 (d,d,d,d,d,d,d) 4.26beaver1 114 4 (d,d,c,d) 2.54beaver2 100 4 (d,d,c,d) 2.73BOD 6 2 (d,c) 0.11cars 50 2 (d,d) 0.80ChickWeight 578 4 (d,d,d,d) 13.01chickwts 71 2 (d,d) 0.98CO2 84 5 (d,d,d,d,c) 3.33DNase 176 3 (d,c,c) 2.36esoph 88 5 (d,d,d,d,d) 2.12faithful 272 2 (c,d) 1.52Formaldehyde 6 2 (c.c) 0.18freeny 39 5 (c,c,c,c,c) 2.57Indometh 66 3 (d,c,c) 0.97Infert 248 8 (d,d,d,d,d,d,d,

d) 13.91InsecSprays 72 2 (d,d) 0.23iris 150 5 (c,c,c,c,d) 6.94LifeCycleSavings 50 5 (c,c,c,c,c) 3.1Lobllolly 84 3 (c,d,d) 1.01longley 16 7 (c,c,c,c,c,d,c) 2.26morley 100 3 (d,d,d) 1.21mtcars 32 11 (c,c,c,c,c,c,c,

c,c,c,c) 6.73Orange 35 3 (d,d,d) 0.5OrchadSprays 64 4 (d,d,d,d) 1.09PlantGrowth 30 2 (c,d) 0.16pressure 19 2 (d,c) 0.22Puromycin 23 3 (c,d,d) 0.34quakes 1000 5 (c,c,c,c,d) 56.12sleep 20 3 (c,c,d) 0.48stackloss 21 4 (d,d,d,d) 0.53swiss 47 6 (c,c,d,d,c,c) 4.18Theoph 132 5 (d,c,c,c,c) 6.94ToothGrowth 60 4 (d,c,d,c) 1.11trees 31 3 (c,d,c) 0.58USArrests 50 4 (c,d,d,c) 1.87USJudgeRatings 43 12 (c,c,c,c,c,c,c,

c,c,c,c,c) 13.66warpbreaks 54 3 (d,d,d) 0.27women 15 2 (d,d) 0.9

Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 25 / 26

Page 26: 2014 9-22

Conclusion

Conclusion

.Establish Chow-Liu Learning based on MDL without assuming eitherDiscrete or Continuous..

......

Theoretical Analysis w.r.t. n,N,K (K : quantization depth)

Realistic Computation using R

 Insight:

The implimation is not hard

The computation is proportional to K

 Future Works:

Optimal K w.r.t. n,N

Exponential Memory w.r.t. K

R Package Publication

Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 26 / 26