bayesian networks c. learning / c.3 structure learning

34
Bayesian Networks C. Learning / C.3 Structure Learning / PC Algorithm Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim http://www.ismll.uni-hildesheim.de Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of Hildesheim Course on Bayesian Networks, winter term 2016/2017 1/31

Upload: others

Post on 28-May-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bayesian Networks C. Learning / C.3 Structure Learning

Bayesian Networks

C. Learning / C.3 Structure Learning / PC Algorithm

Lars Schmidt-Thieme

Information Systems and Machine Learning Lab (ISMLL)Institute of Computer Science

University of Hildesheimhttp://www.ismll.uni-hildesheim.de

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 1/31

Page 2: Bayesian Networks C. Learning / C.3 Structure Learning

Bayesian Networks

Syllabus

Mon. 24.10. (1) 0. Introduction

A. Fundamentals& A.1. Probability Calculus revisited

Mon. 31.10. (2) A.2. Separation in GraphsMon. 7.11. (3) A.3. Bayesian and Markov NetworksMon. 14.11. (4) (ctd.)

B. InferenceMon. 21.11. (5) B.1. Exact InferenceMon. 28.11. (6) B.2. Exact Inference / Join TreeMon. 5.12. (7) (ctd.)Mon. 12.12. (8) B.3. Approximate Inference / SamplingMon. 19.12. (9) B.4. Approximate Inference / Propagation

C. LearningMon. 9.1. (10) C.1. Parameter LearningMon. 16.1. (11) C.2. Parameter Learning with Missing ValuesMon. 23.1. (12) C.3. Structure Learning / PC AlgorithmMon. 30.1. (13) C.4. Structure Learning / Search Methods

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 1/31

Page 3: Bayesian Networks C. Learning / C.3 Structure Learning

Bayesian Networks

1. PC Algorithm

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 1/31

Page 4: Bayesian Networks C. Learning / C.3 Structure Learning

Bayesian Networks / 1. PC Algorithm

Types of Methods for Structure Learning

There are three types of structure learning algorithms forBayesian networks:

1. constrained-based learning (e.g., PC),2. searching with a target function (e.g., K2),3. hybrid methods (e.g., sparse candidate).

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 1/31

Page 5: Bayesian Networks C. Learning / C.3 Structure Learning

Bayesian Networks / 1. PC Algorithm

Computing the Skeleton

Lemma 1 (Edge Criterion). Let G := (V,E) be a DAG andX, Y ∈ V . Then it is equivalent:

(i) X and Y cannot be separated by any Z, i.e.,

¬IG(X, Y | Z) ∀Z ⊆ V \ {X, Y }(ii) There is an edge between X and Y , i.e.,

(X, Y ) ∈ E or (Y,X) ∈ E

Definition 1. Any Z ⊆ V \ {X, Y } with IG(X, Y | Z) iscalled a separator of X and Y .

Sep(X, Y ) := {Z ⊆ V \ {X, Y } | IG(X, Y | Z)}

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 2/31

Page 6: Bayesian Networks C. Learning / C.3 Structure Learning

Bayesian Networks / 1. PC Algorithm

Computing the Skeleton / Separators

1 separators-basic(set of variables V, independency relation I) :2 Allocate S : P2(V ) → P(V ) ∪ {none}3 for {X, Y } ⊆ V do4 S({X, Y }) := none5 for T ⊆ V \ {X, Y } do6 if I(X, Y |T )7 S({X, Y }) := T8 break9 fi

10 od11 od12 return S

Figure 1: Compute a separator for each pair of variables.

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 3/31

Page 7: Bayesian Networks C. Learning / C.3 Structure Learning

Bayesian Networks / 1. PC Algorithm

Example / 1/3 – Computing the Skeleton

Let I be the following independency structure:

I(A,D |C), I(A,D | {C,B}), I(B,D)

Then we can compute the followingseparators:

S(A,B) := noneS(A,C) := noneS(A,D) := {C}S(B,C) := noneS(B,D) := ∅S(C,D) := none

Thus, the skeleton of the BayesianNetwork representing I looks like

D

B

A

C

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 4/31

Page 8: Bayesian Networks C. Learning / C.3 Structure Learning

Bayesian Networks / 1. PC Algorithm

Computing the V-structure

Lemma 2 (Uncoupled Head-to-head Meeting Criterion). LetG := (V,E) be a DAG, X, Y, Z ∈ V with

x

Z

YX

Then it is equivalent:

(i) X → Z ← Y is an uncoupled head-to-head meeting, i.e.,

(X,Z), (Y, Z) ∈ E, (X, Y ), (Y,X) 6∈ E(ii) Z is not contained in any separator of X and Y , i.e.,

Z 6∈ S ∀S ∈ Sep(X, Y )

(iii) Z is not contained in at least one separator of X and Y , i.e.,

Z 6∈ S ∃S ∈ Sep(X, Y )

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 5/31

Page 9: Bayesian Networks C. Learning / C.3 Structure Learning

Bayesian Networks / 1. PC Algorithm

Computing Skeleton and V-structure

1 vstructure(set of variables V, independency relation I) :2 S := separators(V, I)3 G := (V,E) with E := {{X, Y } |S({X, Y }) = none}4 for X, Y, Z ∈ V with X − Z − Y,X−6 −Y do5 if Z 6∈ S(X, Y )6 orient X − Z − Y as X → Z ← Y7 fi8 od9 return G

Figure 2: Compute skeleton and v-structure.

1 learn-structure-pc(set of variables V, independency relation I) :2 G := vstructure(V, I)3 saturate(G)4 return G

Figure 3: Learn structure of a Bayesian Network (SGS/PC algorithm,[SGS00]).

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 6/31

Page 10: Bayesian Networks C. Learning / C.3 Structure Learning

Bayesian Networks / 1. PC Algorithm

Example / 2/3 – Computing the V-Structure

Separators:S(A,B) := noneS(A,C) := noneS(A,D) := {C}S(B,C) := noneS(B,D) := ∅S(C,D) := none

Skeleton:

D

B

A

C

Checking A–C–D:

D

B

A

C

Checking B–C–D:

D

B

A

C

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 7/31

Page 11: Bayesian Networks C. Learning / C.3 Structure Learning

Bayesian Networks / 1. PC Algorithm

Example / 3/3 – Saturating

Skeleton and v-structure:

C

A

B

D

Saturating:

rule 1 rule 2

C

A

B

D

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 8/31

Page 12: Bayesian Networks C. Learning / C.3 Structure Learning

Bayesian Networks / 1. PC Algorithm

Number of Independency Tests

Let there be n variables.For each of the

(n2

)pairs of variables, there are 2n−2

candidates for possible separators.

number of I-tests =

(n

2

)2n−2

Example: n = 4:(n

2

)2n−2 =

(4

2

)22 = 6 · 4 = 24

If we start with small separators and stop once aseparator has been found, we still have to check

4 · (1 + 2 + 1) + 1 · (1 + 2) + 1 · 1 = 20

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 9/31

Page 13: Bayesian Networks C. Learning / C.3 Structure Learning

Bayesian Networks / 1. PC Algorithm

Number of Independency Tests

Can we reduce the number of tests for a given pair ofvariable by reusing results for other pairs of variables?

Lemma 3. Let G := (V,E) be a DAG and X, Y ∈ Vseparated. Then

I(X, Y | pa(X)) or I(X, Y | pa(Y ))

As we do not know directions of edges at the skeletonrecovery step, we use the weaker result:

I(X, Y | fan(X)) or I(X, Y | fan(Y ))

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 10/31

Page 14: Bayesian Networks C. Learning / C.3 Structure Learning

Bayesian Networks / 1. PC Algorithm

Computing the Skeleton / Separators(1) separators-remove-edges(separator map S, skeleton graph G, independency relation I) :(2) i := 0(3) while ∃X ∈ V : | fanG(X)| > i do(4) for {X, Y } ∈ E with | fanG(X)| > i or | fanG(Y )| > i do(5) for T ∈ P i(fanG(X) \ {Y }) ∪ P i(fanG(Y ) \ {X}) do(6) if I(X, Y | T )(7) S({X, Y }) := T(8) E := E \ {{X, Y }}(9) break

(10) fi(11) od(12) od(13) i := i+ 1(14) od(15) return S

1 separators-interlaced(set of variables V, independency relation I) :2 Allocate S : P2(V ) → P(V ) ∪ {none}3 S({X, Y }) := none ∀{X, Y } ⊆ V4 G := (V,E) with E := P2(V )5 separators-remove-edges(S,G, I)6 return S

Figure 4: Compute a separator for each pair of variables.Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 11/31

Page 15: Bayesian Networks C. Learning / C.3 Structure Learning

Bayesian Networks / 1. PC Algorithm

Example / Computing the Separators (1/3)

I(A,D |C), I(A,D | {C,B}), I(B,D)

i = 0 :A, B, T = ∅: —

C, T = ∅: —D, T = ∅: —

B, C, T = ∅: —D, T = ∅: D({B,D}) = ∅

C, D, T = ∅: —

initial graph:

D

B

A

C

after update for B,D:

D

B

A

C

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 12/31

Page 16: Bayesian Networks C. Learning / C.3 Structure Learning

Bayesian Networks / 1. PC Algorithm

Example / Computing the Separators (2/3)

I(A,D |C), I(A,D | {C,B}), I(B,D)

i = 1 :A, B, T = {C}, {D}: —

C, T = {B}, {D}: —D, T = {B}, {C}: S({A,D}) = {C}

B, C, T = {A}, {D}: —C, D, T = {A}, {B}: —

after update for B,D:

D

B

A

C

after update for A,D:

C

A

B

D

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 13/31

Page 17: Bayesian Networks C. Learning / C.3 Structure Learning

Bayesian Networks / 1. PC Algorithm

Example / Computing the Separators (3/3)

I(A,D |C), I(A,D | {C,B}), I(B,D)

i = 2 :A, C, T = {B,D}: —B, C, T = {A,D}: —C, D, T = {A,B}: —

total: 19 I-tests.

after update for A,D:

C

A

B

D

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 14/31

Page 18: Bayesian Networks C. Learning / C.3 Structure Learning

Bayesian Networks / 1. PC Algorithm

Algorithms – SGS vs. PC

SGS/PC with separators-basic is called SGSalgorithm ([SGS00], 1990).

SGS/PC with separators-interlaced is called PCalgorithm ([SGS00], 1991).

Implementations are available:

• in Tetradhttp://www.phil.cmu.edu/projects/tetrad/(class files & javadocs, no sources)• in Hugin (commercial).

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 15/31

Page 19: Bayesian Networks C. Learning / C.3 Structure Learning

Bayesian Networks / 1. PC Algorithm

Another Example

Let I be the following independency structure:

I(A,D |B), I(B,D)

PC computes the following DAG pattern:

A

B

D

But this is not even a representation of I, as it implies

IG(A,D | ∅)

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 16/31

Page 20: Bayesian Networks C. Learning / C.3 Structure Learning

Bayesian Networks / 1. PC Algorithm

Representation and Faithfulness Tests

PC computes the DAG pattern of an independencyrelation

if there exists one at all !

Remember: not every independency relation has afaithful DAG representation.

But how do we know if an independency relation has afaithful DAG representation?

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 17/31

Page 21: Bayesian Networks C. Learning / C.3 Structure Learning

Bayesian Networks / 1. PC Algorithm

Representation and Faithfulness Tests

There is no easy way to decide if a given independencyrelation has a faithful DAG representation.

So just check ex-post if the DAG pattern we have foundis a faithful representation.

Check:

1. compute all IG(X, Y | Z) and check if I(X, Y | Z)(representation),

2. check for each I(X, Y | Z) if IG(X, Y | Z) (faithfulness).

As there is no easy way to enumerate IG(X, Y | Z) for aDAG pattern G directly, we draw a representative H andthen enumerate IH(X, Y | Z) (remember that IH = IG).

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 18/31

Page 22: Bayesian Networks C. Learning / C.3 Structure Learning

Bayesian Networks / 1. PC Algorithm

Draw a Representative of a DAG pattern

1 draw-representative(DAG pattern G) :2 saturate(G)3 while G has unoriented edges do4 draw an edge from G an orient it arbitrarily5 saturate(G)6 od7 return G

Figure 5: Draw a representative of a DAG pattern.

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 19/31

Page 23: Bayesian Networks C. Learning / C.3 Structure Learning

Bayesian Networks / 1. PC Algorithm

Draw a Representative of a DAG pattern

Here, rule 4 is necessary in saturate.

W

Y

Z

X

W

Y

Z

X

Start with DAG pattern

X

Z

Y

W

that is already saturated

and choose as next edge X–W andorient it as X → W . Then rule 4 has tobe applied to saturate the resultingDAG pattern

X

Z

Y

W

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 20/31

Page 24: Bayesian Networks C. Learning / C.3 Structure Learning

Bayesian Networks / 1. PC Algorithm

Draw a Representative of a DAG pattern / Example

step 1a: saturate.X

Z

Y

W

step 1b: orient any unoriented edge:

W

Y

Z

X

step 2a: saturate.

rule 4

W

Y

Z

X

step 2b: orient any unoriented edge:

rule 4

X

Z

Y

W

step 3a: saturate.

rule 1

rule 4

X

Z

Y

W

done.

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 21/31

Page 25: Bayesian Networks C. Learning / C.3 Structure Learning

Bayesian Networks / 1. PC Algorithm

Representation and Faithfulness Tests

If I = Ip the probabilistic independency relation of a JPDp, then for (1) it suffices to check the Markov property,i.e.,

for all X check if Ip(X, nondesc(X) \ pa(X) | pa(X))

It even suffices to check for any topological orderingσ = (X1, . . . , Xn) if

Ip(σ(i), σ({1, . . . , i− 1}) \ pa(σ(i)) | pa(σ(i)))

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 22/31

Page 26: Bayesian Networks C. Learning / C.3 Structure Learning

Bayesian Networks / 1. PC Algorithm

A Third Method to Compute Separators

separators-interlaced computes separatorstop-down by

• starting with a complete graph and then• successively thining the graph.

Therefore, we have to start checking with lots ofcandidates for possible separators.

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 23/31

Page 27: Bayesian Networks C. Learning / C.3 Structure Learning

Bayesian Networks / 1. PC Algorithm

A Third Method to Compute Separators

1 separators-add-edges(separator map S, skeleton graph G, independency relation I) :2 S({X, Y }) := argminT⊆fanG(X),T⊆fanG(Y ) g(X, Y, T ) ∀{X, Y } ∈ P2(V )

3 {X0, Y0} := argmax{X,Y }∈P2(V ) g(X, Y, S({X, Y }))4 while ¬I(X0, Y0 |S({X0, Y0})) do5 E := E ∪ {{X0, Y0}}6 S({X0, Y0}) := none7 S({X, Y }) := argminT⊆fanG(X),T⊆fanG(Y ) g(X, Y, T ) ∀{X, Y } ∈ P2(V ) \ E,X ∈ {X0, Y0}8 {X0, Y0} := argmax{X,Y }∈P2(V )\E g(X, Y, S({X, Y }))9 od

10 return S

1 separators-bottumup(set of variables V, independency relation I) :2 Allocate S : P2(V ) → P(V ) ∪ {none}3 G := (V,E) with E := ∅4 separators-add-edges(S,G, I)5 separators-remove-edges(S,G, I)6 return S

Figure 6: Compute a separator for each pair of variables [TASB03].

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 24/31

Page 28: Bayesian Networks C. Learning / C.3 Structure Learning

Bayesian Networks / 1. PC Algorithm

Embedded Faithfulness

Let I be the following independency structure:

I(A,D), I(A,E), I(A,E |B), I(A,E |D), I(B,E)

EA C

B D

Assume C to be hidden.

d-separations between variables A,B,D, and E:

I(A,D), I(A,E), I(A,E |B), I(A,E |D), I(B,E)

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 25/31

Page 29: Bayesian Networks C. Learning / C.3 Structure Learning

Bayesian Networks / 1. PC Algorithm

Embedded Faithfulness

Definition 2. Let I be an independency relation on thevariables V and G be a DAG with vertices W ⊇ V .I is embedded in G, if all independency statementsentailed by G between variables from V hold in I:

IG(X ,Y |Z)⇒ I(X ,Y |Z) ∀X ,Y ,Z ⊆ V

I is embedded faithfully in G, if the independencystatements entailed by G are exactly I:

IG(X ,Y |Z)⇔ I(X ,Y |Z) ∀X ,Y ,Z ⊆ V

Many independency relations w./o. faithful DAGrepresentation can be embedded faithfully.

But not every independency relation can be embeddedfaithfully.

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 26/31

Page 30: Bayesian Networks C. Learning / C.3 Structure Learning

Bayesian Networks / 1. PC Algorithm

Embedded Faithfulness / Example

Let I be the independency relation [Nea03, ex. 2.13,p. 103]

I(X, Y ), I(X, Y |Z)

Z

X YAssume, I can be embedded faithfully in a DAG G.

• As ¬I(X,Z), there must be chain X ∼ Z w./o.head-to-head meetings.• As ¬I(Z, Y ), there must be chain Y ∼ Z w./o.

head-to-head meetings.

Now, concatenate both chains X ∼ Z ∼ Y :

• eihter X ∼→ Z ←∼ Y , and then the chain is notblocked by Z, i.e., not I(X, Y |Z),• or not X ∼→ Z ←∼ Y , and then the chain is not

blocked by ∅, i.e., not I(X, Y ).

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 27/31

Page 31: Bayesian Networks C. Learning / C.3 Structure Learning

Bayesian Networks / 1. PC Algorithm

There are variants of the PC algorithm for finding faithfulembeddings of a given independency relation:

• Causal Inference (CI) and• Fast Causal Inference Algorithms (FCI; [SGS00])

See also [Nea03, ch. 10.2].

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 28/31

Page 32: Bayesian Networks C. Learning / C.3 Structure Learning

Bayesian Networks / 1. PC Algorithm

Outline of Part 1 : Learning Bayesian Networks

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 29/31

Page 33: Bayesian Networks C. Learning / C.3 Structure Learning

Bayesian Networks / 1. PC Algorithm

section key concepts

I. ProbabilisticIndependence andSeparation in Graphs

Prob. independence,separation in graphs,Bayesian Network

II. Learning Parameters Max. likelihoodestimation, max. aposterior estimation

III. Learning Parameterswith Missing Values

EM algo.

IV. Learning Structure byConstrained-basedLearning

Markov equivalence,graph patterns, PCalgo.

V. Learning Structure byLocal Search

Bayesian InformationCriterion (BIC), K2algo., GES algo.

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 30/31

Page 34: Bayesian Networks C. Learning / C.3 Structure Learning

Bayesian Networks / 1. PC Algorithm

References

[Nea03] Richard E. Neapolitan. Learning Bayesian Networks. Prentice Hall, 2003.

[SGS00] Peter Spirtes, Clark Glymour, and Richard Scheines. Causation, Prediction, andSearch. MIT Press, 2 edition, 2000.

[TASB03] I. Tsamardinos, C. F. Aliferis, A. Statnikov, and L. E. Brown. Scaling-up bayesiannetwork learning to thousands of variables using local learning technique. Technicalreport, DSL TR-03-02, March 12, 2003, Vanderbilt University, Nashville, TN, USA,2003.

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute of Computer Science, University of HildesheimCourse on Bayesian Networks, winter term 2016/2017 31/31