12735 15 lec 09 8 apr - cmu12735: urban systems modeling lec. 09 bayesian networks applications 8...
TRANSCRIPT
Lec. 0912735: Urban Systems Modeling Bayesian Networks
12735: Urban Systems Modeling
instructor: Matteo Pozzi
1
Bayesian Networks
Lec. 09
x1
x3 x4
x2
x6
x5
x8x7
x9
Lec. 0912735: Urban Systems Modeling Bayesian Networks
outline
2
‐ example of applications
‐ how to shape a problem as a BN
‐ complexity of the inference problem
‐ inference via variable elimination
‐ inference via junction tree
‐ MCMC approximate inference
Lec. 0912735: Urban Systems Modeling Bayesian Networks
intro on Bayesian Networks
3
magnitudoseismic intensity Discrete variables, possible values for each var.
JOINT PROBABILITYChain rule (product rule)
, , ,table
damage
1table table,
table
Each variable is defined by a table withnumber of dimensions equal to number of parents plus one.
random variables are nodes,links defines conditional dependence/independence.
Lec. 0912735: Urban Systems Modeling Bayesian Networks
example of Bayesian network
4
x1
x3
scenario
x4
x2
x6
x5
x8x7
material
strength
damagedemand
stiffness
Set of random variables, defined by conditional independence.
loss
x9
load
stress
Lec. 0912735: Urban Systems Modeling Bayesian Networks
example of Bayesian network
5
x1
x3
scenario
x4
x2
x6
x5
x8x7
material
strength
damagedemand
stiffness
loss
x9
load
stress
roots
Set of random variables, defined by conditional independence.
roots defined by:
Lec. 0912735: Urban Systems Modeling Bayesian Networks
example of Bayesian network
6
x1
x3
scenario
x4
x2
x6
x5
x8x7
material
strength
damagedemand
stiffness
loss
x9
load
stress
roots
Set of random variables, defined by conditional independence.
parentschildren defined by:
parent
child
roots defined by:
Lec. 0912735: Urban Systems Modeling Bayesian Networks
example of Bayesian network
7
x1
x3
scenario
x4
x2
x6
x5
x8x7
material
strength
damagedemand
stiffness
loss
x9
load
stress
roots
Set of random variables, defined by conditional independence.
parent
child
roots defined by:
parents
joint probability:
task: prediction – conditional prediction
parentschildren defined by:
Lec. 0912735: Urban Systems Modeling Bayesian Networks
applications
8
‐ integrated risk analysis
‐ predicting global warming
‐ predicting effects of natural hazards
‐ road construction
‐ time models: degrading systems, e.g. due to fatigue HMM
‐ time models: vibration of structures (Kalman Filter)
Lec. 0912735: Urban Systems Modeling Bayesian Networks
example of 2 vars. BN
9
magnitudoseismic intensity Discrete variables,
possible values for each variable
Joint probability, : table: 1degreesoffreedom (dofs)
Chain rule (product rule),: 1table: 1dofs: table: dofs
,,
1
1
∀ : 1
fully connected, or complete graph
if :
, : 2 2dofs
this reduced graph is less powerful than the complete one.It can represent only joint probability satisfying . However inference is much easier for this graph:
Lec. 0912735: Urban Systems Modeling Bayesian Networks
P (Y ) Y 1 Y 2 Y 3
20% 50% 30% 100%
P (X ) P (X ,Y ) Y 1 Y 2 Y 3
X 1 10% X 1 2% 5% 3% 10%X 2 60% X 2 3% 30% 27% 60%X 3 30% X 3 15% 15% 0% 30%
100% 20% 50% 30% 100%
P (Y ) Y 1 Y 2 Y 3
20% 50% 30% 100%
P (X ) P (X ,Y ) Y 1 Y 2 Y 3
X 1 10% X 1 2% 5% 3% 10%X 2 60% X 2 12% 30% 18% 60%X 3 30% X 3 6% 15% 9% 30%
100% 20% 50% 30% 100%
Independence [from lec.2]
10
,
,
,
,the joint prob. is “richer” than the set of marginal prob.
the joint prob. is no “richer” than the set of marginal prob.
12
3
12
3
0
0.1
0.2
0.3
XY
P(X
,Y)
12
3
12
3
0
0.1
0.2
0.3
XY
P(X
,Y)
Lec. 0912735: Urban Systems Modeling Bayesian Networks
example of 3 vars. BN
11
magnitudoseismic intensity Discrete variables, possible values for each var.
Joint probability, , : table: 1dofs
Chain rule (product rule), , ,
, ,, ,
1
complete graph
if | : ,
damage
, ,1 2 2 N 1dofs
After observing intensity , any additional information on magnitudo is irrelevant for inferring the damage .conditional independence
Lec. 0912735: Urban Systems Modeling Bayesian Networks
chain graph for n vars
12
…Complete graph:
: table: 1dofs
If ∀ , … , | : , … ,
, … ,
1 1 ≅ dofs
Chain graph
1 2 3 4 5 6 7 8 9101
103
105
107
109
n
num
ber o
f dof
s
N = 10
completechain
the chain graph is less powerful, but much easier to handle.
…
Lec. 0912735: Urban Systems Modeling Bayesian Networks
prediction by variable elimination
13
M I D
magnitudoseismic intensity
damage, ,
build joint probability:
derive marginal probability by marginalization:
, ,,
table
,I D
D
we can derive without handling any 3‐d table: only handling 1‐d and 2‐d tables.
you can derive everything from the joint prob.:
vector matrix product
Lec. 0912735: Urban Systems Modeling Bayesian Networks
prediction by variable elimination [cont.]
14
M I D
magnitudoseismic intensity
damage
,I
1
M
I
, ,build joint probability:
derive marginal probability by marginalization:
, ,,
table
you can derive everything from the joint prob.:
Lec. 0912735: Urban Systems Modeling Bayesian Networks
prediction by variable elimination [cont.]
15
M I D
magnitudoseismic intensity
damage
,I
1
M
1M
, ,build joint probability:
derive marginal probability by marginalization:
, ,,
table
you can derive everything from the joint prob.:
Lec. 0912735: Urban Systems Modeling Bayesian Networks
inference by variable elimination
16
M I D
magnitudoseismic intensity
damage
,M
, ,build joint probability:
derive marginal probability by marginalization:
, , , /
table
you can derive everything from the joint prob.:
D
,,
normalization:
Lec. 0912735: Urban Systems Modeling Bayesian Networks
inference by variable elimination [cont.]
17
M I D
magnitudoseismic intensity
damage
,
, ,build joint probability:
derive marginal probability by marginalization:
, , , /
table
you can derive everything from the joint prob.:
D
,,
normalization:
I
Lec. 0912735: Urban Systems Modeling Bayesian Networks
best order of elimination
18
x3 x4
x6
x5
x8
strength
damage
load
stress
stiffness
, , , , ,
, ,
x3 x4 x5
x8
4D table
The efficiency of the algorithm depends on the order for eliminating variables. By selecting an inappropriate order, you may increase the dimension of the Condition Probability Tables (CPTs).
E.g., for predicting , it is not efficient to eliminate first, relating damage to {load, stiffness, strength}.
Lec. 0912735: Urban Systems Modeling Bayesian Networks
D1 D2 |
task: modeling
branching graph
19
D1
I
seismic intensity
damage on building 2
, ,build joint probability:
D2
damage on building 1
∝ ,
,
, ∝ , , ∝cost.
, 1
prediction:
after observing :
after observing and :
is irrelevant
and are NOT independent, while is not fixed.
is irrelevant after
1‐d table
3‐d table
no 3‐d table is used
D1 D2
Lec. 0912735: Urban Systems Modeling Bayesian Networks
V graph
20
L1
Ddamage
load 2, , ,
build joint probability:
L2
load 1
L1 L2
,,
,, 1
prediction:
after observing :
task: modeling
, are irrelevant
as L1 L2
,
,
∝ , ,
, ∝
1cost.
Lec. 0912735: Urban Systems Modeling Bayesian Networks
V graph [cont.]
21
L1
Ddamage
load 2, , ,
build joint probability:
L2
load 1
,
L1 L2
task: modeling
after observing :
knowledge on L1 is used for building likelihood .
∝ , ,
,
after observing and :
conditionally to (having observed) , variables L1 and L2 are NOT independent.
, ∝ , , , ∝ ,cost.
L1 L2 |
this is an example of INDUCED DEPENDENCE(induced correlation)
Lec. 0912735: Urban Systems Modeling Bayesian Networks
inference via variable elimination and junction tree
22
M I D
magnitudoseismic intensity
damage
target:
M,I I,D
, →
method:
‐ eliminate I to get
, →‐ eliminate M to get
The variables to be eliminated depend on the specific query. If we are interested in more than one query, we may repeat some operations in different queries.
I
clique cliqueseparator
, ,
The Junction Tree is an algorithm to get response to all possible queries, without repeating operations.
Lec. 0912735: Urban Systems Modeling Bayesian Networks
HHM revised
23
Sk Sk+1
yk yk+1
S1
y1
Sn
yn
S0 … … task:compute
:
eliminate , process , eliminate , process , … , eliminate , process .(eliminate) (eliminate) (eliminate)
The prediction‐correction algorithm is an application of a best elimination order.
Lec. 0912735: Urban Systems Modeling Bayesian Networks
conditions for exact inference
24
‐ Discrete variables, except for “course of dimensionality”.
‐ Continuous variables: integral instead of sum. Generally integrals cannot be solved in close form. But they can be solved for Gaussian Linear Models (GLM).
x3
x4
x1
x6x2
x5, ,
condition for GLM: if vector lists all parents of :
x7
‐ Other problems can be also mapped into a GLM. For example Log‐normal models can be mapped by taking into GLMs by taking the log.
‐ Hybrid graphs have also been proposed, mixing discrete and continuous variables, by imposing some rules.
‐ GLM can be seen as a special case of Gaussian processes, with special independency relations (while Gaussian processes are complete graphs).
GLMs are used for dynamic systems (Kalman filters)
Lec. 0912735: Urban Systems Modeling Bayesian Networks
approximate inference
25
‐MC: sequential sampling. We start sampling roots from their marginal, then each other variables conditional to their (sampled) parents. After observing any variable, we can reject samples non compatible with observations, or use importance sampling.
‐MCMC: Gibbs sampling. We samples randomly variables conditional to the other vars. In the Markov blanket (kept fixed). It is an application of the Metropolis algorithm with special proposal distribution.
Markov blanket
Gibbs sampling
Russell, S. and P. Norvig. (2010). Artificial Intelligence: A Modern Approach. Pearson Education.
Barber, B. (2012). Bayesian Reasoning and Machine Learning. Cambridge UP
Lec. 0912735: Urban Systems Modeling Bayesian Networks
summary
26
Inference and prediction in Bayesian Network can be done in three steps.i) compute the joint probability:
ii) compute the conditional distribution
iii) marginalize on variables of interest:
All exact and approximate methods are used to overcome computational difficulties related to previous approach.
parents
. ∝ .,
.,
. .\ .\
Lec. 0912735: Urban Systems Modeling Bayesian Networks
HHM with dummy algorithm
27
Sk Sk+1
yk yk+1
S1
y1
Sn
yn
S0 … … task:compute
:
i) compute the joint probability: : , : ∏
ii) compute the conditional distribution: : : ∝ : , :
iii) marginalize on variables of interest : : ∑ : ::
huge table/function:it is not an effective path
Lec. 0912735: Urban Systems Modeling Bayesian Networks
references
28
Barber, B. (2012). Bayesian Reasoning and Machine Learning. Cambridge UP. Downloadable from http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php?n=Brml.HomePage
Bishop, C. (2006). Pattern Recognition and Machine Learning. Springer
Russell, S. and P. Norvig. (2010). Artificial Intelligence: A Modern Approach. Pearson Education.