adaptive importance sampling for estimation in structured domains l.e. ortiz and l.p. kaelbling
TRANSCRIPT
Adaptive Importance
Sampling for Estimation in
Structured Domains
L.E. Ortiz and L.P. Kaelbling
2
Contents
Notations Importance Sampling Adaptive Importance Sampling Empirical Results
3
Notations Bayesian network (BN) and influence diagram
(ID) (A: decision node, U: utitity node)
4
Probabilities of interest
(O: variables of interest, Z: remaining ones) Best strategy: The strategy with the highest
expected utility. The action ‘a’ maximizing the value associated with the evidence ‘o’ (i.e. the parents of ‘a’).
Importance sampling is needed to calculate the above summations
Z
oOZPoOP ),()(
Zo aAoOZUaAoOZPaV ),,()|,()(
5
Importance Sampling
quantity of interest: Z ~ important sampling distribution f(z):
estimation of G :
(sampling of w from f) Cf. Estimation of
Z
ZgG )(
Z Z
ZwZfZfZgZfG )()())(/)()((
N
l
lzwN
G1
)( )(1ˆ
)()( ZgZE
fgw /
6
BN: likelihood weighting
(prior)
(likelihood)
ID:
oO
n
j jj
n
i ii OPaOPZPaZPoOZPZg 21
11))(|())(|(),()(
oO
n
i ii ZPaZPZf |))(|()( 1
1
oO
n
j jj OPaOPZw |))(|()( 2
1
),,()|,()()( aAoOZUaAoOZPaVzg o
aAoO
n
i ii ZPaZPZf ,1|))(|()( 1
aAoO
n
j jj AOZUOPaOPZw ,1|),,())(|()( 2
7
Eg.
G can be calculated by sampling of w’s. Cf.
),(),|()|(),|(
),|()|()|()()(
73255244637
2613121
aAXUXXxXPXxXPXXXP
aAXXPXXPXXPXPZg
),|(),|()|()|()()( 6372613121 XXXPaAXXPXXPXXPXPZf
),(),|()|()( 73255244 aAXUXXxXPXxXPZw
),,,,,,,()|,,,,,,( 54763215476321
1 2 3 6 7
aAxxXXXXXUaAxxXXXXXPGx x x x x
8
Variance of the weights:
Minimum variance importance sampling
distributions: (taking a derivitive from above)
The weights have 0 variance in this case(w=G)
f (z) must have “ Fat Tail ”:
as
for at least one value of Z.
Z
GZwZfZwVar 22)()())((
Z
ZgZgZf )(/)()(*
))(( ZwVar 0)( Zf
9
Adaptive Importance Sampling
Parameterizing the importance sampling distribution (tabularizing)
Update rules based on gradient descent
)(
)|)(,(
1
)|(iZPa iZ
ii
j k
ZjZPakZIijk
n
i
Zf
)()( )()()1( tptt et
10
Three different forms of gradient minimize variance directly minimize distance between the current sampling
distribution and approximate optimal sampling distribution
minimize distance between the current sampling distribution and empirical optimal distribution
f
eZ
ZjZPakZIE
ZZjZPakZI
Zfe
ijk
ii
Zijk
ii
ijk
,),()|)(,(
),()|)(,(
)|()(
),()|)(,(
)(
1)(ˆ )(),()(
),()(
1
)(tlt
ijkt
ltiitN
lijk
t
ZzZjZPakZI
tN
e
11
Minimizing variance:
via approximate optimal distribution:
2
22
)|(),(
)|()|())|(()(
ZwZ
GZwZfZwVare
Var
ZVar
)(ˆ/)()(ˆ tt GZgZf
)1ˆ/)|()(|()|()(),(
))()|((2
1)(
)()(),()(),(*
2*
2
2
ttlttlt
L
ZL
GzwzfZfZfZ
ZfZfe
)()(),(*
**
ˆ/)|()|(/)(),(
))|(/)(log()()(
1
1
ttltKL
ZKL
GzwZfZfZ
ZfZfZfe
1)ˆ/)|(log(1))|(/)(log(),(
))(/)|(log()()(
)()(),(*
**
2
2
ttlt
KL
ZKL
GzwZfZfZ
ZfZfZfe
21 2
1
2
1KLKLKL eee
s
12
via parameterized empirical distribution:
( , if RHS=0)
)(
1
)(),(),(
)(
1
)(),(),()(
)|()|)((
)|()|)(,(ˆtN
l
tltlti
tN
l
tltltii
ijkt
zwzZjZPaI
zwzZjZPakZI
ijkt
ijkt )()(ˆ
),ˆ()(ˆ
),ˆ()(
)()()(
ijkt
ijkt
ijk
t
ijkijkijk
e
e
1)/ˆlog(),ˆ()ˆ/log()(
/ˆ),ˆ()/ˆlog(ˆ)(
ˆ),ˆ()ˆ(2
1)(
22
11
22
,,
,,
,,
2
ijkijkijkijkKLkji ijkijkijkKL
ijkijkijkijkKLkji ijkijkijkKL
ijkijkijkijkLkji ijkijkL
e
e
e
13
Remarks
’s are proportional to square, linear, logarithmic of the weights.
L2 is positive if w/G > 1 (under estimation of g)
The size and sign of are related to under or over estimation of g.
14
15
Empirical Results Problem: Calculate VMP(t)(A) for A=2,
MP(t)=1 in the computer mouse problem. Evaluation: by MSE between the true value
and the estimation from sampling method. Var and L2 are better than LW(traditional
method) L2 is more stable than other methods
16