adjoint enhancement within global stochastic methods s. eldred, eric t. phipps, keith r. dalbey...
TRANSCRIPT
Michael S. Eldred, Eric T. Phipps, Keith R. Dalbey
Optimization and Uncertainty Quantification Dept.
Sandia National Laboratories, Albuquerque, NM
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of
Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.
Adjoint Enhancement within Global Stochastic Methods
Scalability of Global UQ Methods
For production UQ analyses, we prefer fast converging global methods:
• local approximate methods (reliability methods, moment-based methods) exhibit
significant errors in presence of multimodal/nonsmooth/highly nonlinear responses
• MC/LHS are robust with dimension-independent convergence rates, but rates can
be unacceptably slow
Spectral methods (e.g., PCE) provide a more effective balance of robustness
and efficiency, especially when solution smoothness can be exploited
Exponential growth in expansion size with n and p, and simulation requirements
in collocation methods are at least equal to the number of terms
To mitigate the curse of dimensionality:
• A priori model reduction methods (e.g., Karhunen-Loeve)
• Adaptive refinement methods to reduce effective dimension
• Adjoint techniques [given n (random dimension) > m (response QoI)]
• Sparsity detection methods
Extend Input Scalability through
Adjoint Derivative-Enhancement
Polynomial chaos expansions (PCE):
• Linear regression with derivatives
• gradients, Hessians
• Semi-intrusive collocation
Stochastic collocation (SC):
• Gradient-enhanced interpolants
• local: cubic Hermite splines
• global: Hermite interpolation polynomials
• Local refinement with value/gradient surpluses
Gradient-enhanced kriging (GEK):
• Efficient global reliability analysis (EGRA)
• Gaussian process adaptive imp sampling (GPAIS)
Polynomial Chaos Expansions (PCE)
Approximate response w/ spectral proj. using orthogonal polynomial basis fns
i.e. using
• Nonintrusive: estimate aj using sampling, regression,
tensor-product quadrature, sparse grids, or cubature
Generalized PCE (Wiener-Askey + numerically-generated)
• Tailor basis: selection of basis orthogonal to input PDF avoids additional nonlinearity
Additional bases generated numerically (discretized Stieltjes + Golub-Welsch)
• Tailor expansion form: – Dimension p-refinement: anisotropic TPQ/SSG, generalized SSG
– Dimension & region h-refinement: local bases with global & local refinement
Polynomial Chaos Expansions (PCE)
super-algebraic for num.
integration & regression
1/sqrt(N) for LHS
Approximate response w/ spectral proj. using orthogonal polynomial basis fns
i.e. using
• Nonintrusive: estimate aj using sampling, regression,
tensor-product quadrature, sparse grids, or cubature
Generalized PCE (Wiener-Askey + numerically-generated)
• Tailor basis: selection of basis orthogonal to input PDF avoids additional nonlinearity
Additional bases generated numerically (discretized Stieltjes + Golub-Welsch)
• Tailor expansion form: – Dimension p-refinement: anisotropic TPQ/SSG, generalized SSG
– Dimension & region h-refinement: local bases with global & local refinement
Gradient-Enhanced PCE
Straightforward regression approach:
Vandermonde-like systems known to suffer from ill-conditioning
• unweighted LLS by SVD
(LAPACK GELSS)
• equality constrained LLS by QR
(LAPACK GGLSE) when under-
determined by values alone
0 2 4 6 8 10 12 1410
0
102
104
106
108
1010
1012
1014
1016
1018
Expansion Order
Con
ditio
n N
um
be
r
Grad-Enhanced PCE: SVD Condition for Pt Colloc ratio = 1
Rosenbrock no grads
Rosenbrock grads
Short col no grads
Short col grads
Cant beam no grads
Cant beam grads
0 2 4 6 8 10 12 1410
0
102
104
106
108
1010
1012
Expansion Order
Con
ditio
n N
um
be
r
Grad-Enhanced PCE: SVD Condition for Pt Colloc ratio = 2
Rosenbrock no grads
Rosenbrock grads
Short col no grads
Short col grads
Cant beam no grads
Cant beam grads
LHS 2x oversample LHS 1x
0 2 4 6 8 10 12 1410
0
105
1010
1015
1020
1025
1030
1035
Expansion Order
Con
ditio
n N
um
be
r
Grad-Enhanced PCE: SVD Condition for Prob Colloc ratio = 2
Rosenbrock no grads
Rosenbrock grads
Short col no grads
Short col grads
Cant beam no grads
Cant beam grads
0 2 4 6 8 10 12 1410
0
105
1010
1015
1020
1025
1030
1035
Expansion Order
Con
ditio
n N
um
be
r
Grad-Enhanced PCE: SVD Condition for Prob Colloc ratio = 1
Rosenbrock no grads
Rosenbrock grads
Short col no grads
Short col grads
Cant beam no grads
Cant beam grads
TPQ 2x
TPQ 1x
Gradient-Enhanced PCE: “Point Collocation” LHS with & without gradients, oversample ratio = 1 or 2
Con
ditio
nin
g issu
es e
vid
en
t a
s w
e o
ve
r-re
so
lve
exa
ct so
lutio
ns
0 2 4 6 8 10 12 1410
-14
10-12
10-10
10-8
10-6
10-4
10-2
100
102
104
Expansion Order
Mom
en
t e
rro
r
Gradient-Enhanced PCE: Rosenbrock Moments
no grads cr2
grads GELSS cr2
grads GGLSE cr2
no grads cr2
grads GELSS cr2
grads GGLSE cr2
no grads cr1
grads GELSS cr1
grads GGLSE cr1
no grads cr1
grads GELSS cr1
grads GGLSE cr1
0 2 4 6 8 10 12 14
10-12
10-10
10-8
10-6
10-4
10-2
100
Expansion Order
Mom
en
t e
rro
r
Gradient-Enhanced PCE: Short Column Moments
no grads cr2
grads GELSS cr2
grads GGLSE cr2
no grads cr2
grads GELSS cr2
grads GGLSE cr2
no grads cr1
grads GELSS cr1
grads GGLSE cr1
no grads cr1
grads GELSS cr1
grads GGLSE cr1
0 2 4 6 8 10 12 1410
-12
10-10
10-8
10-6
10-4
10-2
100
102
104
106
Expansion Order
Mom
en
t e
rro
r
Gradient-Enhanced PCE: Mod Cantilever Stress Moments
no grads cr2
grads GELSS cr2
grads GGLSE cr2
no grads cr2
grads GELSS cr2
grads GGLSE cr2
no grads cr1
grads GELSS cr1
grads GGLSE cr1
no grads cr1
grads GELSS cr1
grads GGLSE cr1
0 2 4 6 8 10 12 1410
-12
10-10
10-8
10-6
10-4
10-2
100
Expansion Order
Mom
en
t e
rro
r
Gradient-Enhanced PCE: Mod Cantilever Displacement Moments
no grads cr2
grads GELSS cr2
grads GGLSE cr2
no grads cr2
grads GELSS cr2
grads GGLSE cr2
no grads cr1
grads GELSS cr1
grads GGLSE cr1
no grads cr1
grads GELSS cr1
grads GGLSE cr1
Gradient-Enhanced PCE: Rosenbrock “Point Collocation” (LHS), with & without gradients
0 2 4 6 8 10 12 1410
0
101
102
103
104
105
Expansion Order
Con
ditio
n N
um
be
r
Gradient-Enhanced PCE: Rosenbrock Std Uniform Condition Numbers from GELSS
Rosen no grads 2*terms
Rosen grads 2*terms
Rosen no grads 2*terms2
Rosen grads 2*terms2
0 2 4 6 8 10 12 1410
-14
10-12
10-10
10-8
10-6
10-4
10-2
100
102
104
Expansion Order
Mom
en
t e
rro
r
Gradient-Enhanced PCE: Rosenbrock Std Uniform Point Collocation
no grads 2*terms
grads GELSS 2*terms
no grads 2*terms
grads GELSS 2*terms
no grads 2*terms2
grads GELSS 2*terms2
no grads 2*terms2
grads GELSS 2*terms2
0 2 4 6 8 10 12 1410
-14
10-12
10-10
10-8
10-6
10-4
10-2
100
102
104
Expansion Order
Mom
en
t e
rro
r
Gradient-Enhanced PCE: Rosenbrock Point Collocation
no grads cr2
grads GELSS cr2
no grads cr2
grads GELSS cr2
no grads cr2sq
grads GELSS cr2sq
no grads cr2sq
grads GELSS cr2sq
0 2 4 6 8 10 12 1410
0
102
104
106
108
1010
1012
Expansion Order
Con
ditio
n N
um
be
r
Gradient-Enhanced PCE: Rosenbrock Std Normal Condition Numbers from GELSS
Rosen no grads 2*terms
Rosen grads 2*terms
Rosen no grads 2*terms2
Rosen grads 2*terms2
Rosen no grads 2*terms3
Rosen grads 2*terms3
Std Uniforms,
oversample ratio
= 2 * terms,
2 * terms2
Std Normals,
oversample ratio
= 2 * terms,
2 * terms2,
2 * terms3
Current Direction: Semi-intrusive collocation
Adjoints + Krylov basis hot starting
Approach: semi-intrusive acceleration of
gradient-enhanced PCE using hot-starting of
Krylov bases.
Results: Hot-starting results in speedup
of 2-3x, and adjoints were also reasonably
effective. Additional work needed to tune
the combination to maximize basis reuse.
• Value/gradient batching
• Space-filling curves, point clustering
Stochastic Collocation (SC)
Advantages relative to PCE:
• Somewhat simpler (no expansion order to manage separately)
• Often less expensive (no integration for coefficients)
• Expansion only formed for sampling probabilities (estimating moments of any order is straightforward)
• Adaptive h-refinement with hierarchical surpluses; explicit gradient-enhancement
Disadvantages relative to PCE:
• Less flexible/fault tolerant structured data sets (tensor/sparse grids)
• Expansion variance not guaranteed positive (important in opt./interval est.)
• No direct inference of spectral decay rates
With sufficient care on PCE form, PCE/SC performance is essentially identical
for many cases of interest (tensor/sparse grids with standard Gauss rules)
Instead of estimating coefficients for known basis functions,
form interpolants for known coefficients
• Global: Lagrange (values) or Hermite (values+derivatives)
• Local: linear (values) or cubic (values+gradients) splines
Sparse interpolants formed using S of tensor interpolants
N
i
iiii
N
i
iiii
N
i
iiii
N
i
iiii
xHxHxHdx
df
xHxHxHdx
df
xHxHxHdx
df
xHxHxHff
1
3
)2(
2
)1(
1
)1(
3
1
3
)1(
2
)2(
1
)1(
2
1
3
)1(
2
)1(
1
)2(
1
1
3
)1(
2
)1(
1
)1(
)()()(
)()()(
)()()(
)()()(
N
i
iiii
N
i
iiii
N
i
iiii
N
i
iiii
wwwdx
dfwww
dx
df
wwwdx
dfwwwf
1
)2()1()1(
31
)1()2()1(
2
1
)1()1()2(
11
)1()1()1(
Dimension-adaptive h-refinement for SC:
• Local spline interpolants: linear Lagrange (value-based),
cubic Hermite (gradient-enhanced)
• Global grids: iso/aniso tensor, iso/aniso/generalized sparse
• h-refinement: uniform, adaptive, goal-oriented adaptive
• Basis formulations: nodal, hierarchical
Gradient-enhanced interpolants:
global h-refinement
and similar for higher-order moments
Cubic shape fns: type 1
(value) & type 2 (gradient)
Multivariate tensor product to arbitrary derivative order (Lalescu):
Sparse grid gradient-enhanced interpolation of Rosenbrock (w=3, uniform over [-2,2])
Colloc pt 1: truth value = 3.6090000000e+03 interpolant = 3.6090000000e+03 error = 0.0000000000e+00
truth grad_1 = -9.6120000000e+03 interpolant = -9.6120000000e+03 error = 0.0000000000e+00
truth grad_2 = -2.4000000000e+03 interpolant = -2.4000000000e+03 error = 0.0000000000e+00
Colloc pt 2: truth value = 2.5090000000e+03 interpolant = 2.5090000000e+03 error = 0.0000000000e+00
truth grad_1 = -8.0120000000e+03 interpolant = -8.0120000000e+03 error = 0.0000000000e+00
truth grad_2 = -2.0000000000e+03 interpolant = -2.0000000000e+03 error = 0.0000000000e+00
Colloc pt 3: truth value = 1.6090000000e+03 interpolant = 1.6090000000e+03 error = 0.0000000000e+00
truth grad_1 = -6.4120000000e+03 interpolant = -6.4120000000e+03 error = 0.0000000000e+00
truth grad_2 = -1.6000000000e+03 interpolant = -1.6000000000e+03 error = 0.0000000000e+00
Colloc pt 4: truth value = 9.0900000000e+02 interpolant = 9.0900000000e+02 error = 0.0000000000e+00
truth grad_1 = -4.8120000000e+03 interpolant = -4.8120000000e+03 error = 0.0000000000e+00
truth grad_2 = -1.2000000000e+03 interpolant = -1.2000000000e+03 error = 0.0000000000e+00
Colloc pt 5: truth value = 4.0900000000e+02 interpolant = 4.0900000000e+02 error = 0.0000000000e+00
truth grad_1 = -3.2120000000e+03 interpolant = -3.2120000000e+03 error = 0.0000000000e+00
truth grad_2 = -8.0000000000e+02 interpolant = -8.0000000000e+02 error = 0.0000000000e+00
...
100
101
102
103
104
105
106
10-16
10-14
10-12
10-10
10-8
10-6
10-4
10-2
100
Simulations
Err
or
in R
elia
bilit
y I
nd
ex
Convergence for Gerstner aniso3 for sparse grids under uniform refinement
PCE Global Legendre
SC Global Lagrange
SC PWLinear Newton-Cotes
SC PWCubic Newton-Cotes
102
103
104
105
106
10-4
10-3
10-2
10-1
100
Simulations
Err
or
in R
elia
bilit
y I
nd
ex
Convergence for Sobol G Fn for sparse grids under uniform refinement
PCE Global Legendre
SC Global Lagrange
SC PWLinear Newton-Cotes
SC PWCubic Newton-Cotes
Smooth: Gerstner aniso3 Nonsmooth: Sobol’s g-function
Gradient-enhanced interpolants:
global h-refinement
𝒆−𝟏𝟎𝒙𝟐−𝟓𝒚𝟐
Grad-enhanced interpolants:
global p-refinement
N
i
iiii
N
i
iiii
N
i
iiii
N
i
iiii
wwwdx
dfwww
dx
df
wwwdx
dfwwwf
1
)2()1()1(
31
)1()2()1(
2
1
)1()1()2(
11
)1()1()1(
Interpolation polynomials computed from divided
difference tables with 2m matching conditions:
• Type 1 basis: 0/…/1/0/… for values, 0/…/0 for gradients
• Type 2 basis: 0/…/0 for values, 0/…/1/0/… for gradients
I X W(F(X)) W(F'(X))
1 0.000000 -0.260271E+10 -0.461959E+08
2 0.100000 -0.129730E+12 -0.374681E+10
3 0.200000 -0.143117E+13 -0.631919E+11
4 0.300000 -0.524034E+13 -0.382830E+12
5 0.400000 -0.611016E+13 -0.101787E+13
6 0.500000 0.153665E+13 -0.129255E+13
7 0.600000 0.673925E+13 -0.801827E+12
8 0.700000 0.382127E+13 -0.236405E+12
9 0.800000 0.764733E+12 -0.302886E+11
10 0.900000 0.513565E+11 -0.137316E+10
11 1.000000 0.752948E+09 -0.126888E+08
F(X) Integral N = 5 N = 9 N = 13 N = 17 N = 21
X0 2 2 2 2 1.98227 465.228
X2 0.666667 0.666667 0.666667 0.666662 0.649341 445.483
X4 0.4 0.4 0.4 0.399995 0.383488 431.798
X6 0.285714 2.7e-14 0.285714 0.28571 0.270158 420.806
X8 0.222222 2.8e-14 0.222222 0.222218 0.207667 411.458
X10 0.181818 0.433333 -8.94147e-11 0.181814 0.168265 403.239
X12 0.153846 1.2 -1.17325e-10 0.153842 0.141271 395.842
X14 0.133333 2.19167 -1.41711e-10 -3.96212e-06 0.121696 389.064
N
i
iiii
N
i
iiii
N
i
iiii
N
i
iiii
xHxHxHdx
df
xHxHxHdx
df
xHxHxHdx
df
xHxHxHff
1
3
)2(
2
)1(
1
)1(
3
1
3
)1(
2
)2(
1
)1(
2
1
3
)1(
2
)1(
1
)2(
1
1
3
)1(
2
)1(
1
)1(
)()()(
)()()(
)()()(
)()()(
From J. Burkardt, 2011
Hierarchical basis:
• improved precision in QoI increments
• Surpluses provide error estimates for
local refinement using local/global
hierarchical interpolants
Hierarchical linear splines; from Xiang Ma, Ph.D. dissertation, Cornell Univ., 2010
From J. Jakeman, July 2010
Current Direction: Local Error Estimation with
Hierarchical Value/Gradient Surpluses
Gradient-Enhanced Kriging
Pivoted Cholesky for Ill-Conditioning
• Pivoted Cholesky (on not ) sorts points into decreasing order of new
info [factor of (1+M)3 for pivoted Cholesky and doesn’t favor derivatives over
function values]
• Apply new order to whole points (function value immediately followed by
derivatives) in then do fast LAPACK Cholesky on
• Use LAPACK “rcond()” in a bisection search to efficiently determine the
number of equations that need to be dropped
• Discarded points are the ones that contain the least new info and are
therefore the easiest to predict
• Different “optimal” subset for each set of correlation parameters, , tried so, if
selecting by max. likelihood, need to minimize the per equation negative
log-likelihood
• Only depends on and inputs so can use for sample design and
to detect discontinuities (check predictions at discarded points)
R
NN
GRGRT
1
2detlogdetlog
ˆlogobj
R
R R
X
23
Calculating Probability of Failure with GPAIS
• Importance Sampling reduces Monte Carlo’s error variance by drawing more
samples from “important” regions & appropriately down-weighting them
• unknown so optimal importance density is unknown
• Gaussian Process Adaptive Importance Sampling uses a series of improving
GP approximations of , , in a mixture
approximation of ,
• Mixture importance sampling “is … not much worse than importance
sampling from the best of the mixture components” [Owen & Zhou 2000]
• j-th component GP approximation is
• Real valued Expected Indicator is the point-wise portion of
the GP’s Gaussian CDF past the failure threshold
xI
1E0 xIj
ii
N
i i
i
iISii
N
i
iMC xxx
xxI
NPxxxI
NP
~,1
~,1
11
xxIx *
x*
x* xxwxxJ
j
jj
M *
0
xxIx jj E
xuxjxxj 0
* ,1
Closing Remarks
(Adjoint) gradient-enhanced UQ enhance scalability in global methods
• PCE regression
– Gradient-matching eqns are competitive in utility to value-matching eqns
– Ill-conditioning of Vandermode-like systems accuracy concerns for higher order bases
• Unstructured grids not based on quadrature points
• Oversampling cond num asymptotes for std uniforms using terms2
• Discretize using fixed order bases; solver techniques (Leja ordering, SVD truncation, etc.)
– Semi-intrusive collocation
• Given adjoint intrusion, investigate other intrusion opportunities (short of stochastic Galerkin)
• Krylov basis reuse among collocation evaluations sets better clustering needed
• SC with gradient-enhanced interpolants
– Local cubic splines:
• Explicit approach has effectively zero interpolation error for values & gradients
• Algebraic convergence for smooth problems; improved robustness for nonsmooth
– Global Hermite interpolation polynomials:
• Instability
– Gradient-enhanced local adaptive refinement
• Gradient-enhanced kriging EGRA, GPAIS, Bayesian emulation
– Solver techniques: pivoting Cholesky
– Ensemble emulation: hierarchical approximation, discretization (TGP)
Future direction adjoint enhancement in sparsity detecting PCE methods