adjoint enhancement within global stochastic methods s. eldred, eric t. phipps, keith r. dalbey...

24
Michael S. Eldred, Eric T. Phipps, Keith R. Dalbey Optimization and Uncertainty Quantification Dept. Sandia National Laboratories, Albuquerque, NM Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000. Adjoint Enhancement within Global Stochastic Methods

Upload: doanmien

Post on 10-May-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Michael S. Eldred, Eric T. Phipps, Keith R. Dalbey

Optimization and Uncertainty Quantification Dept.

Sandia National Laboratories, Albuquerque, NM

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of

Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.

Adjoint Enhancement within Global Stochastic Methods

Scalability of Global UQ Methods

For production UQ analyses, we prefer fast converging global methods:

• local approximate methods (reliability methods, moment-based methods) exhibit

significant errors in presence of multimodal/nonsmooth/highly nonlinear responses

• MC/LHS are robust with dimension-independent convergence rates, but rates can

be unacceptably slow

Spectral methods (e.g., PCE) provide a more effective balance of robustness

and efficiency, especially when solution smoothness can be exploited

Exponential growth in expansion size with n and p, and simulation requirements

in collocation methods are at least equal to the number of terms

To mitigate the curse of dimensionality:

• A priori model reduction methods (e.g., Karhunen-Loeve)

• Adaptive refinement methods to reduce effective dimension

• Adjoint techniques [given n (random dimension) > m (response QoI)]

• Sparsity detection methods

Extend Input Scalability through

Adjoint Derivative-Enhancement

Polynomial chaos expansions (PCE):

• Linear regression with derivatives

• gradients, Hessians

• Semi-intrusive collocation

Stochastic collocation (SC):

• Gradient-enhanced interpolants

• local: cubic Hermite splines

• global: Hermite interpolation polynomials

• Local refinement with value/gradient surpluses

Gradient-enhanced kriging (GEK):

• Efficient global reliability analysis (EGRA)

• Gaussian process adaptive imp sampling (GPAIS)

Polynomial Chaos Expansions (PCE)

Approximate response w/ spectral proj. using orthogonal polynomial basis fns

i.e. using

• Nonintrusive: estimate aj using sampling, regression,

tensor-product quadrature, sparse grids, or cubature

Generalized PCE (Wiener-Askey + numerically-generated)

• Tailor basis: selection of basis orthogonal to input PDF avoids additional nonlinearity

Additional bases generated numerically (discretized Stieltjes + Golub-Welsch)

• Tailor expansion form: – Dimension p-refinement: anisotropic TPQ/SSG, generalized SSG

– Dimension & region h-refinement: local bases with global & local refinement

Polynomial Chaos Expansions (PCE)

super-algebraic for num.

integration & regression

1/sqrt(N) for LHS

Approximate response w/ spectral proj. using orthogonal polynomial basis fns

i.e. using

• Nonintrusive: estimate aj using sampling, regression,

tensor-product quadrature, sparse grids, or cubature

Generalized PCE (Wiener-Askey + numerically-generated)

• Tailor basis: selection of basis orthogonal to input PDF avoids additional nonlinearity

Additional bases generated numerically (discretized Stieltjes + Golub-Welsch)

• Tailor expansion form: – Dimension p-refinement: anisotropic TPQ/SSG, generalized SSG

– Dimension & region h-refinement: local bases with global & local refinement

Gradient-Enhanced PCE

Straightforward regression approach:

Vandermonde-like systems known to suffer from ill-conditioning

• unweighted LLS by SVD

(LAPACK GELSS)

• equality constrained LLS by QR

(LAPACK GGLSE) when under-

determined by values alone

0 2 4 6 8 10 12 1410

0

102

104

106

108

1010

1012

1014

1016

1018

Expansion Order

Con

ditio

n N

um

be

r

Grad-Enhanced PCE: SVD Condition for Pt Colloc ratio = 1

Rosenbrock no grads

Rosenbrock grads

Short col no grads

Short col grads

Cant beam no grads

Cant beam grads

0 2 4 6 8 10 12 1410

0

102

104

106

108

1010

1012

Expansion Order

Con

ditio

n N

um

be

r

Grad-Enhanced PCE: SVD Condition for Pt Colloc ratio = 2

Rosenbrock no grads

Rosenbrock grads

Short col no grads

Short col grads

Cant beam no grads

Cant beam grads

LHS 2x oversample LHS 1x

0 2 4 6 8 10 12 1410

0

105

1010

1015

1020

1025

1030

1035

Expansion Order

Con

ditio

n N

um

be

r

Grad-Enhanced PCE: SVD Condition for Prob Colloc ratio = 2

Rosenbrock no grads

Rosenbrock grads

Short col no grads

Short col grads

Cant beam no grads

Cant beam grads

0 2 4 6 8 10 12 1410

0

105

1010

1015

1020

1025

1030

1035

Expansion Order

Con

ditio

n N

um

be

r

Grad-Enhanced PCE: SVD Condition for Prob Colloc ratio = 1

Rosenbrock no grads

Rosenbrock grads

Short col no grads

Short col grads

Cant beam no grads

Cant beam grads

TPQ 2x

TPQ 1x

Gradient-Enhanced PCE: “Point Collocation” LHS with & without gradients, oversample ratio = 1 or 2

Con

ditio

nin

g issu

es e

vid

en

t a

s w

e o

ve

r-re

so

lve

exa

ct so

lutio

ns

0 2 4 6 8 10 12 1410

-14

10-12

10-10

10-8

10-6

10-4

10-2

100

102

104

Expansion Order

Mom

en

t e

rro

r

Gradient-Enhanced PCE: Rosenbrock Moments

no grads cr2

grads GELSS cr2

grads GGLSE cr2

no grads cr2

grads GELSS cr2

grads GGLSE cr2

no grads cr1

grads GELSS cr1

grads GGLSE cr1

no grads cr1

grads GELSS cr1

grads GGLSE cr1

0 2 4 6 8 10 12 14

10-12

10-10

10-8

10-6

10-4

10-2

100

Expansion Order

Mom

en

t e

rro

r

Gradient-Enhanced PCE: Short Column Moments

no grads cr2

grads GELSS cr2

grads GGLSE cr2

no grads cr2

grads GELSS cr2

grads GGLSE cr2

no grads cr1

grads GELSS cr1

grads GGLSE cr1

no grads cr1

grads GELSS cr1

grads GGLSE cr1

0 2 4 6 8 10 12 1410

-12

10-10

10-8

10-6

10-4

10-2

100

102

104

106

Expansion Order

Mom

en

t e

rro

r

Gradient-Enhanced PCE: Mod Cantilever Stress Moments

no grads cr2

grads GELSS cr2

grads GGLSE cr2

no grads cr2

grads GELSS cr2

grads GGLSE cr2

no grads cr1

grads GELSS cr1

grads GGLSE cr1

no grads cr1

grads GELSS cr1

grads GGLSE cr1

0 2 4 6 8 10 12 1410

-12

10-10

10-8

10-6

10-4

10-2

100

Expansion Order

Mom

en

t e

rro

r

Gradient-Enhanced PCE: Mod Cantilever Displacement Moments

no grads cr2

grads GELSS cr2

grads GGLSE cr2

no grads cr2

grads GELSS cr2

grads GGLSE cr2

no grads cr1

grads GELSS cr1

grads GGLSE cr1

no grads cr1

grads GELSS cr1

grads GGLSE cr1

Gradient-Enhanced PCE: Rosenbrock “Point Collocation” (LHS), with & without gradients

0 2 4 6 8 10 12 1410

0

101

102

103

104

105

Expansion Order

Con

ditio

n N

um

be

r

Gradient-Enhanced PCE: Rosenbrock Std Uniform Condition Numbers from GELSS

Rosen no grads 2*terms

Rosen grads 2*terms

Rosen no grads 2*terms2

Rosen grads 2*terms2

0 2 4 6 8 10 12 1410

-14

10-12

10-10

10-8

10-6

10-4

10-2

100

102

104

Expansion Order

Mom

en

t e

rro

r

Gradient-Enhanced PCE: Rosenbrock Std Uniform Point Collocation

no grads 2*terms

grads GELSS 2*terms

no grads 2*terms

grads GELSS 2*terms

no grads 2*terms2

grads GELSS 2*terms2

no grads 2*terms2

grads GELSS 2*terms2

0 2 4 6 8 10 12 1410

-14

10-12

10-10

10-8

10-6

10-4

10-2

100

102

104

Expansion Order

Mom

en

t e

rro

r

Gradient-Enhanced PCE: Rosenbrock Point Collocation

no grads cr2

grads GELSS cr2

no grads cr2

grads GELSS cr2

no grads cr2sq

grads GELSS cr2sq

no grads cr2sq

grads GELSS cr2sq

0 2 4 6 8 10 12 1410

0

102

104

106

108

1010

1012

Expansion Order

Con

ditio

n N

um

be

r

Gradient-Enhanced PCE: Rosenbrock Std Normal Condition Numbers from GELSS

Rosen no grads 2*terms

Rosen grads 2*terms

Rosen no grads 2*terms2

Rosen grads 2*terms2

Rosen no grads 2*terms3

Rosen grads 2*terms3

Std Uniforms,

oversample ratio

= 2 * terms,

2 * terms2

Std Normals,

oversample ratio

= 2 * terms,

2 * terms2,

2 * terms3

Current Direction: Semi-intrusive collocation

Adjoints + Krylov basis hot starting

Approach: semi-intrusive acceleration of

gradient-enhanced PCE using hot-starting of

Krylov bases.

Results: Hot-starting results in speedup

of 2-3x, and adjoints were also reasonably

effective. Additional work needed to tune

the combination to maximize basis reuse.

• Value/gradient batching

• Space-filling curves, point clustering

Stochastic Collocation (SC)

Advantages relative to PCE:

• Somewhat simpler (no expansion order to manage separately)

• Often less expensive (no integration for coefficients)

• Expansion only formed for sampling probabilities (estimating moments of any order is straightforward)

• Adaptive h-refinement with hierarchical surpluses; explicit gradient-enhancement

Disadvantages relative to PCE:

• Less flexible/fault tolerant structured data sets (tensor/sparse grids)

• Expansion variance not guaranteed positive (important in opt./interval est.)

• No direct inference of spectral decay rates

With sufficient care on PCE form, PCE/SC performance is essentially identical

for many cases of interest (tensor/sparse grids with standard Gauss rules)

Instead of estimating coefficients for known basis functions,

form interpolants for known coefficients

• Global: Lagrange (values) or Hermite (values+derivatives)

• Local: linear (values) or cubic (values+gradients) splines

Sparse interpolants formed using S of tensor interpolants

N

i

iiii

N

i

iiii

N

i

iiii

N

i

iiii

xHxHxHdx

df

xHxHxHdx

df

xHxHxHdx

df

xHxHxHff

1

3

)2(

2

)1(

1

)1(

3

1

3

)1(

2

)2(

1

)1(

2

1

3

)1(

2

)1(

1

)2(

1

1

3

)1(

2

)1(

1

)1(

)()()(

)()()(

)()()(

)()()(

N

i

iiii

N

i

iiii

N

i

iiii

N

i

iiii

wwwdx

dfwww

dx

df

wwwdx

dfwwwf

1

)2()1()1(

31

)1()2()1(

2

1

)1()1()2(

11

)1()1()1(

Dimension-adaptive h-refinement for SC:

• Local spline interpolants: linear Lagrange (value-based),

cubic Hermite (gradient-enhanced)

• Global grids: iso/aniso tensor, iso/aniso/generalized sparse

• h-refinement: uniform, adaptive, goal-oriented adaptive

• Basis formulations: nodal, hierarchical

Gradient-enhanced interpolants:

global h-refinement

and similar for higher-order moments

Cubic shape fns: type 1

(value) & type 2 (gradient)

Multivariate tensor product to arbitrary derivative order (Lalescu):

Sparse grid gradient-enhanced interpolation of Rosenbrock (w=3, uniform over [-2,2])

Colloc pt 1: truth value = 3.6090000000e+03 interpolant = 3.6090000000e+03 error = 0.0000000000e+00

truth grad_1 = -9.6120000000e+03 interpolant = -9.6120000000e+03 error = 0.0000000000e+00

truth grad_2 = -2.4000000000e+03 interpolant = -2.4000000000e+03 error = 0.0000000000e+00

Colloc pt 2: truth value = 2.5090000000e+03 interpolant = 2.5090000000e+03 error = 0.0000000000e+00

truth grad_1 = -8.0120000000e+03 interpolant = -8.0120000000e+03 error = 0.0000000000e+00

truth grad_2 = -2.0000000000e+03 interpolant = -2.0000000000e+03 error = 0.0000000000e+00

Colloc pt 3: truth value = 1.6090000000e+03 interpolant = 1.6090000000e+03 error = 0.0000000000e+00

truth grad_1 = -6.4120000000e+03 interpolant = -6.4120000000e+03 error = 0.0000000000e+00

truth grad_2 = -1.6000000000e+03 interpolant = -1.6000000000e+03 error = 0.0000000000e+00

Colloc pt 4: truth value = 9.0900000000e+02 interpolant = 9.0900000000e+02 error = 0.0000000000e+00

truth grad_1 = -4.8120000000e+03 interpolant = -4.8120000000e+03 error = 0.0000000000e+00

truth grad_2 = -1.2000000000e+03 interpolant = -1.2000000000e+03 error = 0.0000000000e+00

Colloc pt 5: truth value = 4.0900000000e+02 interpolant = 4.0900000000e+02 error = 0.0000000000e+00

truth grad_1 = -3.2120000000e+03 interpolant = -3.2120000000e+03 error = 0.0000000000e+00

truth grad_2 = -8.0000000000e+02 interpolant = -8.0000000000e+02 error = 0.0000000000e+00

...

100

101

102

103

104

105

106

10-16

10-14

10-12

10-10

10-8

10-6

10-4

10-2

100

Simulations

Err

or

in R

elia

bilit

y I

nd

ex

Convergence for Gerstner aniso3 for sparse grids under uniform refinement

PCE Global Legendre

SC Global Lagrange

SC PWLinear Newton-Cotes

SC PWCubic Newton-Cotes

102

103

104

105

106

10-4

10-3

10-2

10-1

100

Simulations

Err

or

in R

elia

bilit

y I

nd

ex

Convergence for Sobol G Fn for sparse grids under uniform refinement

PCE Global Legendre

SC Global Lagrange

SC PWLinear Newton-Cotes

SC PWCubic Newton-Cotes

Smooth: Gerstner aniso3 Nonsmooth: Sobol’s g-function

Gradient-enhanced interpolants:

global h-refinement

𝒆−𝟏𝟎𝒙𝟐−𝟓𝒚𝟐

Grad-enhanced interpolants:

global p-refinement

N

i

iiii

N

i

iiii

N

i

iiii

N

i

iiii

wwwdx

dfwww

dx

df

wwwdx

dfwwwf

1

)2()1()1(

31

)1()2()1(

2

1

)1()1()2(

11

)1()1()1(

Interpolation polynomials computed from divided

difference tables with 2m matching conditions:

• Type 1 basis: 0/…/1/0/… for values, 0/…/0 for gradients

• Type 2 basis: 0/…/0 for values, 0/…/1/0/… for gradients

I X W(F(X)) W(F'(X))

1 0.000000 -0.260271E+10 -0.461959E+08

2 0.100000 -0.129730E+12 -0.374681E+10

3 0.200000 -0.143117E+13 -0.631919E+11

4 0.300000 -0.524034E+13 -0.382830E+12

5 0.400000 -0.611016E+13 -0.101787E+13

6 0.500000 0.153665E+13 -0.129255E+13

7 0.600000 0.673925E+13 -0.801827E+12

8 0.700000 0.382127E+13 -0.236405E+12

9 0.800000 0.764733E+12 -0.302886E+11

10 0.900000 0.513565E+11 -0.137316E+10

11 1.000000 0.752948E+09 -0.126888E+08

F(X) Integral N = 5 N = 9 N = 13 N = 17 N = 21

X0 2 2 2 2 1.98227 465.228

X2 0.666667 0.666667 0.666667 0.666662 0.649341 445.483

X4 0.4 0.4 0.4 0.399995 0.383488 431.798

X6 0.285714 2.7e-14 0.285714 0.28571 0.270158 420.806

X8 0.222222 2.8e-14 0.222222 0.222218 0.207667 411.458

X10 0.181818 0.433333 -8.94147e-11 0.181814 0.168265 403.239

X12 0.153846 1.2 -1.17325e-10 0.153842 0.141271 395.842

X14 0.133333 2.19167 -1.41711e-10 -3.96212e-06 0.121696 389.064

N

i

iiii

N

i

iiii

N

i

iiii

N

i

iiii

xHxHxHdx

df

xHxHxHdx

df

xHxHxHdx

df

xHxHxHff

1

3

)2(

2

)1(

1

)1(

3

1

3

)1(

2

)2(

1

)1(

2

1

3

)1(

2

)1(

1

)2(

1

1

3

)1(

2

)1(

1

)1(

)()()(

)()()(

)()()(

)()()(

From J. Burkardt, 2011

Hierarchical basis:

• improved precision in QoI increments

• Surpluses provide error estimates for

local refinement using local/global

hierarchical interpolants

Hierarchical linear splines; from Xiang Ma, Ph.D. dissertation, Cornell Univ., 2010

From J. Jakeman, July 2010

Current Direction: Local Error Estimation with

Hierarchical Value/Gradient Surpluses

Gradient-Enhanced Kriging

Pivoted Cholesky for Ill-Conditioning

• Pivoted Cholesky (on not ) sorts points into decreasing order of new

info [factor of (1+M)3 for pivoted Cholesky and doesn’t favor derivatives over

function values]

• Apply new order to whole points (function value immediately followed by

derivatives) in then do fast LAPACK Cholesky on

• Use LAPACK “rcond()” in a bisection search to efficiently determine the

number of equations that need to be dropped

• Discarded points are the ones that contain the least new info and are

therefore the easiest to predict

• Different “optimal” subset for each set of correlation parameters, , tried so, if

selecting by max. likelihood, need to minimize the per equation negative

log-likelihood

• Only depends on and inputs so can use for sample design and

to detect discontinuities (check predictions at discarded points)

R

NN

GRGRT

1

2detlogdetlog

ˆlogobj

R

R R

X

Gradient-Enhanced Kriging (GEK) Enabler for EGRA, GPAIS, Bayesian emulation

Gradient-Enhanced Kriging (GEK) Enabler for EGRA, GPAIS, Bayesian emulation

Gradient-Enhanced Kriging (GEK) Enabler for EGRA, GPAIS, Bayesian emulation

19

Efficient Global Reliability Analysis

10 samples 28 samples

20

Efficient Global Reliability Analysis

10 samples 28 samples

explore

21

Efficient Global Reliability Analysis

10 samples 28 samples

explore

exploit

22

Efficient Global Reliability Analysis

10 samples 28 samples

explore

exploit

23

Calculating Probability of Failure with GPAIS

• Importance Sampling reduces Monte Carlo’s error variance by drawing more

samples from “important” regions & appropriately down-weighting them

• unknown so optimal importance density is unknown

• Gaussian Process Adaptive Importance Sampling uses a series of improving

GP approximations of , , in a mixture

approximation of ,

• Mixture importance sampling “is … not much worse than importance

sampling from the best of the mixture components” [Owen & Zhou 2000]

• j-th component GP approximation is

• Real valued Expected Indicator is the point-wise portion of

the GP’s Gaussian CDF past the failure threshold

xI

1E0 xIj

ii

N

i i

i

iISii

N

i

iMC xxx

xxI

NPxxxI

NP

~,1

~,1

11

xxIx *

x*

x* xxwxxJ

j

jj

M *

0

xxIx jj E

xuxjxxj 0

* ,1

Closing Remarks

(Adjoint) gradient-enhanced UQ enhance scalability in global methods

• PCE regression

– Gradient-matching eqns are competitive in utility to value-matching eqns

– Ill-conditioning of Vandermode-like systems accuracy concerns for higher order bases

• Unstructured grids not based on quadrature points

• Oversampling cond num asymptotes for std uniforms using terms2

• Discretize using fixed order bases; solver techniques (Leja ordering, SVD truncation, etc.)

– Semi-intrusive collocation

• Given adjoint intrusion, investigate other intrusion opportunities (short of stochastic Galerkin)

• Krylov basis reuse among collocation evaluations sets better clustering needed

• SC with gradient-enhanced interpolants

– Local cubic splines:

• Explicit approach has effectively zero interpolation error for values & gradients

• Algebraic convergence for smooth problems; improved robustness for nonsmooth

– Global Hermite interpolation polynomials:

• Instability

– Gradient-enhanced local adaptive refinement

• Gradient-enhanced kriging EGRA, GPAIS, Bayesian emulation

– Solver techniques: pivoting Cholesky

– Ensemble emulation: hierarchical approximation, discretization (TGP)

Future direction adjoint enhancement in sparsity detecting PCE methods