introduction to design and analysis of computer experimentsother disciplinary examples of computer...

Empirical ExperimentationPrediction

GaSP ModelsConclusions

Introduction to Design and Analysis ofComputer Experiments

Thomas Santner

Department of StatisticsThe Ohio State University

October 2010

TJ Santner Computer Experiments



Outline

Empirical Experimentation

Prediction

GaSP Models

Conclusions





• Physical Experiments (agriculture, industrial, medicine) arethe gold-standard for determining the relationship between aendpoint of scientific interest and potential factors that affect it.





• Physical Experiments (agriculture, industrial, medicine) arethe gold-standard for determining the relationship between aendpoint of scientific interest and potential factors that affect it.• ViewpointExperimental output = true I/O relationship + “noise”

Y p(x) = ηtrue(x) + ǫ(x)

wherex → ηtrue(x)

is the unknown true I/O relationship.





• Some Challenges for Physical Experiments◮ There are scientifically unimportant nuisance (blocking )

factors- model these factors via xi







◮ Some factors affecting the outcome are not recognized–randomize Rxs to experimental units to preventunrecognized factors from systematically affectingtreatment comparisons








◮ Choice of sample size determine “right-sizedexperiments” to detect “important” treatment differences








◮ Choice of sample size determine “right-sizedexperiments” to detect “important” treatment differences

◮ Stochastic Models relating the inputs x to the outputY p(x)





• Stochastic Simulation Experiments popular tool forstudying complex physical systems each of whose partsbehave in a known stochastic manner but whose ensemblebehavior is not understood analytically. (Heavily used inIndustrial Engineering–e.g., compare job shop set ups)




Computer Experiments and Physical Experiments

• Computer Experiment use of (deterministic) computersimulators that implements a detailed mathematical model tostudy a input/output relationship of scientific interest (micromodel vs macro model used in SS Experiments)





• Computer Experiment use of (deterministic) computersimulators that implements a detailed mathematical model tostudy a input/output relationship of scientific interest (micromodel vs macro model used in SS Experiments)• Mathematical model is typically a PDE/ODE. Implementationvia FE, CFD, . . .





• Computer Experiment use of (deterministic) computersimulators that implements a detailed mathematical model tostudy a input/output relationship of scientific interest (micromodel vs macro model used in SS Experiments)• Mathematical model is typically a PDE/ODE. Implementationvia FE, CFD, . . .• Some applications have data from both (one or more)computer experiment(s) and (one or more) physicalexperiment(s) to investigate the true input/output relationship ofscientific interest. Thus computer experiments are used asadjuncts or surrogates for physical experiments for studyinginput/output relationships




Designing an Acetabular Cup





Ong et al (2006) conducted anuncertainty analysis of the ef-fects of engineering cup de-sign, surgical, and patientvariables on the stability of un-cemented accetabular cups(“porous ingrowth” cups whichrely on growth of the bone intothe prosthesis)





• Engineering Cup design Inputs Length stem, Stem crosssection, sphericity of acetabular cup, etc





• Engineering Cup design Inputs Length stem, Stem crosssection, sphericity of acetabular cup, etc• Patient Inputs elastic modulus of the bone, friction betweenprosthesis s tem and the bone





• Engineering Cup design Inputs Length stem, Stem crosssection, sphericity of acetabular cup, etc• Patient Inputs elastic modulus of the bone, friction betweenprosthesis s tem and the bone• Surgical Inputs accuracy of reamed acetabular cup in pelvis(gouges area between rear of acetabular cup insert and boneprovide space for later debris buildup)





• Engineering Cup design Inputs Length stem, Stem crosssection, sphericity of acetabular cup, etc• Patient Inputs elastic modulus of the bone, friction betweenprosthesis s tem and the bone• Surgical Inputs accuracy of reamed acetabular cup in pelvis(gouges area between rear of acetabular cup insert and boneprovide space for later debris buildup)• Output is one of three measures of the amount of ingrowthof the bone into the acetabular cup




Other Disciplinary Examples of Computer Experiments

◮ Biomechanics Rawlinson, et al. (2006) simulate thestresses and strains experienced by knee prostheses ofdifferent engineering designs.






◮ Aeronautical Engineering






◮ Aeronautical Engineering◮ Drug Development






◮ Aeronautical Engineering◮ Drug Development◮ Economics Lempert, et al. (2000) introduce the

Wonderland model which simulates world economicdevelopment under given scenarios of innovation andpollution






◮ Aeronautical Engineering◮ Drug Development◮ Economics Lempert, et al. (2000) introduce the

Wonderland model which simulates world economicdevelopment under given scenarios of innovation andpollution

◮ Tissue Engineering Spilker (2009) modeled the kinetics offluid flow in two-phase model of cartilage under sinusoidalloading




Features of Computer Simulators

• x → Code → y(x ) is a proxy for the physical process(based on simplified physics or biology)






• y(x) is presumed to be a biased representation of the trueinput-response relationship (the true relationship would beobtainable by a physical experiment)







• y(x) need not be replicated (the code is deterministic) (incontrast to stochastic computer simulations)








• x can often be high-dimensional (often functional)









• When x is high-dimensional, screening is important









• When x is high-dimensional, screening is important

• Computer codes used in practice have running times thatcan range from minutes to days .




Prediction Based on Computer Output

• Suppose we have only computer output and no data froman associated physical experiment (momentarily forget aboutthe possibility of data from an associated physical experiment)





• Suppose we have only computer output and no data froman associated physical experiment (momentarily forget aboutthe possibility of data from an associated physical experiment)• Warning Can’t determine code bias in such a setting





• Suppose we have only computer output and no data froman associated physical experiment (momentarily forget aboutthe possibility of data from an associated physical experiment)• Warning Can’t determine code bias in such a setting• Notation Let the inputs and outputs for our training data be

(x tr1 , yc(x tr

1 )), . . . , (x trn , yc(x tr

n )),





• Suppose we have only computer output and no data froman associated physical experiment (momentarily forget aboutthe possibility of data from an associated physical experiment)• Warning Can’t determine code bias in such a setting• Notation Let the inputs and outputs for our training data be

(x tr1 , yc(x tr

1 )), . . . , (x trn , yc(x tr

n )),

• Goal Predict yc(x0) where x0 is an untried (“new”) input





• Idea Regard yc(x) as a realization (a “draw”) from arandom function Y c(x).

• This is a Bayesian model; the desired “smoothness”properties of yc(x) must be built into all functions drawn fromY c(x).





• Provides flexibility –draws from three urns (Y c(x)processes)

x

Y

0.0 0.2 0.4 0.6 0.8 1.0

−3

−1

01

23

h

R(h

)

−1.5 −0.5 0.5 1.5

0.0

0.4

0.8

Y

0.0 0.2 0.4 0.6 0.8 1.0

−3

−1

01

23

R(h

)

−1.5 −0.5 0.5 1.5

0.0

0.4

0.8

Y

0.0 0.2 0.4 0.6 0.8 1.0

−3

−1

01

23

R(h

)

−1.5 −0.5 0.5 1.5

0.0

0.4

0.8





• We use the training data to pick the exact Y c(x) urn we drawfrom.





• We use the training data to pick the exact Y c(x) urn we drawfrom.• A naive predictor

yc(x0) = E{Y c(x0)|training data}





• We use the training data to pick the exact Y c(x) urn we drawfrom.• A naive predictor

yc(x0) = E{Y c(x0)|training data}

• A measure of prediction uncertainty is

Var(Y c(x0)|training data)

(due to uncertainty about the model)





0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

2

Index 1 to 50

tmp




GaSP Models

• The simplest possible model for Y (x) is the regression-likemodel

Y (x) =

k∑

j=1

βj fj(x) + Z (x) = β⊤f (x) + Z (x)

(Y (x) = Large Scale Trends + Smooth Deviations )




GaSP Models


Y (x) =

k∑

j=1

βj fj(x) + Z (x) = β⊤f (x) + Z (x)


◮ f1(x), . . . , fk (x) are known regression functions,




GaSP Models


Y (x) =

k∑

j=1

βj fj(x) + Z (x) = β⊤f (x) + Z (x)


◮ f1(x), . . . , fk (x) are known regression functions,◮ β1, . . . , βk are unknown regression coefficients, and




GaSP Models


Y (x) =

k∑

j=1

βj fj(x) + Z (x) = β⊤f (x) + Z (x)


◮ f1(x), . . . , fk (x) are known regression functions,◮ β1, . . . , βk are unknown regression coefficients, and◮ Usually, β⊤f (x) = β0 in computer experiments




GaSP Models

Y (x) =∑k

j=1 βj fj(x) + Z (x) = β⊤f (x) + Z (x)

• In regression we regard deviations from the regressionmean as “measurement errors” meaning that Z (x1) and Z (x2)are unrelated (“white noise”).• In computer experiments we want a Z (x) model thatexhibits smooth deviations (from the Large Scale Trends).The simplest such model for Z (x) is a stationary GaussianStochastic Process (“GaSP”)




Stationary GaSPs

• Z (x) is stationarity if for any x , the random variable Z (x)has constant mean and constant variance , andCor(Z (x1), Z (x2) depends only on x1 − x2, i.e.,




Stationary GaSPs


◮ E {Z (x)} = 0(=⇒ E {Y (x)} = β⊤f (x)(= β0)

)




Stationary GaSPs


◮ E {Z (x)} = 0(=⇒ E {Y (x)} = β⊤f (x)(= β0)

)

◮ Var(Z (x)) = σ2Z

(=⇒ Var(Y (x)) = σ2

Z

)




Stationary GaSPs


◮ E {Z (x)} = 0(=⇒ E {Y (x)} = β⊤f (x)(= β0)

)

◮ Var(Z (x)) = σ2Z

(=⇒ Var(Y (x)) = σ2

Z

)

◮ Cor (Z (x1), Z (x2)) is determined by a correlationfunction , i.e., an R(·) that satisfies R(0) = 1 and

Cor (Z (x1), Z (x2)) = R(x1 − x2)

(=⇒ Cor (Y (x1), Y (x2)) = R(x1 − x2))

(and Cov(Y (x1), Y (x2)) = σ2Z × R(x1 − x2)

)




Stationary GaSPs

◮ For any s ≥ 1 and x1, . . . ,xs

Z ≡ (Z1, . . . , Zs) ≡ (Z (x1), . . . , Z (xs)) ∼ MultVar Normal




Stationary GaSPs

◮ For any s ≥ 1 and x1, . . . ,xs


• The process Y (·) is completely determined once we identify(β0, σ

2Z , R(·)), i.e.,

Y ≡ (Y1, . . . , Ys) ≡ (Y (x1), . . . , Y (xs)) ∼ Ns

(Fβ, σ2

Z × R)

where F = F (x1, . . . , xs) is s × k and R i ,j = R(x i − x j) is s × s




Stationary GaSPs

◮ For any s ≥ 1 and x1, . . . ,xs


• The process Y (·) is completely determined once we identify(β0, σ

2Z , R(·)), i.e.,

Y ≡ (Y1, . . . , Ys) ≡ (Y (x1), . . . , Y (xs)) ∼ Ns

(Fβ, σ2

Z × R)

where F = F (x1, . . . , xs) is s × k and R i ,j = R(x i − x j) is s × s

• Typically assume R(·) = R(·| ξ) is known up to a finite vectorof parameters ξ, i.e., R(·) is parametric




Popular Correlation Functions

Let z ≡ x1 − x2 = (z1, . . . , zd )





Let z ≡ x1 − x2 = (z1, . . . , zd )

• White Noise

R(z) =

{1, z = 00, z 6= 0

=⇒ R = Is





Let z ≡ x1 − x2 = (z1, . . . , zd )

• White Noise

R(z) =

{1, z = 00, z 6= 0

=⇒ R = Is

• Product Power Gaussian Correlation Function

R(z) =

d∏

j=1

exp(−ξjz2j ) = exp(−

d∑

j=1

ξjz2j ), z ∈ IRd





• Product Cubic Correlation Function

R(z| ξ) =d∏

j=1

R(zj | ξj)

where ξ = (ξ1, . . . , ξd ) > 0 and

R(z| ξ) =

1 − 6(

zξ

)2+ 6

(|z|ξ

)3, |z| ≤ ξ/2

2(

1 − |z|ξ

)3, ξ/2 < |z| ≤ ξ

0, ξ < |z|

for ξ > 0 and z ∈ IR.




The Multivariate Normal Distribution

• Definition X = (X1, . . . , Xd )⊤ has the multivariate normaldistribution with mean µ and covariance matrix Σ, denotedX ∼ Nd(µ,Σ), if X joint pdf

1

(2π)d2√

det(Σ)exp{−

12

(x − btµ)⊤Σ−1(x − btµ)⊤}




The Multivariate Normal Distribution

• Definition X = (X1, . . . , Xd )⊤ has the multivariate normaldistribution with mean µ and covariance matrix Σ, denotedX ∼ Nd(µ,Σ), if X joint pdf

1

(2π)d2√

det(Σ)exp{−

12

(x − btµ)⊤Σ−1(x − btµ)⊤}

• Recall that if

W = (W 1, W 2) ∼ Nd

((µ1µ2

),

(Σ11 Σ12

Σ21 Σ22

))

then given that W 2 = w2 the conditional density of W 1 givenW 2 = w2, denoted [W 1|W 2 = w2] is

Nd1

(µ1 + Σ12Σ

−122 (x2 − µ2) ,Σ11 − Σ12Σ

−122 Σ21

)




Application

• Naive Predictor

yc(x0) = E{Y c(x0)|Y tr} =

k∑

j=1

βj fj(x0) + r⊤0 R−1 (Y tr − Fβ

)




Application

• Naive Predictor

yc(x0) = E{Y c(x0)|Y tr} =

k∑

j=1


)

• Prediction Uncertainty is

Var(Y c(x0)|Y tr ) = σ2Z

(1 − r⊤0 R−1r0

)




Application

• Naive Predictor

yc(x0) = E{Y c(x0)|Y tr} =

k∑

j=1


)

• Prediction Uncertainty is

Var(Y c(x0)|Y tr ) = σ2Z

(1 − r⊤0 R−1r0

)

• Usually don’t know (β, σ2Z , ξ) :-(




Inference

• Frequentist Inference Choice of urn equivalent to selectionof (β0, σ

2Z , ξ). Use training data to estimate (β0, σ

2Z , ξ) (the

“urn”) which is the basis for our predictor

yc(x0) = E{Y c(x0)|training data, β0, σ2Z , ξ}




Inference

• Frequentist Inference Choice of urn equivalent to selectionof (β0, σ

2Z , ξ). Use training data to estimate (β0, σ

2Z , ξ) (the

“urn”) which is the basis for our predictor

yc(x0) = E{Y c(x0)|training data, β0, σ2Z , ξ}

• Bayesian Inference Places a prior distribution on (β0, σ2Z , ξ)

and uses the posterior [β0, σ2Z , ξ|training data] to select a

collection of urns and averages the predictors from each urn

yc(x0) = E{E{Y c(x0)|training data, β0, σ2Z , ξ}}

where the outer E{·} is wrt [β0, σ2Z , ξ|training data].




Illustration

True Curve (solid); n = 5 training data (diamonds);REML-EBLUP with Gaussian correlation function (dotted)

x

Tru

e an

d P

redi

cted

y()

0.0 0.2 0.4 0.6 0.8 1.0

-0.5

0.0

0.5

1.0




Conclusions

• The frequentist “plug-in” y(x0) is the basic off-the-shelfpredictor of y(x0) based on training data




Conclusions


• When (β0, σ2Z , ξ) are known, y(x0) has many theoretical

optimality properties although when these parameters areestimated from the data, the list of optimality properties is muchshorter




Conclusions




• How to use the data to estimate (β0, σ2Z , ξ)? MLE? REML?

PMLE? XV? No plug-in method overcomes the “smoothing”problem seen in the example, although Bayesian predictors can




Conclusions




• How to use the data to estimate (β0, σ2Z , ξ)? MLE? REML?

PMLE? XV? No plug-in method overcomes the “smoothing”problem seen in the example, although Bayesian predictors can

• Bayesian methodology can be used to make more legitimatepredictors and honest estimates of prediction uncertainty




Conclusions

• Other limitations: there are many cases where the y(·) doesnot “look” like a draw from a GaSP. Need predictors based onnon-stationary models.




Conclusions


• Two approaches to the Design of a computer experiment,i.e., choice of {x tr

i }ni=1.




Conclusions



i }ni=1.

• Approach 1 geometric-choose v to be “space-filling.”




Conclusions



i }ni=1.

• Approach 1 geometric-choose v to be “space-filling.”

• Approach 2 criteria-based designs for computerexperiments, eg, find an input which achieves

arg minx

yc(x)}




Questions?


introduction to design and analysis of computer experimentsother disciplinary examples of computer...

Documents