james l. crooks (samsi, duke university)
DESCRIPTION
Emulation of a Stochastic Forest Simulator Using Kernel Stick-Breaking Processes (Work in Progress). James L. Crooks (SAMSI, Duke University). Background. We desire to predict the distribution of tree species in the North Carolina forest under a variety of future climate change scenarios. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: James L. Crooks (SAMSI, Duke University)](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816063550346895dcf8c8c/html5/thumbnails/1.jpg)
Emulation of a Stochastic Forest Simulator Using Kernel Stick-Breaking Processes
(Work in Progress)
James L. Crooks (SAMSI, Duke University)
![Page 2: James L. Crooks (SAMSI, Duke University)](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816063550346895dcf8c8c/html5/thumbnails/2.jpg)
Background● We desire to predict the distribution of tree species in
the North Carolina forest under a variety of future climate change scenarios.
● Toward this end we can use the forest simulator developed by J. Clark and P. Agarwal’s joint research group.
● This simulator models the life-cycle of individual trees within a tree stand of pre-specified area.
● Growth and fecundity are in part mediated by the climate-influenced variables temperature and soil moisture.
![Page 3: James L. Crooks (SAMSI, Duke University)](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816063550346895dcf8c8c/html5/thumbnails/3.jpg)
Motivation● The forest simulator has the following properties that
make emulation both important and difficult:– Its speed limits the physical area that can be simulated in
reasonable time (the current standard is 128 m x 128 m)– Its output is stochastic– Its output distribution can be non-gaussian – Its output distribution can vary over the input space.
● Thus there is a need for a local, nonparametric statistical method to emulate the entire output distribution across in the input space.
![Page 4: James L. Crooks (SAMSI, Duke University)](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816063550346895dcf8c8c/html5/thumbnails/4.jpg)
Objectives● Run simulator with 3 species under “standard” climatic
conditions for 1000+ years to establish equilibrium initial conditions.
● Run simulator for a further 100 years at each of various points in the climate input space (temperature and soil moisture increase rates).
● Emulate the output over this input space using the Kernel Stick-Breaking Processes idea of Dunson and Park (2006).
![Page 5: James L. Crooks (SAMSI, Duke University)](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816063550346895dcf8c8c/html5/thumbnails/5.jpg)
2122
21211 iiiiii
Ti xxxxxxX
i indexes the run of the simulatorxi1 = Mean Temperature Increase / Century xi2 = Mean Soil Moisture Increase / Century
yi1 = Final Number of Adult Trees of Species 1yi2 = Final Number of Adult Trees of Species 2yi3 = Final Number of Adult Trees of Species 3
Simulator Climate Input Variables
Design Matrix (see below)
Simulator Output Variables
Summary of Input and Output Variables
![Page 6: James L. Crooks (SAMSI, Duke University)](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816063550346895dcf8c8c/html5/thumbnails/6.jpg)
Forest Simulator output for the
1001 year initialization run.
We will focuson number of adult trees.
Legend— Total— Species 1— Species 2— Species 3
![Page 7: James L. Crooks (SAMSI, Duke University)](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816063550346895dcf8c8c/html5/thumbnails/7.jpg)
• We expect that the mean response will be suppressed at extreme values of climate variables.
→Therefore we model the mean response as:
with a design matrix having up to quadratic terms
}6,...,1{
}3,2,1{
},...,1{
,βXexpμ6
1kijkikij
k
j
Ni
Climate Variable (Temperature or Soil Moisture Increase Rate)
Num
ber o
f Tre
es
i indexes simulator run
j indexes the tree species
k indexes the regression coef.
Single Regression Surface
Justifying the Choice of Model
![Page 8: James L. Crooks (SAMSI, Duke University)](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816063550346895dcf8c8c/html5/thumbnails/8.jpg)
● We do not a priori expect the output distribution to be Gaussian anywhere on the input space.
→ Use a non-parametric (Dirichlet Process) infinite mixture of regression surfaces instead of a single surface.
● We do not a priori expect the shape of the output distribution to be constant over the input space.
→ Use the Kernel Stick-Breaking Process of Dunson and Park (2006) to allow the DP mixture to be predictor-dependent.
Climate Variable (Temperature or Soil Moisture Increase Rate)
Num
ber o
f Tre
es Finite (Truncated) Mixture ofRegression Surfaces
![Page 9: James L. Crooks (SAMSI, Duke University)](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816063550346895dcf8c8c/html5/thumbnails/9.jpg)
Negative Binomial Likelihood● The output variable of interest is number of adult trees of each
species. Why not use a Poisson likelihood?● Preliminary data show Var[y] scales roughly like E[y]2, not
E[y], and Var[y] is also inversely dependent on the forest area.
→Use the negative binomial distribution, which has pmf:
and moments:
where the prior range of can be increased with area.
yν
μνμ
μνν
Γ(ν)1)Γ(yν)Γ(yνμ,|yf
ν
μμνμ,|yVarμνμ,|yE2
![Page 10: James L. Crooks (SAMSI, Duke University)](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816063550346895dcf8c8c/html5/thumbnails/10.jpg)
The Full Model
22i22
21i11i
ii
im
mmi1
iix
6
1kijkikij
3
1jijjii
ixiii
ΓxψΓxψexpΓ,xK
,Γ,xKVΓ,V;xW
,βGΓ,V;xW1Γ,V;xWβG
βXexpμ,μ,νNegBinν,β|yf
N1,...,i,βdGν,β|yfν|yf
i
i
lll
llll
lll
ll
��
�
��
Kernel Stick-Breaking Process
![Page 11: James L. Crooks (SAMSI, Duke University)](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816063550346895dcf8c8c/html5/thumbnails/11.jpg)
{1,2,3}jid,DiscreteGr~ν
LogNormal~ψ
Wishart~Φ,Σ
al,MatrixNorm~β
,Φ,Σ,βalMatrixNorm~G
,ηGDP~G
,α1,Beta~V
id,DiscreteGr~Γ
j
1,2
10
10
0
0000
0
��
����
l
l
l
![Page 12: James L. Crooks (SAMSI, Duke University)](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816063550346895dcf8c8c/html5/thumbnails/12.jpg)
Comments on the Model● This model, unlike Dunson and Park’s original, lacks
conjugacy between f and G0; thus two changes must be made to their algorithm:– We no longer have the full conditional for , so we must use
a Metropolis-Hastings step to update it.– The integral cannot be evaluated exactly
so we must approximate it numerically using (e.g., ) Monte-Carlo integration.
● The original MATLAB code is itself not fast, but once a posterior sample has been generated it is cheap to predict the output pmf at new points in the input space.
i0ii βdGν,β|yf��
iβ�
![Page 13: James L. Crooks (SAMSI, Duke University)](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816063550346895dcf8c8c/html5/thumbnails/13.jpg)
Generating Simple Climate Change Scenarios
● The ballpark estimates of today’s (soil moisture, temperature) mean and covariance are:
● The 1000+ year initialization run has temperature and soil moisture generated by a MVN with this mean and covariance.
● Temperature is measured in °C and soil moisture in %.
14.380.190.191.78
cov18.9416.61,mean
![Page 14: James L. Crooks (SAMSI, Duke University)](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816063550346895dcf8c8c/html5/thumbnails/14.jpg)
● Future 100 year scenarios are generated assuming the means change linearly in time with rates given by the points on plot below:
• GCM’s generally predict hotter, drier conditions for the Southeastern US.
•Accordingly, ranges were: [-1,+2]*SD/century for Temperature and [-2,+1]*SD/century for Soil Moisture.
![Page 15: James L. Crooks (SAMSI, Duke University)](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816063550346895dcf8c8c/html5/thumbnails/15.jpg)
Shown are the generated soil moisture and temperature used in the initialization run, and three generated future scenarios. Climate change begins at year 1052.
Legend— Stable Climate— Hotter/Drier— Cooler/Wetter
![Page 16: James L. Crooks (SAMSI, Duke University)](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816063550346895dcf8c8c/html5/thumbnails/16.jpg)
Results● I just got the initialization run back last week, so
ask me in 3 months.
Other Thoughts● May need to continue the initialization run another 500-1000 years to get a better
equilibrium.● Need a lot more runs when using nonparametrics anyway, so the benefits of using a
Latin Hyper-Cube design are less obvious (in 2-D anyway).
![Page 17: James L. Crooks (SAMSI, Duke University)](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816063550346895dcf8c8c/html5/thumbnails/17.jpg)
Acknowledgements● Jim Clark’s group for use of their simulator, and
especially Sean McMahon for his invaluable assistance.● David Dunson and Ju-Hyun Park for explaining their
paper to me and letting me use their algorithm.● The SAMSI Methodology and Terrestrial Models
Working Groups for fruitful discussions.
ReferencesDunson, D. B., and J.-H. Park, “Kernel Stick-Breaking Processess”, ISDS Discussion Paper
22 (2006) and Biometrika (accepted)Govindarajan, S., M. Dietze, P. Agarwal, and J. S. Clark, “A scalable simulator for forest
dynamics”, Symposium on Computational Geometry 2004: 106-115Govindarajan, S., M. Dietze, P. Agarwal, and J. S. Clark, “A scalable algorithm for dispersing populations”, Journal of Intelligent Information Systems 2004 (online)