parameter estimation for forest modeling jeremy jeremy
TRANSCRIPT
Parameter estimation for forest modeling
Jeremy Jeremy LichsteinLichstein
[email protected]@ufl.edu
September 17, 2010September 17, 2010September 17, 2010September 17, 2010
Princeton UniversityPrinceton University US Forest ServiceUS Forest Service
Parameter estimation for forest modeling
Ryan et al. 2010
Parameter estimation for forest modeling
• Stand-level C pools/fluxes vs. stand age, soil, topography, climate, etc.– Non-linear functions.
• Demographic parameters for dynamic models (SORTIE, FVS, many others)
Growth, mortality, and reproductive rates are non-– Growth, mortality, and reproductive rates are non-linear functions of environmental conditions (light, temperature, etc.).
• Individual allometry: parameters relating dbh to biomass (total, litter, roots, etc.)– How to estimate parameters for rare species?
Goals of today’s talk
• Introduce advanced methods for estimating
parameters.
• Introduce free software to execute the
methods.methods.
• This is NOT a statistics course, so relax and
enjoy!
• Implementing the methods will require a
significant time investment.
Outline
• Maximum Likelihood Estimation
• Non-linear regression with R
• Bayesian Hierarchical Models to
estimate parameters for rare speciesestimate parameters for rare species
Example: linear regression (σ2 = 1.5; two unknown parameters)
• “Best fit” line minimizes Sum of Squared Residuals
• Equivalent to maximizing the Likelihood = Probability of the
Data given the Model (intercept and slope) = P(y1)P(y2)…P(yN)
Example: linear regression (σ2 = 1.5; two unknown parameters)
��=� 1√2����exp
−�����− ���� 22��2 ���
��
����= ��0 + ��1����
• Many classical statistical analyses (ANOVA,
t-test, linear regression, etc.) are special
cases of Maximum Likelihood that can be
solved analytically using Calculus.
• In other cases, we use numerical methods • In other cases, we use numerical methods
to obtain the Maximum Likelihood
Estimates, confidence intervals, etc.
Example: one-parameter likelihood function
10 trials per
replicate
• Analytical method: p = (# successes)/(# trials) = 30/40 = 0.75.
• Numerical method: try different values of p between 0 and 1 to see which value maximizes the likelihood.
Bolker 2008
Example:
two-parameter
likelihood surface
Bolker 2008
Nelder-Mead simplex Metropolis
Maximum Likelihood example
Basal area vs. stand age (FIA data: FL 1990s)
Variance
• Power law of mean:
var = v1*meanv2
MeanMean
• y-intercept
• asymptote
• curvature
• trend
1970s 1980s 1990s 2000s
������ = 1 + �����− 1990 100 � ����������+ �1 − �� ������������+ �� �
��2 = ��1������� ��2
## negative log likelihood function
nll = function(biomass,age,yr,p=0.1,bmax=30,k=50,delta=0,v1=1,v2=1){
mu = (1 + yr*delta/100)*(p*bmax + (1-p)*bmax*age/(k + age))
var = v1*mu^v2
-sum(dnorm(biomass,mean=mu,sd=sqrt(var),log=T)) # sum of minus logLik
}
## read FIA data
D = read.csv("c:/Documents/FIA/biomass_data.csv",header=TRUE,sep=",")
D = D[D$state=="FL",] # just Florida
rel.year = D$year - 1990 # measurement year relative to 1990
R code, example
rel.year = D$year - 1990 # measurement year relative to 1990
data.list=list(biomass=D$biomass,age=D$age,yr=rel.year) # data
start.list=list(p=0.1, bmax=30, k=100, delta=0, v1=1, v2=1) # initial values
mle.biomass.age = mle2(nll, data=data.list, start=start.list)
mle = coef(mle.biomass.age) # MLEs
ci = confint(mle.biomass.age) # confidence intervals
plot(D$age,D$biomass) # plot data points
lines(lowess(D$age,D$biomass),col='green') # locally weighted regression
## plot curves using MLEs …
• Free software for Unix, Windows, Mac
• Linear regression, ANOVA, etc.
• Multivariate, spatial, time-series analysis
• Likelihood and Bayesian estimation
• Error propagation (e.g., Monte Carlo)
• Publication-quality graphics
www.r-project.org
Main disadvantage:Main disadvantage:
Not user friendly.
fle
xib
ility
SPSS
SAS
R
Comparison of commonly used statistical software
time investment (training)
fle
xib
ility
Excel
Jump
SPSS
Spanish
•“R para Principiantes”
•A Spanish translation of “An Introduction to R”
•“Gráficos Estadísticos con R”
•“Cartas sobre Estadística de la Revista Argentina de Bioingeniería”
•“Introducción al uso y programación del sistema estadístico R”
•“Generacion automatica de reportes con R y LaTeX”
Free resources in Spanish and Portuguese
www.r-project.org
•“Generacion automatica de reportes con R y LaTeX”
•“Metodos Estadisticos con R y R Commander”
Portuguese
•“Bioestatística usando R”
•“Introdução à Biometria utilizando R”
•“Introdução à Programação em R”
•“Tóppicos de Estatística utilizando R”
•“Guia de instalação do R”
“R Commander” Graphical User Interface for basic statistics
http://socserv.mcmaster.ca/jfox/Misc/Rcmdr/
Bolker 2008
• Exploratory data analysis
(graphics)
• Likelihood and Bayesian
statistics
• Download text and
example R code for free
from Ben Bolker’s website
http://www.math.mcmaster.ca/~bolker/emdbook/index.html
from Ben Bolker’s website
• Or buy from amazon.com
for $50
Estimating parameters for rare species:
Bayesian Hierarchical Modeling
• Would like to have parameter estimates (e.g., allometric
equations) for each species in each region, site, soil type, etc.
• Many species, particularly in diverse tropical forests, are rare,
so sample sizes are small.
• Some alternatives:
– Group rare species into functional types.– Group rare species into functional types.
– Assign rare species a mean value from common species.
– Hierarchical estimation: each parameter comes from a
probability distribution that is informed by entire dataset
(all species).
• All three alternatives combine data from multiple species.
Hierarchical analysis provides a non-arbitrary way to do this.
Hierarchical models are often fit in a Bayesian context.
Bayesian vs. Maximum Likelihood
• Prior probability distribution of parameters:
What we believe before looking at our data.
Posterior probability distribution of parameters:
P(Model | Data) ∝ P(Data | Model) × P(Model)
Posterior ∝ Likelihood × Prior
• Posterior probability distribution of parameters:
What we believe after looking at our data.
• If priors are non-informative:
– Posterior distribution depends primarily on the data.
– Posterior means ≈ MLEs
Bolker 2008
Bayesian statistics: Why Bother?
• Bayesian and Maximum Likelihood analyses yield similar inferences in many cases, but hierarchical models and other complex models are easier to fit in a Bayesian context.
• Markov chain Monte Carlo (MCMC): method for drawing random samples from a probability drawing random samples from a probability distribution. Can describe Bayesian posterior distribution (mean, percentiles, etc.) even if no analytical solutions are available.
• Can execute with R or WinBUGS (also free):
http://www.mrc-bsu.cam.ac.uk/bugs/
Hierarchical Estimation
Each parameter is a sample from a probability
distribution, whose parameters we must also
estimate.
Parameter estimates for species X are a
compromise between data for species X and the
parameter sampling distributions.
Hierarchical Estimation:
Example using taxonomic hierarchy
Each parameter is a sample from a
probability distribution that may be
informed by:
• Covariates
– Soil, topography, local climate,
etc.
order 1, order 2, order 3, …
family 1, family 2, family 3, …
division
etc.
• Taxonomy: Rare species parameter
estimate ≈
– genus-level mean if taxonomic
signal is strong
– overall mean if taxonomic signal
is weak
genus 1, genus 2, genus 3, …
species 1, species 2, species 3, …
Shade-tolerance index for 315 U.S. tree species:
proportion of saplings in understory (FIA data)
rare species (n < 10): data not informative
Lichstein et al. 2010
Ecological Applications
Magnolia sp.-0
.50
.51
.5
0.5 1.0 1.5 2.0
Magnolia grandiflora
-0.5
0.5
1.5
0.5 1.0 1.5 2.0
Magnolia macrophylla
-0.5
0.5
1.5
0.5 1.0 1.5 2.0
[he
igh
t (m
)]Height allometry for 315 U.S. tree species
Magnolia acuminata
-0.5
0.5
1.5
0.5 1.0 1.5 2.0
Magnolia virginiana-0
.50
.51
.5
0.5 1.0 1.5 2.0
Magnolia fraseri
-0.5
0.5
1.5
0.5 1.0 1.5 2.0
log10[diameter (cm)]
log
10[h
eig
ht
(m)]
Sabal sp.-0
.50
.51
.5
0.5 1.0 1.5 2.0
Quercus rubra
-0.5
0.5
1.5
0.5 1.0 1.5 2.0
Quercus durandii
-0.5
0.5
1.5
0.5 1.0 1.5 2.0
[he
igh
t (m
)]Height allometry for 315 U.S. tree species
Sorbus americana
-0.5
0.5
1.5
0.5 1.0 1.5 2.0
Quercus arizonica-0
.50
.51
.5
0.5 1.0 1.5 2.0
Quercus lobata
-0.5
0.5
1.5
0.5 1.0 1.5 2.0
log10[diameter (cm)]
log
10[h
eig
ht
(m)]
Hierarchical modeling allows you to use all
available data without making arbitrary
decisions about how to group rare species.
Should you worry about all of this?
• Complex statistical tools are not always
necessary or desirable.
• If you can answer the questions that you want
using the methods you are familiar with… using the methods you are familiar with…
don’t worry, be happy.