modelling of data - universitätsklinikum magdeburg tutorial: modelling of data by j. schaber...
TRANSCRIPT
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Modelling of DataParameter Confidence Limits in Nonlinear Regression
Outline• Maximum Likelihood• Linear Regression Theory• Nonlinear Regression
• Asymptotic Theory• Distributions and Confidence Intervals of Estimated
Parameters
• Examples• Numerics• Optimal Experimental Design• Model Discrimination
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Least Square EstimationThe Problem
We fit M adjustable parameters pj, j=1,…,M of a model f
to N data points (xi,yi) i=1,…,N, where f predicts y,
y=f(x,p1,…,pM).
General Idea: Least Square Fit
Minimize over p1,…,pM:
other ideas: absolute residuals (more robust), but LS has some nice properties, e.g. differentiable.
∑=
−=N
iMii ppxfypfL
1
21 )],...,,([),(
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Intuitive Maximum LikelihoodIntuitive ideas
Question: “Given a model, what is the probability of
the parameters to be correct for certain data?”,
i.e. P(p|fy).
Search for model f where P(p|fy) is maximal.
⇒ IMPOSSIBLE, “there is no statistical universe of models from which f can be drawn.”
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Intuitive Maximum LikelihoodIntuitive ideas
Easier: “Given a particular set of parameters, what
is the probability of the data to have occurred
under a certain model?”, i.e. P(y|pf).
⇒Assumption: There is only one model, the correct one.
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Intuitive Maximum LikelihoodIntuitive ideas
If P(y|pf) is small, p is intuitively “unlikely”.
⇒ Identify P(y|p) by Likelihood of p given y.
⇒Maximize Likelihood of p given y (for the one
and only model), thus,
we estimate p by a Maximum Likelihood
Estimator.
5.7.2005 Tutorial: Modelling of Data by J. Schaber
The Maximum LikelihoodHow do we get a ML estimator?Assumptions:
each data point yi has a measurement error εi that is independent from other εj.
=> P(y |pf)
ε is normally distributed with variance σ2.
=> P(y |pf)
)()(11
f
N
iif
N
ii pPpyP ∏∏
==
∝= ε
)(1
),(21 2
f
N
i
pxfy
pyLeii
∝∝∏=
⎟⎠⎞
⎜⎝⎛ −
−σ
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Maximum Likelihood Estimator
Maximizing L(y|pf) = maximizing ln(L(y|pf) ).
⇒ With independent normal errors and the correct
model, the Sum of Squared Residuals is a Maximum
Likelihood estimator of p.
∑∏==
⎟⎠⎞
⎜⎝⎛ −
−
⎟⎠⎞
⎜⎝⎛ −
−=⎟⎟
⎠
⎞
⎜⎜
⎝
⎛∝
N
i
iiN
i
pxfy
fpxfyepyL
ii
1
2
1
),(21 ),(
21ln))(ln(
2
σσ
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Linear Regression Theory
If each Xi is a normal random variable with unit variance and mean zero, then has a chi-square distribution with n degrees of freedom.
=>
is chi-square distributed with N-M degrees of freedom. (Because of p not all terms are statistically independent )
∑ =
N
i iX1
∑=
⎟⎠⎞
⎜⎝⎛ −
=N
i
iif
pxfypyL1
2),(21)(
σ
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Linear Regression Theory
The multivariate linear model
can be written as with
Then
iiMMiii xpxpxppy ε+++++= ...22110
ε+= Xpy
( ) )ˆ()ˆ(ˆ)(1
2 pXypXypxypyLN
iiii −′−=′−=∑
=
I2)cov( σε =
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Linear Regression Theory
εε ′=′′+′−′
=−′−=
pXXppXyyypXypXypyL X
ˆˆˆ2)ˆ()ˆ()ˆ(
This becomes minimal in p when
The “normal equation”
0ˆ=
∂′∂pεε
( ) yXXXp
pXXyXp
′′=⇔
=′+′−=∂′∂
−1ˆ
0ˆ220ˆεε
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Linear Regression TheorySome properties for
If , i.e. the model is correct and, i.e. the errors independent
Then p is nicely behaved, i.e., i.e. p is unbiased and
, quality can be estimated.Note that both error structure and model structure
contribute to the covariance
( ) yXXXp ′′= −1ˆ
XpyE =)(Iy 2)cov( σ=
ppE ˆ)ˆ( =12 )()ˆcov( −′= XXp σ
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Linear Regression TheoryThe Gauss-Markov Theorem:If and and is a LS estimator, then has minimum variance.
This holds for any distribution of y. Normality is not required.
σ can be estimated by
XpyE =)( Iy 2)cov( σ= p̂12 )()ˆcov( −′= XXp σ
)1/()( −−MNpyL X
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Linear Regression TheoryIf y is normally distributed N(Xp,σ2I) then,
is N(p,σ2(X’X)-1) L(y|pX)/σ2 is χ2(N-M)
and s are independent.
Thus, confidence intervals of p and all kinds of hypotheses tests for the model, data and parameters can be developed.
p̂
p̂ )1/()(2 −−= MNpLs
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Nonlinear Regression TheoryNow consider N observations (yi,xi) with knownfunctional relationship f, thus
i=1,2,…, N
We assume that E(εi)=0. The least squareestimator of p* minimizes
iii pxfy ε+= ),( *
p̂
( )∑=
−==N
iiif pxfypLpyL
1
2),()()(
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Nonlinear Regression TheoryThe LS estimator therefore satisfies
When f is differentiable, in a small neighbourhood of p*, f can be approximated (Taylor) by
J(p) Jacobian.
p̂
0)(
ˆ
=∂
∂
pppL
*))((*)()(
)(*)()( *
1 *
pppJpfpf
pppfpfpf r
M
rr
prii
−+≈⇔
−∂∂
+≈ ∑=
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Nonlinear Regression TheoryHence,
Where , X=J(p*) and .
From linear theory .When N large, is almost certain in a small neighbourhood of p*. Therefore and
2
2
2
)(*))*)((*)((
))(()(
XbzpppJpfy
pfypL
−=
−−−≈
−=
ε=−= *)( pfyz *ppb −=
( ) zXXXb ′′= −1ˆp̂
bpp ˆ*ˆ ≈−εJJJpp ′′≈− −1)(*ˆ
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Nonlinear Regression TheoryThe Jacobian plays the same role as X in linear regression
If ε is N(0,σ2I) then, asymptotically,is N(p,σ2(J’J)-1)
L(y|pf)/σ2 is χ2(N-M)and σ2 are independent and p̂
*ˆ pp −
MNMFMsppJJpp −∝−′′− ,2*)ˆ(ˆˆ)*ˆ(
)/()(2 NMpL −≈σ
MNMFp
pnpL
pLpL−
−∝
−,)ˆ(
)ˆ(*)(
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Nonlinear Regression TheorySummarizing so far:
AssumptionsWe know the correct model f(x,p)ε is N(0,σ2I)
=> Parameters p can be estimated and confidence intervals can be calculated.
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Numerical ApproximationsUntil now we do not know how the parameters are estimated.No general solution. Method depends on problem.However, the theory suggests two methods:1. Suppose pa is an approximation of , then
and
Substituting in leads to
p̂))(()()( aaa pppJpfpf −+≈
))(()()()( aaa pppJprpfypr −−≈−=
)()()( prprpL ′=
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Numerical Approximations
This is minimized with respect to p when
⇒Given a current approximtion pa the next approximation should be
This is called the Gauss-Newton Method
)()()()(2)()() aaaaaa ppJJppppJprprprp −′−+−−′≈(L
( ) aaa prJJJpp ∂=′′=− − :)(1
aaa pp ∂+=+1
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Numerical ApproximationsWe can also expand L(p) directly:
This differs from the above in that the Hessian H(pa) is approximated by 2J’J.
⇒Given a current approximtion pa the next approximation should be
This is called the Newton Method
)()()(21)()()()(
2a
aa
aaa
a
aa pp
pppLpppp
ppLpLpL −
∂∂∂
−+−∂
∂+≈
a
aaaa
ppLpHpp
∂∂
−= −+ )()(11
5.7.2005 Tutorial: Modelling of Data by J. Schaber
ExampleAssume we have a Michaelis-Menten kinetic and want to estimate vmax and Km from noisy measurements.
2 4 6 8 10S
0.5
1
1.5
2
VM−M kinetics with noisy measurements
esttrue
5.7.2005 Tutorial: Modelling of Data by J. Schaber
ExampleL(vmax,Km) is well-behaved in this case.
Sum of Squared Residues
0
1
2
3
vmax
0
1
2
3
Km
02468
ChiSq
0
1
2vmax
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Example95 % confidence intervals: For a large number of repetitions of the same experiment, the true value lies in 95% of the cases within the CI.
N=3Out=4
20 40 60 80 100experiments
1
2
3
4
5CI Vmax Confidence Intervals
5.7.2005 Tutorial: Modelling of Data by J. Schaber
ExampleN=10, Out=7
20 40 60 80 100experiments
1.6
1.8
2
2.2
2.4
CI Vmax Confidence Intervals
5.7.2005 Tutorial: Modelling of Data by J. Schaber
ExampleTesting for normality. 100 repetitionsK-S test: normally distributed (P < 0.05)
1.9 2 2.1 2.2 2.3
1
2
3
4
5
vmax Distrib.
1.9 2.1 2.2
0.2
0.4
0.6
0.8
1
5.7.2005 Tutorial: Modelling of Data by J. Schaber
ExampleIncreasing N
10 20 30 40 50N
0.5
1.52
2.53
3.5vmax Confidence Intervals
5.7.2005 Tutorial: Modelling of Data by J. Schaber
ExampleTesting for Least Sqaures. 100 repetitionsK-S test: Chi Square distributed (P < 0.05)
5 10 15 20
0.020.040.060.080.1
0.12
Least Square Distrib.
10 15 20
0.2
0.4
0.6
0.8
1
5.7.2005 Tutorial: Modelling of Data by J. Schaber
ExampleWith small N we can get large Confidence Intervals, but variance of CI is also increasedLarger N gives us smaller and more uniform CI.In the case of a known model and parameter estimations close to the true one, the parameters estimations are normally distributed.With increasing N CI become smaller, but may still not cover the true value. 95% CI stays 95% CI no mattter how small it is.
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Confidence Interval and RegionAsymptotically p ~ N(p,σ2(J’J)-1).Estimate σ2 independently by
=>An approximate α-CI for :
=>An approximate α- Confidence Region:
Drawback: CI symmetric and CR approximated by a quadratic. Might not be appropriate.
)/()(2 NMpLs −=
( ) 12/ ˆˆˆ−
− ′± rrMNr JJstp αp̂
})ˆ(ˆˆ)ˆ(:{,
2 αMNM
FMsppJJppp−
≤−′′−
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Confidence Interval and RegionTo estimate α-CR, we can use the shape of L(p):
Advantage: Account for intrinsic curvature effects.Drawback: We must find contours where L(p) has a certain value, this might not be feasible or computationally difficult.
⎭⎬⎫
⎩⎨⎧ −
≤−
−
αMNM
Fp
pnpL
pLpLp,)ˆ(
)ˆ()(:
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Confidence ContoursConsider CI and Contours of L(p).
N=3
L(p) not symmetric.
Sum of Squared Residues
0
1
2
3
vmax
0
1
2
3
Km
02468
ChiSq
0
1
2vmax
5.7.2005 Tutorial: Modelling of Data by J. Schaber
0 1 2 3 4 5 6 7
02468
10Likelihood Contours for N= 3
Example
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Example
0 1 2 3 4 5 602468
10Likelihood Contours for N= 5
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Example
0 1 2 3 4 5 602468
10Likelihood Contours for N= 7
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Example
1 2 3 4 5 602468
10Likelihood Contours for N= 11
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Example
1.25 1.5 1.75 2 2.25 2.5 2.75 30
0.51
1.52
2.5Likelihood Contours for N= 17
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Example
1.25 1.5 1.75 2 2.25 2.5 2.75 30
0.51
1.52
2.5Likelihood Contours for N= 21
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Example
1.5 1.75 2 2.25 2.5 2.75 30
0.51
1.52
2.5Likelihood Contours for N= 27
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Example
1.4 1.6 1.8 2 2.2 2.4 2.6 2.80
0.51
1.52
2.5Likelihood Contours for N= 39
5.7.2005 Tutorial: Modelling of Data by J. Schaber
The “true” CI always larger than approximated CI.True and approximate CIs asymptotically equal.Remember: Gauss-Markov for Linear Reg.: LS estimator is best (minimum variance).In General: Cramer-Rao bound:
F Fisher Information Matrix, here Cov-1=σ−2(J’J)
)ˆ()ˆ( 1 pFpVar −≥
Fisher Information
5.7.2005 Tutorial: Modelling of Data by J. Schaber
We can locally linearize nonlinear systems.We can apply linear theory with X=JWith normal errors, LS estimator is ML estimator and we can approximate CI.Quality of approximation depends on curvature of the system.All of the above only applies, when we have the correct model.
Summary Asympotic Theory
5.7.2005 Tutorial: Modelling of Data by J. Schaber
We have the correct model and we want to estimate the parameters.Problem: We only have money for few measurementsQuestion: Where do we most effectively measure?Mathematically: Where do I measure to minimize Confidence Intervals, i.e. Covariance matrix s2(J’J)-1.Typical: D-optimal design, det(J’J)-1
Optimal Experimental Design
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Example: Michaelis-Menten Kinetics.We have two measurements x1 and x2 between 1 and 10.x1 < x2Calculate
Minimize det(J’J)-1(x) with respect to x=(x1,x2)
Optimal Experimental Design
ppxfpxJ
ˆ
)()ˆ,(∂
∂=
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Best measurements as far apart as possible
Optimal Experimental Design
24
68
10
x22
4
6
8
10
x1
0100
200
300
Det
24
68
x2
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Optimal Experimental Design
2 4 6 8 10S
0.5
1
1.5
2
VM−M kinetics with noisy measurements
2 4 6 8 10S
0.5
1
1.5
2
VM−M kinetics with noisy measurements
SDesttrue
2 4 6 8 10S
0.5
1
1.5
2VM−M kinetics with noisy measurements
2 4 6 8 10S
0.5
1
1.5
2VM−M kinetics with noisy measurements
SDesttrue
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Model Discrimination
1 2 3 4S
0.10.20.30.40.50.60.7
v M−M Kinetics
1 2 3 4S
0.10.20.30.40.50.60.7
v M−M Kinetics
1 2 3 4S
0.10.20.30.40.50.60.7
vSubstrate Inhibition Kinetics
1 2 3 4S
0.10.20.30.40.50.60.7
vSubstrate Inhibition Kinetics
1 2 3 4S
0.10.20.30.40.50.60.7
v Hill Kinetics
1 2 3 4S
0.10.20.30.40.50.60.7
v Hill Kinetics
NP LS fitM-M 2 0.011SI 3 0.001HK 3 0.002
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Model DiscriminationLS distributed according to χ2(N-M).
⇒We can calculate confidence limits, i.e. P-values, the P(LS<= χ2(N-M)).
⇒ However, complexity of the model not accounted for
2 4 6 8 10 12 14
0.1
0.2
0.3
0.4
0.5Chi2 2 DF
2 4 6 8 10
0.1
0.2
0.3
0.4
0.5Chi2 1 DF
NP LS fit Chi2M-M 2 0.011 0.022SI 3 0.001 0.001HK 3 0.002 0.002
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Model DiscriminationFor nested models, M1 is simplification of M2 with p1<p2, we can apply Likelihood Ratio TestH0: M1 simplification of M2
⇒2(L(M1)-L(M2))~ χ2(p2-p1).M-M is simplification of both SI and HK
NP LS fit LR Chi2M-M 2 0.011SI 3 0.001 0.020 0.001HK 3 0.002 0.018 0.001
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Model DiscriminationOther measures that take into account fit and complexity: AIC, MDL
2)1)(ln(
SSR21 −−
+
= pNpN
eN
MDL
12
−−+=
MNNMSSRAIC
NP LS fit AIC MDLM-M 2 0.011 4.011178 0.704325SI 3 0.001 6.001262 1.040983HK 3 0.002 6.002177 1.041898
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Take Home MessagesUnder certain regularity conditions Least Square Estimator is Maximum Likelihood Estimator, which is an entirely intuitive concept.Under certain regularity conditions we can obtain quality measures for fitted parameters, i.e. Asymptotic Covariance Matrix ASMWe can use ASM to optimize experimental design.Theory also gives us methods to discriminate models (LRT)
5.7.2005 Tutorial: Modelling of Data by J. Schaber
Numerical ApproximationsUntil now we do not know how the parameters are estimated.No general solution. Method depends on problem.However, the theory suggests two methods:Suppose pa s an approximation of , then
and
Substituting in L(p)=r’(p)r(p)
p̂))(()()( aaa pppJpfpf −+≈
))(()()()( aaa pppJprpfypr −+≈−=