modelling of data - universitätsklinikum magdeburg tutorial: modelling of data by j. schaber...

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Modelling of DataParameter Confidence Limits in Nonlinear Regression

Outline• Maximum Likelihood• Linear Regression Theory• Nonlinear Regression

• Asymptotic Theory• Distributions and Confidence Intervals of Estimated

Parameters

• Examples• Numerics• Optimal Experimental Design• Model Discrimination


Least Square EstimationThe Problem

We fit M adjustable parameters pj, j=1,…,M of a model f

to N data points (xi,yi) i=1,…,N, where f predicts y,

y=f(x,p1,…,pM).

General Idea: Least Square Fit

Minimize over p1,…,pM:

other ideas: absolute residuals (more robust), but LS has some nice properties, e.g. differentiable.

∑=

−=N

iMii ppxfypfL

1

21 )],...,,([),(


Intuitive Maximum LikelihoodIntuitive ideas

Question: “Given a model, what is the probability of

the parameters to be correct for certain data?”,

i.e. P(p|fy).

Search for model f where P(p|fy) is maximal.

⇒ IMPOSSIBLE, “there is no statistical universe of models from which f can be drawn.”



Easier: “Given a particular set of parameters, what

is the probability of the data to have occurred

under a certain model?”, i.e. P(y|pf).

⇒Assumption: There is only one model, the correct one.



If P(y|pf) is small, p is intuitively “unlikely”.

⇒ Identify P(y|p) by Likelihood of p given y.

⇒Maximize Likelihood of p given y (for the one

and only model), thus,

we estimate p by a Maximum Likelihood

Estimator.


The Maximum LikelihoodHow do we get a ML estimator?Assumptions:

each data point yi has a measurement error εi that is independent from other εj.

=> P(y |pf)

ε is normally distributed with variance σ2.

=> P(y |pf)

)()(11

f

N

iif

N

ii pPpyP ∏∏

==

∝= ε

)(1

),(21 2

f

N

i

pxfy

pyLeii

∝∝∏=

⎟⎠⎞

⎜⎝⎛ −

−σ


Maximum Likelihood Estimator

Maximizing L(y|pf) = maximizing ln(L(y|pf) ).

⇒ With independent normal errors and the correct

model, the Sum of Squared Residuals is a Maximum

Likelihood estimator of p.

∑∏==

⎟⎠⎞

⎜⎝⎛ −

−

⎟⎠⎞

⎜⎝⎛ −

−=⎟⎟

⎠

⎞

⎜⎜

⎝

⎛∝

N

i

iiN

i

pxfy

fpxfyepyL

ii

1

2

1

),(21 ),(

21ln))(ln(

2

σσ


Linear Regression Theory

If each Xi is a normal random variable with unit variance and mean zero, then has a chi-square distribution with n degrees of freedom.

=>

is chi-square distributed with N-M degrees of freedom. (Because of p not all terms are statistically independent )

∑ =

N

i iX1

∑=

⎟⎠⎞

⎜⎝⎛ −

=N

i

iif

pxfypyL1

2),(21)(

σ



The multivariate linear model

can be written as with

Then

iiMMiii xpxpxppy ε+++++= ...22110

ε+= Xpy

( ) )ˆ()ˆ(ˆ)(1

2 pXypXypxypyLN

iiii −′−=′−=∑

=

I2)cov( σε =



εε ′=′′+′−′

=−′−=

pXXppXyyypXypXypyL X

ˆˆˆ2)ˆ()ˆ()ˆ(

This becomes minimal in p when

The “normal equation”

0ˆ=

∂′∂pεε

( ) yXXXp

pXXyXp

′′=⇔

=′+′−=∂′∂

−1ˆ

0ˆ220ˆεε


Linear Regression TheorySome properties for

If , i.e. the model is correct and, i.e. the errors independent

Then p is nicely behaved, i.e., i.e. p is unbiased and

, quality can be estimated.Note that both error structure and model structure

contribute to the covariance

( ) yXXXp ′′= −1ˆ

XpyE =)(Iy 2)cov( σ=

ppE ˆ)ˆ( =12 )()ˆcov( −′= XXp σ


Linear Regression TheoryThe Gauss-Markov Theorem:If and and is a LS estimator, then has minimum variance.

This holds for any distribution of y. Normality is not required.

σ can be estimated by

XpyE =)( Iy 2)cov( σ= p̂12 )()ˆcov( −′= XXp σ

)1/()( −−MNpyL X


Linear Regression TheoryIf y is normally distributed N(Xp,σ2I) then,

is N(p,σ2(X’X)-1) L(y|pX)/σ2 is χ2(N-M)

and s are independent.

Thus, confidence intervals of p and all kinds of hypotheses tests for the model, data and parameters can be developed.

p̂

p̂ )1/()(2 −−= MNpLs


Nonlinear Regression TheoryNow consider N observations (yi,xi) with knownfunctional relationship f, thus

i=1,2,…, N

We assume that E(εi)=0. The least squareestimator of p* minimizes

iii pxfy ε+= ),( *

p̂

( )∑=

−==N

iiif pxfypLpyL

1

2),()()(


Nonlinear Regression TheoryThe LS estimator therefore satisfies

When f is differentiable, in a small neighbourhood of p*, f can be approximated (Taylor) by

J(p) Jacobian.

p̂

0)(

ˆ

=∂

∂

pppL

*))((*)()(

)(*)()( *

1 *

pppJpfpf

pppfpfpf r

M

rr

prii

−+≈⇔

−∂∂

+≈ ∑=


Nonlinear Regression TheoryHence,

Where , X=J(p*) and .

From linear theory .When N large, is almost certain in a small neighbourhood of p*. Therefore and

2

2

2

)(*))*)((*)((

))(()(

XbzpppJpfy

pfypL

−=

−−−≈

−=

ε=−= *)( pfyz *ppb −=

( ) zXXXb ′′= −1ˆp̂

bpp ˆ*ˆ ≈−εJJJpp ′′≈− −1)(*ˆ


Nonlinear Regression TheoryThe Jacobian plays the same role as X in linear regression

If ε is N(0,σ2I) then, asymptotically,is N(p,σ2(J’J)-1)

L(y|pf)/σ2 is χ2(N-M)and σ2 are independent and p̂

*ˆ pp −

MNMFMsppJJpp −∝−′′− ,2*)ˆ(ˆˆ)*ˆ(

)/()(2 NMpL −≈σ

MNMFp

pnpL

pLpL−

−∝

−,)ˆ(

)ˆ(*)(


Nonlinear Regression TheorySummarizing so far:

AssumptionsWe know the correct model f(x,p)ε is N(0,σ2I)

=> Parameters p can be estimated and confidence intervals can be calculated.


Numerical ApproximationsUntil now we do not know how the parameters are estimated.No general solution. Method depends on problem.However, the theory suggests two methods:1. Suppose pa is an approximation of , then

and

Substituting in leads to

p̂))(()()( aaa pppJpfpf −+≈

))(()()()( aaa pppJprpfypr −−≈−=

)()()( prprpL ′=


Numerical Approximations

This is minimized with respect to p when

⇒Given a current approximtion pa the next approximation should be

This is called the Gauss-Newton Method

)()()()(2)()() aaaaaa ppJJppppJprprprp −′−+−−′≈(L

( ) aaa prJJJpp ∂=′′=− − :)(1

aaa pp ∂+=+1


Numerical ApproximationsWe can also expand L(p) directly:

This differs from the above in that the Hessian H(pa) is approximated by 2J’J.

⇒Given a current approximtion pa the next approximation should be

This is called the Newton Method

)()()(21)()()()(

2a

aa

aaa

a

aa pp

pppLpppp

ppLpLpL −

∂∂∂

−+−∂

∂+≈

a

aaaa

ppLpHpp

∂∂

−= −+ )()(11


ExampleAssume we have a Michaelis-Menten kinetic and want to estimate vmax and Km from noisy measurements.

2 4 6 8 10S

0.5

1

1.5

2

VM−M kinetics with noisy measurements

esttrue


ExampleL(vmax,Km) is well-behaved in this case.

Sum of Squared Residues

0

1

2

3

vmax

0

1

2

3

Km

02468

ChiSq

0

1

2vmax


Example95 % confidence intervals: For a large number of repetitions of the same experiment, the true value lies in 95% of the cases within the CI.

N=3Out=4

20 40 60 80 100experiments

1

2

3

4

5CI Vmax Confidence Intervals


ExampleN=10, Out=7

20 40 60 80 100experiments

1.6

1.8

2

2.2

2.4

CI Vmax Confidence Intervals


ExampleTesting for normality. 100 repetitionsK-S test: normally distributed (P < 0.05)

1.9 2 2.1 2.2 2.3

1

2

3

4

5

vmax Distrib.

1.9 2.1 2.2

0.2

0.4

0.6

0.8

1


ExampleIncreasing N

10 20 30 40 50N

0.5

1.52

2.53

3.5vmax Confidence Intervals


ExampleTesting for Least Sqaures. 100 repetitionsK-S test: Chi Square distributed (P < 0.05)

5 10 15 20

0.020.040.060.080.1

0.12

Least Square Distrib.

10 15 20

0.2

0.4

0.6

0.8

1


ExampleWith small N we can get large Confidence Intervals, but variance of CI is also increasedLarger N gives us smaller and more uniform CI.In the case of a known model and parameter estimations close to the true one, the parameters estimations are normally distributed.With increasing N CI become smaller, but may still not cover the true value. 95% CI stays 95% CI no mattter how small it is.


Confidence Interval and RegionAsymptotically p ~ N(p,σ2(J’J)-1).Estimate σ2 independently by

=>An approximate α-CI for :

=>An approximate α- Confidence Region:

Drawback: CI symmetric and CR approximated by a quadratic. Might not be appropriate.

)/()(2 NMpLs −=

( ) 12/ ˆˆˆ−

− ′± rrMNr JJstp αp̂

})ˆ(ˆˆ)ˆ(:{,

2 αMNM

FMsppJJppp−

≤−′′−


Confidence Interval and RegionTo estimate α-CR, we can use the shape of L(p):

Advantage: Account for intrinsic curvature effects.Drawback: We must find contours where L(p) has a certain value, this might not be feasible or computationally difficult.

⎭⎬⎫

⎩⎨⎧ −

≤−

−

αMNM

Fp

pnpL

pLpLp,)ˆ(

)ˆ()(:


Confidence ContoursConsider CI and Contours of L(p).

N=3

L(p) not symmetric.

Sum of Squared Residues

0

1

2

3

vmax

0

1

2

3

Km

02468

ChiSq

0

1

2vmax


0 1 2 3 4 5 6 7

02468

10Likelihood Contours for N= 3

Example


Example

0 1 2 3 4 5 602468



Example

1 2 3 4 5 602468



Example

1.25 1.5 1.75 2 2.25 2.5 2.75 30

0.51

1.52

2.5Likelihood Contours for N= 17


Example

1.25 1.5 1.75 2 2.25 2.5 2.75 30

0.51

1.52



Example

1.5 1.75 2 2.25 2.5 2.75 30

0.51

1.52



Example

1.4 1.6 1.8 2 2.2 2.4 2.6 2.80

0.51

1.52



The “true” CI always larger than approximated CI.True and approximate CIs asymptotically equal.Remember: Gauss-Markov for Linear Reg.: LS estimator is best (minimum variance).In General: Cramer-Rao bound:

F Fisher Information Matrix, here Cov-1=σ−2(J’J)

)ˆ()ˆ( 1 pFpVar −≥

Fisher Information


We can locally linearize nonlinear systems.We can apply linear theory with X=JWith normal errors, LS estimator is ML estimator and we can approximate CI.Quality of approximation depends on curvature of the system.All of the above only applies, when we have the correct model.

Summary Asympotic Theory


We have the correct model and we want to estimate the parameters.Problem: We only have money for few measurementsQuestion: Where do we most effectively measure?Mathematically: Where do I measure to minimize Confidence Intervals, i.e. Covariance matrix s2(J’J)-1.Typical: D-optimal design, det(J’J)-1

Optimal Experimental Design


Example: Michaelis-Menten Kinetics.We have two measurements x1 and x2 between 1 and 10.x1 < x2Calculate

Minimize det(J’J)-1(x) with respect to x=(x1,x2)


ppxfpxJ

ˆ

)()ˆ,(∂

∂=


Best measurements as far apart as possible


24

68

10

x22

4

6

8

10

x1

0100

200

300

Det

24

68

x2



2 4 6 8 10S

0.5

1

1.5

2


2 4 6 8 10S

0.5

1

1.5

2


SDesttrue

2 4 6 8 10S

0.5

1

1.5

2VM−M kinetics with noisy measurements

2 4 6 8 10S

0.5

1

1.5

2VM−M kinetics with noisy measurements

SDesttrue


Model Discrimination

1 2 3 4S

0.10.20.30.40.50.60.7

v M−M Kinetics

1 2 3 4S

0.10.20.30.40.50.60.7

v M−M Kinetics

1 2 3 4S

0.10.20.30.40.50.60.7

vSubstrate Inhibition Kinetics

1 2 3 4S

0.10.20.30.40.50.60.7

vSubstrate Inhibition Kinetics

1 2 3 4S

0.10.20.30.40.50.60.7

v Hill Kinetics

1 2 3 4S

0.10.20.30.40.50.60.7

v Hill Kinetics

NP LS fitM-M 2 0.011SI 3 0.001HK 3 0.002


Model DiscriminationLS distributed according to χ2(N-M).

⇒We can calculate confidence limits, i.e. P-values, the P(LS<= χ2(N-M)).

⇒ However, complexity of the model not accounted for

2 4 6 8 10 12 14

0.1

0.2

0.3

0.4

0.5Chi2 2 DF

2 4 6 8 10

0.1

0.2

0.3

0.4

0.5Chi2 1 DF

NP LS fit Chi2M-M 2 0.011 0.022SI 3 0.001 0.001HK 3 0.002 0.002


Model DiscriminationFor nested models, M1 is simplification of M2 with p1<p2, we can apply Likelihood Ratio TestH0: M1 simplification of M2

⇒2(L(M1)-L(M2))~ χ2(p2-p1).M-M is simplification of both SI and HK

NP LS fit LR Chi2M-M 2 0.011SI 3 0.001 0.020 0.001HK 3 0.002 0.018 0.001


Model DiscriminationOther measures that take into account fit and complexity: AIC, MDL

2)1)(ln(

SSR21 −−

+

= pNpN

eN

MDL

12

−−+=

MNNMSSRAIC

NP LS fit AIC MDLM-M 2 0.011 4.011178 0.704325SI 3 0.001 6.001262 1.040983HK 3 0.002 6.002177 1.041898


Take Home MessagesUnder certain regularity conditions Least Square Estimator is Maximum Likelihood Estimator, which is an entirely intuitive concept.Under certain regularity conditions we can obtain quality measures for fitted parameters, i.e. Asymptotic Covariance Matrix ASMWe can use ASM to optimize experimental design.Theory also gives us methods to discriminate models (LRT)


Numerical ApproximationsUntil now we do not know how the parameters are estimated.No general solution. Method depends on problem.However, the theory suggests two methods:Suppose pa s an approximation of , then

and

Substituting in L(p)=r’(p)r(p)

p̂))(()()( aaa pppJpfpf −+≈

))(()()()( aaa pppJprpfypr −+≈−=

modelling of data - universitätsklinikum magdeburg tutorial: modelling of data by j. schaber...

Documents