modelling of data - universitätsklinikum magdeburg tutorial: modelling of data by j. schaber...

Post on 02-Apr-2018

234 Views

Category:

Documents

6 Downloads

Preview:

Click to see full reader

TRANSCRIPT

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Modelling of DataParameter Confidence Limits in Nonlinear Regression

Outline• Maximum Likelihood• Linear Regression Theory• Nonlinear Regression

• Asymptotic Theory• Distributions and Confidence Intervals of Estimated

Parameters

• Examples• Numerics• Optimal Experimental Design• Model Discrimination

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Least Square EstimationThe Problem

We fit M adjustable parameters pj, j=1,…,M of a model f

to N data points (xi,yi) i=1,…,N, where f predicts y,

y=f(x,p1,…,pM).

General Idea: Least Square Fit

Minimize over p1,…,pM:

other ideas: absolute residuals (more robust), but LS has some nice properties, e.g. differentiable.

∑=

−=N

iMii ppxfypfL

1

21 )],...,,([),(

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Intuitive Maximum LikelihoodIntuitive ideas

Question: “Given a model, what is the probability of

the parameters to be correct for certain data?”,

i.e. P(p|fy).

Search for model f where P(p|fy) is maximal.

⇒ IMPOSSIBLE, “there is no statistical universe of models from which f can be drawn.”

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Intuitive Maximum LikelihoodIntuitive ideas

Easier: “Given a particular set of parameters, what

is the probability of the data to have occurred

under a certain model?”, i.e. P(y|pf).

⇒Assumption: There is only one model, the correct one.

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Intuitive Maximum LikelihoodIntuitive ideas

If P(y|pf) is small, p is intuitively “unlikely”.

⇒ Identify P(y|p) by Likelihood of p given y.

⇒Maximize Likelihood of p given y (for the one

and only model), thus,

we estimate p by a Maximum Likelihood

Estimator.

5.7.2005 Tutorial: Modelling of Data by J. Schaber

The Maximum LikelihoodHow do we get a ML estimator?Assumptions:

each data point yi has a measurement error εi that is independent from other εj.

=> P(y |pf)

ε is normally distributed with variance σ2.

=> P(y |pf)

)()(11

f

N

iif

N

ii pPpyP ∏∏

==

∝= ε

)(1

),(21 2

f

N

i

pxfy

pyLeii

∝∝∏=

⎟⎠⎞

⎜⎝⎛ −

−σ

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Maximum Likelihood Estimator

Maximizing L(y|pf) = maximizing ln(L(y|pf) ).

⇒ With independent normal errors and the correct

model, the Sum of Squared Residuals is a Maximum

Likelihood estimator of p.

∑∏==

⎟⎠⎞

⎜⎝⎛ −

⎟⎠⎞

⎜⎝⎛ −

−=⎟⎟

⎜⎜

⎛∝

N

i

iiN

i

pxfy

fpxfyepyL

ii

1

2

1

),(21 ),(

21ln))(ln(

2

σσ

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Linear Regression Theory

If each Xi is a normal random variable with unit variance and mean zero, then has a chi-square distribution with n degrees of freedom.

=>

is chi-square distributed with N-M degrees of freedom. (Because of p not all terms are statistically independent )

∑ =

N

i iX1

∑=

⎟⎠⎞

⎜⎝⎛ −

=N

i

iif

pxfypyL1

2),(21)(

σ

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Linear Regression Theory

The multivariate linear model

can be written as with

Then

iiMMiii xpxpxppy ε+++++= ...22110

ε+= Xpy

( ) )ˆ()ˆ(ˆ)(1

2 pXypXypxypyLN

iiii −′−=′−=∑

=

I2)cov( σε =

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Linear Regression Theory

εε ′=′′+′−′

=−′−=

pXXppXyyypXypXypyL X

ˆˆˆ2)ˆ()ˆ()ˆ(

This becomes minimal in p when

The “normal equation”

0ˆ=

∂′∂pεε

( ) yXXXp

pXXyXp

′′=⇔

=′+′−=∂′∂

−1ˆ

0ˆ220ˆεε

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Linear Regression TheorySome properties for

If , i.e. the model is correct and, i.e. the errors independent

Then p is nicely behaved, i.e., i.e. p is unbiased and

, quality can be estimated.Note that both error structure and model structure

contribute to the covariance

( ) yXXXp ′′= −1ˆ

XpyE =)(Iy 2)cov( σ=

ppE ˆ)ˆ( =12 )()ˆcov( −′= XXp σ

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Linear Regression TheoryThe Gauss-Markov Theorem:If and and is a LS estimator, then has minimum variance.

This holds for any distribution of y. Normality is not required.

σ can be estimated by

XpyE =)( Iy 2)cov( σ= p̂12 )()ˆcov( −′= XXp σ

)1/()( −−MNpyL X

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Linear Regression TheoryIf y is normally distributed N(Xp,σ2I) then,

is N(p,σ2(X’X)-1) L(y|pX)/σ2 is χ2(N-M)

and s are independent.

Thus, confidence intervals of p and all kinds of hypotheses tests for the model, data and parameters can be developed.

p̂ )1/()(2 −−= MNpLs

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Nonlinear Regression TheoryNow consider N observations (yi,xi) with knownfunctional relationship f, thus

i=1,2,…, N

We assume that E(εi)=0. The least squareestimator of p* minimizes

iii pxfy ε+= ),( *

( )∑=

−==N

iiif pxfypLpyL

1

2),()()(

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Nonlinear Regression TheoryThe LS estimator therefore satisfies

When f is differentiable, in a small neighbourhood of p*, f can be approximated (Taylor) by

J(p) Jacobian.

0)(

ˆ

=∂

pppL

*))((*)()(

)(*)()( *

1 *

pppJpfpf

pppfpfpf r

M

rr

prii

−+≈⇔

−∂∂

+≈ ∑=

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Nonlinear Regression TheoryHence,

Where , X=J(p*) and .

From linear theory .When N large, is almost certain in a small neighbourhood of p*. Therefore and

2

2

2

)(*))*)((*)((

))(()(

XbzpppJpfy

pfypL

−=

−−−≈

−=

ε=−= *)( pfyz *ppb −=

( ) zXXXb ′′= −1ˆp̂

bpp ˆ*ˆ ≈−εJJJpp ′′≈− −1)(*ˆ

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Nonlinear Regression TheoryThe Jacobian plays the same role as X in linear regression

If ε is N(0,σ2I) then, asymptotically,is N(p,σ2(J’J)-1)

L(y|pf)/σ2 is χ2(N-M)and σ2 are independent and p̂

*ˆ pp −

MNMFMsppJJpp −∝−′′− ,2*)ˆ(ˆˆ)*ˆ(

)/()(2 NMpL −≈σ

MNMFp

pnpL

pLpL−

−∝

−,)ˆ(

)ˆ(*)(

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Nonlinear Regression TheorySummarizing so far:

AssumptionsWe know the correct model f(x,p)ε is N(0,σ2I)

=> Parameters p can be estimated and confidence intervals can be calculated.

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Numerical ApproximationsUntil now we do not know how the parameters are estimated.No general solution. Method depends on problem.However, the theory suggests two methods:1. Suppose pa is an approximation of , then

and

Substituting in leads to

p̂))(()()( aaa pppJpfpf −+≈

))(()()()( aaa pppJprpfypr −−≈−=

)()()( prprpL ′=

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Numerical Approximations

This is minimized with respect to p when

⇒Given a current approximtion pa the next approximation should be

This is called the Gauss-Newton Method

)()()()(2)()() aaaaaa ppJJppppJprprprp −′−+−−′≈(L

( ) aaa prJJJpp ∂=′′=− − :)(1

aaa pp ∂+=+1

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Numerical ApproximationsWe can also expand L(p) directly:

This differs from the above in that the Hessian H(pa) is approximated by 2J’J.

⇒Given a current approximtion pa the next approximation should be

This is called the Newton Method

)()()(21)()()()(

2a

aa

aaa

a

aa pp

pppLpppp

ppLpLpL −

∂∂∂

−+−∂

∂+≈

a

aaaa

ppLpHpp

∂∂

−= −+ )()(11

5.7.2005 Tutorial: Modelling of Data by J. Schaber

ExampleAssume we have a Michaelis-Menten kinetic and want to estimate vmax and Km from noisy measurements.

2 4 6 8 10S

0.5

1

1.5

2

VM−M kinetics with noisy measurements

esttrue

5.7.2005 Tutorial: Modelling of Data by J. Schaber

ExampleL(vmax,Km) is well-behaved in this case.

Sum of Squared Residues

0

1

2

3

vmax

0

1

2

3

Km

02468

ChiSq

0

1

2vmax

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Example95 % confidence intervals: For a large number of repetitions of the same experiment, the true value lies in 95% of the cases within the CI.

N=3Out=4

20 40 60 80 100experiments

1

2

3

4

5CI Vmax Confidence Intervals

5.7.2005 Tutorial: Modelling of Data by J. Schaber

ExampleN=10, Out=7

20 40 60 80 100experiments

1.6

1.8

2

2.2

2.4

CI Vmax Confidence Intervals

5.7.2005 Tutorial: Modelling of Data by J. Schaber

ExampleTesting for normality. 100 repetitionsK-S test: normally distributed (P < 0.05)

1.9 2 2.1 2.2 2.3

1

2

3

4

5

vmax Distrib.

1.9 2.1 2.2

0.2

0.4

0.6

0.8

1

5.7.2005 Tutorial: Modelling of Data by J. Schaber

ExampleIncreasing N

10 20 30 40 50N

0.5

1.52

2.53

3.5vmax Confidence Intervals

5.7.2005 Tutorial: Modelling of Data by J. Schaber

ExampleTesting for Least Sqaures. 100 repetitionsK-S test: Chi Square distributed (P < 0.05)

5 10 15 20

0.020.040.060.080.1

0.12

Least Square Distrib.

10 15 20

0.2

0.4

0.6

0.8

1

5.7.2005 Tutorial: Modelling of Data by J. Schaber

ExampleWith small N we can get large Confidence Intervals, but variance of CI is also increasedLarger N gives us smaller and more uniform CI.In the case of a known model and parameter estimations close to the true one, the parameters estimations are normally distributed.With increasing N CI become smaller, but may still not cover the true value. 95% CI stays 95% CI no mattter how small it is.

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Confidence Interval and RegionAsymptotically p ~ N(p,σ2(J’J)-1).Estimate σ2 independently by

=>An approximate α-CI for :

=>An approximate α- Confidence Region:

Drawback: CI symmetric and CR approximated by a quadratic. Might not be appropriate.

)/()(2 NMpLs −=

( ) 12/ ˆˆˆ−

− ′± rrMNr JJstp αp̂

})ˆ(ˆˆ)ˆ(:{,

2 αMNM

FMsppJJppp−

≤−′′−

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Confidence Interval and RegionTo estimate α-CR, we can use the shape of L(p):

Advantage: Account for intrinsic curvature effects.Drawback: We must find contours where L(p) has a certain value, this might not be feasible or computationally difficult.

⎭⎬⎫

⎩⎨⎧ −

≤−

αMNM

Fp

pnpL

pLpLp,)ˆ(

)ˆ()(:

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Confidence ContoursConsider CI and Contours of L(p).

N=3

L(p) not symmetric.

Sum of Squared Residues

0

1

2

3

vmax

0

1

2

3

Km

02468

ChiSq

0

1

2vmax

5.7.2005 Tutorial: Modelling of Data by J. Schaber

0 1 2 3 4 5 6 7

02468

10Likelihood Contours for N= 3

Example

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Example

0 1 2 3 4 5 602468

10Likelihood Contours for N= 5

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Example

0 1 2 3 4 5 602468

10Likelihood Contours for N= 7

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Example

1 2 3 4 5 602468

10Likelihood Contours for N= 11

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Example

1.25 1.5 1.75 2 2.25 2.5 2.75 30

0.51

1.52

2.5Likelihood Contours for N= 17

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Example

1.25 1.5 1.75 2 2.25 2.5 2.75 30

0.51

1.52

2.5Likelihood Contours for N= 21

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Example

1.5 1.75 2 2.25 2.5 2.75 30

0.51

1.52

2.5Likelihood Contours for N= 27

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Example

1.4 1.6 1.8 2 2.2 2.4 2.6 2.80

0.51

1.52

2.5Likelihood Contours for N= 39

5.7.2005 Tutorial: Modelling of Data by J. Schaber

The “true” CI always larger than approximated CI.True and approximate CIs asymptotically equal.Remember: Gauss-Markov for Linear Reg.: LS estimator is best (minimum variance).In General: Cramer-Rao bound:

F Fisher Information Matrix, here Cov-1=σ−2(J’J)

)ˆ()ˆ( 1 pFpVar −≥

Fisher Information

5.7.2005 Tutorial: Modelling of Data by J. Schaber

We can locally linearize nonlinear systems.We can apply linear theory with X=JWith normal errors, LS estimator is ML estimator and we can approximate CI.Quality of approximation depends on curvature of the system.All of the above only applies, when we have the correct model.

Summary Asympotic Theory

5.7.2005 Tutorial: Modelling of Data by J. Schaber

We have the correct model and we want to estimate the parameters.Problem: We only have money for few measurementsQuestion: Where do we most effectively measure?Mathematically: Where do I measure to minimize Confidence Intervals, i.e. Covariance matrix s2(J’J)-1.Typical: D-optimal design, det(J’J)-1

Optimal Experimental Design

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Example: Michaelis-Menten Kinetics.We have two measurements x1 and x2 between 1 and 10.x1 < x2Calculate

Minimize det(J’J)-1(x) with respect to x=(x1,x2)

Optimal Experimental Design

ppxfpxJ

ˆ

)()ˆ,(∂

∂=

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Best measurements as far apart as possible

Optimal Experimental Design

24

68

10

x22

4

6

8

10

x1

0100

200

300

Det

24

68

x2

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Optimal Experimental Design

2 4 6 8 10S

0.5

1

1.5

2

VM−M kinetics with noisy measurements

2 4 6 8 10S

0.5

1

1.5

2

VM−M kinetics with noisy measurements

SDesttrue

2 4 6 8 10S

0.5

1

1.5

2VM−M kinetics with noisy measurements

2 4 6 8 10S

0.5

1

1.5

2VM−M kinetics with noisy measurements

SDesttrue

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Model Discrimination

1 2 3 4S

0.10.20.30.40.50.60.7

v M−M Kinetics

1 2 3 4S

0.10.20.30.40.50.60.7

v M−M Kinetics

1 2 3 4S

0.10.20.30.40.50.60.7

vSubstrate Inhibition Kinetics

1 2 3 4S

0.10.20.30.40.50.60.7

vSubstrate Inhibition Kinetics

1 2 3 4S

0.10.20.30.40.50.60.7

v Hill Kinetics

1 2 3 4S

0.10.20.30.40.50.60.7

v Hill Kinetics

NP LS fitM-M 2 0.011SI 3 0.001HK 3 0.002

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Model DiscriminationLS distributed according to χ2(N-M).

⇒We can calculate confidence limits, i.e. P-values, the P(LS<= χ2(N-M)).

⇒ However, complexity of the model not accounted for

2 4 6 8 10 12 14

0.1

0.2

0.3

0.4

0.5Chi2 2 DF

2 4 6 8 10

0.1

0.2

0.3

0.4

0.5Chi2 1 DF

NP LS fit Chi2M-M 2 0.011 0.022SI 3 0.001 0.001HK 3 0.002 0.002

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Model DiscriminationFor nested models, M1 is simplification of M2 with p1<p2, we can apply Likelihood Ratio TestH0: M1 simplification of M2

⇒2(L(M1)-L(M2))~ χ2(p2-p1).M-M is simplification of both SI and HK

NP LS fit LR Chi2M-M 2 0.011SI 3 0.001 0.020 0.001HK 3 0.002 0.018 0.001

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Model DiscriminationOther measures that take into account fit and complexity: AIC, MDL

2)1)(ln(

SSR21 −−

+

= pNpN

eN

MDL

12

−−+=

MNNMSSRAIC

NP LS fit AIC MDLM-M 2 0.011 4.011178 0.704325SI 3 0.001 6.001262 1.040983HK 3 0.002 6.002177 1.041898

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Take Home MessagesUnder certain regularity conditions Least Square Estimator is Maximum Likelihood Estimator, which is an entirely intuitive concept.Under certain regularity conditions we can obtain quality measures for fitted parameters, i.e. Asymptotic Covariance Matrix ASMWe can use ASM to optimize experimental design.Theory also gives us methods to discriminate models (LRT)

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Numerical ApproximationsUntil now we do not know how the parameters are estimated.No general solution. Method depends on problem.However, the theory suggests two methods:Suppose pa s an approximation of , then

and

Substituting in L(p)=r’(p)r(p)

p̂))(()()( aaa pppJpfpf −+≈

))(()()()( aaa pppJprpfypr −+≈−=

top related