modelling of data - universitätsklinikum magdeburg tutorial: modelling of data by j. schaber...

5.7.2005 Tutorial: Modelling of Data by J. Schaber

Modelling of DataParameter Confidence Limits in Nonlinear Regression

Outline• Maximum Likelihood• Linear Regression Theory• Nonlinear Regression

• Asymptotic Theory• Distributions and Confidence Intervals of Estimated

Parameters

• Examples• Numerics• Optimal Experimental Design• Model Discrimination

Least Square EstimationThe Problem

We fit M adjustable parameters pj, j=1,…,M of a model f

to N data points (xi,yi) i=1,…,N, where f predicts y,

y=f(x,p1,…,pM).

General Idea: Least Square Fit

Minimize over p1,…,pM:

other ideas: absolute residuals (more robust), but LS has some nice properties, e.g. differentiable.

iMii ppxfypfL

21 )],...,,([),(

Intuitive Maximum LikelihoodIntuitive ideas

Question: “Given a model, what is the probability of

the parameters to be correct for certain data?”,

i.e. P(p|fy).

Search for model f where P(p|fy) is maximal.

⇒ IMPOSSIBLE, “there is no statistical universe of models from which f can be drawn.”

Easier: “Given a particular set of parameters, what

is the probability of the data to have occurred

under a certain model?”, i.e. P(y|pf).

⇒Assumption: There is only one model, the correct one.

If P(y|pf) is small, p is intuitively “unlikely”.

⇒ Identify P(y|p) by Likelihood of p given y.

⇒Maximize Likelihood of p given y (for the one

and only model), thus,

we estimate p by a Maximum Likelihood

Estimator.

The Maximum LikelihoodHow do we get a ML estimator?Assumptions:

each data point yi has a measurement error εi that is independent from other εj.

=> P(y |pf)

ε is normally distributed with variance σ2.

=> P(y |pf)

)()(11

ii pPpyP ∏∏

∝= ε

),(21 2

pyLeii

∝∝∏=

⎟⎠⎞

⎜⎝⎛ −

Maximum Likelihood Estimator

Maximizing L(y|pf) = maximizing ln(L(y|pf) ).

⇒ With independent normal errors and the correct

model, the Sum of Squared Residuals is a Maximum

Likelihood estimator of p.

∑∏==

⎟⎠⎞

⎜⎝⎛ −

⎟⎠⎞

⎜⎝⎛ −

−=⎟⎟

⎜⎜

⎛∝

fpxfyepyL

),(21 ),(

21ln))(ln(

Linear Regression Theory

If each Xi is a normal random variable with unit variance and mean zero, then has a chi-square distribution with n degrees of freedom.

is chi-square distributed with N-M degrees of freedom. (Because of p not all terms are statistically independent )

⎟⎠⎞

⎜⎝⎛ −

pxfypyL1

2),(21)(

The multivariate linear model

can be written as with

iiMMiii xpxpxppy ε+++++= ...22110

ε+= Xpy

( ) )ˆ()ˆ(ˆ)(1

2 pXypXypxypyLN

iiii −′−=′−=∑

I2)cov( σε =

εε ′=′′+′−′

=−′−=

pXXppXyyypXypXypyL X

ˆˆˆ2)ˆ()ˆ()ˆ(

This becomes minimal in p when

The “normal equation”

∂′∂pεε

( ) yXXXp

pXXyXp

′′=⇔

=′+′−=∂′∂

−1ˆ

0ˆ220ˆεε

Linear Regression TheorySome properties for

If , i.e. the model is correct and, i.e. the errors independent

Then p is nicely behaved, i.e., i.e. p is unbiased and

, quality can be estimated.Note that both error structure and model structure

contribute to the covariance

( ) yXXXp ′′= −1ˆ

XpyE =)(Iy 2)cov( σ=

ppE ˆ)ˆ( =12 )()ˆcov( −′= XXp σ

Linear Regression TheoryThe Gauss-Markov Theorem:If and and is a LS estimator, then has minimum variance.

This holds for any distribution of y. Normality is not required.

σ can be estimated by

XpyE =)( Iy 2)cov( σ= p̂12 )()ˆcov( −′= XXp σ

)1/()( −−MNpyL X

Linear Regression TheoryIf y is normally distributed N(Xp,σ2I) then,

is N(p,σ2(X’X)-1) L(y|pX)/σ2 is χ2(N-M)

and s are independent.

Thus, confidence intervals of p and all kinds of hypotheses tests for the model, data and parameters can be developed.

p̂ )1/()(2 −−= MNpLs

Nonlinear Regression TheoryNow consider N observations (yi,xi) with knownfunctional relationship f, thus

i=1,2,…, N

We assume that E(εi)=0. The least squareestimator of p* minimizes

iii pxfy ε+= ),( *

( )∑=

−==N

iiif pxfypLpyL

2),()()(

Nonlinear Regression TheoryThe LS estimator therefore satisfies

When f is differentiable, in a small neighbourhood of p*, f can be approximated (Taylor) by

J(p) Jacobian.

*))((*)()(

)(*)()( *

pppJpfpf

pppfpfpf r

−+≈⇔

−∂∂

+≈ ∑=

Nonlinear Regression TheoryHence,

Where , X=J(p*) and .

From linear theory .When N large, is almost certain in a small neighbourhood of p*. Therefore and

)(*))*)((*)((

))(()(

XbzpppJpfy

−−−≈

ε=−= *)( pfyz *ppb −=

( ) zXXXb ′′= −1ˆp̂

bpp ˆ*ˆ ≈−εJJJpp ′′≈− −1)(*ˆ

Nonlinear Regression TheoryThe Jacobian plays the same role as X in linear regression

If ε is N(0,σ2I) then, asymptotically,is N(p,σ2(J’J)-1)

L(y|pf)/σ2 is χ2(N-M)and σ2 are independent and p̂

*ˆ pp −

MNMFMsppJJpp −∝−′′− ,2*)ˆ(ˆˆ)*ˆ(

)/()(2 NMpL −≈σ

pLpL−

−∝

−,)ˆ(

)ˆ(*)(

Nonlinear Regression TheorySummarizing so far:

AssumptionsWe know the correct model f(x,p)ε is N(0,σ2I)

=> Parameters p can be estimated and confidence intervals can be calculated.

Numerical ApproximationsUntil now we do not know how the parameters are estimated.No general solution. Method depends on problem.However, the theory suggests two methods:1. Suppose pa is an approximation of , then

Substituting in leads to

p̂))(()()( aaa pppJpfpf −+≈

))(()()()( aaa pppJprpfypr −−≈−=

)()()( prprpL ′=

Numerical Approximations

This is minimized with respect to p when

⇒Given a current approximtion pa the next approximation should be

This is called the Gauss-Newton Method

)()()()(2)()() aaaaaa ppJJppppJprprprp −′−+−−′≈(L

( ) aaa prJJJpp ∂=′′=− − :)(1

aaa pp ∂+=+1

Numerical ApproximationsWe can also expand L(p) directly:

This differs from the above in that the Hessian H(pa) is approximated by 2J’J.

⇒Given a current approximtion pa the next approximation should be

This is called the Newton Method

)()()(21)()()()(

pppLpppp

ppLpLpL −

∂∂∂

−+−∂

∂+≈

ppLpHpp

∂∂

−= −+ )()(11

ExampleAssume we have a Michaelis-Menten kinetic and want to estimate vmax and Km from noisy measurements.

2 4 6 8 10S

VM−M kinetics with noisy measurements

esttrue

ExampleL(vmax,Km) is well-behaved in this case.

Sum of Squared Residues

Example95 % confidence intervals: For a large number of repetitions of the same experiment, the true value lies in 95% of the cases within the CI.

N=3Out=4

20 40 60 80 100experiments

5CI Vmax Confidence Intervals

ExampleN=10, Out=7

20 40 60 80 100experiments

CI Vmax Confidence Intervals

ExampleTesting for normality. 100 repetitionsK-S test: normally distributed (P < 0.05)

1.9 2 2.1 2.2 2.3

vmax Distrib.

1.9 2.1 2.2

ExampleIncreasing N

10 20 30 40 50N

3.5vmax Confidence Intervals

ExampleTesting for Least Sqaures. 100 repetitionsK-S test: Chi Square distributed (P < 0.05)

5 10 15 20

0.020.040.060.080.1

Least Square Distrib.

10 15 20

ExampleWith small N we can get large Confidence Intervals, but variance of CI is also increasedLarger N gives us smaller and more uniform CI.In the case of a known model and parameter estimations close to the true one, the parameters estimations are normally distributed.With increasing N CI become smaller, but may still not cover the true value. 95% CI stays 95% CI no mattter how small it is.

Confidence Interval and RegionAsymptotically p ~ N(p,σ2(J’J)-1).Estimate σ2 independently by

=>An approximate α-CI for :

=>An approximate α- Confidence Region:

Drawback: CI symmetric and CR approximated by a quadratic. Might not be appropriate.

)/()(2 NMpLs −=

( ) 12/ ˆˆˆ−

− ′± rrMNr JJstp αp̂

})ˆ(ˆˆ)ˆ(:{,

2 αMNM

FMsppJJppp−

≤−′′−

Confidence Interval and RegionTo estimate α-CR, we can use the shape of L(p):

Advantage: Account for intrinsic curvature effects.Drawback: We must find contours where L(p) has a certain value, this might not be feasible or computationally difficult.

⎭⎬⎫

⎩⎨⎧ −

≤−

pLpLp,)ˆ(

)ˆ()(:

Confidence ContoursConsider CI and Contours of L(p).

L(p) not symmetric.

Sum of Squared Residues

0 1 2 3 4 5 6 7

10Likelihood Contours for N= 3

Example

0 1 2 3 4 5 602468

Example

0 1 2 3 4 5 602468

Example

1 2 3 4 5 602468

Example

1.25 1.5 1.75 2 2.25 2.5 2.75 30

2.5Likelihood Contours for N= 17

Example

1.25 1.5 1.75 2 2.25 2.5 2.75 30

Example

1.5 1.75 2 2.25 2.5 2.75 30

Example

1.4 1.6 1.8 2 2.2 2.4 2.6 2.80

The “true” CI always larger than approximated CI.True and approximate CIs asymptotically equal.Remember: Gauss-Markov for Linear Reg.: LS estimator is best (minimum variance).In General: Cramer-Rao bound:

F Fisher Information Matrix, here Cov-1=σ−2(J’J)

)ˆ()ˆ( 1 pFpVar −≥

Fisher Information

We can locally linearize nonlinear systems.We can apply linear theory with X=JWith normal errors, LS estimator is ML estimator and we can approximate CI.Quality of approximation depends on curvature of the system.All of the above only applies, when we have the correct model.

Summary Asympotic Theory

We have the correct model and we want to estimate the parameters.Problem: We only have money for few measurementsQuestion: Where do we most effectively measure?Mathematically: Where do I measure to minimize Confidence Intervals, i.e. Covariance matrix s2(J’J)-1.Typical: D-optimal design, det(J’J)-1

Optimal Experimental Design

Example: Michaelis-Menten Kinetics.We have two measurements x1 and x2 between 1 and 10.x1 < x2Calculate

Minimize det(J’J)-1(x) with respect to x=(x1,x2)

ppxfpxJ

)()ˆ,(∂

Best measurements as far apart as possible

2 4 6 8 10S

SDesttrue

2 4 6 8 10S

2VM−M kinetics with noisy measurements

2 4 6 8 10S

2VM−M kinetics with noisy measurements

SDesttrue

Model Discrimination

1 2 3 4S

0.10.20.30.40.50.60.7

v M−M Kinetics

1 2 3 4S

0.10.20.30.40.50.60.7

v M−M Kinetics

1 2 3 4S

0.10.20.30.40.50.60.7

vSubstrate Inhibition Kinetics

1 2 3 4S

0.10.20.30.40.50.60.7

vSubstrate Inhibition Kinetics

1 2 3 4S

0.10.20.30.40.50.60.7

v Hill Kinetics

1 2 3 4S

0.10.20.30.40.50.60.7

v Hill Kinetics

NP LS fitM-M 2 0.011SI 3 0.001HK 3 0.002

Model DiscriminationLS distributed according to χ2(N-M).

⇒We can calculate confidence limits, i.e. P-values, the P(LS<= χ2(N-M)).

⇒ However, complexity of the model not accounted for

2 4 6 8 10 12 14

0.5Chi2 2 DF

2 4 6 8 10

0.5Chi2 1 DF

NP LS fit Chi2M-M 2 0.011 0.022SI 3 0.001 0.001HK 3 0.002 0.002

Model DiscriminationFor nested models, M1 is simplification of M2 with p1<p2, we can apply Likelihood Ratio TestH0: M1 simplification of M2

⇒2(L(M1)-L(M2))~ χ2(p2-p1).M-M is simplification of both SI and HK

NP LS fit LR Chi2M-M 2 0.011SI 3 0.001 0.020 0.001HK 3 0.002 0.018 0.001

Model DiscriminationOther measures that take into account fit and complexity: AIC, MDL

2)1)(ln(

SSR21 −−

= pNpN

−−+=

MNNMSSRAIC

NP LS fit AIC MDLM-M 2 0.011 4.011178 0.704325SI 3 0.001 6.001262 1.040983HK 3 0.002 6.002177 1.041898

Take Home MessagesUnder certain regularity conditions Least Square Estimator is Maximum Likelihood Estimator, which is an entirely intuitive concept.Under certain regularity conditions we can obtain quality measures for fitted parameters, i.e. Asymptotic Covariance Matrix ASMWe can use ASM to optimize experimental design.Theory also gives us methods to discriminate models (LRT)

Numerical ApproximationsUntil now we do not know how the parameters are estimated.No general solution. Method depends on problem.However, the theory suggests two methods:Suppose pa s an approximation of , then

Substituting in L(p)=r’(p)r(p)

p̂))(()()( aaa pppJpfpf −+≈

))(()()()( aaa pppJprpfypr −+≈−=

modelling of data - universitätsklinikum magdeburg tutorial: modelling of data by j. schaber...

Documents

mngt6232 data analysis & statistical modelling for ... ·...

albert schaber - epub.ub.uni-muenchen.de

presented by keith schaber president

eer data modelling

data flow modelling

data modelling & modelator

data modelling i. plan introduction structured methods...

data modelling part 2: modelling principles - termnet ·...

data driven modelling using matlab -...

dimensional modelling-data warehouse & data mining

relational data analysis ii. plan introduction structured...

die original biax schaber elektronik-schaber und...

logical data modelling

data modelling

modelling football data

python data modelling

schiebe-schaber 6280hd gladiator wartungsanleitung€¦ ·...

learn data modelling by example - database...

data modelling - monash · pdf file1 ims 5024 data...

data modelling 101