nonlinear regression

72
Nonlinear Regression Didier Concordet NATIONAL VETERINARY SCHOOL Toulouse

Upload: fawzi

Post on 10-Jan-2016

68 views

Category:

Documents


0 download

DESCRIPTION

NATIONAL VETERINARY SCHOOL Toulouse. Nonlinear Regression. Didier Concordet. An example. Questions. What does nonlinear mean ? What is a nonlinear kinetics ? What is a nonlinear statistical model ? For a given model, how to fit the data ? Is this model relevant ?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Nonlinear Regression

Nonlinear Regression

Didier Concordet

NATIONAL VETERINARY SCHOOL Toulouse

Page 2: Nonlinear Regression

2

0

1

10

100

1000

0 100 200 300 400

Time (t)

Co

nce

ntr

atio

n (

Y)

An exampleTime Conc

0 112.05 69.1

10 50.420 22.330 12.860 6.390 4.0

120 3.5150 2.2180 1.7210 1.2300 0.4

Page 3: Nonlinear Regression

3

Questions

• What does nonlinear mean ?– What is a nonlinear kinetics ?– What is a nonlinear statistical model ?

• For a given model, how to fit the data ?

• Is this model relevant ?

Page 4: Nonlinear Regression

4

What does nonlinear mean ?

Definition : An operator (P) is linear if :• for all objects x, y on which it operates

P(x+y) = P (x) + P(y)• for all numbers and all objects x

P (x) = P(x)

When an operator is not linear, it is nonlinear

Page 5: Nonlinear Regression

5

Examples

• P (t) = a t

• P(t) = a

• P(t) = a + b t

• P(t) = a t + b t²

Among the operators below which one are nonlinear ?

• P(a,b) = a t + b t²

• P(A,) = A exp (- t)

• P(A) = A exp (- 0.1 t)

• P(t) = A exp (- t)

Page 6: Nonlinear Regression

6

What is a nonlinear kinetics ?

For a given dose D Concentration at time

t, C(t,D)

The kinetics is linear when the operator :

DtCDP , is linear

When P(D) is not linear, the kinetics is nonlinear

Page 7: Nonlinear Regression

7

What is a nonlinear kinetics ?

Examples :

t

V

Cl

V

DDtCDP exp,

KtV

DDtCDP ,

Page 8: Nonlinear Regression

8

What is a nonlinear statistical model ?

A statistical model

qp xxxfY ,,,,,,, 2121

Observation :Dep. variable

Parameters Covariates :indep. variables

Error :residual

function

Page 9: Nonlinear Regression

9

What is a nonlinear statistical model ?

qpp xxxfP ,,,,,,,,,, 212121

A statistical model is linear when the operator :

is linear.When pP ,,, 21 is not linear

the model is nonlinear

Page 10: Nonlinear Regression

10

What is a nonlinear statistical model ?

Example :

tY 21

Y = Concentration

t = time

The model :

is linear

Page 11: Nonlinear Regression

11

Examples

tY 21

2321 ttY

tY 21 exp

ttY 4321 expexp

Among the statistical models below which one are nonlinear ?

22

2

3

1

x

xY

tY 1.0exp1

Page 12: Nonlinear Regression

12

Questions

• What does nonlinear mean ?– What is a nonlinear kinetics ?– What is a nonlinear statistical model ?

• For a given model, how to fit the data ?

• Is this model relevant ?

Page 13: Nonlinear Regression

13

How to fit the data ?

• Write a (statistical) model

• Choose a criterion

• Minimize the criterion

Proceed in three main steps

Page 14: Nonlinear Regression

14

Write a (statistical) model

• Find a function of covariate(s) to describe the

mean variation of the dependent variable (mean

model).

• Find a function of covariate(s) to describe the

dispersion of the dependent variable about the

mean (variance model).

Page 15: Nonlinear Regression

15

0.1

1.0

10.0

100.0

1000.0

0 100 200 300 400

Time (t)

Co

nce

ntr

atio

n (

Y)

Example

tY 21 exp

is assumed gaussianwith a constant variance

homoscedastic model

Page 16: Nonlinear Regression

16

How to choose the criterion to optimize ?

Homoscedasticity : Ordinary Least Squares (OLS)When normality OLS are equivalent to maximum likelihood

Heteroscedasticity: Weight Least Squares (WLS) Extended Least Squares (ELS)

Page 17: Nonlinear Regression

17

Homoscedastic models

minimum ,,, minimum 212

pi

i SS

iiqiipi xxxfY ,,2,121 ,,,,,,,

Define :

,,,,,,,,,,2

,,2,12121 i

iqiipip xxxfYSS

The Ordinary Least-Squares criterion

Page 18: Nonlinear Regression

18

Heteroscedastic models : Weight Least-Squares criterion

minimum ,,, minimum 212

pi

ii WSSw

iiqiipi xxxfY ,,2,121 ,,,,,,,

Define :

,,,,,,,,,,2

,,2,12121 i

iqiipiip xxxfYwWSS

weight iw

Page 19: Nonlinear Regression

19

How to choose the weights ?

iiqiip

iqiipi

xxxg

xxxfY

,,,,,,,

,,,,,,,

,,2,121

,,2,121

When the model

is heteroscedastic (ie is not constant with i))( iVar It is possible to rewrite it as

iiqiipi xxxfY ,,2,121 ,,,,,,,

where does not depend on i)( iVar The weights are chosen as

2,,2,121 ,,,,,,,1/ iqiipi xxxgw

Page 20: Nonlinear Regression

20

Example

iii tY 21 exp

iiii ttY expexp 2121

with ii

i

tt

YCV

22

12

21i 2expexpVar

cste

The model can be rewritten as

csteiVar with

The weights are chosen as 221 exp1/ ii tw

Page 21: Nonlinear Regression

21

Extended (Weight) Least Squares

minimum ,,, 21 pEWSS

iiqiipi xxxfY ,,2,121 ,,,,,,,

Define :

ii

iqiipiip wxxxfYwEWSS ln- ,,,,,,,,,,2

,,2,12121

weight iw

Page 22: Nonlinear Regression

22

Balance sheet

Criterion When Advantages Drawbacks

OLSHomoscedastic models

Easy to use

WLSHeteroscedastic models

robust to variance mispecification

estimator with large variance

ELSHeteroscedastic models

Unbiased estimate small variance (efficiency with normality)

not robust to variance mispecification

Page 23: Nonlinear Regression

23

The criterion properties

It converges

It leads to consistent (unbiased) estimates

It leads to efficient estimates

It has several minima

Page 24: Nonlinear Regression

24

It converges

When the sample size increases, it concentrates about a value of the parameter

Example : Consider the homoscedastic model

iii tY 21 exp

exp,1

22121

n

iii tYSS

The criterion to use is the Least Squares criterion

Page 25: Nonlinear Regression

25

It converges

, 21 SS

1

2

Small sample size

Large sample size

Page 26: Nonlinear Regression

26

It leads to consistent estimates

, 21 SS

1

2

02

01

The criterion concentrates about the true value

Page 27: Nonlinear Regression

27

It leads to efficient estimates

For a fixed n, the variance of an consistent estimator is always greater than a limit (Cramer-Rao lower bound).

For a fixed n, the "precision" of a consistent estimator is bounded

An estimator is efficient when its varianceequals this lower bound

Page 28: Nonlinear Regression

28

Geometric interpretation

1

2criterion

This ellipsoidis a confidence region of the parameter

Page 29: Nonlinear Regression

29

It leads to efficient estimates

1

2

02

01

For a given large n, it does not exist a criterion giving consistent estimates more "convex" than - 2 ln(likelihood)

- 2 ln(likelihood)

criterion

Page 30: Nonlinear Regression

30

It has several minima

1

2criterion

Page 31: Nonlinear Regression

31

Minimize the criterion

Suppose that the criterion to optimize has been chosen

We are looking for the value of denoted

which achieve the minimum of the criterion.

21ˆ,ˆ 21 ,

We need an algorithm to minimize such a criterion

Page 32: Nonlinear Regression

32

Example

Consider the homoscedastic model

iii tY 21 exp

We are looking for the value of denoted

exp,2

2121 i

ii tYSS

which achieve the minimumof the criterion

21ˆ,ˆ 21 ,

Page 33: Nonlinear Regression

33

Isocontours

, 21 SS

1

2

Page 34: Nonlinear Regression

34

Different families of algorithms

• Zero order algorithms : computation of the criterion• First order algorithms : computation of the first derivative of the criterion• Second order algorithms : computation of the second derivative of the criterion

Page 35: Nonlinear Regression

35

Zero order algorithms

• Simplex algorithm

• Grid search and Monte-Carlo methods

Page 36: Nonlinear Regression

36

Simplex algorithm

1

2

Page 37: Nonlinear Regression

37

Monte-carlo algorithm

1

2

Page 38: Nonlinear Regression

38

First order algorithms

• Line search algorithm

• Conjugate gradient

Page 39: Nonlinear Regression

39

First order algorithms

The derivatives of the criterion cancel at its optimaSuppose that there is only one parameter to estimate

The criterion (e.g. SS) depends only on

How to find the value(s) of where the criterion cancels ?

Page 40: Nonlinear Regression

40

Line search algorithm

Derivative of the criterion

1

2

Page 41: Nonlinear Regression

41

Second order algorithms

Gauss-Newton (steepest descent method)

Marquardt

Page 42: Nonlinear Regression

42

Second order algorithms

The derivatives of the criterion cancel at its optima.

When the criterion is (locally) convex there is a path to

reach the minimum : the steepest direction.

Page 43: Nonlinear Regression

43

Gauss Newton (one dimension)

Derivative of thecriterion

123

The criterion is convex

Page 44: Nonlinear Regression

44

Gauss Newton (one dimension)

Derivative of thecriterion

The criterion is not convex

1 2

Page 45: Nonlinear Regression

45

Gauss Newton

1

2

Page 46: Nonlinear Regression

46

MarquardtD

eriv

ativ

e of

the

crit

erio

n

Allows to deal with the case where the criterion is not convex

12

When the second derivative <0 (first derivative decreases)it is set to a positive value

3

Page 47: Nonlinear Regression

47

Balance sheet

Order Algo When Robustness Speed

0Monte-Carlo

To start the optimisation +++ 0

Simplex To start the optimisation ++ +

1Conjugate gradient

When the second derivative is difficult to compute

+ ++

Line SearchWhen the second derivative is difficult to compute

++ +

2Gauss Newton

To finish the optimisation

0 +++

MarquardtWith a reasonnable starting point

+ ++

Page 48: Nonlinear Regression

48

Questions

• What does nonlinear mean ?– What is a nonlinear kinetics ?– What is a nonlinear statistical model ?

• For a given model, how to fit the data ?

• Is this model relevant ?

Page 49: Nonlinear Regression

49

Is this model relevant ?

• Graphical inspection of the residuals

– mean model ( f )

– variance model ( g )

• Inspection of numerical results

– variance-correlation matrix of the estimator

– Akaike indice

Page 50: Nonlinear Regression

50

Graphical inspection of the residuals

iiqiip

iqiipi

xxxg

xxxfY

,,,,,,,

,,,,,,,

,,2,121

,,2,121

For the model

Calculate the weight residuals :

,,,,ˆ,,ˆ,ˆ

,,,,ˆ,,ˆ,ˆˆ

,,2,121

,,2,121

iqiip

iqiipii

xxxg

xxxfY

and draw i vs iqiipi xxxfY ,,2,121 ,,,,ˆ,,ˆ,ˆˆ

Page 51: Nonlinear Regression

51

Check the mean model

scatterplot of weight residuals vs fitted values

iY

i

0

No structure in the residualsOK

iY

i

0

structure in the residualschange the mean model(f function)

ijx , ijx ,

Page 52: Nonlinear Regression

52

Check the variance model : homoscedasticity

Scatterplot of weight residuals vs fitted values

iY

i

0

homoscedasticityOK

No structure in the residualsbut heteroscedasticitychange the model (g function)

iY

i

0

ijx ,ijx ,

Page 53: Nonlinear Regression

53

0

1

10

100

1000

0 100 200 300 400

Time (t)

Co

nce

ntr

atio

n (

Y)

ExampleTime Conc

0 112.05 69.1

10 50.420 22.330 12.860 6.390 4.0

120 3.5150 2.2180 1.7210 1.2300 0.4

iii tY 21 exphomoscedastic model

Criterion : OLS

Page 54: Nonlinear Regression

54

Example

-6

-4

-2

0

2

4

6

0 100 200 300 400it

i

structure in the residuals

change the mean model

New model iiii ttY 4321 expexp

homoscedastic model

iii tY 21 exp

Page 55: Nonlinear Regression

55

-3

-2

-1

0

1

2

3

4

0 100 200 300 400

Example

it

i

heteroscedasticitychange the variance model

New model

iiii ttY 1expexp 4321

Need WLS

iiii ttY 4321 expexp

Page 56: Nonlinear Regression

56

Example

iiii ttY 1expexp 4321

-0.15

-0.1

-0.05

0

0.05

0.1

0 100 200 300 400

it

i

No structureWeight residuals homoscedastic

OK

Page 57: Nonlinear Regression

57

Inspection of numerical results

correlation matrix of the estimator

1ˆ,ˆˆ,ˆ

ˆ,ˆ1ˆ,ˆ

ˆ,ˆˆ,ˆ1

21

221

121

pp

p

p

rr

rr

rr

C

Strong correlations between estimators :• the model is over-parametrized• the parametrization is not good• the model is not identifiable

Page 58: Nonlinear Regression

58

The model is over-parametrized

Change the mean and/or variance model (f and/or g )

Example :The appropriate model is

and you fitted

iii tY 21 exp

iiii ttY 1expexp 4321

Perform a test or check the AIC

Page 59: Nonlinear Regression

59

The parametrization is not good

Change the parametrization of your model

Example :you fitted

try

iii tY 21 exp

iii tV

Cl

V

DY

exp

Two useful indices :the parametric curvaturethe intrinsic curvature

Page 60: Nonlinear Regression

60

The model is not identifiable

The model has too many parameters compare to the number of data : there are lots of solutions to the optimisation

Examples :

1

2

crit

erio

n

1

2

crit

erio

n

Look at the eigenvalues of the correlation matrix if

minis too large and/orminmax /

too small, simplify the model

Page 61: Nonlinear Regression

61

The Akaike indice

The Akaike indice allows to select a model among several models in "competition".

The Akaike indice is nothing else but the penalized log likelihood. That is, it chooses the model which is the more likely.

The penality is chosen such that the indice is convergent :when the sample size increases, the indice selects the "true" model.

pSSnAIC 2ln n = sample size, SS = (Weight or Ordinary) SS p = number of parameters that have been estimated

The model with the smaller AIC is the best among the comparedmodels

Page 62: Nonlinear Regression

62

Example

0 0.06979 100.665 0.0976 10.6160 0.010191 0.04349 101.738 0.1048 12.4846 0.011482 0.03713 101.589 0.1037 12.1725 0.011213 0.03707 101.596 0.1035 12.1057 0.011174 0.03707 101.595 0.1035 12.1051 0.011175 0.03707 101.595 0.1035 12.1051 0.01117

Final value of loss function is 0.037

Iteration Loss1 2 3 4

Page 63: Nonlinear Regression

63

Example

Parameter Estimate A.S.E. Param/ASETheta1 101.595 6.104 16.645Theta2 0.104 0.006 17.366Theta3 12.105 0.784 15.431Theta4 0.011 3.66E-04 30.067

10.632 10.006 0.449 1

-0.003 0.393 0.916 1

1 2 3 4

R =

essentiallyintrinsic curvature

Page 64: Nonlinear Regression

64

About the ellipsoid

It is linked to the convexity of the criterionIt is linked to the variance of the estimator

The convexity of the criterion is linked to the variance of the estimator

Page 65: Nonlinear Regression

65

Different degres of convexity

flat criterionweakly convex

convex criterion

locally convex convex in some directions locally convex

Page 66: Nonlinear Regression

66

How to measure convexity ?

Calculate the hessian matrix matrix of partial second derivatives

22

212

21

212

21

212

21

212

21 ,,

,,

,

SSSS

SSSS

H

When the second derivative is positive, the criterion is convex at the point where the second derivative is evaluated

One parameter

Severalparameters

Page 67: Nonlinear Regression

67

How to measure convexity ?

212

21121 ,0

0,,

H

It is possible to find a linear transformation of the parameterssuch that the hessian matrix is

212 , are the eigenvalues of the hessian matrix 211 ,

0, 211 When for all , and 0, 212 21 ,

the criterion is convex

Page 68: Nonlinear Regression

68

How to measure convexity ?

0, 211 When for some , and 0, 212 21 ,

the criterion is locally convex

What is the point for which21 , 0, 211

and 0, 212 ?

When 212 , 211 , and are low (but >0),

the criterion is flat

Page 69: Nonlinear Regression

69

The variance-covariance matrix

The variance-covariance matrix of the estimator (denoted V)

is proportional to 02

01

1 ,H

221

211

ˆˆ,ˆ

ˆ,ˆˆ

VarCov

CovVarV

02

012

02

011

02

012

02

011

,

10

0,

1

,0

0,

V

It is possible to find a linear transformation of the parameterssuch that V is

Page 70: Nonlinear Regression

70

The variance-covariance matrix

02

011 , 0

20

12 ,

are the eigenvalues of the variance-covariance matrix V

Page 71: Nonlinear Regression

71

The correlation matrix

The correlation matrix of the estimator (denoted C )

is obtained from V

221

211

ˆˆ,ˆ

ˆ,ˆˆ

VarCov

CovVarV

21

21

ˆ ˆ

ˆ,ˆ with

1

1

VarVar

Covr

r

rC

correlation matrix

Page 72: Nonlinear Regression

72

Geometric interpretation

1

2

crit

erio

n

2

1

02

011 ,

02

012 , 1Var

2Var

r = 0Axes of the ellipsoid // axes