nonlinear regression

Nonlinear Regression

Didier Concordet

NATIONAL VETERINARY SCHOOL Toulouse

0 100 200 300 400

Time (t)

An exampleTime Conc

0 112.05 69.1

10 50.420 22.330 12.860 6.390 4.0

120 3.5150 2.2180 1.7210 1.2300 0.4

Questions

• What does nonlinear mean ?– What is a nonlinear kinetics ?– What is a nonlinear statistical model ?

• For a given model, how to fit the data ?

• Is this model relevant ?

What does nonlinear mean ?

Definition : An operator (P) is linear if :• for all objects x, y on which it operates

P(x+y) = P (x) + P(y)• for all numbers and all objects x

P (x) = P(x)

When an operator is not linear, it is nonlinear

Examples

• P (t) = a t

• P(t) = a

• P(t) = a + b t

• P(t) = a t + b t²

Among the operators below which one are nonlinear ?

• P(a,b) = a t + b t²

• P(A,) = A exp (- t)

• P(A) = A exp (- 0.1 t)

• P(t) = A exp (- t)

What is a nonlinear kinetics ?

For a given dose D Concentration at time

t, C(t,D)

The kinetics is linear when the operator :

DtCDP , is linear

When P(D) is not linear, the kinetics is nonlinear

What is a nonlinear kinetics ?

Examples :

DDtCDP exp,

DDtCDP ,

What is a nonlinear statistical model ?

A statistical model

qp xxxfY ,,,,,,, 2121

Observation :Dep. variable

Parameters Covariates :indep. variables

Error :residual

function

qpp xxxfP ,,,,,,,,,, 212121

A statistical model is linear when the operator :

is linear.When pP ,,, 21 is not linear

the model is nonlinear

Example :

Y = Concentration

t = time

The model :

is linear

Examples

2321 ttY

tY 21 exp

ttY 4321 expexp

Among the statistical models below which one are nonlinear ?

tY 1.0exp1

Questions

How to fit the data ?

• Write a (statistical) model

• Choose a criterion

• Minimize the criterion

Proceed in three main steps

Write a (statistical) model

• Find a function of covariate(s) to describe the

mean variation of the dependent variable (mean

model).

• Find a function of covariate(s) to describe the

dispersion of the dependent variable about the

mean (variance model).

1000.0

0 100 200 300 400

Time (t)

Example

tY 21 exp

is assumed gaussianwith a constant variance

homoscedastic model

How to choose the criterion to optimize ?

Homoscedasticity : Ordinary Least Squares (OLS)When normality OLS are equivalent to maximum likelihood

Heteroscedasticity: Weight Least Squares (WLS) Extended Least Squares (ELS)

Homoscedastic models

minimum ,,, minimum 212

iiqiipi xxxfY ,,2,121 ,,,,,,,

Define :

,,,,,,,,,,2

,,2,12121 i

iqiipip xxxfYSS

The Ordinary Least-Squares criterion

Heteroscedastic models : Weight Least-Squares criterion

minimum ,,, minimum 212

ii WSSw

Define :

,,,,,,,,,,2

,,2,12121 i

iqiipiip xxxfYwWSS

weight iw

How to choose the weights ?

iiqiip

iqiipi

,,,,,,,

,,2,121

When the model

is heteroscedastic (ie is not constant with i))( iVar It is possible to rewrite it as

where does not depend on i)( iVar The weights are chosen as

2,,2,121 ,,,,,,,1/ iqiipi xxxgw

Example

iii tY 21 exp

iiii ttY expexp 2121

with ii

21i 2expexpVar

The model can be rewritten as

csteiVar with

The weights are chosen as 221 exp1/ ii tw

Extended (Weight) Least Squares

minimum ,,, 21 pEWSS

Define :

iqiipiip wxxxfYwEWSS ln- ,,,,,,,,,,2

,,2,12121

weight iw

Balance sheet

Criterion When Advantages Drawbacks

OLSHomoscedastic models

Easy to use

WLSHeteroscedastic models

robust to variance mispecification

estimator with large variance

ELSHeteroscedastic models

Unbiased estimate small variance (efficiency with normality)

not robust to variance mispecification

The criterion properties

It converges

It leads to consistent (unbiased) estimates

It leads to efficient estimates

It has several minima

It converges

When the sample size increases, it concentrates about a value of the parameter

Example : Consider the homoscedastic model

iii tY 21 exp

iii tYSS

The criterion to use is the Least Squares criterion

It converges

, 21 SS

Small sample size

Large sample size

It leads to consistent estimates

, 21 SS

The criterion concentrates about the true value

For a fixed n, the variance of an consistent estimator is always greater than a limit (Cramer-Rao lower bound).

For a fixed n, the "precision" of a consistent estimator is bounded

An estimator is efficient when its varianceequals this lower bound

Geometric interpretation

2criterion

This ellipsoidis a confidence region of the parameter

For a given large n, it does not exist a criterion giving consistent estimates more "convex" than - 2 ln(likelihood)

- 2 ln(likelihood)

criterion

It has several minima

2criterion

Minimize the criterion

Suppose that the criterion to optimize has been chosen

We are looking for the value of denoted

which achieve the minimum of the criterion.

21ˆ,ˆ 21 ,

We need an algorithm to minimize such a criterion

Example

Consider the homoscedastic model

iii tY 21 exp

We are looking for the value of denoted

2121 i

ii tYSS

which achieve the minimumof the criterion

21ˆ,ˆ 21 ,

Isocontours

, 21 SS

Different families of algorithms

• Zero order algorithms : computation of the criterion• First order algorithms : computation of the first derivative of the criterion• Second order algorithms : computation of the second derivative of the criterion

Zero order algorithms

• Simplex algorithm

• Grid search and Monte-Carlo methods

Simplex algorithm

Monte-carlo algorithm

First order algorithms

• Line search algorithm

• Conjugate gradient

First order algorithms

The derivatives of the criterion cancel at its optimaSuppose that there is only one parameter to estimate

The criterion (e.g. SS) depends only on

How to find the value(s) of where the criterion cancels ?

Line search algorithm

Derivative of the criterion

Second order algorithms

Gauss-Newton (steepest descent method)

Marquardt

Second order algorithms

The derivatives of the criterion cancel at its optima.

When the criterion is (locally) convex there is a path to

reach the minimum : the steepest direction.

Gauss Newton (one dimension)

Derivative of thecriterion

The criterion is convex

Gauss Newton (one dimension)

Derivative of thecriterion

The criterion is not convex

Gauss Newton

MarquardtD

Allows to deal with the case where the criterion is not convex

When the second derivative <0 (first derivative decreases)it is set to a positive value

Balance sheet

Order Algo When Robustness Speed

0Monte-Carlo

To start the optimisation +++ 0

Simplex To start the optimisation ++ +

1Conjugate gradient

When the second derivative is difficult to compute

Line SearchWhen the second derivative is difficult to compute

2Gauss Newton

To finish the optimisation

MarquardtWith a reasonnable starting point

Questions

Is this model relevant ?

• Graphical inspection of the residuals

– mean model ( f )

– variance model ( g )

• Inspection of numerical results

– variance-correlation matrix of the estimator

– Akaike indice

Graphical inspection of the residuals

iiqiip

iqiipi

,,,,,,,

,,2,121

For the model

Calculate the weight residuals :

,,,,ˆ,,ˆ,ˆ

,,,,ˆ,,ˆ,ˆˆ

,,2,121

iqiipii

and draw i vs iqiipi xxxfY ,,2,121 ,,,,ˆ,,ˆ,ˆˆ

Check the mean model

scatterplot of weight residuals vs fitted values

No structure in the residualsOK

structure in the residualschange the mean model(f function)

ijx , ijx ,

Check the variance model : homoscedasticity

Scatterplot of weight residuals vs fitted values

homoscedasticityOK

No structure in the residualsbut heteroscedasticitychange the model (g function)

ijx ,ijx ,

0 100 200 300 400

Time (t)

ExampleTime Conc

0 112.05 69.1

10 50.420 22.330 12.860 6.390 4.0

120 3.5150 2.2180 1.7210 1.2300 0.4

iii tY 21 exphomoscedastic model

Criterion : OLS

Example

0 100 200 300 400it

structure in the residuals

change the mean model

New model iiii ttY 4321 expexp

homoscedastic model

iii tY 21 exp

0 100 200 300 400

Example

heteroscedasticitychange the variance model

New model

iiii ttY 1expexp 4321

Need WLS

iiii ttY 4321 expexp

Example

0 100 200 300 400

No structureWeight residuals homoscedastic

Inspection of numerical results

correlation matrix of the estimator

1ˆ,ˆˆ,ˆ

ˆ,ˆ1ˆ,ˆ

ˆ,ˆˆ,ˆ1

Strong correlations between estimators :• the model is over-parametrized• the parametrization is not good• the model is not identifiable

The model is over-parametrized

Change the mean and/or variance model (f and/or g )

Example :The appropriate model is

and you fitted

iii tY 21 exp

Perform a test or check the AIC

The parametrization is not good

Change the parametrization of your model

Example :you fitted

iii tY 21 exp

iii tV

Two useful indices :the parametric curvaturethe intrinsic curvature

The model is not identifiable

The model has too many parameters compare to the number of data : there are lots of solutions to the optimisation

Examples :

Look at the eigenvalues of the correlation matrix if

minis too large and/orminmax /

too small, simplify the model

The Akaike indice

The Akaike indice allows to select a model among several models in "competition".

The Akaike indice is nothing else but the penalized log likelihood. That is, it chooses the model which is the more likely.

The penality is chosen such that the indice is convergent :when the sample size increases, the indice selects the "true" model.

pSSnAIC 2ln n = sample size, SS = (Weight or Ordinary) SS p = number of parameters that have been estimated

The model with the smaller AIC is the best among the comparedmodels

Example

0 0.06979 100.665 0.0976 10.6160 0.010191 0.04349 101.738 0.1048 12.4846 0.011482 0.03713 101.589 0.1037 12.1725 0.011213 0.03707 101.596 0.1035 12.1057 0.011174 0.03707 101.595 0.1035 12.1051 0.011175 0.03707 101.595 0.1035 12.1051 0.01117

Final value of loss function is 0.037

Iteration Loss1 2 3 4

Example

Parameter Estimate A.S.E. Param/ASETheta1 101.595 6.104 16.645Theta2 0.104 0.006 17.366Theta3 12.105 0.784 15.431Theta4 0.011 3.66E-04 30.067

10.632 10.006 0.449 1

-0.003 0.393 0.916 1

1 2 3 4

essentiallyintrinsic curvature

About the ellipsoid

It is linked to the convexity of the criterionIt is linked to the variance of the estimator

The convexity of the criterion is linked to the variance of the estimator

Different degres of convexity

flat criterionweakly convex

convex criterion

locally convex convex in some directions locally convex

How to measure convexity ?

Calculate the hessian matrix matrix of partial second derivatives

When the second derivative is positive, the criterion is convex at the point where the second derivative is evaluated

One parameter

Severalparameters

21121 ,0

It is possible to find a linear transformation of the parameterssuch that the hessian matrix is

212 , are the eigenvalues of the hessian matrix 211 ,

0, 211 When for all , and 0, 212 21 ,

the criterion is convex

0, 211 When for some , and 0, 212 21 ,

the criterion is locally convex

What is the point for which21 , 0, 211

and 0, 212 ?

When 212 , 211 , and are low (but >0),

the criterion is flat

The variance-covariance matrix

The variance-covariance matrix of the estimator (denoted V)

is proportional to 02

ˆˆ,ˆ

ˆ,ˆˆ

VarCov

CovVarV

It is possible to find a linear transformation of the parameterssuch that V is

The variance-covariance matrix

011 , 0

are the eigenvalues of the variance-covariance matrix V

The correlation matrix

The correlation matrix of the estimator (denoted C )

is obtained from V

ˆˆ,ˆ

ˆ,ˆˆ

VarCov

CovVarV

ˆ,ˆ with

VarVar

correlation matrix

Geometric interpretation

012 , 1Var

r = 0Axes of the ellipsoid // axes

nonlinear regression

tconcentration yconf

Documents

1 multiple regression nonlinear regression...

nonlinear regression functions - econometrics sap

nonlinear regression - simon fraser university

unsupervised regression with applications to nonlinear

nonlinear regression i - idiap

nonlinear regression: untransformed data

nonlinear regression methow - stat.ncsu.edu

nonlinear regression tutorial - university of...

nonlinear nonparametric regression models

parametric estimating – nonlinear regression - …...

use of nonlinear regression and nonlinear mathematical...

lecture 5: nonlinear regression functions

chapter 8: nonlinear regression functions

unsupervised kernel regression for nonlinear

nonlinear regression - eth z

gaussian processes for nonlinear regression and …gaussian...

use of nonlinear regression and nonlinear mathematical

nonlinear regression ii - idiap.ch

nonlinear regression - university of cagliari

nonlinear regression functions (sw chapter 8)