an introduction to splines - simon fraser...

54
An Introduction to Splines Trinity River Restoration Program Workshop on Outmigration: Population Estimation October 6–8, 2009

Upload: phungcong

Post on 05-Apr-2018

234 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

An Introduction to Splines

Trinity River Restoration ProgramWorkshop on Outmigration: Population Estimation

October 6–8, 2009

Page 2: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

An Introduction to Splines

1 Linear RegressionSimple Regression and the Least Squares MethodLeast Squares Fitting in RPolynomial Regression

2 Smoothing SplinesSimple SplinesB-splinesOverfitting and Smoothness

Page 3: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

An Introduction to Bayesian Inference

1 Linear RegressionSimple Regression and the Least Squares MethodLeast Squares Fitting in RPolynomial Regression

Page 4: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

An Introduction to Bayesian Inference

1 Linear RegressionSimple Regression and the Least Squares MethodLeast Squares Fitting in RPolynomial Regression

Page 5: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

Simple Linear RegressionDaily temperatures in Montreal from April 1 (Day 81) to June 30 (Day191), 1961.

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

100 120 140 160 180

05

1015

20

Montreal Temp. −− April 1 to June 30, 1961

Day of Year

Tem

pera

ture

Introduction to Splines: Linear Regression, Simple Regression and the Least Squares Method 5/52

Page 6: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

Simple Linear RegressionThe Model

Assumptions

Mean On average, the change in the response isproportional to the change in the predictor.

Errors 1. The deviation in the response for anyobservation does not depend on any otherobservation.

2. The average magnitude of the deviation is thesame for all values of the predictor.

Mathematically

For i = 1, . . . , n:yi = β0 + β1xi + εi

where ε1, . . . , εn are independent with mean 0 and variance σ2.

Introduction to Splines: Linear Regression, Simple Regression and the Least Squares Method 6/52

Page 7: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

The Least Squares MethodExample: The Montreal Data

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

100 120 140 160 180

05

1015

20

Montreal Temp. −− April 1 to June 30, 1961

Day of Year

Tem

pera

ture

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

100 120 140 160 1800

510

1520

Montreal Temp. −− April 1 to June 30, 1961

Day of Year

Tem

pera

ture

Introduction to Splines: Linear Regression, Simple Regression and the Least Squares Method 7/52

Page 8: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

The Least Squares MethodThe Residuals

DefinitionGiven values for β0 and β1, the residual for the i th observation isthe difference between the observed and the predicted response:

ei = yi − yi

where yi = β0 + β1xi .

Introduction to Splines: Linear Regression, Simple Regression and the Least Squares Method 8/52

Page 9: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

The Least Squares MethodThe Least Squares Criterion

The least squares method defines the best values of β0 and β1 tobe those that minimize the sum of the squared residuals:

SS =n∑

i=1

e2i =

n∑i=1

(yi − yi )2.

Introduction to Splines: Linear Regression, Simple Regression and the Least Squares Method 9/52

Page 10: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

The Least Squares MethodExample: The Montreal Data

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

100 120 140 160 180

05

1015

20

Montreal Temp. −− April 1 to June 30, 1961

Day of Year

Tem

pera

ture

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

100 120 140 160 180

05

1015

20

Montreal Temp. −− April 1 to June 30, 1961

Day of Year

Tem

pera

ture

SS=1549.37 SS=1148.56

Introduction to Splines: Linear Regression, Simple Regression and the Least Squares Method 10/52

Page 11: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

An Introduction to Bayesian Inference

1 Linear RegressionSimple Regression and the Least Squares MethodLeast Squares Fitting in RPolynomial Regression

Page 12: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

Least Squares Fitting in RThe Data

Suppose that the data is a data frame with elements:

I x: the days from 90 to 181

I y: the observed temperatures

> data = read.table("MontrealTemp1.txt")> summary(data)

x yMin. : 90.0 Min. : -0.901st Qu .:112.8 1st Qu.: 5.60Median :135.5 Median :11.55Mean :135.5 Mean :11.463rd Qu .:158.2 3rd Qu .:16.70Max. :181.0 Max. :23.60

>

Introduction to Splines: Linear Regression, Least Squares Fitting in R 12/52

Page 13: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

Least Squares Fitting in RFitting the Model

Fitting the model with lm:

> lm(y~x,data)

Call:lm(formula = y ~ x, data = data)

Coefficients:(Intercept) x

-15.3996 0.1982

Introduction to Splines: Linear Regression, Least Squares Fitting in R 13/52

Page 14: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

Least Squares Fitting in RFitting the Model

Fitting the model with lm:

> lmfit = lm(y~x,data)> attributes(lmfit)$names[1] "coefficients" "residuals"[3] "effects" "rank"[5] "fitted.values" "assign"[7] "qr" "df.residual"[9] "xlevels" "call"

[11] "terms" "model"

$class[1] "lm"

>

Introduction to Splines: Linear Regression, Least Squares Fitting in R 14/52

Page 15: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

Least Squares Fitting in RThe Fitted Line

Plotting the fitted line over the raw data:

# Plot the raw data

> plot(data$x,data$y,main="Montreal Temp. ...",xlab="Day of Year",ylab="Temperature")

# Add the fitted line

> lines(data$x,lmfit$fit ,col="red",lwd =3)

Introduction to Splines: Linear Regression, Least Squares Fitting in R 15/52

Page 16: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

Least Squares Fitting in RThe Fitted Line

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

100 120 140 160 180

05

1015

20

Montreal Temp. −− April 1 to June 30, 1961

Day of Year

Tem

pera

ture

Introduction to Splines: Linear Regression, Least Squares Fitting in R 16/52

Page 17: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

The Least Squares MethodGoodness-of-Fit Testing

Residual Diagnostics

The value of the residuals should not depend on x or y in anysystematic way.

I Common indications of lack of fit:I trends with x or y (curves or clusters of high/low values)I constant increase/decrease (funnel shape)I increase followed by decrease (football shape)I very large (+ or -) values (outliers)

I Assessed by plotting e versus x and y .

Introduction to Splines: Linear Regression, Least Squares Fitting in R 17/52

Page 18: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

Least Squares Fitting in RResidual Plots

Plotting the residuals versus the predictor and response:

## Plot the residuals versus day

> plot(data$x,lmfit$resid ,xlab="Day of Year",ylab="Residual")

> abline(h=0)

## Plot the residuals versus temperature

> plot(data$y,lmfit$resid ,xlab="Temperature",ylab="Residual")

> abline(h=0)

Introduction to Splines: Linear Regression, Least Squares Fitting in R 18/52

Page 19: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

Least Squares Fitting in RThe Fitted Line

Residuals vs. Day Residuals vs. Temperature

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

100 120 140 160 180

−10

−5

05

10

Day of Year

Res

idua

l

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

0 5 10 15 20

−10

−5

05

10

Temperature

Res

idua

l

Introduction to Splines: Linear Regression, Least Squares Fitting in R 19/52

Page 20: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

Exercises

1. Montreal Temperature Data – April 1 to June 30, 1961File: Intro to splines\Exercises\montreal temp 1.RUse the provide code to fit the simple linear regression modelto the Montreal temperature data from the spring of 1961,plot the fitted line, and produce the residual plots.

2. Montreal Temperature Data – Jan. 1 to Dec. 31, 1961File: Intro to splines\Exercises\montreal temp 2.RRepeat exercise 1 with the data from all of 1961.

Introduction to Splines: Linear Regression, Least Squares Fitting in R 20/52

Page 21: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

An Introduction to Bayesian Inference

1 Linear RegressionSimple Regression and the Least Squares MethodLeast Squares Fitting in RPolynomial Regression

Page 22: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

Polynomial RegressionMotivation

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●●●●●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

0 100 200 300

−20

−10

010

20

Montreal Temp. −− January 1 to December 31, 1961

Day of Year

Tem

pera

ture

Introduction to Splines: Linear Regression, Polynomial Regression 22/52

Page 23: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

Polynomial RegressionMotivation

Residuals vs. Day Residuals vs. Temperature

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●●●●●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

0 100 200 300

−20

−10

010

20

Day of Year

Res

idua

l

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

−20 −10 0 10 20

−20

−10

010

20

Temperature

Res

idua

l

Introduction to Splines: Linear Regression, Polynomial Regression 23/52

Page 24: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

Polynomial RegressionPolynomials

DefinitionA polynomial of degree D is a function formed by linearcombinations of the powers of its argument up to D:

y = β0 + β1x + β2x2 + · · ·+ βDxD

Specific Polynomials

Linear y = β0 + β1x

Quadratic y = β0 + β1x + β2x2

Cubic y = β0 + β1x + β2x2 + β3x

3

Quartic y = β0 + β1x + β2x2 + β3x

3 + β4x4

Quintic y = β0 + β1x + β2x2 + β3x

3 + β4x4 + β5x

5

Introduction to Splines: Linear Regression, Polynomial Regression 24/52

Page 25: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

Polynomial RegressionThe Design Matrix

DefinitionThe design matrix for a regression model with n observations andp predictors is the matrix with n rows and p columns such that thevalue of the j th predictor for the i th observation is located incolumn j of row i .

Design matrix for a polynomial of degree D

123...n

1 x1 x2

1 x31 · · · xD

1

1 x2 x22 x3

2 · · · xD2

1 x3 x23 x3

3 · · · xD3

...1 xn x2

n x3n · · · xD

n

Introduction to Splines: Linear Regression, Polynomial Regression 25/52

Page 26: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

Polynomial Regression in RConstructing the Design Matrix – Quadratic

The design matrix for polynomial regression can be generated withthe function outer():

> D = 2> X = outer(data$x,1:D,"^")> X[1:5 ,]

[,1] [,2][1,] 1 1[2,] 2 4[3,] 3 9[4,] 4 16[5,] 5 25>

Note: we do not need to include the intercept column.

Introduction to Splines: Linear Regression, Polynomial Regression 26/52

Page 27: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

Polynomial Regression in RLeast Squares Fitting – Quadratic

> lmfit = lm(y~X,data)> attributes(lmfit)$names[1] "coefficients" "residuals" ...

$class[1] "lm"

> lmfit$coefficients(Intercept) X1 X2

-23.715358962 0.413901580 -0.001014625

Introduction to Splines: Linear Regression, Polynomial Regression 27/52

Page 28: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

Polynomial Regression in RFitted Model – Quadratic

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●●●●●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

0 100 200 300

−20

−10

010

20

Montreal Temp. −− January 1 to December 31, 1961

Day of Year

Tem

pera

ture

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●●●●

●●

●●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0 100 200 300−

15−

10−

50

510

15

Day of Year

Res

idua

l

Introduction to Splines: Linear Regression, Polynomial Regression 28/52

Page 29: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

Exercises

1. Montreal Temperature Data – Jan. 1 to Dec. 31, 1961File: Intro to splines\Exercises\montreal temp 3.RUse the provided code to fit polynomial regression models ofvarying degree to the data for all of 1961. Models of differentdegree are constructed by setting the variable D (e.g., D=2produces a quadratic model). What is the minimal degreerequired for the model to fit well?

2. Montreal Temperature Data – Jan. 1, 1961, to Dec. 31, 1962File: Intro to splines\Exercises\montreal temp 4.RRepeat this exercise using the data from both 1961 and 1962.

Introduction to Splines: Linear Regression, Polynomial Regression 29/52

Page 30: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

An Introduction to Bayesian Inference

2 Smoothing SplinesSimple SplinesB-splinesOverfitting and Smoothness

Page 31: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

An Introduction to Bayesian Inference

2 Smoothing SplinesSimple SplinesB-splinesOverfitting and Smoothness

Page 32: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

SplinesMotivation

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●●●

●●

●●

●●●●●●●

●●

●●●

●●●●●

●●●●

●●●

●●●

●●

●●●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●●●

●●

●●●●●

●●●●

●●●●●

●●

●●

●●●●

●●

●●●

●●●

●●

●●●

●●●●●

●●●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●●

●●●●

●●●●

●●

●●

●●

●●

●●

●●●●

●●●

●●●

●●

●●

●●

0 200 400 600

−20

−10

010

20Montreal Temp. −− January 1 to December 31, 1962

Day of Year

Tem

pera

ture

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●●●●

●●

●●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

0 200 400 600

−15

−10

−5

05

10

Day of Year

Res

idua

lHow is the temperature changing in the spring of 1962?

y =− 7.6− 8.3x − 0.3x2 − 5.2× 104x−3 + 4.4× 10−6x4

− 2.1× 10−8x5 + 6.0× 10−11x6 − 8.9× 10−14x7 + 5.5× 10−17x8

Introduction to Splines: Smoothing Splines, Simple Splines 32/52

Page 33: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

SplinesA Linear Spline for the Montreal Temperature Data

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●●●

●●

●●

●●●●●●●

●●

●●●

●●●●●

●●●●

●●●

●●●

●●

●●●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●●●

●●

●●●●●

●●●●

●●●●●

●●

●●

●●●●

●●

●●●

●●●

●●

●●●

●●●●●

●●●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●●

●●●●

●●●●

●●

●●

●●

●●

●●

●●●●

●●●

●●●

●●

●●

●●

0 200 400 600

−20

−10

010

20

Montreal Temp. −− January 1 to December 31, 1962

Day of Year

Tem

pera

ture

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●●●●

●●

●●

●●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

0 200 400 600

−15

−10

−5

05

10

Day of Year

Res

idua

l

How is the temperature changing in the spring of 1962?

y = −144.5 + .3x

Introduction to Splines: Smoothing Splines, Simple Splines 33/52

Page 34: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

SplinesLinear Splines

DefinitionA linear spline is a continuous function formed by connecting linearsegments. The points where the segments connect are called theknots of the spline.

Introduction to Splines: Smoothing Splines, Simple Splines 34/52

Page 35: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

SplinesHigher Order Splines

DefinitionA spline of degree D is a function formed by connectingpolynomial segments of degree D so that:

I the function is continuous,

I the function has D − 1 continuous derivatives, and

I the Dth derivative is constant between knots.

Introduction to Splines: Smoothing Splines, Simple Splines 35/52

Page 36: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

Simples SplinesThe Truncated Polynomials

DefinitionThe truncated polynomial of degree D associated with a knot ξk isthe function which is equal to 0 to the left of ξk and equal to(x − ξk)D to the right of ξk .

(x − ξk)D+ =

{0 x < ξk(x − ξk)D x ≥ ξk

The equation for a spline of degree D with K knots is:

y = β0 +D∑

d=1

βdxd +K∑

k=1

bk(x − ξk)D+

Introduction to Splines: Smoothing Splines, Simple Splines 36/52

Page 37: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

Simple SplinesThe Design Matrix

The design matrix for a spline of degree D with K knots is the nby 1 + D + K matrix with entries:

1 x1 x21 · · · xD

1 (x1 − ξ1)D+ · · · (x1 − ξK )D

+

1 x2 x22 · · · xD

2 (x2 − ξ1)D+ · · · (x2 − ξK )D

+

1 x3 x23 · · · xD

3 (x3 − ξ1)D+ · · · (x3 − ξK )D

+...

1 xn x2n · · · xD

n (xn − ξ1)D+ · · · (xn − ξK )D

+

Introduction to Splines: Smoothing Splines, Simple Splines 37/52

Page 38: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

Simple Splines in RThe Design Matrix

After defining the degree and the locations of the knots, the designmatrix can be generated with the functions outer and cbind:

> D = 3> K = 5> knots = 730 * (1:K)/(K+1)> X1 = outer(data$x,1:D,"^")> X2 = outer(data$x,knots ,">") *

outer(data$x,knots ,"-")^D> X = cbind(X1 ,X2)> round(X[c(1 ,150 ,300) ,1:5] ,1)

[,1] [,2] [,3] [,4] [,5][1,] 1 1 1 0.0 0[2,] 150 22500 3375000 22745.4 0[3,] 300 90000 27000000 5671495.4 181963>

Introduction to Splines: Smoothing Splines, Simple Splines 38/52

Page 39: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

Simple Splines in RFitting the Spline Model

lmfit = lm(y~X,data=data)

Introduction to Splines: Smoothing Splines, Simple Splines 39/52

Page 40: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

Simple Splines in RFitted Cubic Spline

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●●●

●●

●●

●●●●●●●

●●

●●●

●●●●●

●●●●

●●●

●●●

●●

●●●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●●●

●●

●●●●●

●●●●

●●●●●

●●

●●

●●●●

●●

●●●

●●●

●●

●●●

●●●●●

●●●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●●

●●●●

●●●●

●●

●●

●●

●●

●●

●●●●

●●●

●●●

●●

●●

●●

0 200 400 600

−20

−10

010

20

Montreal Temp. −− January 1 to December 31, 1962

Day of Year

Tem

pera

ture

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●●●●

●●

●●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

0 200 400 600−

15−

10−

50

510

Day of Year

Res

idua

l

Introduction to Splines: Smoothing Splines, Simple Splines 40/52

Page 41: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

Exercises

1. Montreal Temperature Data – Jan. 1 to Dec. 31, 1961File: Intro to splines\Exercises\montreal temp 5.RUse the code provided to fit splines of varying degree and withdifferent numbers of knots to the data from 1961 and 1962.

Introduction to Splines: Smoothing Splines, Simple Splines 41/52

Page 42: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

An Introduction to Bayesian Inference

2 Smoothing SplinesSimple SplinesB-splinesOverfitting and Smoothness

Page 43: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

The B-Spline BasisTroubles with Truncated Polynomials

Splines computed from the truncated polynomials may benumerically unstable because:

I the values in the design matrix may be very large, and

I the columns of the design matrix may be highly correlated.

Introduction to Splines: Smoothing Splines, B-splines 43/52

Page 44: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

The B-spline Basis in RGenerating the Design Matrix and Fitting the Model

The B-spline design matrix can be constructed via the function bsprovided by the splines library:

> library(splines)> D = 3> K = 5> knots = 730 * (1:K)/(K+1)> X = bs(data$x,knots=knots ,

degree=D,intercept=TRUE)> lmfit = lm(y~X-1,data=data)>

Introduction to Splines: Smoothing Splines, B-splines 44/52

Page 45: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

The B-spline Basis in RFitted Cubic B-spline Model

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●●●

●●

●●

●●●●●●●

●●

●●●

●●●●●

●●●●

●●●

●●●

●●

●●●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●●●

●●

●●●●●

●●●●

●●●●●

●●

●●

●●●●

●●

●●●

●●●

●●

●●●

●●●●●

●●●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●●

●●●●

●●●●

●●

●●

●●

●●

●●

●●●●

●●●

●●●

●●

●●

●●

0 200 400 600

−20

−10

010

20

Montreal Temp. −− January 1, 1961, to December 31, 1962

Day of Year

Tem

pera

ture

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●●●●

●●

●●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

0 200 400 600−

15−

10−

50

510

Day of Year

Res

idua

l

Introduction to Splines: Smoothing Splines, B-splines 45/52

Page 46: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

Exercises

1. Montreal Temperature Data – Jan. 1 to Dec. 31, 1961File: Intro to splines\Exercises\montreal temp 6.RFit B-splines to the data from 1961 and 1962 using the codein the file. Increase the number of knots to see how thisaffects the fit of the curve. What happens when the numberof knots is very large, say K = 50?

Introduction to Splines: Smoothing Splines, B-splines 46/52

Page 47: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

An Introduction to Bayesian Inference

2 Smoothing SplinesSimple SplinesB-splinesOverfitting and Smoothness

Page 48: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

Overfitting and SmoothnessMotivation

A cubic spline with 50 knots:

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●●●

●●

●●

●●●●●●●

●●

●●●

●●●●●

●●●●

●●●

●●●

●●

●●●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●●●

●●

●●●●●

●●●●

●●●●●

●●

●●

●●●●

●●

●●●

●●●

●●

●●●

●●●●●

●●●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●●

●●●●

●●●●

●●

●●

●●

●●

●●

●●●●

●●●

●●●

●●

●●

●●

0 200 400 600

−20

−10

010

20

Montreal Temp. −− January 1, 1961, to December 31, 1962

Day of Year

Tem

pera

ture

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

Introduction to Splines: Smoothing Splines, Overfitting and Smoothness 48/52

Page 49: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

Overfitting and SmoothnessKnot Selection

Concept

The shape of a spline can be controlled by carefully choosing thenumber of knots and their exact locations in order to:

1. allow flexibility where the trend changes quickly, and

2. avoid overfitting where the trend changes little.

Challenge

Choosing the number of knots and their location is a very difficultproblem to solve.

Introduction to Splines: Smoothing Splines, Overfitting and Smoothness 49/52

Page 50: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

Overfitting and SmoothnessPenalization

Concept

We can also balance overfitting and smoothness by controlling thesize of the spline coefficients.

Introduction to Splines: Smoothing Splines, Overfitting and Smoothness 50/52

Page 51: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

Overfitting and SmoothnessPenalization for Truncated Polynomials

Penalization for the Linear Spline

I Consider the equation for each segment of the spline:

(0, ξ1) : y = β0 + β1 x(ξ1, ξ2) : y = (β0 − b1ξ1) + (β1 + b1) x(ξ2, ξ3) : y = (β0 − b1ξ1 − b2ξ2) + (β1 + b1 + b2) x

I The spline is smooth if b1, b2, . . . , bK are all close to 0.

Penalized Least Squares

PSS =n∑

i=1

(yi − yi )2 + λ

K∑k=1

b2k

Introduction to Splines: Smoothing Splines, Overfitting and Smoothness 51/52

Page 52: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

Overfitting and SmoothnessPenalization for the B-spline Basis

Penalization for the B-spline

The spline is smooth if b1, b2, . . . , bK are all close to each other.(But not necessarily close to 0.)

Penalized Least Squares

PSS =n∑

i=1

(yi − yi )2 + λ

K∑k=3

((bk − bk−1)− (bk−1 − bk−2))2

Introduction to Splines: Smoothing Splines, Overfitting and Smoothness 52/52

Page 53: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

Overfitting and SmoothnessA Penalized Cubic B-spline

A penalized cubic B-spline with 50 knots and λ = 5:

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●●●

●●

●●

●●●●●●●

●●

●●●

●●●●●

●●●●

●●●

●●●

●●

●●●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●●●

●●

●●●●●

●●●●

●●●●●

●●

●●

●●●●

●●

●●●

●●●

●●

●●●

●●●●●

●●●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●●

●●●●

●●●●

●●

●●

●●

●●

●●

●●●●

●●●

●●●

●●

●●

●●

0 200 400 600

−20

−10

010

20

Montreal Temp. −− January 1, 1961, to December 31, 1962

Day of Year

Tem

pera

ture

●●

●●●●●

●●

●●●●●●●

Introduction to Splines: Smoothing Splines, Overfitting and Smoothness 53/52

Page 54: An Introduction to Splines - Simon Fraser Universitypeople.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to Splines 1 Linear Regression Simple Regression

Exercises

1. Montreal Temperature Data – Jan. 1 to Dec. 31, 1961File: Intro to splines\Exercises\montreal temp 7.RFit penalized cubic B-splines to the Montreal temperaturedata for 1961 and 1962 using the provided code.

Introduction to Splines: Smoothing Splines, Overfitting and Smoothness 54/52