doubly cyclic smoothing splines and analysis of seasonal...

37
Boston-Keio Workshop 2016 . . . Doubly Cyclic Smoothing Splines and Analysis of Seasonal Daily Pattern of CO2 Concentration in Antarctica Mihoko Minami Keio University, Japan August 15, 2016 Joint work with Ryo Kiguchi CO2 Data were provided by National Institute of Polar Research 1 / 37

Upload: others

Post on 24-Oct-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

  • Boston-Keio Workshop 2016

    .

    .

    . ..

    .

    .

    Doubly Cyclic Smoothing Splines andAnalysis of Seasonal Daily Pattern of CO2 Concentration

    in Antarctica

    Mihoko Minami

    Keio University, Japan

    August 15, 2016

    Joint work with Ryo Kiguchi

    CO2 Data were provided by National Institute of Polar Research

    1 / 37

  • Hourly observations of CO2 concentration at Syowastation in Antarctica (1984/ 2/ 3 - 2009/12/31)

    25+ years

    (ppm)

    1year 5days

    2 / 37

  • Hourly observations of CO2 concentration at Syowastation in Antarctica

    The data clearly show

    strong temporal trend

    strong seasonal variation

    We are also interested to know if there is

    a daily pattern ? If so, does it vary seasonally ?

    an effect due to wind speed? If so, does it vary seasonally ?

    an effect due to wind direction?

    In this talk, we introduce the method to analyze a seasonal daily pattern,doubly cyclic smoothing splines, and show the results of analysis thatgive answers to the above questions.

    3 / 37

  • Outline

    .. .1 Cyclic cubic smoothing splines

    A cyclic cubic spline functionCyclic cubic smoothing splinesSmoothing mechanism of cyclic cubic smoothing splines

    .. .2 A tensor product method: an extenstion to a multivariate smoothing

    methodRoughness penalty for the tensor product method

    .. .3 Doubly cyclic cubic smoothing splines

    What do doubly cyclic cubic smoothing splines do?Wiggly components are shrunk more

    .. .4 Analysis of CO2 concentration at Syowa station in Antarctica

    Seasonal variation of daily PatternConfidence interval curves for daily patternModel selectionSeasonal change of wind speed effect

    .. .5 Conclusions

    4 / 37

  • 1. Cyclic cubic smoothing splines

    5 / 37

  • Cyclic cubic smoothing splines

    The cyclic cubic smoothing spline is a smoothing method toestimate periodic variation such as daily or annual pattern of timeseries observations.

    Day 1 Day2 Day 3 Day 4

    It fits a cyclic cubic spline function which is a periodic piece-wisecubic function with continuity up to the second derivative.

    6 / 37

  • A cyclic cubic spline function

    A cyclic cubic spline function g(t) is

    periodic When the period is T ,

    g(t+ kT ) = g(t) for k = 0,±1,±2, · · ·

    piece-wise cubic polynomialGiven knots t(0) < t(1) < · · · < t(K−1) < t(K) with t(K) − t(0) = T ,

    g(t) = fj(t) for t ∈ [t(j−1), t(j)), j = 1, 2, · · · ,K

    where fj(t)s are cubic polynomial functions.

    That is, for t ∈ [t(0), t(K)], it can be expressed as

    g(t) =

    K∑j=1

    I[t(j−1),t(j))fj(t)

    7 / 37

  • A cyclic cubic spline function

    A cyclic cubic spline function

    g(t) =

    K∑j=1

    I[t(j−1),t(j))fj(t)

    is also continuous up to the second derivativeFor j = 1, 2, · · · ,K − 1,

    fj(t(j)) = fj+1(t

    (j)), f ′j(t(j)) = f ′j+1(t

    (j)), f ′′j (t(j)) = f ′′j+1(t

    (j))

    The values at the both endpoints t(0) and t(K) are equal up tothe second derivative.

    f1(t(0)) = fK(t

    (K)), f ′1(t(0)) = f ′K(t

    (K)), f ′′1 (t(0)) = f ′′K(t

    (K))

    8 / 37

  • Cyclic cubic smoothing splines

    A cyclic cubic spline function is flexible.To avoid overfitting, we impose a roughness penalty.

    In a most simplified case, the model and object function are defined as:

    Model� �yi = g(ti) + ϵi, ϵi ∼ N(0, σ2), i = 1, · · · , n, i .i .d .

    where g(t) is a cyclic cubic spline function.� �Penalized squared errors� �

    Ωλ(g) =

    n∑i=1

    {yi − g(ti)}2 + λ∫ t(K)t(0)

    g′′(t)2dt, λ > 0 (1)

    � �λ is called a smoothing parameter.

    9 / 37

  • Example: Daily pattern of PM2.5 1/3

    Hourly observations of PM2.5 1/3 in air for 28 days at Fukuoka

    5 10 15 20

    2.70

    2.75

    2.80

    2.85

    2.90

    2.95

    3.00

    lambda/m = 8lambda/m=1.2

    In the left plot,

    black dots with solid lines depicthourly averages of PM2.5 1/3

    the red dashed curve is thefitted cyclic cubic splinefunction with λ = 8× 28

    the green dotted curve is thefitted cyclic cubic splinefunction with λ = 1.2× 28

    10 / 37

  • Smoothing mechanism of cyclic cubic smoothing splines

    In the following, we assume knots are evenly spaced with

    t(j) − t(j−1) = h for j = 1, 2, · · · ,K

    Function value parameterization

    We employ the function value parameterization to express cyclic cubicfunctions. For j = 1, 2, · · · ,K, let

    βj be the function value of g(t) at t(j), that is, βj = g(t

    (j)), and

    bj(t) be the corresponding cyclic cubic spline basis function withbj(t

    (i)) = δij for i = 1, 2, · · · ,K,so that a cyclic cubic spline function g(t) can be expressed as

    g(t) =K∑j=1

    βjbj(t) (2)

    11 / 37

  • Penalty term with function value parameterization

    The penalty term can be expressed as∫ t(K)t(0)

    g′′(t)2dx = βTDTB−1Dβ (3)

    where

    β = (β1, β2, · · · , βK)TB and D are cyclic band matrices,

    B =h

    6G(4, 1) and D =

    1

    hG(−2, 1)

    where G(a, b) denotes a cyclic band matrix

    G(a, b) =

    a b bb a b

    . . .. . .

    . . .

    b a bb b a

    12 / 37

  • Least penalized squared error estimate β̂

    Suppose now that

    all observations were made at knots

    at each knot, m observations were obtained.

    Let y denote the sample average vector at the knots. Then, we have

    n∑i=1

    {yi − g(ti)}2 = ∥y − 1m ⊗ y∥2 +m∥y − β∥2

    so that the minimization of penalized squared errors is equivalent to theminimization of

    S(β) = m∥y − β∥2 + λβTDTB−1Dβ (4)

    Least penalized squared error estimate� �β̂ = Hy where H =

    (IK +

    λ

    mDTB−1D

    )−1� �

    13 / 37

  • Eigenvalues and eigenvectors of G(a, b)

    For the even number of knots (K = 2q), eigenvalues of a cyclic bandmatrix G(a, b) with b > 0 are in descending order

    l1 = a+ 2b, l2j = l2j+1 = a+ 2b cos2πjk , l2q = a− 2b

    (j = 1, · · · , q − 1)and the corresponding eigenvectors are

    u1 =1√k(1, 1, · · · , 1, 1)T ,

    u2j =

    √2

    k

    cos(2πj 1k

    )...

    cos(2πj ik

    )...

    cos (2πj)

    ,u2j+1 =√

    2

    k

    sin(2πj 1k

    )...

    sin(2πj ik

    )...

    sin (2πj)

    , j = 1, 2, · · · , q − 1

    u2q =1√k(1,−1, · · · , 1,−1)T .

    14 / 37

  • Eigenvalues and eigenvectors of influence matrix H

    Recall that the estimate β̂ of the function values at knots is given by

    β̂ = Hy where

    (IK +

    λ

    mDTB−1D

    )−1Matrices D,B and IK share the same eigenvectors, so does H.

    The eigenvalues of the influence matrix H are given in descending order by

    γ1 = 1, γ2j = γ2j+1 =

    1 + λm · 12h3 ·(1− cos 2πj

    k

    )2(2 + cos

    2πj

    k

    )

    −1

    j = 1, · · · , q − 1,

    and γ2q =

    (1 +

    λ

    m· 48h3

    )−1

    15 / 37

  • Eigenvalues and eigenvectors of influence matrix Hγj

    5 10 15 20

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Eigen Values of Influence Matrix

    j

    Eig

    en V

    alue

    s

    lambda/m = 8lambda/m = 1.2

    β̂ =

    (Ik +

    λ

    mDTB−1D

    )−1y

    =

    k∑j=1

    γj(uj ,y) uj

    where γjs are eigenvalues and

    uj are eigenvectors of H.

    uj

    0 5 10 15 20

    0.15

    0.25

    i= 1

    0 5 10 15 20

    −0.

    3−

    0.1

    0.1

    0.3 i= 2

    0 5 10 15 20

    −0.

    3−

    0.1

    0.1

    0.3 i= 4

    0 5 10 15 20

    −0.

    3−

    0.1

    0.1

    0.3 i= 6

    0 5 10 15 20

    −0.

    3−

    0.1

    0.1

    0.3 i= 8

    0 5 10 15 20

    −0.

    3−

    0.1

    0.1

    0.3 i= 10

    The black solid curvesare cyclic cubic splinebasis functionscorresponding to uj .

    The red dashed curvesare cyclic cubic splinebasis functionsmultiplied by γj (=γjuj ).

    16 / 37

  • Smoothing mechanism of cyclic cubic smoothing splines

    5 10 15 20

    2.70

    2.75

    2.80

    2.85

    2.90

    2.95

    3.00

    lambda/m = 8lambda/m=1.2

    β̂ =

    k∑j=1

    γjujuTj

    y=

    k∑j=1

    γj(uj ,y) uj

    y =k∑

    j=1

    (uj ,y) uj

    The smoothing mechanism can be understood as followsIt decomposes the average observation vector y into

    the constant component (overall mean), andsin and cos components with frequencies 1 to q(= m/2).

    sin and cos components are shrunk. The higher the frequency is, themore the component is shrunk.The overall mean and shrunk components are summed up toproduce β̂

    17 / 37

  • 2. A tensor product method:

    an extenstion toa multivariate smoothing method

    18 / 37

  • A tensor product method :an extenstion to a bivariate smoothing method

    Suppose we have

    basis functions for a function space Ω1: a1(s), a2(s), · · · , aK1(s)

    basis functions for a function space Ω2: b1(t), b2(t), · · · , bK2(t).

    A tensor product method uses products of basis functions on Ω1 × Ω2

    ai(s)bj(t), i = 1, 2, · · · ,K1, j = 1, 2, · · · ,K2

    as its basis functions. Thus, a bivariate function for the tensor productmethod can be expressed as

    fst(s, t) =

    K1∑i=1

    K2∑j=1

    βijai(s)bj(t) (5)

    19 / 37

  • Roughness penalty for the tensor product method

    Roughness penality for a tensor product smoothing function is defined as

    J(fst) =

    ∫Ωs×Ωt

    λs

    (∂2fst∂s2

    )2+ λt

    (∂2fst∂t2

    )2ds dt.

    When knots are evenly spaced, the penalty term can be approximated as

    Penalty term for a tensor product smoothing function� �J(fst) ≈ λsβT (Ss ⊗ IKt)β + λtβT (IKs ⊗ St)β (6)

    where β is a vector of appropriately rearranged function values at grids.� �(Wood, 2006)

    20 / 37

  • 3. Doubly cyclic cubic smoothing splines

    21 / 37

  • Doubly cyclic cubic smoothing splines

    Doubly cyclic cubic smoothing splines are generated using a tensor productmethod with:

    - basis functions for a function space with a yearly period:fa1 , fa2 ,· · · , faKa

    - basis functions for a function space with a daily period:fd1 , fd2 ,· · · , fdKd .

    We start with a univariate function of time t,defined on a coil, that winds around a torus:

    f(t) =

    Ka∑i=1

    Kd∑j=1

    βijfai (t)f

    dj (t)

    Then, to have a function that is smooth intwo directions, we re-express this as:

    f̃(s, t) =

    Ka∑i=1

    Kd∑j=1

    βijfai (s)f

    dj (t)

    and consider penalty to this function.22 / 37

  • What does doubly cyclic cubic smoothing spline do?

    When

    knots are evenly spaced, and

    the numbers of observations are equal for all knots,

    then, original basis functions can be linearly transformed into

    Oorthogonal basis functions

    f cosij : cyclic cubic spline function whose values at knots are equal to

    cos (2πt(iha + jhd)) for i = 0, · · · q∗a, j = 0,±1, · · · ± q∗dfsinij : cyclic cubic spline function whose values at knots are equal to

    sin (2πt(iha + jhd)) for i = 0, · · · q∗a, j = 0,±1, · · · ± q∗d

    where q∗d ≤ Kd/2− 1, q∗a ≪ Ka/2− 1 and for ha = 1/Ka, hd = 1/Kd,

    These are the eigenvectors of the influence matrix for doubly cyclic cubicsmoothing splines.

    23 / 37

  • What do doubly cyclic cubic smoothing splines do?

    Estimated function� �f̂(t) =

    q∗a∑i=0

    q∗d∑j=−q∗d

    (1 + λa

    (1− cos 2πiha)2

    2 + cos 2πiha+ λd

    (1− cos 2πjhd)2

    2 + cos 2πjhd

    )−1

    ×

    (< ucosij ,y >

    |ucosij |2f cosij +

    < usinij ,y >

    |usinij |2f sinij

    )� �where q∗d ≤ Kd/2− 1, q∗a ≪ Ka/2− 1 and for ha = 1/Ka, hd = 1/Kd,

    ucosij : vectors of values of cos (2πt(iha + jhd)) at knots

    usinij : vectors of values of sin (2πt(iha + jhd)) at knots

    f cosij : cyclic cubic spline function with cos (2πt(iha + jhd)) as values at knots

    f sinij : cyclic cubic spline function with sin (2πt(iha + jhd)) as values at knots

    y : vector of the averages at knots

    24 / 37

  • Wiggly components are shrunk more

    The doubly cyclic cubic smoothing spline shrinks the components of

    basis function with values at knots cos (2πt(iha ± jhd)), andbasis function with values at knots sin (2πt(iha ± jhd))

    by multiplying them by the shrinkage rate(1 + λa

    (1− cos 2πiha)2

    2 + cos 2πiha+ λd

    (1− cos 2πjhd)2

    2 + cos 2πjhd

    )−1and sum them up.

    0 5 10 15 200.0

    0.2

    0.4

    0.6

    0.8

    1.0

    −15−10

    −5 0

    5 10

    15

    i(Annual)

    j(Dai

    ly)

    Shr

    inka

    ge r

    ate

    The larger i or j is, the morewiggly the basis function is ineither direction.

    The more wiggly the basis func-tion is in either direction, themore its coefficient is shrunk.

    25 / 37

  • 4. Analysis of CO2 concentrationat Syowa station in Antarctica

    26 / 37

  • A model with temporal trend and seasonal daily pattern

    We start with a linear additive model for CO2 concentration with temporaltrend and seasonal daily pattern as explanatory terms.� �

    Model 1: Y = ftr(t) + fday,year(t) + ϵ� �where

    Y : CO2 concentration

    ftr(t) : a cubic spline function of time t for temporal trend

    fday,year(t) : a doubly cyclic cubic spline function of time t with daily

    and annual cyclesϵ : random error with variance σ2

    We used R package mgcv by Simon Wood for analysis.27 / 37

  • Temporal trend and annual variation

    Temporal trend is almostlinear

    CO2 concentration hasincreased 40ppm in 25 years

    The range of annualvariation is 1.1ppm

    CO2 concentration is low insummer and high in winter

    28 / 37

  • Seasonal variation of daily Pattern in CO2 concentration

    Daily pattern of CO2 concentration has a seasonal variationIt has the largest daily variation (0.017ppm) in summer (January 4th)

    29 / 37

  • Confidence interval curves for daily pattern

    95% confidence interval curves for January 4th and July 4th.

    0 5 10 15 20 25 30

    −0.

    015

    −0.

    010

    −0.

    005

    0.00

    00.

    005

    0.01

    00.

    015

    Time

    CO

    2 co

    ncen

    trat

    ion

    January 4July 4

    Hourly variation is significant in summer (January 4th),but not significant in winter (July 4th).

    30 / 37

  • Effects of wind speed and direction

    Wind speed might have an effect on CO2 concentration.The effect of wind speed might depend on the wind direction.� �

    Model 2: y = ftr(t) + fday,year(t) + fws,wd(s, d) + ϵ� �where

    Y : Co2 concentration

    ftr(t) : a cubic spline function of time t for temporal trend

    fday,year(t) : a doubly cyclic cubic spline function of time t with daily

    and annual cyclesfws,wd(s, d) : tensor product of a cubic spline function of wind speed s

    and a cyclic spline function of wind direction dϵ : random error with variance σ2

    31 / 37

  • Seasonal effect of wind speed

    The effect of wind speed might differ by season.� �Model 3: y = ftr(t) + fday,year(t) + fws,year(s, t) + ϵ� �

    where

    Y : Co2 concentration

    ftr(t) : a cubic spline function of time t for temporal trend

    fday,year(t) : a doubly cyclic cubic spline function of time t with daily

    and annual cyclesfws,year(s, t) : tensor product of a cubic spline function of wind speed s

    and a cyclic cubic spline function with annual cycle of tϵ : random error with variance σ2

    32 / 37

  • Model selection for CO2 concentration

    We fitted the following three models and compared AIC

    model formula AIC

    model 1 y = ftr(t) + fday,year(t) + ϵ 19942.5

    model 2 y = ftr(t) + fday,year(t) + fws,wd(s, d) + ϵ 18725.6

    model 3 y = ftr(t) + fday,year(t) + fws,year(s, t) + ϵ 16189.5

    where

    ftr(t) : a cubic spline function of time tfday,year(t) : a doubly cyclic cubic spline function of time t with daily

    and annual cyclesfws,wd(s, d) : tensor product of a cubic spline function of wind speed s

    and a cyclic spline function of wind direction dfws,year(s, t) : tensor product of a cubic spline function of wind speed s

    and a cyclic cubic spline function with annual cycle of t33 / 37

  • Seasonal change ofwind speed effect

    January - March

    April - June

    July - September

    October - December

    0 100 200 300 400 500

    −0.

    2−

    0.1

    0.0

    0.1

    0.2

    January 4

    Wind Speed

    CO

    2 co

    ncen

    trat

    ion

    −0.

    2−

    0.1

    0.0

    0.1

    0.2

    0 100 200 300 400 500

    −0.

    2−

    0.1

    0.0

    0.1

    0.2

    February 4

    Wind Speed

    CO

    2 co

    ncen

    trat

    ion

    −0.

    2−

    0.1

    0.0

    0.1

    0.2

    0 100 200 300 400 500

    −0.

    2−

    0.1

    0.0

    0.1

    0.2

    March 4

    Wind Speed

    CO

    2 co

    ncen

    trat

    ion

    −0.

    2−

    0.1

    0.0

    0.1

    0.2

    0 100 200 300 400 500

    −0.

    2−

    0.1

    0.0

    0.1

    0.2

    April 4

    Wind Speed

    CO

    2 co

    ncen

    trat

    ion

    −0.

    2−

    0.1

    0.0

    0.1

    0.2

    0 100 200 300 400 500

    −0.

    2−

    0.1

    0.0

    0.1

    0.2

    May 4

    Wind Speed

    CO

    2 co

    ncen

    trat

    ion

    −0.

    2−

    0.1

    0.0

    0.1

    0.2

    0 100 200 300 400 500

    −0.

    2−

    0.1

    0.0

    0.1

    0.2

    June 4

    Wind Speed

    CO

    2 co

    ncen

    trat

    ion

    −0.

    2−

    0.1

    0.0

    0.1

    0.2

    0 100 200 300 400 500

    −0.

    2−

    0.1

    0.0

    0.1

    0.2

    July 4

    Wind Speed

    CO

    2 co

    ncen

    trat

    ion

    −0.

    2−

    0.1

    0.0

    0.1

    0.2

    0 100 200 300 400 500

    −0.

    2−

    0.1

    0.0

    0.1

    0.2

    August 4

    Wind Speed

    CO

    2 co

    ncen

    trat

    ion

    −0.

    2−

    0.1

    0.0

    0.1

    0.2

    0 100 200 300 400 500

    −0.

    2−

    0.1

    0.0

    0.1

    0.2

    September 4

    Wind Speed

    CO

    2 co

    ncen

    trat

    ion

    −0.

    2−

    0.1

    0.0

    0.1

    0.2

    0 100 200 300 400 500

    −0.

    2−

    0.1

    0.0

    0.1

    0.2

    October 4

    Wind Speed

    CO

    2 co

    ncen

    trat

    ion

    −0.

    2−

    0.1

    0.0

    0.1

    0.2

    0 100 200 300 400 500

    −0.

    2−

    0.1

    0.0

    0.1

    0.2

    November 4

    Wind Speed

    CO

    2 co

    ncen

    trat

    ion

    −0.

    2−

    0.1

    0.0

    0.1

    0.2

    0 100 200 300 400 500

    −0.

    2−

    0.1

    0.0

    0.1

    0.2

    December 4

    Wind Speed

    CO

    2 co

    ncen

    trat

    ion

    −0.

    2−

    0.1

    0.0

    0.1

    0.2

    34 / 37

  • Conclusions

    We proposed the doubly cyclic cubic smoothing spline method.

    For a simple model, the eigenvalues and eigenvectors of the influencematrix can be explicitly expressed with the values of trigonometricfunctions with different frequencies.

    This expression shows that the more wiggly the basis function is, themore its coefficient is shrunk.

    We analyzed CO2 concentration at Syowa station in Antarctica usingthis method. CO2 concentration has a strong temporal trend andannual variation.

    Daily pattern of CO2 concentration has a seasonal variation. Hourlyvariation is significant in summer (January), but not significant inwinter (July).

    The effect of wind speed also has annual variation.

    Flexible regression models using nonparametric smoothing methodsenable us to analyze the data of interest more precisely.

    35 / 37

  • Reference

    ...1 Green, P.J. and Silverman, B.W. (1993) Nonparametric Regressionand Generalized Linear Models: A roughness penalty approach,Chapman&Hall/CRC.

    ...2 Wood, S.N. (2003) Thin plate regression splines. J. R. Statist. Soc.B 65, 95-114

    ...3 Wood, S.N. (2006a) Low-Rank Scale-Invariant Tensor ProductSmooths for Generalized Additive Mixed Models. Biometrics 62(4):1025-1036

    ...4 Wood, S.N. (2006b) Generalized Additive Models: An Introductionwith R, Chapman Hall/CRC.

    ...5 Kiguchi, R. and Minami, M. (2012) Cyclic Cubic Regression SplineSmoothing and Analysis of CO2 Data at Showa Station in Antarctica,Proceedings of International Biometric Conference 2012.

    ...6 Kiguchi, R. (2014) Doubly Cyclic Smoothing Splines and Analysis ofCO2 Data at Syowa Station in Antarctica, Master’s thesis, KeioUniversity.

    36 / 37

  • Thank you for your attention!

    37 / 37

    OutlineCyclic cubic smoothing splinesA cyclic cubic spline functionCyclic cubic smoothing splinesSmoothing mechanism of cyclic cubic smoothing splines

    A tensor product method: an extenstion to a multivariate smoothing methodRoughness penalty for the tensor product method

    Doubly cyclic cubic smoothing splinesWhat do doubly cyclic cubic smoothing splines do?Wiggly components are shrunk more

    Analysis of CO2 concentration at Syowa station in AntarcticaSeasonal variation of daily PatternConfidence interval curves for daily patternModel selectionSeasonal change of wind speed effect

    Conclusions