hiba nassar - diva portal

Hiba Nassar

Functional Hodrick-Prescott Filter

Licentiate Thesis

Mathematics

A thesis for the Degree of Licentiate of Philosophy in Mathematics.

Functional Hodrick-Prescott FilterHiba Nassar

Linnæus UniversitySchool of Computer Science, Physics and MathematicsSE-351 95 Växjö, Swedenhttp://www.lnu.se

To Rani, and my parents

AbstractThe study of functional data analysis is motivated by their applications invarious fields of statistical estimation and statistical inverse problems.In this thesis we propose a functional Hodrick-Prescott filter. This filteris applied to functional data which take values in an infinite dimensionalseparable Hilbert space. The filter depends on a smoothing parameter. Inthis study we characterize the associated optimal smoothing parameter whenthe underlying distribution of the data is Gaussian. Furthermore we extendthis characterization to the case when the underlying distribution of the datais white noise.

Keywords: Inverse problems, adaptive estimation, Hodrick-Prescott filter,smoothing, trend extraction, Gaussian measures on a Hilbert space.

v

AcknowledgmentsI would like to express my gratitude to all those who gave me the possibilityto complete this licentiate. I want to thank the Department of Mathematicsat Linnaeus University who gave me the opportunity to continue my study.furthermore I am grateful to my supervisor, Docent Astrid Hilbert, whosegenerous guidance and support made it much interesting to work.I would like also to thank my second supervisor, Professor Boualem Djehiche,for giving his time and kind advice. He made it possible for me to work ona topic that is of great interest to me.Many friends have helped me stay sane through these two and a half years.Their support and care helped me overcome setbacks and stay focused on mystudy. I would like to express my special thanks to Haidar Al-Talibi.I am also grateful to my family in Syria whose love, support and warmth stillreach me even though they are far away. My sincere thanks to my belovedhusband, Rani Basna, provided the love, support, encouragement and helpthat kept me going.

vi

Contents

Abstract v

Acknowledgments vi

1 Theoretical background 11.1 Singular value decomposition of a compact operator. . . . . . 11.2 The Moore-Penrose Pseudo Inverse. . . . . . . . . . . . . . . 21.3 Functional Data Analysis . . . . . . . . . . . . . . . . . . . . 31.4 Hilbert Space-Valued Gaussian Random Variable. . . . . . . . 31.5 Hilbert Scales. . . . . . . . . . . . . . . . . . . . . . . . . . . 41.6 Generalized Gaussian Random Variables. . . . . . . . . . . . 51.7 Hodrick-Prescott Filter History. . . . . . . . . . . . . . . . . . 6References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Results 11

3 Papers 143.1 Functional Hodrick-Prescott Filter . . . . . . . . . . . . . . . 16

vii

Chapter 1

Theoretical background

In this chapter we shall introduce the main mathematical tools that are usedin the thesis.

1.1 Singular value decomposition of a compactoperator.

Let K : X → Y be a compact operator, where X ,Y are separable Hilbertspaces (i.e., Hilbert spaces with countable bases). Let K∗ denote the adjointoperator of K ( i.e. (Kx, y) = (x,K∗y) for all x ∈ X and y ∈ Y). A system(λn, en, dn) with n ∈ N, λn > 0, en ∈ X , dn ∈ Y, is called singular valuedecomposition (SVD) of a compact operator K if the following conditionshold:The sequence λn consists of non-negative numbers such that λ2n arethe non-zero eigenvalues of the self adjoint operator K∗K arranged in non-increasing order and repeated according to their multiplicity, the en, en ∈ Xare the eigenvectors of the operator K∗K corresponding to λ2n, and thesequence dn is defined in terms of en as

dn =Ken‖Ken‖

.

The sequence dn is a complete orthonormal system of eigenvectors of theoperator KK∗. Moreover,

Ken = λndn, K∗dn = λnen, n ∈ N,

and the following decompositions hold:

Kx =

∞∑n=1

λn〈x, en〉dn, x ∈ X ,

and

K∗y =

∞∑n=1

λn〈y, dn〉en, y ∈ Y.

1

Chapter 1. Theoretical background

The infinite series in these decompositions converge in the norm of the Hilbertspaces X and Y, respectively.

1.2 The Moore-Penrose Pseudo Inverse.The Moore-Penrose pseudo-inverse is a general way to find the solution tothe following linear equation

Kx = y,

where K : X → Y is a bounded linear operator.

Definition 1.2.1. • An element x ∈ X that minimizes the functional

J (x) = ‖Kx− y‖2

is called a least-squares solution of Kx = y.

• x ∈ X is called best-approximate solution of Kx = y if x is a least-squares solution of Kx = y and

‖x‖ = inf‖z‖where z is least-squarres solution of Kx = y

holds.

To this end we introduce the notations D(A) and ker(A) for the domainof definition of the operator A and the kernel of A, respectively. Now thedefinition of Moore-Penrose pseudo-inverse is provided, see [1].

Definition 1.2.2. The Moore-Penrose pseudo-inverse of K is the operatorK† : D(K†) → ker(K)⊥, which associates with each y the unique minimumleast square solution of K, and satisfies all of the following four criteria:

• KK†K = K,

• K†KK† = K†,

• (KK†)∗ = KK†,

• (K†K)∗ = K†K.

If K is a compact linear operator with SVD (λn, en, dn) and y ∈ D(K†),then the vector

∞∑n=1

1

λn〈y, dn〉en

2

1.3. Functional Data Analysis

is well defined and is a least squares solution.Therefore

x = K†y =

∞∑n=1

1

λn〈y, dn〉en.

Special case: Let y ∈ D(K†) then multiply both sides of the equation Kx = yby K∗, it gives

K∗Kx = K∗y.

If K∗K is invertible ( which is the case in this thesis) then the Moore-Penrosepseudo-inverse of K is K† = (K∗K)−1K∗.

1.3 Functional Data Analysis

Functional data analysis is a new direction of Statistics that started in theearly 1990s, but it’s foundation go back to much earlier times where the the-ory of operators in Hilbert space and functional analysis had achieved somedevelopments.A functional data is not a set of single observation but rather a set of mea-surements depending on a parameter set in a continuous way that are to beregarded as a single curve or image. This is due to the ability of moderninstruments to perform tightly spaced measurements so that these data canbe seen as samples of curves. In the most simple setting, the sample consistsof N curves X1(t), X2(t), ..., XN (t), t ∈ τ . The set τ is typically an intervalof a line.However, in many different applications τ might be a subset of plane, orsphere. In those cases, the data are surfaces over a region, or more generalfunctions over some domain, hence the term functional data.

1.4 Hilbert Space-Valued Gaussian Random Vari-able.

Definition 1.4.1. Let (Ω,F ,P) be a probability space and H a Hilbert spacewith inner product 〈·, ·〉. A measurable function

X : Ω→ H

is called a Hilbert space-valued random variable.A random variable X taking values in a separable Hilbert space H is said tobe Gaussian if for every y ∈ H the law of 〈y,X〉 is Gaussian.

3


Gaussian random variables are well defined by their meanm = E(X) ∈ Hand their covariance operator Σ : H → H defined by

〈y,Σz〉 = E[(y,X −m)(X −m, z)].

The mean and the covariance operator are uniquely determined for thatHilbert space-valued Gaussian random variable is denoted by N(m,Σ).The covariance operator of a Gaussian random variable has special propertieswhich can be characterized by the following lemma.

Lemma 1.4.1. Let X be a Gaussian random variable on a separable Hilbertspace. Then the covariance operator Σ of X is self-adjoint, positive andtrace-class.

For more details see e.g. [19], [11].

1.5 Hilbert Scales.

Suppose H is a Hilbert space with inner product 〈·, ·〉H . Let L : H → Hbe a densely defined, unbounded, self-adjoint, positive operator, so that Lis closed and satisfies 〈Lx, y〉 = 〈x, Ly〉 for every x, y ∈ D(L) and thereexits a positive constant γ such that 〈Lx, x〉 ≥ γ ‖x‖2 for all x ∈ D(L). Let

M =∞⋂k=0

LkH =∞⋂k=0

D(Lk) be the set of all elements x for which all the

powers of L are defined. It is proved that this set is a dense set in H, see e.g.[5]. For all s ∈ R and all u, v ∈M, define the inner products

〈x, y〉s := 〈Lsx, Lsy〉H ,

and the corresponding norms

‖x‖s := ‖Lsx‖H .

The Hilbert space Hs denotes the completion of H with respect to the norm‖·‖s . The family (Hs)s∈R is called the Hilbert scale induced by L.A Hilbert scale (Hs)s∈R possesses the following properties (see [2], [5] and[18]) :

• Hs is compactly embedded in Hr if s ≥ r and is dense there.

• If s ≥ 0 then H−s is the dual space of Hs. H−s is equipped with thefollowing inner product:

〈x, y〉−s := 〈L−sx, L−sy〉H .

4

1.6. Generalized Gaussian Random Variables.

• Let −∞ < q < r < s < ∞ and x ∈ Hs . Then the interpolationinequality

‖x‖r ≤ ‖x‖s−rs−qq ‖x‖

r−qs−qs

holds.

1.6 Generalized Gaussian Random Variables.

Let N be the intersection of separable Hilbert spaces (Hs, ‖·‖s) i.e.

N =

∞⋂k=0

Hk

where Hs+1 ⊂ Hs and N is dense in each Hs. The topology of N is theprojective limit topology. Let N∗ be the dual of N equipped with the weaktopology. The space N∗ can be viewed as a union of the duals of Hn i.e.

N∗ =

∞⋃k=0

H−k.

Note: The family (Hs)s∈R can be chosen as a Hilbert scale.

Definition 1.6.1. Let (Ω,F ,P) be a probability space. A continuous mapping

η : N → L2(Ω,F ,P)

is called a generalized Gaussian random variable if the random variables u 7→η(u) are Gaussian for all u ∈ N.

The mean value and the covariance operator admit the following repre-sentations, respectively:

Eη(u) = m(u) = 〈u,m〉, m ∈ N∗,E ((η(u)−m(u)) (η(v)−m(v))) = 〈u,Σv〉, Σ : N → N∗.

Proposition 1.6.1. Let η be a generalized Gaussian random variable. Thenthere exists n0 ∈ N such that η belongs to H−n ⊂ N∗ with probability onefor all n ≥ n0. The covariance operator Σ extends continuously to a nuclearmapping

Σ : Hn → H−n, n ≥ n0.

For more details see [15] and [19].

5


1.7 Hodrick-Prescott Filter History.

In this chapter we give a brief history of the Hodrick-Prescott filter. Theprocedure for filtering a trend as a smooth curve from claims data has along history in actuarial science. The real starting point of this filter wasproposed by Leser (1961), building on the graduation method developed byWhittaker (1923) and Henderson (1924), later on this filter is introduced intoeconomics by Hodrick and Prescott in 1980 and 1997. Since that time theHodrick-Prescott filter is widely used in economics and finance to estimateand predict business cycles and trends in macroeconomic time series. For acomparison of different filters as regards their applications in data analysissee [16] and references therein.

Hodrick and Prescott define a trend y = (y1, y2, ..., yT ) of a time seriesx = (x1, x2, ..., xT ) as the minimizer of

T∑t=1

(xt − yt)2 + α

T∑t=3

(yt − 2yt−1 + yt−2)2, (1.1)

with respect to y, for an appropriately chosen positive parameter α, calledthe smoothing parameter. The first term in (1.1) measures a goodness-of-fitand the second term is a measure of the degree-of-smoothness which penal-izes decelerations in growth rate of the trend component.The program eViews recommends α = 1600 when the time series x representsthe log of U.S. real GNP. This value is also widely popular in the analysis ofquarterly macroeconomic data.Assuming the components of the residual (noise) u = x− y and the compo-nents of the signal v defined by

vt = yt − 2yt−1 + yt−2 := D2yt, t = 3, 4, ...T,

to be independent and Gaussian, where D2 stands for the second orderbackwards shift operator. The operator D2 can be written in vector formPy(t) = D2yt, t = 3, 4, ..., T, where the operator P is the following (T−2)×T -matrix

1 −2 1 0 ... ... 00 1 −2 1 ... ... 00 0 1 −2 1 ... 0

. . . . . . . . . . . .0 0 0 0 1 −2 1

.

Hence the time series of observations x satisfies the following linear mixedmodel:

6

1.7. Hodrick-Prescott Filter History.

x = y + u,

Py = v.(1.2)

where u ∼ N(0, σ2uIT ) and v ∼ N(0, σ2

vIT−2).

To find an explicit expression of the trend y(α, x) which minimizes (1.1),we introduce the following well known result (see e.g. [4]):

Proposition 1.7.1. Let M1,M2 be two non-negative matrices. If M1 +M2

is invertible. Then, for each fixed x, the minimizer

arg miny

[(x− y)

′M1(x− y) + y

′M2y

]is given by y(M1,M2, x) = (M1 +M2)−1M1x.

Since P is of rank T − 2, the matrix PP′is invertible. Applying Propo-

sition (1.7.1) we obtain the unique positive definite solution

y(α, x) = (IT + αP′P )−1x. (1.3)

Hodrick and Prescott (1980) and later Schlicht (2005) introduced a wayto estimate the smoothing parameter α. The optimal α is the one which letthe optimal solution y(α, x) in (1.3) be the best predictor of y given the timeseries x i.e.

y(α, x) = E[y|x]. (1.4)

In order to estimate the smoothing parameter, obviously the joint distri-bution of (x, y) is needed. That explains why Hodrick and Prescott assumethe Gaussian distributions for both the noise and the signal. Moreover, inthis case the conditional expectation can be computed explicitly which is notalways the case.Schlicht (2005) gave a characterization of the optimal smoothing parameterα as so-called noise-to-signal ratio i.e. the ratio of the variance of the noiseto the variance of the signal.

Depending on the criterion (1.4), Dermoune et al. (2007) derived a consis-tent estimator of the noise-to-signal ratio α, and non-asymptotic confidenceintervals for α given a confidence level.Furthermore Dermoune et al. (2009) suggest an optimality criterion forchoosing the smoothing parameter for the HP-filter. The smoothing pa-rameter α is chosen to be the one which minimizes the difference betweenthe optimal solution y(α, x) and the best predictor of trend y given the timeseries x i.e.

α∗ = arg minα>0‖E[y|x]− y(α, x)‖2 . (1.5)

7


Moreover, the optimality of this smoothing parameter has been studiedtwo different cases:

• under singular-value parametrization of the trend.

• under the initial-value parametrization of the trend.

In addition, Dermoune et al. suggested an extension to the HP filter tothe multivariate case, where they show that the possible optimal smoothingparameters that satisfy (1.5) constitute a whole set of pairs of positive definitematrices by using singular-value parametrization of the trend.

8

Bibliography

[1] Albert, E. (1972): Regression and the Moore-Penrose pseudoinverse. Vol-ume 94 of Mathematics in Science and Engineering, Academic Press.

[2] Bonic, R. (1967): Some Properties of Hilbert Scales. Proceedings of TheAmerican Mathematical Society - PROC AMER MATH SOC , vol. 18,no. 6.

[3] Dermoune, A., Djehiche, B. and Rahmania, N. (2008): Consistent Es-timator of the Smoothing Parameter in the Hodrick-Prescott Filter. J.JapanStatist. Soc., Vol. 38 (No. 2), pp.225-241.

[4] Dermoune, A., Djehiche, B. and Rahmania, N. (2009): Multivariate Ex-tension of the Hodrick-Prescott Filter-Optimality and Characterization.Studies in Nonlinear Dynamics & Econometrics, 13 , pp. 1-33.

[5] Engl, H. W., Hanke, M., Neubauer, A. (1996): Regularization of inverseproblems. Mathematics and its Applications, Vol. 375. Kluwer AcademicPublishers, Dordrecht.

[6] Ferraty, F. and Vieu, P. (2006): Nonparametric Functional Data Analysis:Methods, Theory, Applications and Implementations, Springer-Verlag,London, 2006.

[7] Hida T., (1980): Brownian Motion. Springer-Verlag, New York-Berlin.

[8] Hodrick, R. and Prescott, E. C. (1997): Postwar U.S. business cycles: Anempirical investigation. Journal of Money, Credit and Banking. 29(1),1-16.

[9] Kaipio, J. and Somersalo, E. (2004): Statistical and Computational In-verse Problems. Applied Mathematical Series, Vol. 160 Springer, Berlin.

[10] Kokoszka, P. (2012): International Scholarly Research Network. ISRNProbability and Statistics, Volume 2012, Article ID 958254, 30 pages

[11] Kuo, H.-H. (1975 ): Gaussian Measures in Banach Spaces. Lecture Notesin Mathematics, Vol. 463, Springer-Verlag, Berlin.

[12] Lehtinen, M.S., Päivärinta, L. and Somersalo, E. (1989): Linear inverseproblems for generalized random variables. Inverse Problems, 5:599-612.

9

Bibliography

[13] Levitin, D.J., Nuzzo, R.L., Vines, B.W. and Ramsay, J.O. ( 2007). Intro-duction to functional data analysis. Canadian Psychology, 48(3), 135-155.

[14] Martens, J.B. ( 2003). Image Technology Design: A Perceptual Ap-proach. Volume 735 of Kluwer international series in engineering andcomputer science, Springer.

[15] Müller, H.-G. and Stadtmüller, U. (2005): Generalized functional linearmodels, The Annals of Statistics (33) pp. 774-805.

[16] Pollock, D.S.G. (2000): Trend estimation and de-trending via rationalsquare-wave filters, Journal of Econometrics 99, 317-334.

[17] Ramsay, J. O. and Silverman, B. W. (1997): Functional Data Analysis,Springer-Verlag, New York.

[18] Reed, M. and Simon,B. (1980): Methods of Modern MathematicalPhysics, I, revised edition, Academic Press.

[19] Rozanov, Ju. A. (1968): Infinite-dimensional Gaussian distribution.Proc. Steklov Inst. Math. (108) (Engl. transl. 1971 (Providence RI:AMS)).

[20] Scherzer,O. (2010): Handbook of Mathematical Methods in Imaging.Springer Reference.

[21] Schlicht, E. (2005): Estimating the Smoothing Parameter in the So-Called Hodrick-Prescott Filter. J. Japan Statist. Soc., Vol. 35 No. 1,99-119.

[22] Skorohod A. V. (1974): Integration in Hilbert Spaces, Springer-Verlag,Berlin.

[23] Stuart, A.M. (2010): Inverse problems: a Bayesian perspective, ActaNumerica 19 (2010), 451-559.

10

Chapter 2

Results

In this chapter we present a summary of the results of this thesis.In this thesis an extension of Hodrick-Prescott filter is proposed in the casewhere we have functional data. The functional Hodrick-Prescott filter isdefined as a mixed model of the same form as (1.2) where the operator P isreplaced by a compact operator A between two Hilbert spaces :

x = y + u,

Ay = v,(2.1)

where x is a functional time series of observables, and the noise, u, as wellas the signal, v, are Gaussian with trace class covariance operators, i.e. u ∼N(0,Σu) and v ∼ N(0,Σv).The optimal smooth trend associated with x satisfies

‖x− y‖2H1+ 〈Ay,BAy〉H2

(2.2)

with respect to y, where B is a regularization smoothing operator instead ofthe smoothing parameter α in (1.1).The main result of this study is the characterization of the optimal smoothingoperator B which can be formalized in the following theorem, which is theinfinite dimensional extension of what Dermoune et al. did in (2009).

Theorem 2.0.2. Under some necessary assumptions, for all x ∈ H1, thesmoothing operator (which is linear, compact and injective)

Bh := (AA∗)−1AΣuA∗Σ−1v h, (2.3)

is the unique operator which satisfies

B = arg minB‖y(B, x)− E[y|x]‖H1

,

where the minimum is taken with respect to all linear bounded operators whichsatisfy a positivity condition.

Furthermore, we have

y(B, x)− E[y|x] = (IH1 −Π)(x− E[x]), (2.4)

11

Chapter 2. Results

and its covariance operator is

cov(y(B, x)− E[y|x]) = (IH1−Π)Σu. (2.5)

In particular,

E

(∥∥∥y(B, x)− E[y|x]∥∥∥2H1

)= trace ((IH1 −Π)Σu) , (2.6)

where Π := A∗(AA∗)−1A.

The second main result is to characterize the optimal smoothing operatorB when the covariance operators of the signal and the noise are non-traceclass, this includes the important case where the signal and the noise arewhite noise.Looking at these Gaussian variables as generalized random variables andunder some assumptions to find an appropriate Hilbert scale where the co-variance operators can be maximally extended to trace class operators on anappropriate domain. An explicit expression of the smoothing operator B isgiven in a form similar to (2.3) but in the appropriate domain.

The third result in this thesis is to show that the optimal smoothingoperator B reduces to the noise-to-signal ratio where the noise and the signalare white noise which coincide with the previous studies.

12

Chapter 3

Papers

14

Paper I

3.1 Functional Hodrick-Prescott Filter

Boualem Djehiche and Hiba Nassar

16

hiba nassar - diva portal

Documents