using a hilbert-schmidt svd for stable kernel...

Using a Hilbert-Schmidt SVD forStable Kernel Computations

Greg Fasshauer Mike McCourt

Department of Applied MathematicsIllinois Institute of Technology

Partially supported by NSF Grant DMS–1115392

Midwest Numerical Analysis DayUniversity of Chicago

May 25, 2013

Greg Fasshauer Hilbert-Schmidt SVD 1

Outline

1 Fundamental Problem

2 Hilbert-Schmidt SVD and General RBF-QR Algorithm

3 Implementation for Compact Matérn Kernels

4 Application 1: Basic Function Approximation

5 Application 2: Choosing a Good Shape Parameter via LOOCV

6 Application 3: Covariance Matrix Determinant

7 Summary


Fundamental Problem

Kernel-based Interpolation

Given data (x i , yi)Ni=1, use a data-dependent linear function space

sf (x) =N∑

j=1

cjK (x ,x j), x ∈ Ω ⊆ Rd

with K : Ω× Ω→ R a positive definite reproducing kernel.

To find cj solve the interpolation equations

sf (x i) = f (x i) = yi , i = 1, . . . ,N.

Leads to linear system Kc = y with symmetric positive definite – oftenill-conditioned – system matrix

Kij = K (x i ,x j), i , j = 1, . . . ,N.


Fundamental Problem

RBFs (positive definite kernels) are attractive to work with because

Gaussians (and certain other RBFs) provide arbitrarily high &dimension-independent convergence rates for sufficiently “nice”data (i.e., with low effective dimension and coming from a smoothfunction) [F., Hickernell & Wozniakowski (2012)],

numerical instability of interpolation/approximation algorithms canbe overcome by using a “better” basis[F. & McCourt (2012), Fornberg & Wright (2004)],

interpolants with certain “flat” RBFs converge to polynomial(spline) interpolants[Driscoll & Fornberg (2002), Song, Riddle, F. & Hickernell (2012)].


Fundamental Problem

Using a Better Basis to Ensure Stability

Well-known idea in approximation theory (e.g., B-splines as stablebasis for piecewise polynomial splines [Schumaker (1981)])

Idea appears also in the RBF literature [Beatson et al. (2001)]

New idea: use eigenfunction expansions of positive definitekernels

[F. & McCourt (2012), F. (2011a), F. (2011b)], appeared implicitly in[Fornberg & Piret (2008)]

Use connection to Green’s kernels [F. & Ye (2011), F. & Ye (2013)],their eigenfunction expansions and classical Sturm-Liouvilletheory


Hilbert-Schmidt SVD and General RBF-QR Algorithm

Hilbert-Schmidt Theory

We assume that the kernel K has a Hilbert-Schmidt expansion (orMercer series expansion) of the form

K (x , z) =∞∑

n=1

λnϕn(x)ϕn(z),

where (λn, ϕn) are the L2-orthonormal eigenpairs of a Hilbert-Schmidtintegral operator TK : L2(Ω, ρ)→ L2(Ω, ρ) defined as

(TK f )(x) =

∫Ω

K (x , z)f (z)ρ(z)dz ,

where Ω ⊂ Rd and ‖K‖L2(Ω×Ω,ρ×ρ) <∞.



But we can’t compute with an infinite matrix, so we choose a truncationvalue M (aided by λn → 0 as n→∞, more later) and rewrite

K =

K (x1,x1) . . . K (x1,xN)...

...K (xN ,x1) . . . K (xN ,xN)

=

ϕ1(x1) . . . ϕM(x1)...

...ϕ1(xN) . . . ϕM(xN)

︸︷︷︸

=Φ

λ1

. . .λM

︸︷︷︸

=Λ

ϕ1(x1) . . . ϕ1(xN)...

...ϕM(x1) . . . ϕM(xN)

︸︷︷︸

=ΦT

Since

K (x i ,x j) =∞∑

n=1

λnϕn(x i)ϕn(x j) ≈M∑

n=1

λnϕn(x i)ϕn(x j)

accurate reconstruction of all entries of K will likely require M > N.



The matrix K is often ill-conditioned, so forming K and computing withit is not a good idea.

The eigen-decompositionK = ΦΛΦT

provides an accurate (elementwise) approximation of K without everforming it.

However, it is not recommended to use this decomposition either sinceall of the ill-conditioning associated with K is still present – sitting in thematrix Λ.

We now use mostly standard numerical linear algebra to develop theHilbert-Schmidt SVD and a general RBF-QR algorithm.



There are at least two ways to interpret our algorithm:

We find an invertible M such that Ψ = KM−1 is better conditionedthan K.We form a Hilbert-Schmidt SVD of the matrix K, i.e.,

K = ΨΛ1ΦT1

whereΛ1 is a diagonal matrix of Hilbert-Schmidt singular values,Ψ and Φ1 are matrices generated by orthogonal eigenfunctions (butnot orthogonal matrices).

RemarkThe matrix Ψ is the same for both interpretations.

It can be computed stably.The notation above implies M = Λ1ΦT

1 .We get a well-conditioned linear system Ψb = y (where b = Mc).



RemarkWe can think of the n-th column of Φ as a sample of the n-theigenfunction obtained at the interpolation locations x1, . . . ,xN .Even though the Hilbert-Schmidt SVD

K = ΨΛ1ΦT1

looks very much like a regular SVD

K = UΣVT

the two are fundamentally different since for the Hilbert-SchmidtSVD we never form and factor the matrix K, and since Ψ and Φ1are not orthogonal.The notation Λ1 might suggest a regularization of the matrix K, butthat is not the case. Such a regularization will occur only forsufficiently small truncation length M.



Details of the Hilbert-Schmidt SVD

Assume M > N, so that Φ is “short and fat” and partition Φ:ϕ1(x1) . . . ϕN(x1) ϕN+1(x1) . . . ϕM(x1)...

......

...ϕ1(xN) . . . ϕN(xN) ϕN+1(xN) . . . ϕM(xN)

=

Φ1︸︷︷︸N×N

Φ2︸︷︷︸N×(M−N)

.

Then

K = ΦΛΦT

= Φ

(Λ1

Λ2

)(ΦT

1ΦT

2

)= Φ

(IN

Λ2ΦT2 Φ−T

1 Λ−11

)︸︷︷︸

= Ψ

Λ1ΦT1︸︷︷︸

=M



Taking a closer look at the matrix Ψ, we see that

Ψ = (Φ1 Φ2)

(IN

Λ2ΦT2 Φ−T

1 Λ−11

)= Φ1 + Φ2

[Λ2ΦT

2 Φ−T1 Λ−1

1

],

which we recognize asthe matrix Φ1 corresponding to samples of the first Neigenfunctionsplus a correction term formed by linear combinations of samplesof higher eigenfunctions.

We can interpret this as having a new basis ψ1(·), . . . , ψN(·) for theinterpolation space span K (·,x1), . . . ,K (·,xN consisting of theappropriately corrected first N eigenfunctions.The data dependence of this new basis is reflected in the terms insidethe brackets, i.e., the correction matrix is data dependent.



The particular structure of the correction matrix[Λ2ΦT

2 Φ−T1 Λ−1

1

]is

important for the success of the method:

Since λn → 0 as n→∞ the eigenvalues in Λ2 are smaller thanthose in Λ1 and so ΦT

2 Φ−T1 does not blow up.

Moreover, the multiplications by Λ2 and Λ−11 can be done

simultaneously. This avoids stability issues associated withpossible

underflow (for the Gaussian kernel, entries in Λ2 are as small asε2M−2) oroverflow (for the Gaussian, entries in Λ−1

1 are as large as ε−2N−2).



The QR in RBF-QR

Additional stability in the computation of the correction matrix[Λ2ΦT

2 Φ−T1 Λ−1

1

],

in particular, in the formation of ΦT2 Φ−T

1 , is achieved via a QRdecomposition of Φ, i.e.,

(Φ1 Φ2

)= Q

(R1︸︷︷︸

N×N

R2︸︷︷︸N×(M−N)

)

with orthogonal N × N matrix Q and upper triangular matrix R1.Then we have

ΦT2 Φ−T

1 = RT2 QT QR−T

1 = RT2 R−T

1 .

This idea appeared in [Fornberg & Piret (2008)].



Summary of Method

Instead of solving the “original” interpolation problem withill-conditioned matrix K

Kc = y ,

leading to inaccurate coefficients which then need to be multipliedagainst poorly conditioned basis functions, we now solve

Ψb = y

for a new set of coefficients which we then evaluate via

sf (x) =N∑

j=1

bjψj(x),

i.e., using the new basis.


Implementation for Compact Matérn Kernels

General Implementation

It is crucial to know the Hilbert-Schmidt expansion of K :

K (x , z) =∞∑

n=1

λnϕn(x)ϕn(z)

An expansion for multivariate Gaussian kernels was used in[F., Hickernell & Wozniakowski (2012)] to provedimension-independent convergence rates[F. & McCourt (2012)] to obtain and implement a stable GaussQRalgorithm.

We now discuss the implementation for generalizations of theBrownian bridge kernel

K (x , z) = min(x , z)− xz, x , z ∈ [0,1],

which we call compact Matérn kernels.Greg Fasshauer Hilbert-Schmidt SVD 16


We define compact Matérn kernels as Green’s kernels of(− d2

dx2 + ε2I

)βK (x , z) = δ(x − z), x , z ∈ [0,1], β ∈ N, ε ≥ 0,

subject to

d2ν

dx2ν K (0, z) =d2ν

dx2ν K (1, z) = 0, ν = 0, . . . , β − 1.

If this operator were considered on R, with appropriate decayconditions at ±∞, the associated Green’s kernel would be the Matérnkernel, e.g., e−ε|x−z| for β = 1.

We choose the name compact Matérn because we restrict thisoperator to the compact domain [0,1], or [0,L] in a more general case.



We define compact Matérn kernels as Green’s kernels of(− d2

dx2 + ε2I

)βK (x , z) = δ(x − z), x , z ∈ [0,1], β ∈ N, ε ≥ 0,

subject to

d2ν

dx2ν K (0, z) =d2ν

dx2ν K (1, z) = 0, ν = 0, . . . , β − 1.

The Hilbert-Schmidt expansion for compact Matérn kernels is of theform

Kβ,ε(x , z) =∞∑

n=1

2(n2π2 + ε2

)β sin (nπx) sin (nπz) ,

i.e., the eigenvalues and eigenfunctions are

λn =1(

n2π2 + ε2)β , ϕn(x) =

√2 sin (nπx) . (1)



Clearly,the eigenfunctions are bounded by

√2,

and, for a fixed value of ε, the eigenvalues decay as n−2β.

Therefore the truncation length M needed for accurate representationof the entries of K can be easily determined as a function of β and ε:

To ensure that we keep the first M significant terms we take M suchthat

λM < εmachλN , M > N.

Using the explicit representation of the eigenvalues, we solve for M:

M(β, ε; εmach) =

⌈1π

√ε−1/βmach (N2π2 + ε2)− ε2

⌉.



Program (MaternQRSolve.m)function yy = MaternQRSolve(x,y,ep,beta,xx)phifunc = @(n,x) sqrt(2)*sin(pi*x*n);N = length(x)+2;M = ceil(1/pi*sqrt(eps^(-1/beta)*(N^2*pi^2+ep^2)-ep^2));n = 1:M;Lambda = diag(((n*pi).^2+ep^2).^(-beta));Phi = phifunc(n,x);[Q,R] = qr(Phi);R1 = R(:,1:N); R2 = R(:,N+1:end);Rhat = R1\R2;Lambda1 = Lambda(1:N,1:N);Lambda2 = Lambda(N+1:M,N+1:M);Rbar = Lambda2*Rhat’/Lambda1;Psi = Phi*[eye(N);Rbar];b = Psi\y;Phi_eval = phifunc(n,xx);yy = Phi_eval*[eye(N);Rbar]*b;

end


Application 1: Basic Function Approximation

Standard RBF vs. MatérnQR InterpolationWe use

Kβ,ε with β = 7 and ε = 1N = 21 uniform samples of f (x) = (1− 4x)14

+ (4x − 3)14+


Application 1: Basic Function Approximation

Error vs. Shape ParameterWe use

Kβ,ε with β = 2 and variable ε (Note: ε = 0→ cubic natural spline)N uniform samples off (x) = (1

2 − δ)−4 (x − δ)2+ (1− δ − x)2

+ e−36(x−0.4)2, δ = 0.0567


Application 2: Determinant of Covariance Matrix

Likelihood Functions for Gaussian Processes

The kernel interpolation problem has an analog in statistics calledkriging.

If, instead of trying to recover a function, we treat our scattered data asone realization of a Gaussian Process, we can prescribe a positivedefinite kernel K as the presumed covariance between realizations ofthe Gaussian Process.

The likelihood function of a zero-mean Gaussian Process (theprobability of the data (xi , yi)

Ni=1 given the kernel parameters θ = (ε, β))

is

P(y |ε) = (2π det(K))−1/2 exp(−1

2yT K−1y

)The matrix K is the kernel interpolation matrix from before.



Maximum Likelihood Estimation (MLE)

We can use MLE to determine an “optimal” shape parameter.Maximizing

P(y |ε) = (2π det(K))−1/2 exp(−1

2yT K−1y

)requires evaluating det(K), which, given its ill-conditioning, is a diceyproposition.

We use Hilbert-Schmidt SVD to compute this determinant:

det(K) = det(ΨΛ1ΦT1 ) = det(Ψ) det(Λ1) det(Φ1)

The values det(Ψ) and det(Φ1) can be computed stably (we presume)using standard techniques.

Evaluating det(Λ1) can be done analytically because it is a diagonalmatrix populated by our Hilbert-Schmidt eigenvalues.



Stable Determinant Evaluation

Figure: N = 30 equally spaced points on [0,1], β = 8


Application 3: Choosing a Good Shape Parameter via LOOCV

LOOCV: How it works

Let s[k ]f be the kernel interpolant to the training data

f1, . . . , fk−1, fk+1, . . . , fN, i.e.,

s[k ]f (x) =

N∑j=1j 6=k

c[k ]j K (x ,x j),

such that

s[k ]f (x i) = fi , i = 1, . . . , k − 1, k + 1, . . . ,N,

and let ek (ε) be the error

ek (ε) = fk − s[k ]f (xk )

at the one validation point xk not used to determine the interpolant.Find

εopt = argminε‖e(ε)‖, e = [e1, . . . ,eN ]T



Efficient formula for the LOOCV criterion:

ek (ε) =ck

K−1kk

, k = 1, . . . ,N, (2)

withck : k th coefficient of full interpolant sf

K−1kk : k th diagonal element of inverse of corresponding

interpolation matrixwas given by [Rippa (1999)] (and also [Wahba (1990)]).

Remark

Since both ck and K−1 need to be computed only once for eachvalue of ε this results in O(N3) computational complexity.All entries in the error vector e can be computed in a singlestatement in MATLAB if we vectorize the component formula (2):

errorvector = (invIM*rhs)./diag(invIM);



LOOCV via Hilbert-Schmidt SVDSince the LOOCV criterion requires the inverse of K (which may bevery ill-conditioned), we use the Hilbert-Schmidt SVD

K = ΨΛ1ΦT1 ,

accurate to within machine precision (i.e., truncation length M > N).Then

K−1 = Φ−T1 Λ−1

1 Ψ−1.

The matrices Ψ and Φ1 are usually “well-behaved”.But Λ1 by itself still contains the basic ill-conditioning, i.e.,potentially very small eigenvalues.We therefore use the pseudoinverse Λ†1 instead of Λ−1

1 , i.e., wedrop some of the smallest eigenvalues of the kernel K .

RemarkNote that truncating the Hilbert-Schmidt SVD is fundamentally differentfrom performing a standard SVD of K and then truncating that.



Example (Optimal ε-β surfaces using MaternQR)

Determine the optimal ε and β for interpolation with compact Matérnkernels

Kβ,ε(x , z) =∞∑

n=1

2(

n2π2 + ε2)−β

sin (nπx) sin (nπz)

on [0,1] using N = 24 evenly spaced samples from

fγ(x) =1(1

2 − γ)2 (x − γ)2

+(1− γ − x)2+e−36(x−0.4)2

,

with γ = 0.0567.



Optimal ε-β surfaces using MaternQR (cont.)

Error of true solution with εopt = 18.047, βopt = 4 (left)Third out CV with εopt = 14.251, βopt = 6 (right)



Optimal ε-β surfaces using MaternQR (cont.)

Half out CV with εopt = 0.1, βopt = 8 (left)LOOCV with εopt = 10.0, βopt = 7 (right).

For a smooth kernel (large β) we see instabilities for LOOCV for smallε – even though the Hilbert-Schmidt SVD is used.


Summary

Hilbert-Schmidt/Mercer expansion and Hilbert-Schmidt SVDprovide a general and transparent framework for stable kernelcomputationImplementation depends on availability of Mercer series forspecific kernels.

some eigenfunctions are easier to handle than othersVast applications

function interpolation/approximationnumerical solution of PDEs (collocation, MFS, MPS)parameter estimation (MLE, GCV)...


Appendix References

References I

Schumaker, L. L. (1981).Spline Functions: Basic Theory.John Wiley & Sons, New York (reprinted by Krieger Publishing 1993).

Wahba, G. (1990).Spline Models for Observational Data.CBMS-NSF Regional Conference Series in Applied Mathematics 59, SIAM(Philadelphia).

Beatson, R. K., Light, W. A. and Billings, S. (2001).Fast solution of the radial basis function interpolation equations: domaindecomposition methods.SIAM J. Sci. Comput. 22/5, pp. 1717–1740.

Driscoll, T. A., and Fornberg, B.,Interpolation in the limit of increasingly flat radial basis functions.Comput. Math. Appl. 43/3-5 (2002), 413–422.


Appendix References

References II

Fasshauer, G. E. (2011).Positive definite kernels: Past, present and future.Dolomites Research Notes on Approximation 4 (2011), 21–63.

Fasshauer, G. E. (2011).Green’s functions: taking another look at kernel approximation, radial basisfunctions and splines.In Approximation Theory XIII: San Antonio 2010, M. Neamtu and L. Schumaker(eds.), Springer Proceedings in Mathematics 13, pp. 37–63.

Fasshauer, G. E., Hickernell, F. J. and Wozniakowski, H.Rate of convergence and tractability of the radial function approximation problem.SIAM J. Numer. Anal. 50/1 (2012), 247–271.

Fasshauer, G. E. and McCourt, M. J. (2012).Stable evaluation of Gaussian RBF interpolants.SIAM J. Scient. Comput. 34/2, pp. A737–A762.


Appendix References

References III

Fasshauer, G. E. and Ye, Q. (2011).Reproducing kernels of generalized Sobolev spaces via a Green functionapproach with distributional operators,Numer. Math. 119/3, pp. 585–611.

Fasshauer, G. E. and Ye, Q. (2013).Reproducing kernels of Sobolev spaces via a Green function approach withdifferential operators & boundary operators,Adv. Comp. Math. 38/4 (2013), pp. 891–921.

Fornberg, B. and Piret, C. (2008).A stable algorithm for flat radial basis functions on a sphere.SIAM J. Scient. Comput. 30/1, pp. 60–80.

Fornberg, B. and Wright, G. (2004).Stable computation of multiquadric interpolants for all values of the shapeparameter.Comput. Math. Appl. 48/5-6, pp. 853–867.


Appendix References

References IV

Rippa, S. (1999).An algorithm for selecting a good value for the parameter c in radial basisfunction interpolation.Adv. in Comput. Math. 11, pp. 193–210.

Song, G., Riddle, J., Fasshauer, G. E. and Hickernell, F. J.Multivariate interpolation with increasingly flat radial basis functions of finitesmoothness.Adv. in Comput. Math. 36/3 (2012), 485–501.


using a hilbert-schmidt svd for stable kernel...

Documents