using a hilbert-schmidt svd for stable kernel...
TRANSCRIPT
Using a Hilbert-Schmidt SVD forStable Kernel Computations
Greg Fasshauer Mike McCourt
Department of Applied MathematicsIllinois Institute of Technology
Partially supported by NSF Grant DMS–1115392
Midwest Numerical Analysis DayUniversity of Chicago
May 25, 2013
Greg Fasshauer Hilbert-Schmidt SVD 1
Outline
1 Fundamental Problem
2 Hilbert-Schmidt SVD and General RBF-QR Algorithm
3 Implementation for Compact Matérn Kernels
4 Application 1: Basic Function Approximation
5 Application 2: Choosing a Good Shape Parameter via LOOCV
6 Application 3: Covariance Matrix Determinant
7 Summary
Greg Fasshauer Hilbert-Schmidt SVD 2
Fundamental Problem
Kernel-based Interpolation
Given data (x i , yi)Ni=1, use a data-dependent linear function space
sf (x) =N∑
j=1
cjK (x ,x j), x ∈ Ω ⊆ Rd
with K : Ω× Ω→ R a positive definite reproducing kernel.
To find cj solve the interpolation equations
sf (x i) = f (x i) = yi , i = 1, . . . ,N.
Leads to linear system Kc = y with symmetric positive definite – oftenill-conditioned – system matrix
Kij = K (x i ,x j), i , j = 1, . . . ,N.
Greg Fasshauer Hilbert-Schmidt SVD 3
Fundamental Problem
RBFs (positive definite kernels) are attractive to work with because
Gaussians (and certain other RBFs) provide arbitrarily high &dimension-independent convergence rates for sufficiently “nice”data (i.e., with low effective dimension and coming from a smoothfunction) [F., Hickernell & Wozniakowski (2012)],
numerical instability of interpolation/approximation algorithms canbe overcome by using a “better” basis[F. & McCourt (2012), Fornberg & Wright (2004)],
interpolants with certain “flat” RBFs converge to polynomial(spline) interpolants[Driscoll & Fornberg (2002), Song, Riddle, F. & Hickernell (2012)].
Greg Fasshauer Hilbert-Schmidt SVD 4
Fundamental Problem
Using a Better Basis to Ensure Stability
Well-known idea in approximation theory (e.g., B-splines as stablebasis for piecewise polynomial splines [Schumaker (1981)])
Idea appears also in the RBF literature [Beatson et al. (2001)]
New idea: use eigenfunction expansions of positive definitekernels
[F. & McCourt (2012), F. (2011a), F. (2011b)], appeared implicitly in[Fornberg & Piret (2008)]
Use connection to Green’s kernels [F. & Ye (2011), F. & Ye (2013)],their eigenfunction expansions and classical Sturm-Liouvilletheory
Greg Fasshauer Hilbert-Schmidt SVD 5
Hilbert-Schmidt SVD and General RBF-QR Algorithm
Hilbert-Schmidt Theory
We assume that the kernel K has a Hilbert-Schmidt expansion (orMercer series expansion) of the form
K (x , z) =∞∑
n=1
λnϕn(x)ϕn(z),
where (λn, ϕn) are the L2-orthonormal eigenpairs of a Hilbert-Schmidtintegral operator TK : L2(Ω, ρ)→ L2(Ω, ρ) defined as
(TK f )(x) =
∫Ω
K (x , z)f (z)ρ(z)dz ,
where Ω ⊂ Rd and ‖K‖L2(Ω×Ω,ρ×ρ) <∞.
Greg Fasshauer Hilbert-Schmidt SVD 6
Hilbert-Schmidt SVD and General RBF-QR Algorithm
But we can’t compute with an infinite matrix, so we choose a truncationvalue M (aided by λn → 0 as n→∞, more later) and rewrite
K =
K (x1,x1) . . . K (x1,xN)...
...K (xN ,x1) . . . K (xN ,xN)
=
ϕ1(x1) . . . ϕM(x1)...
...ϕ1(xN) . . . ϕM(xN)
︸ ︷︷ ︸
=Φ
λ1
. . .λM
︸ ︷︷ ︸
=Λ
ϕ1(x1) . . . ϕ1(xN)...
...ϕM(x1) . . . ϕM(xN)
︸ ︷︷ ︸
=ΦT
Since
K (x i ,x j) =∞∑
n=1
λnϕn(x i)ϕn(x j) ≈M∑
n=1
λnϕn(x i)ϕn(x j)
accurate reconstruction of all entries of K will likely require M > N.
Greg Fasshauer Hilbert-Schmidt SVD 7
Hilbert-Schmidt SVD and General RBF-QR Algorithm
The matrix K is often ill-conditioned, so forming K and computing withit is not a good idea.
The eigen-decompositionK = ΦΛΦT
provides an accurate (elementwise) approximation of K without everforming it.
However, it is not recommended to use this decomposition either sinceall of the ill-conditioning associated with K is still present – sitting in thematrix Λ.
We now use mostly standard numerical linear algebra to develop theHilbert-Schmidt SVD and a general RBF-QR algorithm.
Greg Fasshauer Hilbert-Schmidt SVD 8
Hilbert-Schmidt SVD and General RBF-QR Algorithm
There are at least two ways to interpret our algorithm:
We find an invertible M such that Ψ = KM−1 is better conditionedthan K.We form a Hilbert-Schmidt SVD of the matrix K, i.e.,
K = ΨΛ1ΦT1
whereΛ1 is a diagonal matrix of Hilbert-Schmidt singular values,Ψ and Φ1 are matrices generated by orthogonal eigenfunctions (butnot orthogonal matrices).
RemarkThe matrix Ψ is the same for both interpretations.
It can be computed stably.The notation above implies M = Λ1ΦT
1 .We get a well-conditioned linear system Ψb = y (where b = Mc).
Greg Fasshauer Hilbert-Schmidt SVD 9
Hilbert-Schmidt SVD and General RBF-QR Algorithm
RemarkWe can think of the n-th column of Φ as a sample of the n-theigenfunction obtained at the interpolation locations x1, . . . ,xN .Even though the Hilbert-Schmidt SVD
K = ΨΛ1ΦT1
looks very much like a regular SVD
K = UΣVT
the two are fundamentally different since for the Hilbert-SchmidtSVD we never form and factor the matrix K, and since Ψ and Φ1are not orthogonal.The notation Λ1 might suggest a regularization of the matrix K, butthat is not the case. Such a regularization will occur only forsufficiently small truncation length M.
Greg Fasshauer Hilbert-Schmidt SVD 10
Hilbert-Schmidt SVD and General RBF-QR Algorithm
Details of the Hilbert-Schmidt SVD
Assume M > N, so that Φ is “short and fat” and partition Φ:ϕ1(x1) . . . ϕN(x1) ϕN+1(x1) . . . ϕM(x1)...
......
...ϕ1(xN) . . . ϕN(xN) ϕN+1(xN) . . . ϕM(xN)
=
Φ1︸︷︷︸N×N
Φ2︸︷︷︸N×(M−N)
.
Then
K = ΦΛΦT
= Φ
(Λ1
Λ2
)(ΦT
1ΦT
2
)= Φ
(IN
Λ2ΦT2 Φ−T
1 Λ−11
)︸ ︷︷ ︸
= Ψ
Λ1ΦT1︸ ︷︷ ︸
=M
Greg Fasshauer Hilbert-Schmidt SVD 11
Hilbert-Schmidt SVD and General RBF-QR Algorithm
Taking a closer look at the matrix Ψ, we see that
Ψ = (Φ1 Φ2)
(IN
Λ2ΦT2 Φ−T
1 Λ−11
)= Φ1 + Φ2
[Λ2ΦT
2 Φ−T1 Λ−1
1
],
which we recognize asthe matrix Φ1 corresponding to samples of the first Neigenfunctionsplus a correction term formed by linear combinations of samplesof higher eigenfunctions.
We can interpret this as having a new basis ψ1(·), . . . , ψN(·) for theinterpolation space span K (·,x1), . . . ,K (·,xN consisting of theappropriately corrected first N eigenfunctions.The data dependence of this new basis is reflected in the terms insidethe brackets, i.e., the correction matrix is data dependent.
Greg Fasshauer Hilbert-Schmidt SVD 12
Hilbert-Schmidt SVD and General RBF-QR Algorithm
The particular structure of the correction matrix[Λ2ΦT
2 Φ−T1 Λ−1
1
]is
important for the success of the method:
Since λn → 0 as n→∞ the eigenvalues in Λ2 are smaller thanthose in Λ1 and so ΦT
2 Φ−T1 does not blow up.
Moreover, the multiplications by Λ2 and Λ−11 can be done
simultaneously. This avoids stability issues associated withpossible
underflow (for the Gaussian kernel, entries in Λ2 are as small asε2M−2) oroverflow (for the Gaussian, entries in Λ−1
1 are as large as ε−2N−2).
Greg Fasshauer Hilbert-Schmidt SVD 13
Hilbert-Schmidt SVD and General RBF-QR Algorithm
The QR in RBF-QR
Additional stability in the computation of the correction matrix[Λ2ΦT
2 Φ−T1 Λ−1
1
],
in particular, in the formation of ΦT2 Φ−T
1 , is achieved via a QRdecomposition of Φ, i.e.,
(Φ1 Φ2
)= Q
(R1︸︷︷︸
N×N
R2︸︷︷︸N×(M−N)
)
with orthogonal N × N matrix Q and upper triangular matrix R1.Then we have
ΦT2 Φ−T
1 = RT2 QT QR−T
1 = RT2 R−T
1 .
This idea appeared in [Fornberg & Piret (2008)].
Greg Fasshauer Hilbert-Schmidt SVD 14
Hilbert-Schmidt SVD and General RBF-QR Algorithm
Summary of Method
Instead of solving the “original” interpolation problem withill-conditioned matrix K
Kc = y ,
leading to inaccurate coefficients which then need to be multipliedagainst poorly conditioned basis functions, we now solve
Ψb = y
for a new set of coefficients which we then evaluate via
sf (x) =N∑
j=1
bjψj(x),
i.e., using the new basis.
Greg Fasshauer Hilbert-Schmidt SVD 15
Implementation for Compact Matérn Kernels
General Implementation
It is crucial to know the Hilbert-Schmidt expansion of K :
K (x , z) =∞∑
n=1
λnϕn(x)ϕn(z)
An expansion for multivariate Gaussian kernels was used in[F., Hickernell & Wozniakowski (2012)] to provedimension-independent convergence rates[F. & McCourt (2012)] to obtain and implement a stable GaussQRalgorithm.
We now discuss the implementation for generalizations of theBrownian bridge kernel
K (x , z) = min(x , z)− xz, x , z ∈ [0,1],
which we call compact Matérn kernels.Greg Fasshauer Hilbert-Schmidt SVD 16
Implementation for Compact Matérn Kernels
We define compact Matérn kernels as Green’s kernels of(− d2
dx2 + ε2I
)βK (x , z) = δ(x − z), x , z ∈ [0,1], β ∈ N, ε ≥ 0,
subject to
d2ν
dx2ν K (0, z) =d2ν
dx2ν K (1, z) = 0, ν = 0, . . . , β − 1.
If this operator were considered on R, with appropriate decayconditions at ±∞, the associated Green’s kernel would be the Matérnkernel, e.g., e−ε|x−z| for β = 1.
We choose the name compact Matérn because we restrict thisoperator to the compact domain [0,1], or [0,L] in a more general case.
Greg Fasshauer Hilbert-Schmidt SVD 17
Implementation for Compact Matérn Kernels
We define compact Matérn kernels as Green’s kernels of(− d2
dx2 + ε2I
)βK (x , z) = δ(x − z), x , z ∈ [0,1], β ∈ N, ε ≥ 0,
subject to
d2ν
dx2ν K (0, z) =d2ν
dx2ν K (1, z) = 0, ν = 0, . . . , β − 1.
The Hilbert-Schmidt expansion for compact Matérn kernels is of theform
Kβ,ε(x , z) =∞∑
n=1
2(n2π2 + ε2
)β sin (nπx) sin (nπz) ,
i.e., the eigenvalues and eigenfunctions are
λn =1(
n2π2 + ε2)β , ϕn(x) =
√2 sin (nπx) . (1)
Greg Fasshauer Hilbert-Schmidt SVD 18
Implementation for Compact Matérn Kernels
Clearly,the eigenfunctions are bounded by
√2,
and, for a fixed value of ε, the eigenvalues decay as n−2β.
Therefore the truncation length M needed for accurate representationof the entries of K can be easily determined as a function of β and ε:
To ensure that we keep the first M significant terms we take M suchthat
λM < εmachλN , M > N.
Using the explicit representation of the eigenvalues, we solve for M:
M(β, ε; εmach) =
⌈1π
√ε−1/βmach (N2π2 + ε2)− ε2
⌉.
Greg Fasshauer Hilbert-Schmidt SVD 19
Implementation for Compact Matérn Kernels
Program (MaternQRSolve.m)function yy = MaternQRSolve(x,y,ep,beta,xx)phifunc = @(n,x) sqrt(2)*sin(pi*x*n);N = length(x)+2;M = ceil(1/pi*sqrt(eps^(-1/beta)*(N^2*pi^2+ep^2)-ep^2));n = 1:M;Lambda = diag(((n*pi).^2+ep^2).^(-beta));Phi = phifunc(n,x);[Q,R] = qr(Phi);R1 = R(:,1:N); R2 = R(:,N+1:end);Rhat = R1\R2;Lambda1 = Lambda(1:N,1:N);Lambda2 = Lambda(N+1:M,N+1:M);Rbar = Lambda2*Rhat’/Lambda1;Psi = Phi*[eye(N);Rbar];b = Psi\y;Phi_eval = phifunc(n,xx);yy = Phi_eval*[eye(N);Rbar]*b;
end
Greg Fasshauer Hilbert-Schmidt SVD 20
Application 1: Basic Function Approximation
Standard RBF vs. MatérnQR InterpolationWe use
Kβ,ε with β = 7 and ε = 1N = 21 uniform samples of f (x) = (1− 4x)14
+ (4x − 3)14+
Greg Fasshauer Hilbert-Schmidt SVD 21
Application 1: Basic Function Approximation
Error vs. Shape ParameterWe use
Kβ,ε with β = 2 and variable ε (Note: ε = 0→ cubic natural spline)N uniform samples off (x) = (1
2 − δ)−4 (x − δ)2+ (1− δ − x)2
+ e−36(x−0.4)2, δ = 0.0567
Greg Fasshauer Hilbert-Schmidt SVD 22
Application 2: Determinant of Covariance Matrix
Likelihood Functions for Gaussian Processes
The kernel interpolation problem has an analog in statistics calledkriging.
If, instead of trying to recover a function, we treat our scattered data asone realization of a Gaussian Process, we can prescribe a positivedefinite kernel K as the presumed covariance between realizations ofthe Gaussian Process.
The likelihood function of a zero-mean Gaussian Process (theprobability of the data (xi , yi)
Ni=1 given the kernel parameters θ = (ε, β))
is
P(y |ε) = (2π det(K))−1/2 exp(−1
2yT K−1y
)The matrix K is the kernel interpolation matrix from before.
Greg Fasshauer Hilbert-Schmidt SVD 23
Application 2: Determinant of Covariance Matrix
Maximum Likelihood Estimation (MLE)
We can use MLE to determine an “optimal” shape parameter.Maximizing
P(y |ε) = (2π det(K))−1/2 exp(−1
2yT K−1y
)requires evaluating det(K), which, given its ill-conditioning, is a diceyproposition.
We use Hilbert-Schmidt SVD to compute this determinant:
det(K) = det(ΨΛ1ΦT1 ) = det(Ψ) det(Λ1) det(Φ1)
The values det(Ψ) and det(Φ1) can be computed stably (we presume)using standard techniques.
Evaluating det(Λ1) can be done analytically because it is a diagonalmatrix populated by our Hilbert-Schmidt eigenvalues.
Greg Fasshauer Hilbert-Schmidt SVD 24
Application 2: Determinant of Covariance Matrix
Stable Determinant Evaluation
Figure: N = 30 equally spaced points on [0,1], β = 8
Greg Fasshauer Hilbert-Schmidt SVD 25
Application 3: Choosing a Good Shape Parameter via LOOCV
LOOCV: How it works
Let s[k ]f be the kernel interpolant to the training data
f1, . . . , fk−1, fk+1, . . . , fN, i.e.,
s[k ]f (x) =
N∑j=1j 6=k
c[k ]j K (x ,x j),
such that
s[k ]f (x i) = fi , i = 1, . . . , k − 1, k + 1, . . . ,N,
and let ek (ε) be the error
ek (ε) = fk − s[k ]f (xk )
at the one validation point xk not used to determine the interpolant.Find
εopt = argminε‖e(ε)‖, e = [e1, . . . ,eN ]T
Greg Fasshauer Hilbert-Schmidt SVD 26
Application 3: Choosing a Good Shape Parameter via LOOCV
Efficient formula for the LOOCV criterion:
ek (ε) =ck
K−1kk
, k = 1, . . . ,N, (2)
withck : k th coefficient of full interpolant sf
K−1kk : k th diagonal element of inverse of corresponding
interpolation matrixwas given by [Rippa (1999)] (and also [Wahba (1990)]).
Remark
Since both ck and K−1 need to be computed only once for eachvalue of ε this results in O(N3) computational complexity.All entries in the error vector e can be computed in a singlestatement in MATLAB if we vectorize the component formula (2):
errorvector = (invIM*rhs)./diag(invIM);
Greg Fasshauer Hilbert-Schmidt SVD 27
Application 3: Choosing a Good Shape Parameter via LOOCV
LOOCV via Hilbert-Schmidt SVDSince the LOOCV criterion requires the inverse of K (which may bevery ill-conditioned), we use the Hilbert-Schmidt SVD
K = ΨΛ1ΦT1 ,
accurate to within machine precision (i.e., truncation length M > N).Then
K−1 = Φ−T1 Λ−1
1 Ψ−1.
The matrices Ψ and Φ1 are usually “well-behaved”.But Λ1 by itself still contains the basic ill-conditioning, i.e.,potentially very small eigenvalues.We therefore use the pseudoinverse Λ†1 instead of Λ−1
1 , i.e., wedrop some of the smallest eigenvalues of the kernel K .
RemarkNote that truncating the Hilbert-Schmidt SVD is fundamentally differentfrom performing a standard SVD of K and then truncating that.
Greg Fasshauer Hilbert-Schmidt SVD 28
Application 3: Choosing a Good Shape Parameter via LOOCV
Example (Optimal ε-β surfaces using MaternQR)
Determine the optimal ε and β for interpolation with compact Matérnkernels
Kβ,ε(x , z) =∞∑
n=1
2(
n2π2 + ε2)−β
sin (nπx) sin (nπz)
on [0,1] using N = 24 evenly spaced samples from
fγ(x) =1(1
2 − γ)2 (x − γ)2
+(1− γ − x)2+e−36(x−0.4)2
,
with γ = 0.0567.
Greg Fasshauer Hilbert-Schmidt SVD 29
Application 3: Choosing a Good Shape Parameter via LOOCV
Optimal ε-β surfaces using MaternQR (cont.)
Error of true solution with εopt = 18.047, βopt = 4 (left)Third out CV with εopt = 14.251, βopt = 6 (right)
Greg Fasshauer Hilbert-Schmidt SVD 30
Application 3: Choosing a Good Shape Parameter via LOOCV
Optimal ε-β surfaces using MaternQR (cont.)
Half out CV with εopt = 0.1, βopt = 8 (left)LOOCV with εopt = 10.0, βopt = 7 (right).
For a smooth kernel (large β) we see instabilities for LOOCV for smallε – even though the Hilbert-Schmidt SVD is used.
Greg Fasshauer Hilbert-Schmidt SVD 31
Summary
Hilbert-Schmidt/Mercer expansion and Hilbert-Schmidt SVDprovide a general and transparent framework for stable kernelcomputationImplementation depends on availability of Mercer series forspecific kernels.
some eigenfunctions are easier to handle than othersVast applications
function interpolation/approximationnumerical solution of PDEs (collocation, MFS, MPS)parameter estimation (MLE, GCV)...
Greg Fasshauer Hilbert-Schmidt SVD 32
Appendix References
References I
Schumaker, L. L. (1981).Spline Functions: Basic Theory.John Wiley & Sons, New York (reprinted by Krieger Publishing 1993).
Wahba, G. (1990).Spline Models for Observational Data.CBMS-NSF Regional Conference Series in Applied Mathematics 59, SIAM(Philadelphia).
Beatson, R. K., Light, W. A. and Billings, S. (2001).Fast solution of the radial basis function interpolation equations: domaindecomposition methods.SIAM J. Sci. Comput. 22/5, pp. 1717–1740.
Driscoll, T. A., and Fornberg, B.,Interpolation in the limit of increasingly flat radial basis functions.Comput. Math. Appl. 43/3-5 (2002), 413–422.
Greg Fasshauer Hilbert-Schmidt SVD 33
Appendix References
References II
Fasshauer, G. E. (2011).Positive definite kernels: Past, present and future.Dolomites Research Notes on Approximation 4 (2011), 21–63.
Fasshauer, G. E. (2011).Green’s functions: taking another look at kernel approximation, radial basisfunctions and splines.In Approximation Theory XIII: San Antonio 2010, M. Neamtu and L. Schumaker(eds.), Springer Proceedings in Mathematics 13, pp. 37–63.
Fasshauer, G. E., Hickernell, F. J. and Wozniakowski, H.Rate of convergence and tractability of the radial function approximation problem.SIAM J. Numer. Anal. 50/1 (2012), 247–271.
Fasshauer, G. E. and McCourt, M. J. (2012).Stable evaluation of Gaussian RBF interpolants.SIAM J. Scient. Comput. 34/2, pp. A737–A762.
Greg Fasshauer Hilbert-Schmidt SVD 34
Appendix References
References III
Fasshauer, G. E. and Ye, Q. (2011).Reproducing kernels of generalized Sobolev spaces via a Green functionapproach with distributional operators,Numer. Math. 119/3, pp. 585–611.
Fasshauer, G. E. and Ye, Q. (2013).Reproducing kernels of Sobolev spaces via a Green function approach withdifferential operators & boundary operators,Adv. Comp. Math. 38/4 (2013), pp. 891–921.
Fornberg, B. and Piret, C. (2008).A stable algorithm for flat radial basis functions on a sphere.SIAM J. Scient. Comput. 30/1, pp. 60–80.
Fornberg, B. and Wright, G. (2004).Stable computation of multiquadric interpolants for all values of the shapeparameter.Comput. Math. Appl. 48/5-6, pp. 853–867.
Greg Fasshauer Hilbert-Schmidt SVD 35
Appendix References
References IV
Rippa, S. (1999).An algorithm for selecting a good value for the parameter c in radial basisfunction interpolation.Adv. in Comput. Math. 11, pp. 193–210.
Song, G., Riddle, J., Fasshauer, G. E. and Hickernell, F. J.Multivariate interpolation with increasingly flat radial basis functions of finitesmoothness.Adv. in Comput. Math. 36/3 (2012), 485–501.
Greg Fasshauer Hilbert-Schmidt SVD 36