an introduction to structured tensor-product …in large scale applications the algebraic operations...

140
0-0 These notes are based on a lecture course given by the author in the summer semester of 2005 for postgraduate students at the University of Leipzig/Max-Planck-Institute for Mathematics in the Sciences. The purpose of this course was to provide an introduction to modern methods of a data- sparse representation to integral and more general nonlocal operators based on the use of Kronecker tensor-product decomposition. In recent years multifactor analysis has been recognised as a powerful (and really indispensable) tool to represent multi-dimensional data arising in various applications. Well-known since three decades in chemometics, physicometrics, statistics, signal processing, data mining and in complexity theory, nowadays this tool has also become attractive in numerical PDEs, many-particle calculations, and in solving integral equations. Our goal is to introduce the main mathematical ideas and principles which allow effective representation of some classes of high-dimensional operators in the Kronecker tensor-product form, as well as rigorous analysis of the arising approximations. Low Kronecker-rank representation of operators not only relaxes the “curse of dimensionality”, but also provides efficient numer- ical methods of sub-linear complexity to approximate 2D- and 3D-problems. Leipzig, July 2005. 1 Everything should be made as simple as possible, but not simpler. A. Einstein (1879-1955) An Introduction to Structured Tensor-Product Representation of Discrete Nonlocal Operators Part I: Approximation Tools Boris N. Khoromskij University of Leipzig/MPI MIS, summer 2005 http://personal-homepages.mis.mpg.de/bokh http://www.mis.mpg.de/scicomp/Fulltext/Khoromskij

Upload: others

Post on 29-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

0-0

These notes are based on a lecture course given by the author in thesummer semester of 2005 for postgraduate students at the University ofLeipzig/Max-Planck-Institute for Mathematics in the Sciences. The purposeof this course was to provide an introduction to modern methods of a data-sparse representation to integral and more general nonlocal operators basedon the use of Kronecker tensor-product decomposition.

In recent years multifactor analysis has been recognised as a powerful(and really indispensable) tool to represent multi-dimensional data arisingin various applications. Well-known since three decades in chemometics,physicometrics, statistics, signal processing, data mining and in complexitytheory, nowadays this tool has also become attractive in numerical PDEs,many-particle calculations, and in solving integral equations.

Our goal is to introduce the main mathematical ideas and principles whichallow effective representation of some classes of high-dimensional operatorsin the Kronecker tensor-product form, as well as rigorous analysis of thearising approximations. Low Kronecker-rank representation of operators notonly relaxes the “curse of dimensionality”, but also provides efficient numer-ical methods of sub-linear complexity to approximate 2D- and 3D-problems.

Leipzig, July 2005.

1

Everything should be made as simpleas possible, but not simpler.

A. Einstein (1879-1955)

An Introduction to Structured Tensor-Product

Representation of Discrete Nonlocal Operators

Part I: Approximation Tools

Boris N. Khoromskij

University of Leipzig/MPI MIS, summer 2005

http://personal-homepages.mis.mpg.de/bokh

http://www.mis.mpg.de/scicomp/Fulltext/Khoromskij

Page 2: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Outline of the Lecture Course B. Khoromskij, Leipzig 2005(L1) 2

1. Ubiquitous data-sparse matrix arithmetics; look on Fourier kingdom.

2. Celebrated sampling theorem; Sinc interpolation and quadratures.

3. Introduction to wavelet techniques.

4. Separable approximation to multi-variate functions in Rd.

5. Kronecker-product decomposition of high-dimensional tensors.

Combination with H-matrix, FFT- and FWT-based formats.

6. Hierarchical Kronecker-product (HKT) representation to

multi-dimensional integral operators Au =R

Rd g(·, y)u(y)dy.

7. Structured representation to matrix-valued functions with application

to A−1,√

A, sign(A).

8. Truncated iteration: convergence and truncation error analysis.

9. HKT approximation to matrix-valued functions A−1,√

A, sign(A).

10. Application to the Hartree-Fock and Boltzmann equations.

Lect. 1. Ubiquitous data-sparse matrix arithm.; Fourier kingdom. B. Khoromskij, Leipzig 2005 3

Basic physical models are described by nonlocal data transfer.

In large scale applications the algebraic operations on high-dimensional,

densely populated matrices/tensors require huge computational resources.

Standard methods suffer from the “curse of dimensionality” (R. Bellman).

Examples of (discrete) nonlocal operators:

1. Multi-dimensional integral operators in Rd

2. Elliptic/parabolic solution operators (e.g., financial PDEs)

3. Lyapunov/Riccati matrix equations in control theory

4. Density matrix calculation for many-particle systems

5. Deterministic Boltzmann equation in R3 (dilute gas).

6. Ornstein-Zernike integral equation in R3(theory of disordered matter)

7. Chemometric, psychometric, stochastic models ...

Page 3: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Huge problems: special methods vs. super-computers B. Khoromskij, Leipzig 2005(L1) 4

Complexity of standard matrix operations:

NStor ≈ NA·v = O(N2) for the storage/MVM of fully populated

matrix A ∈ RN×N ; besides NA−1 ≈ NA·B ≈ NL·U = O(N3).

A paradigm of up-to-date numerical simulations:

the faster the computer is the better asymptotical complexity

of the algorithm is required (speed increases proportional to memory).

In low dimensions (d ≤ 3) the goal is O(N)-methods.

Basic principles: making use of hierarchical structures,

low-rank pattern and recursive algorithms.

In multi-dimensional perspective O(N) is not enough since the

“curse of dimensionality”: N = nd (3 · 1022 mol. in 1 cm3 of water).

The challenge is to develop O(n)-algorithms !

Main ideas: tensor-product data-struct. + H-matrix formats.

Old and new ideas or what we are going to discuss B. Khoromskij, Leipzig 2005(L1) 5

Based on recursions via hierarchical structures:

Classical Fourier (1768-1830) methods, FFT in O(N log N) op.

Circulant convolution, Toeplitz, Hankel matrices.

Multiresolution representation via wavelets, FWT in O(N) op.

Data and matrix compression in O(N) op.

Multigrid methods: O(N) - elliptic problem solvers.

Domain decomposition: O(N/p) - parallel algorithms.

Panel clustering, fast multipole, H-matrix in O(qdN logβ N) op.

Well suited for integral (nonlocal) operators in FEM/BEM.

Based on tensor-product data organization:

Kronecker tensor-product (KT) representation in RN , N = nd

(multiway decomposition): O(nq logβ n), q = q(d) - fixed.

Combination of KT formats with H-matrix, wavelet or

FFT-based structures: O(n logβ n) op.

Page 4: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Alternative directions: Compress the input data B. Khoromskij, Leipzig 2005(L1) 6

• High order methods: hp-FEM/BEM, spectral methods,

bcFEM (Khoromskij, Melenk), Richardson extrapolation.

• Adaptive mesh refinement: a priori/a posteriori strateg.

• Best N-term nonlinear approximation (wavelet/FEM)

• Dimension reduction: boundary/interface equations,

Schur complement methods.

• Combination of tensor-product basis with anisotropic

adaptivity: hyperbolic cross approximation by

FEM/wavelets, sparse grids.

• Model reduction: multi-scale, homogenization, genetic

algorithms, neural networks.

• Monte-Carlo methods (e.g., random walk dynamics).

Fourier kingdom. Fourier transform in L1(R) B. Khoromskij, Leipzig 2005(L1) 7

Continuous Fourier transform (S.G. Mallat)

f(ω) :=∫

R

f(t)e−iωtdt.

If f ∈ L1(R) then f ∈ C0(R) and |f(ω)| ≤ ∫R|f(t)|dt < +∞.

If f, f ∈ L1(R) then the inverse Fourier transform is given by

f(t) :=12π

∫R

f(ω)eiωtdω.

Let f, h ∈ L1(R). The convolution

g(t) = f ∗ h :=∫

R

f(t− u)h(u)du

then satisfies

g =12π

∫R

g(ω)eiωtdω ∈ L1(R) with g(ω) = h(ω)f(ω).

Page 5: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Important features of the Fourier transform B. Khoromskij, Leipzig 2005(L1) 8

Each frequency eiωt is amplified by a factor h.

Hence a convolution is called a frequency filtering with a

transfer function of a filter h.

Important relations between f(t) and its FT f(ω):

Inverse: f(t) ⇐⇒ 2πf(−ω)Convolution: (h ∗ f)(t) ⇐⇒ h(ω)f(ω)Multiplication: h(t)f(t) ⇐⇒ 1

2π (h ∗ f)(ω)Translation: f(t− u) ⇐⇒ e−iuω f(ω)Modulation: eiνtf(t) ⇐⇒ f(ω − ν)Scaling: f(t/s) ⇐⇒ |s|f(sω)Time derivatives: f (p)(t) ⇐⇒ (iω)pf(ω)Frequency derivatives: (−it)pf(t) ⇐⇒ f (p)(ω)Complex conjugate: f∗(t) ⇐⇒ f∗(−ω)Hermitian symmetry: f(t) ∈ R ⇐⇒ f(−ω) = f∗(ω).

Fourier transform in L2(R) B. Khoromskij, Leipzig 2005(L1) 9

The inner product of f, h ∈ L2(R) and L2(R)-norm:

〈f, h〉 =∫

R

f(t)h∗(t)dt, ||f ||2 = 〈f, f〉 =∫

R

|f(t)|2dt.

Let f, h ∈ L1(R) ∩ L2(R). The Parseval and Plancherel

formulas read, respectively, as

〈f, h〉 =12π

∫R

f(ω)h∗(ω)dω, ||f ||2 =12π

∫R

|f(ω)|2dω.

The global regularity of f(t) can be controlled by the decay

rate of |f(ω)|, i.e.,

|f (k)(t)| ≤ 12π

∫R

|f(ω)||ω|kdω, k = 0, 1, ...

and f (k) is continuous, if the corresponding integrals converge.

Page 6: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Examples of FT (I) B. Khoromskij, Leipzig 2005(L1) 10

Example 1.1. For a Dirac δ (tempered distribution)

concentrated at the origin t = 0, i.e.,∫

Rδ(t)f(t)dt = f(0),

δ(ω) =∫

R

δ(t)e−iωtdt = 1 (formal representation).

Example 1.2. The FT of the characteristic (indicator, step)

function f(t) = χ[−T,T ](t) =

⎧⎨⎩1 if t ∈ [−T, T ],

0 otherwise:

f(ω) =∫ T

−T

e−iωtdt =2 sin(Tω)

ω∈ L1(R) (not integrable).

Example 1.3. An ideal low-pass filter has a transfer function

h = χ[−ξ,ξ](ω), thus its inverse FT (impulse response) is

h(t) =12π

∫ ξ

−ξ

eiωtdω =sin(ξt)

πt.

With ξ = π, we obtain the classical sinc-function.

Examples of FT (I) B. Khoromskij, Leipzig 2005(L1) 11

−1 −0.5 0 0.5 1 1.5 2

−0.5

0

0.5

1

1.5

Haar scaling function

−10 −8 −6 −4 −2 0 2 4 6 8 10−0.4

−0.2

0

0.2

0.4

0.6

0.8

1Sinc function

Figure 1: Haar (indicator) and Sinc scaling functions.

Functions χ[−π,π](t) (cf. Haar scaling function) and sinc(t)have the complementary (in fact, the opposite) features in

the time and frequency (Fourier) domains.

Numerous wavelet families realize certain compromise

between these two “extreme cases”.

Page 7: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Examples of FT (II) B. Khoromskij, Leipzig 2005(L1) 12

Example 1.4. A FT for a translated Dirac δτ (t) = δ(t− τ) is

calculated by evaluating e−iωt at t = τ :

δτ (ω) =∫

R

δ(t− τ)e−iωtdt = e−iωτ .

For the Dirac comb c(t) =∞∑

n=−∞δ(t− nT ) we have

c =∞∑

n=−∞e−inTω.

Example 1.5. A FT of a Gaussian f(t) = exp(−t2) ∈ C∞ is

also a Gaussian:

f(ω) =√

π exp(−ω2/4).

We readily get 2f ′(ω) + ωf(ω) = 0, which proves the statement.

Fourier series of 2π-periodic functions B. Khoromskij, Leipzig 2005(L1) 13

Denote by L2[−π, π] the Hilbert space of 2π-periodic functions

with the inner product and norm

〈f, h〉 =12π

∫ π

−π

f(ω)h∗(ω)dω, ||f ||2 =12π

∫ π

−π

|f(ω)|2dω.

Thm. 1.1. The family of functions e−ikωk∈Z is an

orthonormal basis of L2[−π, π].

Let lp(Z) be the space of complex-valued sequences f [k]k∈Z

such that∞∑

k=−∞|f [k]|p < +∞. Thm. 1.1 proves that if

f ∈ l2(Z), the Fourier series

f(ω) =∞∑

k=−∞f [k]e−iωk, with f [k] =

12π

∫ π

−π

f(ω)eiωkdω,

is the decomposition of f ∈ L2[−π, π] in the orthogonal Fourier

basis.

Page 8: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Discrete Fourier transform B. Khoromskij, Leipzig 2005(L1) 14

Let SN be the space of finite sequences f [n]0≤n<N of period

N . SN is an Euclidean space with the inner product

〈f, g〉 =N−1∑n=0

f [n]g∗[n].

Thm. 1.2. The familyek[n] = exp

(2iπkn

N

)0≤k<N

is an

orthogonal basis of SN with ||ek||2 = N . Any f ∈ SN can berepresented by

f =

N−1Xk=0

〈f, ek〉||ek||2

ek. (1)

Def. 1.1. The discrete Fourier transform (DFT) of f is

bf [k] := 〈f, ek〉 =

N−1Xn=0

f [n] exp

„−2iπkn

N

«, (N2 complex multipl.).

Due to (1) an inverse DFT is given by

f [n] :=1

N

N−1Xk=0

bf [k] exp

„2iπkn

N

«.

Fast Fourier transform: Outlook B. Khoromskij, Leipzig 2005(L1) 15

FFT: hierarchical recursive algorithm

The fast Fourier transform (FFT) can be traced back (1805)

to Gauss (1777 - 1855). First computer progr. Coolly/Tukey (1965).

FFT is to split the unknown Fourier coefficients f [k],k = 0, ..., N − 1, into the odd and even parts.

Let N = 2q. This allows to make use recursion:

a problem of dimension N = 2q (level q) is transformed to two

problems of dimension N/2 = 2q−1 (level q − 1) plus O(N)operations, etc. until it is reduced to N = 2q problems of

dimension 1 (level 0).

Since the cost per step is O(N) and the number of levels is

q = log2 N , this results in the linear-logarithmic complexity

O(N log2 N) ∼ CF N log2 N with small const. CF ∼ 4.

Page 9: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

FFT: sketch of the algorithm B. Khoromskij, Leipzig 2005(L1) 16

When the frequency index is even, we group the terms n and n + N/2:

f [2k] =N/2−1∑

n=0

(f [n] + f [n + N/2]) exp(−2iπkn

N/2

).

When the frequency index is odd, we have

f [2k + 1] =N/2−1∑

n=0

exp(−2iπn

N

)(f [n]− f [n + N/2]) exp

(−2iπkn

N/2

).

First equation shows that even frequencies are obtained

calculating the DFT of N/2 periodic signal

fe[n] = f [n] + f [n + N/2],

second eq. implies that odd frequencies can be computed by

the DFT of the diagonally scaled N/2 periodic signal

fo[n] = exp(−2iπn

N

)(f [n]− f [n + N/2]).

FFT: Matrix representation B. Khoromskij, Leipzig 2005(L1) 17

The FT matrix FN = fk,nNk,n=1 is given by

fk,n := exp(−2iπkn

N) = W−nk, W = e2iπ/N .

The FFT recursion connects the M-point transform to two

copies of the M/2-point transform

FN =

⎛⎝ IN/2 DN/2

IN/2 −DN/2

⎞⎠⎛⎝ FN/2 0

0 FN/2

⎞⎠⎛⎝ even

odd

⎞⎠ .

IN/2 is the identity matrix, DN/2 is the diagonal matrix with

diagonal entries 1, W−1, ..., W−N/2. The permutation matrix

at the end transforms the input vector into its “even” and its

“odd” part.

Finally, the FFT algorithm keeps going, recursively:

FN → FN/2 → ... → F1.

Page 10: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

FFT: complexity, inverse transform B. Khoromskij, Leipzig 2005(L1) 18

The DFT(N) may be calculated with two DFT(N/2) plus

CF N operations to compute fe[n] and fo[n], n = 0, ..., N/2− 1.We obtain the recursion

NFFT (N) = 2NFFT (N/2) + CF N with NFFT (1) = 0.

Setting N = 2q, q ∈ N, and introducing Q(q) = NFFT (N)/N , we

get

Q(q) = Q(q − 1) + CF with Q(0) = 0,

which implies Q(q) = CF q. Hence NFFT (N) = CF N log2 N .

The inverse FFT of f can be derived from the forward FFT

of its complex conjugate f∗ due to

f∗[n] :=1N

N−1∑k=0

f∗[k] exp(−2iπkn

N

).

FFT: fast discrete convolution B. Khoromskij, Leipzig 2005(L1) 19

Let g be the discrete convolution of two signals f, h supported

only by the indices 0 ≤ n ≤ M − 1,

g[n] = (f ∗ h)[n] =∞∑

k=−∞f [k]h[n− k].

The naive implementation requires M(M + 1) operations.

It can be represented as a matrix-by-vector product (MVP)

with the Toeplitz matrix

T = h[n− k]0≤n,k<M ∈ RM×M , g = Tf.

Extending f and h with over M samples by

h[M ] = 0, h[2M − i] = h[i], i = 1, ..., M − 1,

f [n] = 0, n = M, ..., 2M − 1,

we reduce the problem to the MVP with a circulant matrix

C ∈ R2M×2M specified by the first row h ∈ R2M .

Page 11: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

FFT: circulant convolution B. Khoromskij, Leipzig 2005(L1) 20

An n× n matrix C is called circulant if it has the form

C = circc1, . . . , cn :=

⎛⎜⎜⎜⎜⎜⎜⎝c1 c2 . . . cn

cn c1 . . . cn−1

......

. . ....

c2 . . . cn c1

⎞⎟⎟⎟⎟⎟⎟⎠ , ci ∈ C .

The set of all n× n circulant matrices is closed with respect

to addition and multiplication by a constant.

Any circulant matrix C is associated with the polynomial

pc(z) := c1 + c2z + . . . + cnzn−1, z ∈ C.

FFT: circulant convolution B. Khoromskij, Leipzig 2005(L1) 21

Matrix C has a diagonal representation in the Fourier basis,

C = FTn ΛcFn

with

Λc = diagpc(1), . . . , pc(ωn−1), ω = eiπ/n.

The eigenvector corresponding to the eigenvalue pc(ωj−1) is

given by jth column of Fn, i.e.,

ωj =1√n

(ω(k−1)(j−1))k=1,...,n.

The matrix-vector product with C costs 2CF n log2 n + O(n) op.

Multi-dimensional FFT can be performed by tensorization

process with the linear-logarithmic cost O(N log2 N), N = nd.

Page 12: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Literature to Lecture 1 B. Khoromskij, Leipzig 2005(L1) 22

1. S.G. Mallat: A Wavelet Tour of Signal Processing. Academic Press, San Diego, 1999.

2. W. Hackbusch: Hierarchiche Matrizen - Algorithmen und Analysis. Vorlesungsmanuskript, Leipzig 2004.

3. W. Hackbusch and B.N. Khoromskij: Hierarchical Kronecker Tensor-Product Approximation to a Class

of Nonlocal Operators in High Dimensions. Preprint 16, MPI MIS, Leipzig 2004.

4. B.N. Khoromskij: Data-sparse approximation of nonlocal operators. Lecture notes 17, MPI MIS,

Leipzig 2003.

URL: http://personal-homepages.mis.mpg.de/bokh

http://www.mis.mpg.de/scicomp/Fulltext/Khoromskij/khor1.ps

Lecture 2. Sampling Theorem, Sinc Approximation B. Khoromskij, Leipzig 2005 23

How to discretise analog signals ?

The class of functins f(t), t ∈ R (analog signals) can be

discretized by recording their sample values f(nh)n∈Z at

intervals h > 0.V.A. Kotelnikov (1933) and J. Whittaker (1935) proved a celebrated

theorem: band-limited signals can be exactly reconstructed

via their sampling values.

The sinc function (also called Cardinal function) is given as

sinc(x) :=sin(πx)

πxwith convention sinc(0) = 1.

Thm. 2.1. (Kotelnikov, Shannon, Whittaker) If the support of f is

included in [−π/h, π/h] then

f(t) =∞∑

n=−∞f(nh)Sn,h(t), t ∈ R, (2)

where Sn,h(t) = sinc(t/h− n).

Page 13: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Sampling Theorem B. Khoromskij, Leipzig 2005(L2) 24

−1 −0.5 0 0.5 1 1.5 2

−0.5

0

0.5

1

1.5

Haar scaling function

−10 −8 −6 −4 −2 0 2 4 6 8 10−0.4

−0.2

0

0.2

0.4

0.6

0.8

1Sinc function

Figure 2: Haar (cf. bf of f = sinc) and Sinc scaling functions.

Sampling theorem plays an important role in tele/radio

communications, signal processing, stochastical models etc.

The class of band-limited functions has a direct

characterisation, namely, it is the Paley-Wiener space W (π/h)of entire functions of exponential type (see later on).

Proof of Sampling Theorem (I) B. Khoromskij, Leipzig 2005(L2) 25

Preliminaries to the proof.

(a) The Poisson formula is (in the sense of distributions)

c =∞∑

n=−∞e−inhω =

h

∞∑k=−∞

δ

(ω − 2kπ

h

). (3)

Recall that c =∞∑

n=−∞e−inhω is the FT of the Dirac comb

c(t) =∞∑

n=−∞δ(t− nh) (cf. Ex. 1.4).

Since c is 2πh -periodic, it suffices to prove that c[−π/h,π/h] = 2π

h δ.

(b) To any sample f(nh) we associate a Dirac and introduce

the weighted Dirac sum fd(t) :=∞∑

n=−∞f(nh)δ(t− nh). Since the

FT of δ(t− nh) is e−inhω, we obtain fd =∞∑

n=−∞f(nh)e−inhω.

Page 14: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Proof of Sampling Theorem (II) B. Khoromskij, Leipzig 2005(L2) 26

(c) Now f(t) can be computed from the sample values f(nh)due to the simple relation between FTs fd and f as follows.

Lem. 2.2. The FT of fd is given by

fd(ω) =1h

∞∑k=−∞

f

(ω − 2kπ

h

).

Proof. f(nh)δ(t− nh) = f(t)δ(t− nh) implies

fd(t) := f(t)∞∑

n=−∞δ(t− nh) ≡ f(t)c(t).

Computing the FTs

fd =12π

f ∗ c(ω) (4)

we apply the Poisson formula (3) to represent c(ω).

Since f ∗ δ(ω− ξ) = f(ω− ξ), inserting the above formula to (4)

proves Lem. 2.2.

Proof of Sampling Theorem (III) B. Khoromskij, Leipzig 2005(L2) 27

Proof of Sampling Theorem.

If n = 0, the support of f(ω − nπ/h) does not intersect the

support of f(ω) since f(ω) = 0 for |ω| > π/h. Thus Lem. 2.2

implies

fd(ω) =f(ω)

hif |ω| ≤ π

h.

Recall that the FT of S0,h = sinc(t/h) is S0,h = hχ[−π/h,π/h].

Since supp(f) ∈ [−π/h, π/h], the previous relation results in

f(ω) = S0,h(ω)fd(ω).

The inverse FT of this equation, that is f(t) = S0,h ∗ fd(t),leads to the required result (since Sn,h(t) = S0,h(t− nh))

f(t) = S0,h ∗∞∑

n=−∞f(nh)δ(t− nh) =

∞∑n=−∞

f(nh)S0,h(t− nh).

Page 15: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Generalised sampling theorem B. Khoromskij, Leipzig 2005(L2) 28

Sampling Thm. as a decomposition in orthogonal basis.

Define the space Uh as a set of functions whose FTs have a

support included in [−π/h, π/h].

Lem. 2.2. A set of functions Sn,h(t)n∈Z is an orthogonal

basis of the space Uh. If f ∈ Uh then

f(nh) =1h〈f(t), Sn,h(t)〉 .

Cor. 2.3. The sinc-interpolation formula of Thm. 2.1 can be

interpreted as a decomposition of f ∈ Uh in an orthogonal

basis of Uh:

f(t) =1h

∞∑n=−∞

〈f(·), Sn,h(·)〉Sn,h(t).

If f ∈ Uh, one finds the orthogonal projection of f in Uh.

Proof of Lemma 2.2 B. Khoromskij, Leipzig 2005(L2) 29

Use Sampling Theorem and the Parseval formula.

Recall that S0,h = hχ[−π/h,π/h] and apply the Parseval formula

〈Sn,h(u), Sm,h(t)〉 = 12π

∫R

h2χ[−π/h,π/h]e−inhωeimhωdω

= h2

π/h∫−π/h

e−i(n−m)hωdω = hδ[n−m].

Hence, Sn,h(t)n∈Z is the orthogonal family. Since

Sn,h(t) ∈ Uh, Thm. 2.1 implies that any f ∈ Uh can be

represented as a linear combination of Sn,h(t)n∈Z, i.e., the

latter is an orthogonal basis of Uh.

To verify the second assertion, we again apply the Parseval

formula to obtain

〈f(t), Sn,h(t)〉 =h

∫ π/h

−π/h

f(ω)einhωdω = hf(nh).

Page 16: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Sinc-interpolation of entire functions B. Khoromskij, Leipzig 2005(L2) 30

When the Sinc-interpolant represents a funct. exactly?

C(f, h)(x) =∞∑

k=−∞f(kh)Sk,h(x).

Def. 2.5 Let h > 0, and let W(π/h) denote the family of

entire functions, s.t.∫

R|f(t)|2dt < ∞, and s.t. for all z ∈ C

|f(z)| ≤ Ceπ|z|/h with constant C > 0.

Thm. 2.4 (Stenger) h−1/2Sk,h(x)k∈Z is a complete

L2(R)-orthonormal sequence in W(π/h). Every f ∈W(π/h) has

the cardinal series representation

f(x) = C(f, h)(x), x ∈ R.

Proof: Consequence of the classical Paley-Wiener Theorem.

Sinc-approximation of analytic functions B. Khoromskij, Leipzig 2005(L2) 31

Interpolant C(f, h) provides an incredibly accurate approx.

on R for functions that are analytic and uniformly bounded on

the strip

Dδ := z ∈ C : |m z| ≤ δ, 0 < δ <π

2,

such that

N(f, Dδ) :=∫

R

(|f(x + iδ)|+ |f(x− iδ)|) dx < ∞.

This defines the Hardy space H1(Dδ).

For functions f ∈ H1(Dδ)

supx∈R

|f(x)− C(f, h)(x)| = O(e−πδ/h) h → 0. (5)

Page 17: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Sinc-approximation of analytic functions B. Khoromskij, Leipzig 2005(L2) 32

Likewise, if f ∈ H1(Dδ), the integral

I(f) =∫

Ω

f(x)dx (Ω = R or Ω = R+) (6)

can be approximated with exponential convergence by the

Sinc-quadrature

T (f, h) := h

∞∑k=−∞

f(kh)(

=∫

R

C(f, h)(x)dx ≈ I(f))

,

|I(f)− T (f, h)| = O(e−πδ/h) h → 0. (7)

Analogues estimates hold for (computable) trucated sums

CM (f, h) =∑M

k=−M f(kh)Sk,h(x), TM (f, h) = h∑M

k=−M f(kh).

Standard error estimates B. Khoromskij, Leipzig 2005(L2) 33

Thm. 2.5. (Stenger) If f ∈ H1(Dδ) and |f(x)| ≤ C exp(−b|x|) for

all x ∈ R b, C > 0, then

‖f − CM (f, h)‖∞ ≤ C

[e−πδ/h

2πδN(f, Dδ) +

1bh

e−bhM

], (8)

|I(f)− TM (f, h)| ≤ C

[e−2πδ/h

1− e−2πδ/hN(f, Dδ) +

1be−bhM

]. (9)

Proof: First term of the rhs in (8) represents the

approximation error (5),

‖f(x)− C(f, h)(x)‖∞ ≤ N(f, Dδ)2πδ sinh(πδ/h)

,

while the second one gives the truncation error

‖C(f, h)(x)− CM (f, h)(x)‖∞ ≤ ∑|k|≥M+1

|f(kh)|

≤ 2C∞∑

k=M+1

e−bkh ≤ 2Cbh e−bhM .

Page 18: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Exponential convergence rate B. Khoromskij, Leipzig 2005(L2) 34

Similar arguments apply to (9).

For interpolation error (8), the choice

h =√

πδ/bM

implies the exponential convergence rate

‖f − CM (f, h)‖∞ ≤ CM1/2e−√

πδbM . (10)

In fact, for the chosen h, the first term in the right-hand side

in (8) dominates, hence (10) follows. Usually we set δ = π/2.

For the quadrature error (9), the choice

h =√

2πδ/bM

yields

|I(f)− TM (f, h)| ≤ Ce−√

2πδbM . (11)

Error bounds in the case of double-exponential decay B. Khoromskij, Leipzig 2005(L2) 35

If f has a double-exponential decay as |x| → ∞, i.e.,

|f(x)| ≤ C exp(−bea|x|) for all x ∈ R with a, b, C > 0, (12)

the convergence rate of Sinc-interpolation and quadrature

can be improved up to O(e−cM/ log M ) (cf. Thm. 2.5).

Thm. 2.6. (Gavrilyuk, Hackbusch, Khoromskij) Let f ∈ H1(Dδ) with

some δ < π2 , and let (12) hold. Then the choice

h = log( 2πaMb )/ (aM) leads for the quadrature error

|I − TM (f, h)| ≤ C N(f, Dδ)e−2πδaM/ log(2πaM/b). (13)

The choice h = log(πaMb )/ (aM) implies for the interpolation

error

‖f − CM (f, h)‖∞ ≤ CN(f, Dδ)

2πδe−πδaM/ log(πaM/b). (14)

Page 19: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Error bounds in the case of double-exponential decay B. Khoromskij, Leipzig 2005(L2) 36

Proof. The quadrature error has a bound

|I − TM (f, h)| ≤ C

[e−2πδ/h

1− e−2πδ/hN(f, Dδ) +

e−ahM

abexp(−beahM )

].

In fact the bound for |I − T (f, h)| is the same as in Thm. 2.5.

For the rest sum we use the simple estimate to obtain

∑k: |k|>M

exp(−bea|kh|) = 2∞∑

k=M+1

exp(−bea|kh|)

≤ 2∫ ∞

M

exp(−bea|xh|)dx ≤ 2e−ahM

abhexp(−beahM ).

Now (13) follows.

Error bounds in the case of double-exponential decay B. Khoromskij, Leipzig 2005(L2) 37

The interpolation error of CM (f, h) satisfies

‖f − CM (f, h)‖∞ ≤ C

[e−πδ/h

2πδN(f, Dδ) +

e−ahM

abhexp(−beahM )

].

Again, the approximation error allows the same estimate as in

the standard case. The truncation error bound is determined

by the decay rate of f as |x| → ∞,

‖C(f, h)(x)− CM (f, h)(x)‖∞ ≤ ∑|k|≥M+1

|f(kh)|

≤ 2C∞∑

k=M+1

e−beakh ≤ 2CbaheahM e−beahM

,

which proves (14).

Page 20: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Sinc-interpolation on (a, b) via Thm. 2.5 B. Khoromskij, Leipzig 2005(L2) 38

To apply Thm. 2.5 in the case Ω = (a, b) (say, Ω = R+) one

has to substitute the variable x ∈ Ω by x = ϕ(ζ) such that

ϕ : R → (a, b) is a bijection. This changes f : (a, b)→ R into

f1 := ϕ′ · (f ϕ) : R → R (quadrature),

f1 := f ϕ (interpolation).

Assuming f1 ∈ H1(Dδ), one can apply (10)-(11) to the

transformed function.

Ex. 2.1. In the case of interval, (a, b):

ϕ−1(z) = log[(z − a)/(b− z)], e z = x.

Ex. 2.2. In the case of semi-axis, R+ := (0,∞):

ϕ−1(z) = log[sinh(z)] or ϕ−1(z) = log(z).

Sinc quadratures on R+ B. Khoromskij, Leipzig 2005(L2) 39

Polynomial decay. Let us set Ω = R+ and assume:

(i) f can be analytically extended from R+ into the sector

D(1)δ = z ∈ C : | arg(z)| < δ for some 0 < δ < π/2, (15)

(actually, ϕ−1 : D(1)δ → Dδ is the conformal map),

(ii) f satisfies the inequality

|f(z)| ≤ c|z|α−1(1+|z|)−α−β for some 0 < α, β ≤ 1 and ∀z ∈ D(1)δ .

Let α = 1. Choosing any M ∈ N and taking

h(1) =√

2πδ/(βM), (16)

we define the corresponding quadrature rule (with ϕ(ζ) = eζ)

I(1)M = h(1)

M∑k=−βM

κ(1)k f(z(1)

k ), z(1)k = ekh(1)

, κ(1)k = ekh(1)

,

Page 21: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Sinc quadratures on R+ B. Khoromskij, Leipzig 2005(L2) 40

possessing the exponential convergence rate∣∣∣I − I(1)M

∣∣∣ ≤ Ce−√

2πδβM (17)

with a positive constant C independent of M .

d

d 0

Dd1

id

0

d

d

Dd3

Figure 3: The analyticity sector D(1)δ (left) and the “bullet-shaped” do-

main D(2)δ .

Sinc quadratures on R+ B. Khoromskij, Leipzig 2005(L2) 41

Exponentail decay. Assume that the integrand f can be

analytically extended into the “bullet-shaped” domain

D(2)δ = z ∈ C : | arg(sinh z)| < δ, 0 < δ < π/2,

and that f satisfies

|f(z)| ≤ C

( |z|1 + |z|

)α−1

e−β e z in D(2)δ , α, β ∈ (0, 1]. (18)

Setting α = 1 and choosing h(2) = h(1), κ(2)k = 1 + e−2kh(2)

and

M ∈ N, we obtain the quadrature

I(2)M = h(2)

M∑k=−βM

κ(2)k f(z(2)

k ), z(2)k = log[ekh(2)

+√

1 + e2kh(2) ],

possessing again the exponential convergence rate.

Page 22: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Improved Sinc-interpolation on (a, b) via Thm. 2.6 B. Khoromskij, Leipzig 2005(L2) 42

For applications in FEM/BEM, we reformulate the result of

Thm. 2.6 for parameter dependent functions g(x, y),y ∈ Y ⊂ Rm, defined on the reference interval x ∈ (0, 1].Introduce the mapping

ζ ∈ Dδ → φ(ζ) =1

cosh(sinh(ζ)), δ <

π

2. (19)

Clearly, (0, 1] = φ(R) and, also, φ(ζ) decays twice exponentially,

|φ(ζ)| ≤ 2 exp(−cos δ

2e|e ζ|), ζ ∈ Dδ.

In particular, we have |φ(ζ)| ≤ 2 exp(− 12e|ζ|), ζ ∈ R. Let

Dφ(δ) := φ(ζ) : ζ ∈ Dδ ⊃ (0, 1] be the image of Dδ. One

checks easily that Dφ(δ) ⊂ Sr(0)\0, where Sr(0) is the disc

around zero with a radius r > 1.

Improved Sinc-interpolation on (a, b) via Thm. 2.6 B. Khoromskij, Leipzig 2005(L2) 43

Hence, if a function g is holomorphic on Dφ(δ), then

f(ζ) := φα(ζ)g(φ(ζ)) for any α > 0

is also holomorphic on Dδ. Now the Sinc interpolation

CM (f(·, y), h)(ζ) =M∑

k=−M

f(kh, y)Sk,h(ζ)

with the back-transformation ζ = φ−1(x) = arsinh(arcosh( 1x))

and multiplication by x−α yields the separable approximation

gM (x, y) :=M∑

k=−M

φ(kh)α

xαg(φ(kh), y)Sk,h(φ−1(x)) ≈ g(x, y) (20)

of g(x, y) for x ∈ (0, 1] = φ(R) and y ∈ Y . Since φ(ζ) is an even

function, the separation rank in (20) is reduced to r = M + 1.

Page 23: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Improved Sinc-interpolation on (a, b) via Thm. 2.6 B. Khoromskij, Leipzig 2005(L2) 44

Cor. 2.7. Assume that for all y ∈ Y the functions g(·, y) and

f(ζ, y) := φα(ζ)g(φ(ζ), y) satisfy:

(a) g(·, y) is holomorphic on Dφ(δ), and supy∈Y N(f, Dδ) < ∞(b) f(·, y) satisfies (12) with a = 1 and with certain C, b ∀y ∈ Y .

Then, for all y ∈ Y , the optimal choice h := log MM yields

EM (ζ) := |f(ζ, y)− CM (f(·, y), h)(ζ)| ≤ CN(f, Dδ)

2πδe−

πδMlog M , (21)

|g(x, y)− gM (x, y)| ≤ |x|−α ∣∣EM (f(·, y), h)(φ−1(x))∣∣ . (22)

Proof: Due to the properties of φ : Dδ → Dφ(δ), condition (a) implies

f ∈ H1(Dδ), hence, in view of (b), we can apply Thm. 2.6. NowN(f,Dδ)

2πδe−πδM/ log M corresponds to approx. err., while the evaluation of

truncation err. yields the bound 2Cb log M

e−bM , which is asymptotically

faster decaying as M → ∞. Now (21) follows.

Transforming to approximand (20) implies the bound (22) for g − gM .

Numerics for the Sinc interpolation B. Khoromskij, Leipzig 2005(L2) 45

Ex. 2.3. Separable approximation to the function

g(x, y) = |x|λ sinc(|x| |y|), λ ∈ (−3, 1],

arising from the Boltzmann equation.

4 8 12 16 20 24 28 32 36 40 44 4810

−12

10−10

10−8

10−6

10−4

10−2

100

M − number of quadrature points

erro

r

|x|s sinc(y|x|), x ∈ [−1,1],s=1,y=16

4 8 12 16 20 24 28 32 36 40 44 4810

−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

M − number of quadrature points

erro

r

|x|s sinc(y|x|), x ∈ [−1,1],s=1,y=25

4 8 12 16 20 24 28 32 36 40 44 4810

−5

10−4

10−3

10−2

10−1

M − number of quadrature points

erro

r

|x|s sinc(y|x|), x ∈ [−1,1],s=1,y=36

Figure 4: L∞-error of the sinc-interpolation to |x|λsinc(|x|y), x ∈[−1, 1], y ∈ [1, 36], λ = 1.

Page 24: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Numerics for the Sinc interpolation B. Khoromskij, Leipzig 2005(L2) 46

Ex. 2.4. Sinc-interpolation for g(x, y) = exp(−xy), x, y ≥ 0.

Consider the auxiliary function f(x, y) = x1+x exp(−xy), x ∈ R+,

y ∈ [1, R], which satisfies all the conditions above with

α = β = 1 (exponential decay). With the choice of

interpolation points xk := log[ekh +√

1 + e2kh] ∈ R+, it can be

approximated with exponential convergence.

4 8 12 16 20 24 28 32 36 40 44 4810

−14

10−12

10−10

10−8

10−6

10−4

10−2

100

M − number of quadrature points

erro

r

|x|s exp(−y|x|), x ∈ [−1,1],s=1,y=1.

4 8 12 16 20 24 28 32 36 40 44 4810

−14

10−12

10−10

10−8

10−6

10−4

10−2

100

M − number of quadrature points

erro

r

|x|s exp(−y|x|), x ∈ [−1,1],s=1,y=10.

4 8 12 16 20 24 28 32 36 40 44 4810

−10

10−8

10−6

10−4

10−2

M − number of quadrature points

erro

r

|x|s exp(−y|x|), x ∈ [−1,1],s=1,y=100.

Figure 5: L∞-error of the sinc-interpolation of exp(−|x|y), x ∈ [−1, 1], y ∈ [1, 100] .

Numerics for the Sinc interpolation B. Khoromskij, Leipzig 2005(L2) 47

Ex. 2.5. Mexican hat scaling function

−5 −4 −3 −2 −1 0 1 2 3 4 5 6−0.5

0

0.5

1Mexican hat scaling function

Figure 6: Mexican hat f(x) = (1− x2) exp(−αx2), α > 0.

Sinc interpolation to the Mexican hat, r = M + 1.

α\M 4 9 16 25 36 49 64 81 100

1 0.05 6.10-4 7.10-7 1.10-10 2.10-15 1.10-15 - - -

10 0.17 0.13 0.12 0.04 0.01 0.004 0.0009 1.710-4 2.610-5

0.1 3.8 2.6 0.6 0.08 0.006 1.610-5 2.10-7 2.510-9 2.10-11

Page 25: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Numerics for the Sinc interpolation B. Khoromskij, Leipzig 2005(L2) 48

Ex. 2.6. (Helmholtz kernel in Rd). Define

f(ζ, η, ν) :=eiκ|x−y|

|x− y| , ζ = |x1 − y1|, η = |x2 − y2|, ν = |x3 − y3|.

For (ζ, η) ∈ [0, 1]× [0, b], consider

F (ζ, η) := f(ζ, η, 0) =eiκ√

ζ2+η2√ζ2 + η2

.

We approximate the modified function

F0(ζ, η) := ζα0(F (ζ, η)− F (0, η)), 0 < α0 < 1, (23)

on the domain Ω1 := [δ, 1]× [0, b], where δ > 0 is a small

parameter. The considerations for the remaining domain

Ω2 := [0, δ]× [δ, b] are completely similar.

Numerics for the Sinc interpolation B. Khoromskij, Leipzig 2005(L2) 49

4 8 12 16 20 24 28 32 36 40 44 4810

−10

10−8

10−6

10−4

10−2

100

M − number of quadrature points

erro

r

|x|β cos(κ |x|)/|x|, x ∈ [−1,1], β=0.95

4 8 12 16 20 24 28 32 36 40 44 4810

−10

10−8

10−6

10−4

10−2

100

M − number of quadrature points

erro

r

|x|β cos(κ |x|)/|x|, x ∈ [−1,1],β=0.95, κ=1.

4 8 12 16 20 24 28 32 36 40 44 4810

−10

10−8

10−6

10−4

10−2

100

M − number of quadrature points

erro

r

|x|β cos(κ |x|)/|x|, x ∈ [−1,1],β=0.95, κ=10

Figure 7: Error (depending on κ !) for the Sinc-interpolation to F0 with

κ = 0.01, 1.0, 10, respectively, from left to right.

Page 26: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Numerics for the Sinc interpolation B. Khoromskij, Leipzig 2005(L2) 50

−1 −0.5 0 0.5 1−5

0

5x 10

−6

−1 −0.5 0 0.5 1−6

−4

−2

0

2

4

6

8x 10

−8

−1 −0.5 0 0.5 1−5

0

5x 10

−9

Figure 8: Pointwise error for the Sinc-interpolation to F0 with κ = 0.01

for r = 25 (left), r = 37 (middle) and r = 49.

Literature to Lecture 2 B. Khoromskij, Leipzig 2005(L2) 51

1. S.G. Mallat: A Wavelet Tour of Signal Processing. Academic Press, San Diego, 1999.

2. W. Hackbusch: Hierarchiche Matrizen - Algorithmen und Analysis. Vorlesungsmanuskript, Leipzig 2004.

3. I.P. Gavrilyuk, W. Hackbusch, and B.N. Khoromskij: Tensor-product approximation to elliptic and parabolic

solution operators in higher dimensions. Preprint 83, MPI MIS, Leipzig 2003; Computing (to appear).

4. W. Hackbusch and B.N. Khoromskij: Hierarchical Kronecker Tensor-Product Approximation to a Class

of Nonlocal Operators in High Dimensions. Preprint 16, MPI MIS, Leipzig 2004.

5. F. Stenger: Numerical methods based on Sinc and analytic functions. Springer-Verlag, 1993.

http://personal-homepages.mis.mpg.de/bokh

http://www.mis.mpg.de/scicomp/Fulltext/Khoromskij/khor2.ps

Page 27: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Lecture 3. Introduction to wavelet methods B. Khoromskij, Leipzig 2005 52

Wavelet is the mathematical microscop (B. Hubbard)

Purposes:

• Audio/video compression, radar processing

• Surface identification/analysis

• Image analysis (e.g., “finger prints”, medical imaging)

• Communications (radio, TV)

• Numerical PDEs and IEs, many-particle systems, ...

The fundamental theory behind wavelets is known as the

multi–resolution analysis (MRA).

The MRA provides a great deal of possibilities for multi-level

data and signal processing getting widespread popularity.

Basic ideas and history B. Khoromskij, Leipzig 2005(L3) 53

The multiresolution approach is based on the idea that the

wavelet functions generate a hierarchical sequence of

subspaces in L2(R), which forms the MRA,

Vj+1 ⊂ Vj ⊂ ... ⊂ V0 ⊂ V−1 ⊂ ....

A signal f0 ∈ V0 (at scale 20) is split into a “blurred” version

f1 ∈ V1 at the coarser scale 21 and “detail” d1 ∈ W1 at scale 20.

Repeating this process gives a sequence f0, f1, f2, ... of more

and more blurred versions and the details d1, d2, d3, ....

Each dj can be represented in the wavelet basis using the

“filter coefficients” (high-pass filters), while fj are given in

the scaling function basis via the low-pass filters.

After J iterations the original signal can be exactly

reconstructed f0 = fJ + d1 + ... + dJ .

Page 28: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Basic ideas and history B. Khoromskij, Leipzig 2005(L3) 54

MRA is completely recursive and hence ideal for computation.

Important ingredient is the discrete wavelet transform

(DWT).

DFT allows fast implementation (FWT) with the linear cost

O(N), N = 2J .

Orthogonal wavelets are generated by

– the scaling function (SF) ϕ(x) (mother wavelet) and

– the wavelet ψ(x) (father wavelet).

Sinc approximation method (cf. Lect. 2) can be inspected

within the wavelet concept: Sinc MRA, Sinc wavelet.

It is instructive to compare the Sinc and Haar MRA.

Basic ideas and history B. Khoromskij, Leipzig 2005(L3) 55

A wavelet ψ(x) is a function of zero average∫R

ψ(x)dx = 0.

Using dilated and translated versions of ψ defined by

ψu,s(x) =1√sψ(

x − u

s),

one can apply the continious wavelet transform (cf. the continious FT)

Wf(u, s) :=∫

R

f(x)ψ∗u,s(x)dx.

This provides two-dimensional representation of

one-dimensional signal, which indicates some redundancy.

Elimination of this redundancy can be done by constructing a

basis of the signal space. Hence the next step would be the

discrete wavelet transform. First example is given by the

classical Haar wavelet (Haar 1910).

Page 29: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Multi–Resolution Analysis B. Khoromskij, Leipzig 2005(L3) 56

The SF ϕ(x) generates an orthogonal MRA if it satisfies the

following conditions (i)-(iv):

(i) Translates of these functions with integers

ϕk(x) = ϕ(x− k), k ∈ Z,

are linearly independent and produce the Riesz bases of the

subspace V0 ⊂ L2(R): there exist A, B > 0 s.t. for all

f =∞∑

k=−∞a[k]ϕk(x) ∈ V0, we have

A‖f‖2 ≤∞∑

k=−∞a[k]2 ≤ B‖f‖2.

In the case of orthogonal basis A = B = 1.

Multi–Resolution Analysis B. Khoromskij, Leipzig 2005(L3) 57

(ii) Dyadic dilates of these functions ϕj,k = ϕ(2−jx− k), j ∈ Z,

generate hierarchical sets of subspaces Vj. Specifically, Vj

contains all scaling functions on level j. This means that if a

function f(x) ∈ Vj, its integer translates proportional to the

scale 2j have to be contained in the same space,

f(x) ∈ Vj ⇔ f(x− 2jk) ∈ Vj , k ∈ Z.

(iii) The scaling function spaces satisfy Vj+1 ⊂ Vj, i.e., an

approximation at a resolution 2−j contains all the information

to compute an approximation at coarser resolution 2−j−1.

Moreover, if f(x) ∈ Vj, the dilated function f(x/2) has to be

contained in the coarser resolution space Vj+1

f(x) ∈ Vj ⇔ f(x/2) ∈ Vj+1, j ∈ Z.

Page 30: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Multi–Resolution Analysis B. Khoromskij, Leipzig 2005(L3) 58

(iv) The scaling function spaces also satisfy

(a) limj→∞

Vj =∞⋂

j=−∞Vj = 0,

(b)∞⋃

j=−∞Vj is dense in L2(R).

Specifically, (b) means

limj→−∞

Vj = Closure

⎛⎝ ∞⋃j=−∞

Vj

⎞⎠ = L2(R).

Recall that 2−j is the resolution and 2j is a scale parameter.

Scaling (delation) equation B. Khoromskij, Leipzig 2005(L3) 59

The set of functions ϕj,k(x) is supposed to be orthogonal. It

means that for any k, k′ ∈ Z:∫R

ϕj,k(x)ϕj,k′(x) dx = δkk′ , j ∈ Z.

Let ϕnn∈Z be an orthogonal basis of V0. Then the family

ϕj,nn∈Z is an orthogonal basis of Vj, j ∈ Z, where

ϕj,n(x) :=1

2j/2ϕ

(x− n

2j

).

The orthogonal projection of f over Vj is given by

PVj f =∞∑

n=−∞aj [n]ϕj,n, aj [n] = 〈f, ϕj,n〉 ,

where aj [n] provide a discrete approximation at the scale 2j.

Page 31: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Scaling (delation) equation B. Khoromskij, Leipzig 2005(L3) 60

Scaling (delation) equation.

Since 2−1/2ϕ(x/2) ∈ V1 ⊂ V0, we can decompose

1√2ϕ(x/2) =

∞∑n=−∞

h[n]ϕ(x− n) with (24)

h[n] =1√2〈ϕ(x/2), ϕ(x− n)〉 .

In signal processing, the sequence h[n] is interpreted as a

discrete filter usually called as a conjugate mirror filter

(Mallat, Meyer) or low-pass filter.

For scaling functions with compact support h[n] is the finite

sequence (cf. the Haar SF).

If ϕ(x) has infinite support h[n] might be an infinite sequence

(cf. the Sinc SF).

Scaling equation B. Khoromskij, Leipzig 2005(L3) 61

The FT of (24) implies

ϕ(2ω) =1√2h(ω)ϕ(ω) for h(ω) =

∞∑n=−∞

h[n]e−inω.

For any p ≥ 0, the previous implies

ϕ(2−p+1ω) =1√2h(2−pω)ϕ(2−pω).

Thus, by substitution, we obtain (with arbitrary P ∈ N)

ϕ(ω) =

(P∏

p=1

h(2−pω)√2

)ϕ(2−P ω) =

( ∞∏p=1

h(2−pω)√2

)ϕ(0) (25)

(the latter, if ϕ(ω) is continuous at ω = 0).

Page 32: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Haar and Sinc MRA: check cond. (i)-(iv) B. Khoromskij, Leipzig 2005(L3) 62

Ex. 3.1. Define the Haar scaling function

ϕ(x) = χ0,1(x).

The Haar MRA corresponds to the approximation by

piecewise const. funct., cond. (i)-(iv) can be easily checked.

Clearly, ϕk is the orthogonal basis (i.e., A = B = 1).

Vj ⊂ L2(R) consists of functions which are constant for

x ∈ [n2j , (n + 1)2j) and n ∈ Z, so that Vj ⊂ Vj−1.

The approximation at a resolution 2−j is a projection on a set

of piecewise constant functions on intervals of size 2j.

The filter coefficients h[n] = 1√2〈ϕ(x/2), ϕ(x− n)〉 , are given by

h[n] = 2−1/2 if n = 0, 1 and h[n] = 0 otherwise.

Haar and Sinc MRA: check cond. (i)-(iv) B. Khoromskij, Leipzig 2005(L3) 63

Ex. 3.2. To approximate smooth (analytic) data one make

use of the Sinc (Shannon) scaling function

ϕ(x) = sinc(x) :=sin(πx)

πx.

Vj ⊂ L2(R) is defined as the set of functions whose FT has a

support included in [−2−jπ, 2−jπ].

Lem. 2.2 proves that ϕ(x− n)n∈Z is an orthogonal basis of

V0 (band limited functions). Moreover, it is an interpolating

basis.

The FT of f = sinc(x) is the (shifted/delated) Haar SF

f(ω) = χ−π,π(ω).

We derive from (25) for the filter coefficients

h(ω) =√

2χ−π/2,π/2(ω), ω ∈ [−π, π].

Page 33: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Haar and Sinc orthogonal wavelets B. Khoromskij, Leipzig 2005(L3) 64

−1 −0.5 0 0.5 1 1.5 2

−0.5

0

0.5

1

1.5

Haar scaling function

−1 −0.5 0 0.5 1 1.5 2 2.5 3

−1.5

−1

−0.5

0

0.5

1

1.5

Haar wavelet

−10 −8 −6 −4 −2 0 2 4 6 8 10−0.4

−0.2

0

0.2

0.4

0.6

0.8

1Sinc function

−10 −8 −6 −4 −2 0 2 4 6 8 10−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6Sinc wavelet

Figure 9: Haar and Sinc scaling functions/wavelets.

Wavelet spaces B. Khoromskij, Leipzig 2005(L3) 65

The wavelet spaces have the properties:

(v) There is a wavelet function ψ(x) s.t. its integer translates

ψk(x) = ψ(x− k), and dyadic dilates ψj,k = ψ(2−jx− k), form

subspaces Wj which are complementary to Vj in Vj−1:

Vj−1 = Vj ⊕Wj , Wj⊥Vj . (26)

(vi) From the above relations it follows that L2(R) can be

decomposed into the approximation space Vj0and the sum of

the detail spaces Wj of higher resolutions j ≤ j0:

L2(R) = Vj0 ⊕j0⊕

j=−∞Wj =

∞⊕j=−∞

Wj , (27)

where j0 ∈ Z is a chosen level of resolution.

Page 34: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Orthogonal wavelets B. Khoromskij, Leipzig 2005(L3) 66

(26) means that the orthogonal projection of f on Vj−1 is a

sum of orthogonal projections on Vj and Wj, hence a “detail”

space Wj is the orthogonal complement of Vj in Vj−1:

PVj−1f = PVj f + PWj f.

PWj f gives the “details” of f that appear at the scale 2j−1

but which disappear at the coarser scale 2j.

Thm. 3.1. (Mallat, Meyer) Let ψ be the function whose FT is

ψ(2ω) =1√2e−iωh∗(ω + π)ϕ(ω),

where ϕ is the SF and h is the corresponding conjugate

mirror filter. Let us denote ψj,k(x) := 1√2j

ψ(

x−2jk2j

). For any

scale 2j, ψj,kk∈Z is an orthogonal basis of Wj. For all scales

ψj,k(j,k)∈Z2 is an orthogonal basis of L2(R).

High-pass filters B. Khoromskij, Leipzig 2005(L3) 67

Since ψ(x/2) ∈ W1 ⊂ V0, it can be decomposed in an

orthogonal basis of V0:

1√2ψ(x/2) =

∞∑n=−∞

g[n]ϕ(x− n) (28)

with g[n] = 1√2〈ψ(x/2), ϕ(x− n)〉. In (28) ϕ serves as a kind of

“potential” for generating ψ.

The FT of (28) with Thm. 3.1 yields

ψ(2ω) =1√2g(ω)ϕ(ω), i.e., g(ω) = e−iωh∗(ω + π).

Calculating the inverse FT of above relation leads to

g[n] = (−1)1−nh[1− n].

This is the so-called mirror filter (or high-pass filter) which is

important for the FWT algorithm.

Page 35: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Discrete wavelet transform B. Khoromskij, Leipzig 2005(L3) 68

All in all, the properties (i)-(vi) with Thm. 3.1 mean that any

function f ∈ L2(R) can be represented as a sum of linear

combinations of the scaling functions ϕj0 at a chosen

resolution j = j0 and the wavelet functions ψj at all finer

resolutions j ≤ j0:

f(x) =∞∑

k=−∞aj0 [k]ϕj0,k(x) +

j0∑j=−∞

∞∑k=−∞

dj [k]ψj,k(x). (29)

Here the coefficients aj0 [k] and dj [k] are obtained as the

scalar products with the appropriate basis functions,

aj [k] =∫

R

f(x)ϕj,k(x) dx, dj [k] =∫

R

f(x)ψj,k(x) dx. (30)

Eq. (29), (30) define the Discrete Wavelet Transform

(DWT).

Vanishing moments B. Khoromskij, Leipzig 2005(L3) 69

The wavelet ψ has p vanishing moments if∫R

xkψ(x)dx = 0 for 0 ≤ k ≤ p.

Now ψ is orthogonal to any polynomial of degree p− 1. If f is

locally Ck, then for k < p wavelets are orthogonal to the local

polynomial approximand (say, Taylor) yielding small amplitude

coefficients at fine scales.

ψ has p vanishing moments iff both ψ and h have vanishing

derivatives up to order p− 1 at ω = 0 and at ω = π,

respectively.

If ψ has p vanishing moments then its support is at least of

size 2p− 1 (Daubechies).

Page 36: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Haar wavelet: check cond. (v)-(vi) B. Khoromskij, Leipzig 2005(L3) 70

Ex. 3.1′. Recall the filter coefficients for the Haar scaling

function: h[n] = 2−1/2 if n = 0, 1 and h[n] = 0 otherwise. The

Haar wavelet is thus given by

1√2ψ(

x

2) =

∞∑n=−∞

(−1)1−nh[1− n]ϕ(x− n) =1√2(ϕ(x− 1)− ϕ(x)).

Specifically, ψ(x) = −1, if 0 ≤ x < 1/2, ψ(x) = 1, if 1/2 ≤ x < 1and ψ(x) = 0 otherwise.

Clearly, this is an orthogonal wavelet providing (v)-(vi).

The Haar wavelet has the shotest support among all

orthogonal wavelets (p = 1). It can be applied only to

approximating non-smooth functions (signals).

However, it is a good example for educational purposes.

Sinc wavelet: check cond. (v)-(vi) B. Khoromskij, Leipzig 2005(L3) 71

Ex. 3.2′. The Sinc wavelet is constructed from the Sinc

MRA with ϕ(x) = sinc(x), which approximates functions by

their restriction to low frequency intervals. Thm. 3.1 yields

ψ(ω) =1√2e−iω/2h∗(ω/2 + π)ϕ(ω/2), ω ∈ [−π, π]

with ϕ(ω) = χ−π,π(ω), h(ω) =√

2χ−π/2,π/2(ω). This implies

ψ(ω) = e−iω/2 if ω ∈ [−2π,−π] ∪ [π, 2π]

and ψ(ω) = 0 otherwise. Hence

ψ(x) = ϕ(2x− 1)− ϕ(x− 1/2).

This is the analytic (C∞) wavelet with the decay O(|x|−1) as

|x| → ∞. It can be shown that ψ has an infinite number of

vanishing moments ???

Page 37: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Fast orthogonal wavelet transform B. Khoromskij, Leipzig 2005(L3) 72

Because of Vj = Vj+1 ⊕Wj+1 a function f ∈ Vj may be

represented either in the scaling function basis

f =∞∑

k=−∞〈f, ϕj,k〉ϕj,k =

∞∑k=−∞

aj [k]ϕj,k

or with respect to orthogonal bases of Vj+1 and Wj+1

f =∞∑

k=−∞aj+1[k]ϕj+1,k+

∞∑k=−∞

dj+1[k]ϕj+1,k, dj+1[k] = 〈f, ψj+1,k〉 .

Thm. 3.2. (Mallat) At the decomposition

aj+1[n] =∞∑

k=−∞h[k − 2n]aj [k]; dj+1[n] =

∞∑k=−∞

g[k − 2n]aj [k].

At the reconstruction

aj [n] =∞∑

k=−∞h[n− 2k]aj+1[k] +

∞∑k=−∞

g[n− 2k]dj+1[k].

Fast orthogonal wavelet transform B. Khoromskij, Leipzig 2005(L3) 73

f0 ∈ V0 is split into f1 ∈ V1 at the coarser scale 21 and “detail”

d1 ∈W1 at scale 20. Iterating this process gives a sequence

f0, f1, f2, ... of more and more blurred versions and the details

d1, d2, d3, .... After J iterations the original signal can be

exactly (orthogonality) reconstructed f0 = fJ + d1 + ... + dJ .

The decomposition scheme

a0 → a1 → a2 → · · · → aJ

d1 d2 · · · dJ .

The reconstruction scheme

a0 ← a1 ← a2 ← · · · ← aJ

d1 d2 · · · dJ .

Page 38: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Fast orthogonal wavelet transform B. Khoromskij, Leipzig 2005(L3) 74

Given f = f0 ∈ V0, both the decomposition and reconstruction

are nothing but representations w.r.t. changes of basis

functions

V0 → VJ ⊕W1 ⊕ · · · ⊕WJ .

Iterating the decomposition yields for given coefficients

a0 = [k], the coefficients D[l, k] := (aJ [k], dJ [k], dJ−1[k], ..., d1[k])

The translation a0[k] → D[l, k] is called the discrete

wavelet transform (DFT). The backward transform is

provided by the reconstruction D[l, k] → a0[k].In practice the signal a0 is 2J periodic hence we have N = 2J

coefficients. Then the DFT requires only O(mN) operations,

where m is the filter lenght.

Numerics I: Denoising B. Khoromskij, Leipzig 2005(L3) 75

We perform denoising of randomly perturbed Mexican hat

function. It can be rather accurately reconstructed with only

few wavelet coefficients (say, with ∼ 10 among N = 2048) up

to a threshold proportional to the random amplitude (about

10% of a signal ampl.).

−5 −4 −3 −2 −1 0 1 2 3 4 5 6−0.5

0

0.5

1Mexican hat scaling function

−4 −2 0 2 4−1

−0.5

0

0.5

1

1.5

2

Figure 10: Denoising by Daubechies (4) wavelets for “Mexican hat”.

Page 39: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Numerics II: Approximating smooth signals B. Khoromskij, Leipzig 2005(L3) 76

We approximate the Mexican hat with α = 0.5 by Daubechies

(m) wavelets with the filter length m (next table).

Recall m = 2p− 1.

kW (ε) is the number of (nonzero) wavelet coefficients which

exceed the given threshold ε.

kW (ε) for Daubechies (m) wavelets approximating Mexican hat

m\ε 0.1 0.01 0.001 1.10-4 1.10-5 1.10-6 1.10-7 1.10-8

10 19 31 47 75 105 175 273 388

20 17 24 29 43 53 60 93 121

40 24 24 29 31 31 46 57 105

Numerics II: Approximating smooth signals B. Khoromskij, Leipzig 2005(L3) 77

Next table gives the Sinc-interpolation error vs. Sinc-wavelet

compressed representation, where the total number of

wavelet coefficients kW (ε) corresponds to the threshold ε.

The compression is not efficient since there are no “details” !

In fact, all the important coefficients are observed at

high-level resolution.

Sinc interpolation.

M 4 9 16 25 36 49 64 81 100

ε 0.005 0.003 0.001 2.10-4 4.10-5 4.10-7 4.10-8 6.10-9 9.10-10

Sinc-wavelets for Mexican hat.

mF |N 16|32 36|64 36|64 50|128 70|128 100|256 128|256 160|256 –

ε 0.01 0.005 0.002 4.10-4 6.10-5 6.10-6 6.10-7 6.10-6 –

kW (ε) 20 29 42 54 85 131 179 116 –

Page 40: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Literature to Lecture 3 B. Khoromskij, Leipzig 2005(L3) 78

1. S.G. Mallat: A Wavelet Tour of Signal Processing. Academic Press, San Diego, 1999.

2. I. Daubechies: Ten Lectures on Wavelets. SIAM, Philadelphia, 1992.

3. G. Strang, T. Nguyen: Wavelets and Filter Banks. Wellesley-Cambridge Press, 1997.

4. R. Schneider: Wavelets and Signal Processing. Lecture Notes. Chemnitz, Sommersemester 2000.

URL: http://personal-homepages.mis.mpg.de/bokh

http://www.mis.mpg.de/scicomp/Fulltext/Khoromskij/khor3.ps

Lect. 4. Separable approximation to multi-variate functions B. Khoromskij, Leipzig 2005 79

Analytic methods of Kronecker-product representation to

non-local operators and related tensors are mainly based on

separable approximation to multi-variate functions in Rd.

I. Separation methods by tensor-product interpolation

• Polynomial interpolation

• Sinc interpolation

• Hyperbolic-cross approximation (Wavelet/FEM).

II. Approximating by exponential/trigonometric sums

• Sinc quadratures

• Exponential sums∑

ake−bkx

• Trigonometric sums∑

[ak sin(bkx) + a′k cos(b′kx)].

Item (II) applies to translation invariant functions or to

functions depending on the sum of spatial variables.

Page 41: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Tensor-product interpolation B. Khoromskij, Leipzig 2005(L4) 80

Approximation problem: Given a multi-variate func.

F : Ωd → R, (d ≥ 2), approximate it by a separable expansion

Fr(ζ1, ..., ζd) :=r∑

k=1

ckΦ1k(ζ1) · · ·Φd

k(ζd) ≈ F, Ω ∈ R, R+, (a, b),

where the set of univariate funct. Φk(·) : Ω→ R, 1 ≤ ≤ d,

1 ≤ k ≤ r, may be fixed or chosen adaptively, ck ∈ R.

For numerical efficiency the so-called separation rank r ∈ N

should be reasonably small.

Introduce the tensor-product interpolant IM with respect to

the first d− 1 variables (e.g., polynomial or Sinc interpolant)

IMF := I1M × · · · × Id−1

M F,

where IMF , 1 ≤ ≤ d− 1, denotes the univariate interpolation

applied to the variable ζ ∈ I = Ω, where I is the -th factor

in Ωd = I1 × ...× Id.

Best polynomial approximation B. Khoromskij, Leipzig 2005(L4) 81

In the complex plane C, we introduce the circular ring

Rρ := z ∈ C : 1/ρ < |z| < ρ with ρ > 1.

Thm. 4.1. (Laurent’s Theorem). Let f : C → C be analytic

and bounded by M > 0 in Rρ with ρ > 1, (in the following we

say f ∈ Aρ), and set

Cn :=12π

∫ 2π

0

f(eiθ)einθdθ, n = 0, ±1, ±2, . . . . (31)

Then for all z ∈ Rρ, f(z) =∞∑

n=−∞Cnzn, where the series

converges to f(z) for all z ∈ Rρ. Moreover |Cn| ≤ M/ρ|n|, and

for all θ ∈ [0, 2π] and arbitrary integer m,∣∣∣∣∣f(eiθ)−m∑

n=−m

Cneinθ

∣∣∣∣∣ ≤ 2M

ρ− 1ρ−m. (32)

Page 42: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Chebyshev polynomials B. Khoromskij, Leipzig 2005(L4) 82

By Eρ = Eρ(B) with the reference interval B := [−1, 1], we

denote the Bernstein’s regularity ellipse (with foci at w = ±1and the sum of semi-axes equal to ρ > 1),

Eρ := w ∈ C : |w − 1|+ |w + 1| ≤ ρ + ρ−1.Let Tn(w), n = 0, 1, 2, . . . , be the Chebyshev polynomials, which

may be defined recursively by

T0(w) = 1, T1(w) = w,

Tn+1(w) = 2wTn(w)− Tn−1(w), n = 1, 2, . . . .

Note that Tn(x) = cos(n arccos x), x ∈ [−1, 1], which implies

Tn(1) = 1, Tn(−1) = (−1)n.

It can be seen that with w = 12 (z + 1

z ), there holds

Tn(w) =12(zn + z−n). (33)

Best polynomial approximation by Chebyshev series B. Khoromskij, Leipzig 2005(L4) 83

Thm. 4.2. Let F be analytic and bounded by M in Eρ (with

ρ > 1). Then the expansion

F (w) = C0 + 2∞∑

n=1

CnTn(w), (34)

holds for all w ∈ Eρ (Chebyshev series), and with

Cn =1π

∫ 1

−1

F (w)Tn(w)√1− w2

dw.

Moreover, |Cn| ≤ M/ρn and for w ∈ B and for m = 1, 2, 3, . . . ,

|F (w)− C0 − 2m∑

n=1

CnTn(w)| ≤ 2M

ρ− 1ρ−m, w ∈ B. (35)

Page 43: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Proof of the main theorem B. Khoromskij, Leipzig 2005(L4) 84

Let Aρ,s := f ∈ Aρ : C−n = Cn, then each f ∈ Aρ,s has a representation

(cf. Thm. 4.1)

f(z) = C0 +

∞Xn=1

Cn(zn + z−n), z ∈ Rρ. (36)

Furthermore, from (36) it follows that f(1/z) = f(z), z ∈ Rρ.

Let us apply the mapping w = 12(z + 1

z), which satisfies w(1/z) = w(z). It is

a conformal transform of ξ ∈ Rρ : |ξ| > 1 onto Eρ as well as of

ξ ∈ Rρ : |ξ| < 1 onto Eρ (but not Rρ onto Eρ!). It provides a one to one

correspondence of functions F that are analytic and bounded by M in Eρ

with functions f in Aρ,s.

Since under this mapping we have (33), it follows that if f defined by

(36) is in Aρ,s, then the corresponding transformed function

F (w) = f(z(w)) that is analytic and bounded by M in Eρ is given by (34).

Now the result follows directly due to Thm. 4.1.

Lagrangian polynomial interpolation B. Khoromskij, Leipzig 2005(L4) 85

Let PN (B) be the set of polynomials of degree ≤ N on B.

Define by [INF ](x) ∈ PN (B) the interpolation polynomial of F

with respect to the Chebyshev-Gauss-Lobatto (CGL) nodes

ξj = cosπj

N∈ B, j = 0, 1, . . . , N, with ξ0 = 1, ξN = −1,

where ξj are zeroes of the polynomials (1− x2)T ′N (x), x ∈ B.

In turn, the Lagrangian interpolant IN of F has the form

INF :=N∑

j=0

F (ξj)lj(x) ∈ PN (B), (37)

i.e. IN (ξj) = F (ξj), j = 0, . . . , N, with lj(x) is the set of

interpolation polynomials

lj :=N∏

k=0,j =k

x− ξk

ξj − ξk∈ PN (B).

Clearly, lj(ξj) = 1 and lj(ξk) = 0 ∀k = j.

Page 44: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Lebesque constant for Chebyshev interpolation B. Khoromskij, Leipzig 2005(L4) 86

Given the set ξjNj=0 of interpolation points on [−1, 1] and the

associated Lagrangian interpolation operator IN . The

standard approximation theory for polynomial interpolation

includes the so-called Lebesque constant ΛN ∈ R>1 defined by

‖INu‖∞,B ≤ ΛN‖u‖∞,B ∀u ∈ C(B). (38)

In the case of Chebyshev interpolation it can be shown that

ΛN grows at most logarithmically in N , more precisely,

ΛN ≤ 2π

log N + 1.

The interpolation points which produce the smallest value Λ∗N

of all ΛN are not known, but Bernstein ’54 proves that

Λ∗N =

log N + O(1).

Error bound for polynomial interpolation B. Khoromskij, Leipzig 2005(L4) 87

Thm. 4.3 Let u ∈ C∞[−1, 1] have an analytic extension to Eρ

bounded by M > 0 in Eρ (with ρ > 1). Then we have

‖u− INu‖∞,I ≤ (1 + ΛN )2M

ρ− 1ρ−N , N ∈ N≥1. (39)

Proof. Due to (35) one obtains for the best polynomial

approximations to u on [−1, 1],

minv∈PN

‖u− v‖∞,B ≤ 2M

ρ− 1ρ−N . (40)

Note that the interpolation operator IN is a projection, that

is, for all v ∈ PN we have INv = v. Then applying the triangle

inequality with v ∈ PN ,

‖u− INu‖∞,B = ‖u− v − IN (u− v)‖∞,B ≤ (1 + ΛN )‖u− v‖∞,B

completes the proof.

Page 45: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Tensor-product polynomial interpolation B. Khoromskij, Leipzig 2005(L4) 88

Consider a multi-variate funct. f = f(x1, . . . , xd) : Rd → R,

d ≥ 2, defined on a box B1 ×B2 × . . .×Bd with Bk = [ak, bk].We set B := Bk = [−1, 1], k = 1, . . . , d, thus f : Bd → R.

The corresponding N-th order tensor product interpolation

operator is defined by

INf = I1N × I2

N × . . .× IdNf ∈ PN [Bd],

where IkNf denotes the interpolation polynomial with respect

to xk, k = 1, . . . , d, at nodes ξk ∈ Bk.

We choose the CGL nodes, hence the interpolation points

ξα ∈ Bd, α = (i1, . . . , id) ∈ Nd0, are obtained by the Cartesian

product of 1D-nodes,

ξα :=(

cosπi1N

, . . . , cosπidN

).

Tensor-product polynomial interpolation B. Khoromskij, Leipzig 2005(L4) 89

Again, IN is the projection map,

IN : C(Bd) → PN := p1 × . . .× pd : pi ∈ PN , i = 1, . . . d

that implies the following estimate to the multivariate

counterpart of the Lebesque constant (stability of IN in the

multidimensional case; cf. (38))

‖INf‖∞,Bd ≤ ΛdN‖f‖∞,Bd ∀ f ∈ C(Bd). (41)

To derive an analogue of Thm. 4.3, we introduce the product

domain

E(j)ρ := B1 × . . .×Bj−1 × Eρ(Ij)×Bj+1 × . . .×Bd,

and denote by X−j the (d− 1)-dimensional subset of variables

x1, . . . , xj−1, xj+1, . . . , xd with xj ∈ Bj, j = 1, ..., d.

Page 46: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Tensor-product polynomial interpolation B. Khoromskij, Leipzig 2005(L4) 90

Assump. 4.1. Given f ∈ C∞(Bd), assume there is ρ > 1 such

that for all j = 1, . . . , d, and each fixed ξ ∈ X−j, there exists an

analytic extension fj(xj , ξ) of f(xj , ξ) to Eρ(Bj) ⊂ C with

respect to xj bounded in Eρ(Bj) by certain Mj > 0,independent on ξ.

Thm. 4.4. For f ∈ C∞(Bd), let Assump. 4.1 be satisfied.

Then the interpolation error can be estimated by

‖f − INf‖∞,Bd ≤ ΛdN

2Mρ(f)ρ− 1

ρ−N , (42)

where ΛN is the Lebesque constant for the one-dimensional

interpolant IkN , and

Mρ(f) := max1≤j≤d

maxx∈E(j)

ρ

|fj(x, ξ)|.

Tensor-product polynomial interpolation B. Khoromskij, Leipzig 2005(L4) 91

Proof. The multiple use of (38), (39) and the triangle

inequality lead to

|f − INf | ≤ |f − I1Nf |+ |I1

N (f − I2N × . . .× Id

Nf)|≤ |f − I1

Nf |+ |I1N (f − I2

Nf)|++ |I1

NI2N (f − I3

Nf)|+ . . . + |I1N × . . .× Id−1

N (f − IdNf)|

≤ [(1 + ΛN ) maxx∈E(1)

ρ

|f1(x, ξ)|+ ΛN (1 + ΛN ) maxx∈E(2)

ρ

|f2(x, ξ)|

+ . . . + Λd−1N (1 + ΛN ) max

x∈E(d)ρ

|fd(x, ξ)|] 2ρ− 1

ρ−N

≤ (1 + ΛN )(ΛdN − 1)

ΛN − 12Mρ

ρ− 1ρ−N .

Hence (42) follows since for x > 1 we have (1+x)(xn−1)x−1 ≤ xn.

Page 47: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Sinc-approximation of multi-variate functions B. Khoromskij, Leipzig 2005(L4) 92

Now consider the separable approximation in the case Ω = R.

Extension to the case Ω = R+ or Ω = (a, b) is similar to those

for the univariate Sinc approximation.

Introduce the tensor-product Sinc interpolant CM with

respect to the first d− 1 variables,

CMf := C1M × ...× Cd−1

M f,

where CMf = C

M (f, h), 1 ≤ ≤ d, denotes the univariate Sinc

interpolation applied to the variable ζ ∈ I = R, where I is

the -th factor in Rd = I1 × ...× Id.

Ex. 4.1. Examples of approximated function

f(x) = |x|α, f(x) =exp(κ|x|)|x| , f(x, y) = sinc(|x||y|)

with x, y ∈ Rd.

Sinc-approximation of multi-variate functions B. Khoromskij, Leipzig 2005(L4) 93

The estimation of the error f −CMf requires the Lebesgue

constant ΛM ≥ 1 defined by

||CM (f, h)||∞ ≤ ΛM ||f ||∞ for all f ∈ C(R). (43)

Stenger ’93 proves the inequality

ΛM = maxx∈R

M∑k=−M

|Sk,h(x)| ≤ 2π

(3 + log(M)). (44)

Note that we also have (orthogonality)

∞∑k=−∞

|Sk,h(x)|2 = 1 (x ∈ R) ,

which indicates ΛM = 1 with respect to the L2-norm.

Page 48: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Sinc-approximation of multi-variate functions B. Khoromskij, Leipzig 2005(L4) 94

For each fixed ∈ 1, . . . , d − 1, choose ζ ∈ I and define the remaining

parameter set by Y := I1 × ... × I−1 × I+1 × ... × Id ∈ Rd−1. This

introduces the univariate (parameter dependent) function F(·, y) : I → R,

which is the restriction of F onto the interval I with y ∈ Y.

Thm. 4.5. (Hackbusch, Khoromskij) For each = 1, ..., d− 1 we

assume that for any fixed y ∈ Y, F(·, y) satisfies

(a) F(·, y) ∈ H1(Dδ) with N(F, Dδ) ≤ N <∞ uniformly in y;

(b) F(·, y) has hyper-exponential decay with a = 1, C, b > 0 for

all y ∈ Y.

Then, for all y ∈ Y, the optimal choice h := log MM yields

|F (ζ, y)−CM (F, h)(ζ)| ≤ C

2πδΛd−2

M max=1,...,d−1

N e−πδMlog M (45)

with ΛM defined by (44).

Proof of the Sinc-interpolation error B. Khoromskij, Leipzig 2005(L4) 95

The multiple use of (43) and the triangle inequality lead to

|f −CMf | ≤ |f − C1Mf |+ |C1

M (f − C2M . . . Cd

Mf)|≤ |f − C1

Nf |+ |C1M (f − C2

Mf)|++ |C1

MC2M (f − C3

Mf)|+ . . . + |C1M . . . Cd−2

M (f − Cd−1M f)|

≤ [N1 + ΛMN2 + . . . + Λd−2M Nd−1]

12πδ

e−πδMlog M

≤ 1 + ΛM + ... + Λd−2M

2πδmax

=1,...,d−1N e

−πδMlog M .

Note thatΛd−1

M − 1ΛM − 1

≈ Λd−2M , ΛM →∞,

hence (45) follows.

Page 49: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Separation by integration B. Khoromskij, Leipzig 2005(L4) 96

If a function of ρ =∑d

i=1 xi can be written as the integral

ϕ(ρ) =∫

Ω

eρF (t)G(t)dt

over some Ω ⊂ R (say, Ω = R) and if a quadrature can be

applied, one obtains the separable approximation

ϕ(x1 + . . . + xd) ≈∑

ν

ωνeρF (xν)G(xν) =∑

νcν

d∏i=1

exiF (xν).

with cν = ωνG(xν). For this purpose we apply the Sinc

quadratures (cf. Lect. 2, 6).

Typical examples of such a function ϕ(ρ) are the following:

f(x) = 1/|x− y|, f(x) =1

x1 + ... + xd, xi ≥ 0

with x, y ∈ Rd.

Separation by exponential/trigonometric approximation B. Khoromskij, Leipzig 2005(L4) 97

Besides, the best approximation of ϕ(ρ) by exponential sums,

ϕ(ρ) ≈r∑

ν=1

ωνe−tνρ (46)

(e.g., with respect to the L∞- or L2-norm), leads to an

approximation whose separation rank r is expected to be

close to optimal.

For non-monotone functions ϕ(ρ) the approximations by

trigonometric sums may do a job,

ϕ(ρ) ≈r∑

ν=1

cνe−iωνρ. (47)

Rem. 4.1. The approximation by exponential/trigonometric

sums applies to the matrix-valued function ϕ(A) as well with

A =∑d

i=1 Ai and pairwise commutable matrices Ai.

Page 50: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Separation by exponential/trigonometric approximation B. Khoromskij, Leipzig 2005(L4) 98

For n ≥ 1, consider the set E0n of exponential sums:

E0n :=

u =

n∑ν=1

ωνe−tνx : ων , tν ∈ R

.

Now one can address the problem of finding the best

approximation to f over the set E0n characterised by the best

approximation error d(f, E0n) := infv∈E0

n‖f − v‖∞.

The existence of an approximation by exponentials is based

on the fundamental Big Bernstein Theorem: If f is

completely monotone for x ≥ 0, i.e.,

(−1)nf (n)(x) ≥ 0 for all n ≥ 0, x ≥ 0,

then it is the restriction of the Laplace transform of a

measure to R+:

f(z) =∫

R+

e−tzdµ(t).

Separation by exponential/trigonometric approximation B. Khoromskij, Leipzig 2005(L4) 99

We recall the complete elliptic integral of the first kind with

modulus κ,

K(κ) =∫ 1

0

dt√(1− t2)(1− κ2t2)

(0 < κ < 1)

and define K′(κ) := K(κ′) by κ2 + (κ′)2 = 1.

Thm. 4.6. (Braess). Assume that f is completely monotone

and analytic for e z > 0, and let 0 < a < b. Then for the

uniform approximation on the interval [a, b],

limn→∞ d(f, E0

n)1/n ≤ 1ω2

, where ω = expπK(κ)K′(κ)

with κ =a

b.

In the cases f = ϕ(ρ) below, we have κ = 1/R for R >> 1.

Page 51: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Separation by exponential/trigonometric approximation B. Khoromskij, Leipzig 2005(L4) 100

Now applying the asymptotics

K(κ′) = ln 4κ + C1κ + ... for κ′ → 1,

K(κ) = π2 1 + 1

4κ2 + C1κ4 + ... for κ → 0,

of the complete elliptic integrals, we obtain

1ω2

= exp(−2πK(κ)

K(κ′)

)≈ exp

(− π2

ln(4R)

)≈ 1− π2

ln(4R).

The latter expression indicates that the number n of different

terms to achieve a tolerance ε is asymptotically

n ≈ | log ε|| log ω−2| ≈

| log ε| ln (4R)π2

.

This result shows the same asymptotical convergence in n as

that for the Sinc approximation (cf. Lect. 2).

Exponential approximations in L2-norm B. Khoromskij, Leipzig 2005(L4) 101

The best approximation to f(ρ), ρ ∈ [1, R] with respect to a

weighted L2-norm is reduced to the minimisation of an

explicitly given differentiable functional.

Given R > 1, N ≥ 1, find the 2N parameters

α1, ω1, ..., αN , ωN ∈ R, such that

FW (R; α1, ω1, ..., αN , ωN ) :=∫ R

1

W (x)(f(x)−

N∑i=1

ωie−αix

)2

dx = min .

In the important particular case of f(x) = 1/x and W (x) = 1,the integral can be calculated in a closed form

F1(R; α1, ω1, ..., αN , ωN ) = 1 − 1

R− 2

NXi=1

ωi [Ei(−αi) − Ei(−αiR)]

+1

2

NXi=1

ω2i

αi

he−2αi − e−2αiR

i+ 2

X1≤i<j≤N

ωiωj

αi + αj

he−(αi+αj) − e−(αi+αj)R

i

with the integral exponential function Ei(x) = −∫ x

−∞et

t dt.

Page 52: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Exponential approximations in L2-norm B. Khoromskij, Leipzig 2005(L4) 102

In the special case R = ∞, the expression for F1(∞; . . .) even

simplifies.

Gradient or Newton type methods with a proper choice of the

initial guess can be used to obtain the minimiser of F1.

In general, the integral may be approximated by certain

quadrature.

Optimisation with respect to the maximum norm leads to the

nonlinear minimisation problem

infv∈E0n‖f − v‖L∞[1,R]

involving 2n parameters ων , tνnν=1. The numerical scheme

follows the Remez algorithm.

Exponential approximations in L2-norm B. Khoromskij, Leipzig 2005(L4) 103

Best approximation to 1/√

ρ in L∞-norm is discussed in D.

Braess and W. Hackbusch, a complete list of numerical data

can be found in www.mis.mpg.de/scicomp/EXP SUM/1 x/tabelle.

All calculations using the weighted L2([1, R])-norm have been

performed by the MATLAB subroutine FMINS based on the

global minimisation by direct search.

best approximation to 1/√

ρ in weighted L2([1, R])-norm.

R 10 50 100 200 ‖ · ‖L∞ W (ρ) = 1/√

ρ

r = 4 3.710-4 9.610-4 1.510-3 2.210-3 1.910-3 4.810-3

r = 5 2.810-4 2.810-4 3.710-4 5.810-4 4.210-4 1.210-3

r = 6 8.010-5 9.810-5 1.110-4 1.610-4 9.510-5 3.310-4

r = 7 3.510-5 3.810-5 3.910-5 4.710-5 2.210-5 8.110-5

Page 53: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Why using trigonometric sums B. Khoromskij, Leipzig 2005(L4) 104

Prop. 4.7. (Beylkin, Mohlenkamp). Let d ≥ 2. The trigonometric

identity

sin

⎛⎝ d∑j=1

xj

⎞⎠ =d∑

j=1

sin(xj)∏

k∈1,...,d\j

sin(xk + αk − αj)sin(αk − αj)

(48)

holds for all choices of αk ∈ R, s.t. sin(αk−αj) = 0 for all j = k.

In the case d = 2, the assertion (128) is easy to check. For

d > 2 it can be proven by induction (nontrivial exercise!).

Expansion (128) shows the lack of uniqueness (ambiguity) of

the best rank d Kronecker representation. Hence, the

convergence of algebraic separable approximations might be

non-robust.

Approximation by trigonometric sums can be designed either

using the quadrature method (cf. Lect. 7) and the direct

trigonometric interpolation or by nonlinear optimisation.

Literature to Lect. 4 B. Khoromskij, Leipzig 2005(L4) 105

1. W. Hackbusch and B.N. Khoromskij: Hierarchical Kronecker Tensor-Product Approximation to a Class

of Nonlocal Operators in High Dimensions. Parts I/II. Preprints 29/30, MPI MIS, Leipzig 2005.

2. D. Braess and W. Hackbusch: Approximation of 1x

by exponential sums in [1, ∞). To appear in IMA JNA.

3. G. Beylkin and M.J. Mohlenkamp: Numerical operator calculus in higher dimension.

Proc. Natl. Acad. Sci. USA, 99 (2002), 10246-10251.

4. B.N. Khoromskij: Data-sparse approximation of nonlocal operators. Lecture notes 17, MPI MIS,

Leipzig 2003.

http://personal-homepages.mis.mpg.de/bokh

http://www.mis.mpg.de/scicomp/Fulltext/Khoromskij/khor4.ps

Page 54: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Lect. 5. Data-Sparse Matrix/Tensor Formats. B. Khoromskij, Leipzig 2005 106

We focus on combination of hierarchical and tensor-product

formats:

(i) H-matrix format with standard admissibility (applies on

graded meshes)

(ii) coarsening of the hierarchical format using weaker

admissibility criteria;

(iii) blended H-matrix approximation (combine with Toeplitz,

circulant, Hankel);

(iv) wire-basket approximation for L-harmonic kernels;

(v) fully separated block representation (O(N) complexity);

(vi) uniform (U-) and H2-matrices;

(vii) Kronecker tensor-product format;

(viii) hierarchical Kronecker tensor-product representation.

Hierarchical matrices B. Khoromskij, Leipzig 2005(L5) 107

Hierarchical (H-) matrices

MH,k(TI×I ,P), the class of data-sparse H-matrices - Hackbusch ’99

Further developments - Hackbusch, BNK, Grasedyck, Bebendorf, Borm.

H-matrix technique is a direct descendant of panel clustering,

fast multipole and mosaic-skeleton approximation.

In addition, it allows data-sparse matrix-matrix operations.

Main features:

• Matrix arithmetic of O(N logq N) - complexity,

N := |I| - cardinality of the index set I.

• Accurate approximation to general class of nonlocal

(integral) operators and operator-valued functions F(L)including the elliptic operator inverse L−1, e−tL, sign(L).

• Rigorous theoretical analysis.

Page 55: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

H-Matrix Format B. Khoromskij, Leipzig 2005(L5) 108

H-matrix arithmetic is completely recursive and it is based on

the hierarchical data organisation → efficient implementation.

The H-matrix format is well suited for representation of

integral (nonlocal) operators in FEM/BEM applications.

Thm. 5.1. (complexity of the H-matrix arithmetic)

Let k ∈ N denote the block-wise rank and TI×I be an H-tree

with depth L > 1.Then the arithmetic of N ×N-matrices belonging to

MH,k(TI×I ,P) has the complexity

NH,store ≤ 2CspkLN, NH·v ≤ 4CspkLN,

NH⊕H ≤ Cspk2N(C1L + C2k),

NHH ≤ C0C2spk2LN maxk, L, N

gInv(H)≤ NHH,

where Csp is the sparsity constant.

H-Matrix Format B. Khoromskij, Leipzig 2005(L5) 109

Hierarchical Partitionings P1/2(I × I) and PW(I × I)

Figure 11: Standard- (left) and Weak-admissible H-partitionings for d =

1.

Page 56: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

General Kronecker-product format B. Khoromskij, Leipzig 2005(L5) 110

Def. 5.1. A q-th order tensor is given by

A := [ai1...iq ] ∈ RId

, d = pq, p, q, n ∈ N,

where Id = I1 ⊗ ...⊗ Iq, I = I1 ⊗ ...⊗ I

p with multi-indices

i = (i,1, ..., i,p) ∈ I, = 1, ..., q, where i,m ∈ 1, ..., n, for

m = 1, ..., p (p is supposed to be small).

The inner product of two tensors A and B is defined as

(A, B) :=∑

(i1...iq)∈Id

ai1...iq bi1...iq ,

while the norm of A is given by ‖A‖F :=√

(A, A).

Ex. 5.1. Let A = a1 ⊗ a2, B = b1 ⊗ b2, ai, bi ∈ Rn (q = 2, p = 1).Then

(A, B) = (a1, b1)(a2, b2), ||A||F =√

(a1, a1)(a2, a2),

where the latter corresponds to the Frobenius norm.

General Kronecker-product format B. Khoromskij, Leipzig 2005(L5) 111

Tensor A of the form

A = V 1 ⊗ · · · ⊗ V q, V ∈ Rnp

is called the Kronecker product or decomposed tensor.

Probl. 1. Approximate A by a q-th order tensor Ar - a sum

of Kronecker products (with possibly small Kronecker rank r)

Ar =r∑

k=1

ckV 1k ⊗ · · · ⊗ V q

k ≈ A, ck ∈ R, (49)

where the low dimensional components V k ∈ Rnp

can be

further represented in a structured data-sparse form (say, in

the wavelet based, circulant or Toeplitz format).

Hence, Ar can be represented with the low cost qrnp (at

most) compared with npq.

Tensor-product format (49) has plenty of other merits.

Page 57: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Excursus to the HKT-approximation of matrices B. Khoromskij, Leipzig 2005(L5) 112

Probl. 2. Given A ∈ CN×N with N = nd (here q = d, p = 2), we

approximate A by a matrix Ar of the so-called HKT-format

Ar =r∑

k=1

skV 1k ⊗ · · · ⊗ V d

k ≈ A, sk ∈ R, V k ∈ R

n×n, (50)

where V k ∈MH,k (Alternative: wavelet representation to V

k ).

Given a tol. ε > 0, the Kronecker rank r = r(ε) can be

estimated

r =

⎧⎨⎩O(| log h|d−1| log ε|d−1), (Case a),

O(| log h| · | log ε|), (Case b).

Case a. IOs with asympt. smooth/analytic kernels g(x, y).

Case b. A class of analytic matrix-valued functions F(A);IOs with “off diagonal analytic” translation-invariant kernels.

HKT-approximation of matrices B. Khoromskij, Leipzig 2005(L5) 113

Case (a): Analytic approximation methods are based on a

separable representation to certain multi-variate function

F : Rd → R, d ≥ 2

(say, holomorphic function with isolated singularities):

Fr(ζ1, ..., ζd) :=r∑

i=1

siΦ1i (ζ1) · · ·Φd

i (ζd) ≈ F, (51)

Φi(ζ) is fixed or chosen adaptively.

Case (b): Making use of the r-term Sinc-quadratures for the

Laplace integral representation of F(A) or F (r):

F (r) =∫

R

f(t)e−trdt, F (r) =∫

R

f(t)e−tr2dt

with possible substitution A → r.

Page 58: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Related references B. Khoromskij, Leipzig 2005(L5) 114

H-, KT-, HKT- constructive approximations:

H-Matrix techniques - Group by Hackbusch at MPI MIS, Leipzig

Sinc interpolation/quadratures for analytic funct. with point singularities -

(Kotelnikov ’33; Whittaker ’35; Shannon ’49) Stenger; M. Sugihara; Hackbusch, BNK ’02-’05

Appr. by exponential sums (classical rational approximations, Remes

algorithm, minimization) - Braess, Hackbusch, BNK ’04-’05

IOs in the HKT format - Hackbusch, BNK, Tyrtyshnikov ’03; BNK ’05

HKT approx. to matrix-valued functions - Gavrilyuk, Hackbusch, BNK ’03;

Hackbusch, BNK ’04-’05

Kronecker tensor-product representation - Van Loan, Pitsianis ’93; Golub ’98;

Beylkin, Mohlenkamp ’02; Hackbusch, BNK, Tyrtyshnikov ’03; Grasedyck ’03; ...

Tensor-product + wavelets + sparse grids:

H-matrices/wavelets in density matrix calculation - Flad, Hackbusch, Kolb, Luo,

Schneider ’03-’05; Hutter, Sauter, ...

Applications in FEM/BEM, quantum chemistry, finacies, data mining -

Groups by W. Dahmen, M. Griebel, R. Schneider, C. Schwab, H. Yserentant

Properties of the Kronecker product B. Khoromskij, Leipzig 2005(L5) 115

The Kronecker product (KP) operation A⊗B of two matrices

A = [aij ] ∈ Rm×n, B ∈ Rh×g is an mh× ng matrix that has the

block-representation [aijB] (corresponds to p = 2).

1. Let C ∈ Rs×t, then the KP satisfies the associative law,

(A⊗B)⊗ C = A⊗ (B ⊗ C),

and therefore we do not use brackets in (50). The matrix

A⊗B ⊗ C := (A⊗B)⊗ C has (mhs) rows and (ngt) columns.

2. Let C ∈ Rn×r and D ∈ Rg×s, then the standard

matrix-matrix product in the Kronecker format takes the form

(A⊗B)(C ⊗D) = (AC)⊗ (BD).

The corresponding extension to q-th order tensors is

(A1 ⊗ ...⊗Aq)(B1 ⊗ ...⊗Bq) = (A1B1)⊗ ...⊗ (AqBq).

In the case p > 2 we have similar KP operations.

Page 59: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Properties of the Kronecker product B. Khoromskij, Leipzig 2005(L5) 116

3. We have the distributive law

(A + B)⊗ (C + D) = A⊗ C + A⊗D + B ⊗ C + B ⊗D.

4. Rank relation: rank(A⊗B) = rank(A)rank(B).

Ex. 5.1. In general A⊗B = B ⊗A. What is the condition on

A and B that provides A⊗B = B ⊗A ?

Invariance of some matrix properties:

(1) If A and B are diagonal then A⊗B is also diagonal, and

conversely (if A⊗B = 0).

(2) The upper/lower triangular matrices are preserved.

(3) Let A and B be Hermitian/normal matrices (A∗ = A resp.

A−1 = A). Then A⊗B is of the corresponding type.

(4) Let A ∈ Rn×n and B ∈ Rm×m. Then

det(A⊗B) = (detA)n(detB)m.

Kronecker product: matrix operations B. Khoromskij, Leipzig 2005(L5) 117

Thm. 5.2. Let A ∈ Rn×n and B ∈ Rm×m be invertible

matrices. Then

(A⊗B)−1 = A−1 ⊗B−1.

Proof. Since det(A) = 0, det(B) = 0 and the above property

(4) we have det(A⊗B) = 0. Thus (A⊗B)−1 exists and

(A−1 ⊗B−1)(A⊗B) = (A−1A)⊗ (B−1B) = In2 .

Lem. 5.2. Let A ∈ Rn×n and B ∈ Rm×m be unitary matrices.

Then A⊗B is a unitary matrix.

Proof. Since A∗ = A−1, B∗ = B−1 we have

(A⊗B)∗ = A∗ ⊗B∗ = A−1 ⊗B−1 = (A⊗B)−1.

Page 60: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Kronecker product: matrix operations B. Khoromskij, Leipzig 2005(L5) 118

Define the commutator [A, B] := AB −BA.

Lem. 5.3. Let A ∈ Rn×n and B ∈ Rm×m. Then

[A⊗ In, Im ⊗B] = 0 ∈ Rm2×n2

.

Proof.

[A⊗ In, Im ⊗B] = (A⊗ In)(Im ⊗B)− (Im ⊗B)(A⊗ In)

= A⊗B −A⊗B = 0.

Rem. 5.1. Let A, B ∈ Rn×n, C, D ∈ Rm×m and [A, B] = 0,[C, D] = 0. Then

[A⊗ C, B ⊗D] = 0.

Proof. Apply the identity (A⊗B)(C ⊗D) = (AC)⊗ (BD).

Kronecker product: matrix operations B. Khoromskij, Leipzig 2005(L5) 119

Lem. 5.4. Let A ∈ Rn×n and B ∈ Rm×m. Then

tr(A⊗B) = tr(A)tr(B).

Proof. Since diag(aiiB) = aiidiag(B), we have

tr(A⊗B) =n∑

i=1

m∑j=1

aiibjj =n∑

i=1

aii

m∑j=1

bjj .

Thm. 5.3. Let A, B, I ∈ Rn×n. Then

exp(A⊗ I + I ⊗B) = (expA)⊗ (expB).

Proof. Since [A⊗ I, I ⊗B] = 0, we have

exp(A⊗ I + I ⊗B) = exp(A⊗ I) exp(I ⊗B).

Page 61: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Kronecker product: matrix operations B. Khoromskij, Leipzig 2005(L5) 120

Furthermore, since

exp(A⊗ I) =∞∑

k=0

(A⊗ I)k

k!, exp(I ⊗B) =

∞∑m=0

(I ⊗B)m

m!

the arbitrary term in exp(A⊗ I) exp(I ⊗B) is given by

1k!

1m!

(A⊗ I)k(I ⊗B)m.

Imposing

(A⊗I)k(I⊗B)m = (Ak⊗Ik)(Im⊗Bm) = (Ak⊗I)(I⊗Bm) ≡ Ak⊗Bm,

we finally arrive at

1k!

1m!

(A⊗ I)k(I ⊗B)m = (1k!

Ak)⊗ (1m!

Bm).

Kronecker product: matrix operations B. Khoromskij, Leipzig 2005(L5) 121

Thm. 5.3 can be extended to the case of many-term sum

exp(A1⊗I⊗...⊗I+I⊗A2⊗...⊗I+...+I⊗...⊗I⊗Aq) = (eA1)⊗...⊗(eAq ).

Rem. 5.2. Similar properties can be shown for other analytic

functions, e.g.,

sin(In ⊗A) = In ⊗ sin(A),

sin(A⊗ Im + In ⊗B) = sin(A)⊗ cos(B) + cos(A)⊗ sin(B),

sin(A ⊗ Im + In ⊗ B) =sin(A) ⊗ sin(B + (b − a)I)

sin(b − a)+

sin(A + (a − b)I) ⊗ sin(B))

sin(a − b)

for all values a, b such that sin(a− b) = 0. Analogously, for the

function cos.

Other simple properties:

(A⊗B)T = AT ⊗BT , (A⊗B)∗ = A∗ ⊗B∗.

Page 62: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Eigenvalue problem B. Khoromskij, Leipzig 2005(L5) 122

Thm. 5.4. Let A ∈ Rm×m and B ∈ Rn×n have the eigen-data

λ1, ..., λm, u1, ..., um, and µ1, ..., µn, v1, ..., vn, respectively. Then

A⊗B has the eigenvalues λjµk with the corresponding

eigenvectors uj ⊗ vk, 1 ≤ j ≤ m, 1 ≤ k ≤ n.

Thm. 5.5. Under the conditions of Thm. 5.4 the

eigenvalues/eigenfunctions of A⊗ In + Im ⊗B are given by

λj + µk and uj ⊗ vk, respectively.

Proof. Due to Thm. 5.4 we have

(A⊗ In + Im ⊗B)(uj ⊗ vk) = (A⊗ In)(uj ⊗ vk) + (Im ⊗B)(uj ⊗ vk)

= (Auj)⊗ (Invk) + (Imuj)⊗ (Bvk)

= (λjuj)⊗ vk + uj ⊗ (µkvk)

= (λj + µk)(uj ⊗ vk).

Lyapunov/Silvester equations B. Khoromskij, Leipzig 2005(L5) 123

For a matrix A ∈ Rm×n we use the vector representation

A → vec(A) ∈ Rmn, where vec(A) is an nm× 1 vector obtained

by “stacking” A’s columns (the FORTRAN-style ordering)

vec(A) := [a11, ..., an1, a12, ..., anm]T .

In this way, vec(A) is a rearranged version of A. For example,

we have the relation

vec(AY B) = (BT ⊗A)vec(Y ).

The matrix Sylvester equation for X ∈ Rm×n

AX + XBT = G ∈ Rm×m (52)

with A ∈ Rm×m, B ∈ Rn×n, can be written in vector form

(In ⊗A + B ⊗ Im)vec(X) = vec(G).

Page 63: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Lyapunov/Silvester equations B. Khoromskij, Leipzig 2005(L5) 124

Now the solvability conditions and certain solution methods

can be derived (cf. the results for eigenvalue problems).

Equation (52) is uniquely solvable if

λj(A) + µk(B) = 0.

Moreover, since In ⊗A and B ⊗ Im commute, we can apply all

methods proposed below to represent the inverse

(In ⊗A + B ⊗ Im)−1.

In particular, if A and B correspond to the discrete elliptic

operators in Rd with separable coefficients, we obtain the

low-rank tensor-product decomposition to the Sylvester

solution operator (cf. Lect. 7).

In the case A = B we arrive at the Lyapunov equation.

Kronecker Hadamard product B. Khoromskij, Leipzig 2005(L5) 125

Def. 5.2. Define the Hadamard product

C = A!B = ci1...iq(i1...iq)∈Id

of two tensors A, B ∈ RId

by the entry-wise multiplication

ci1...iq = ai1...iq · bi1...iq .

The following Lemma indicates the simple (but important)

property of the Hadamard product.

Lem. 5.5. Let both A and B be represented in the form

(49) with the Kronecker rank rA, rB and with V k substituted

by Ak ∈ RI

and Bk ∈ RI

, respectively. Then A!B is a tensor

with the Kronecker rank r = rArB given by

A!B =rA∑

k=1

rB∑m=1

ckcm(A1k !B1

m)⊗ ...⊗ (Aqk !Bq

m).

Page 64: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Kronecker Hadamard product B. Khoromskij, Leipzig 2005(L5) 126

Proof. It is easy to check that

(A1 ⊗B1)! (A2 ⊗B2) = (A1 !A2)⊗ (B1 !B2),

and similar for q-term products. Applying the above relations,

we obtain

A!B =

(rA∑

k=1

ck

q⊗=1

Ak

)!(

rB∑m=1

cm

q⊗=1

Bm

)

=rA∑

k=1

rB∑m=1

ckcm

(q⊗

=1

Ak

)!(

q⊗=1

Bm

)

and the assertion follows.

Kronecker Hadamard product B. Khoromskij, Leipzig 2005(L5) 127

Given tensors U ⊗ Y ∈ RI×J with U ∈ RI, Y ∈ RJ , and

B ∈ RI×L. Let T : RL → RJ be the linear operator (tensor)

that maps tensors defined on the index set L into those

defined on J .

Def. 5.3. The Hadamard “scalar” product [D, C]I ∈ RK of

two tensors D := [Di,k] ∈ RI×K and C := [Ci,k] ∈ RI×K with

K ∈ I,J ,L is defined by

[D, C]I :=∑i∈I

[Di,K]! [Ci,K],

where ! denotes the Hadamard product on the index set Kand [Di,K] := [Di,k]k∈K.

Lem. 5.6. Let U, Y, B and T be given as above. Then, with

K = J , the following identity is valid

[U ⊗ Y, T ·B]I = Y ! (T · [U, B]I) ∈ RJ . (53)

Page 65: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Kronecker Hadamard product B. Khoromskij, Leipzig 2005(L5) 128

Proof. By definition of the Hadamard scalar product we have

[U ⊗ Y, T ·B]I =∑i∈I

[U ⊗ Y ]i,J ! [T ·B]i,J

=∑i∈I

[[U ]i · Y ]i,J ! [T ·B]i,J

= Y !(∑

i∈I[U ]i[T ·B]i,J

)

= Y !(

T ·∑i∈I

[U ]i[B]i,L

),

then the assertion follows.

Identity (135) is of the great importance in the forthcoming

applications since in the right-hand side the operator T is

removed from the scalar product and, so, it applies only once.

Complexity of the HKT -matrix arithmetics B. Khoromskij, Leipzig 2005(L5) 129

Complexity issues

Let V k ∈MH,k(TI×I ,P) in (50) and let N = nd.

• Data compression.

The storage for A is only O(rdn) = O(rdN1/d) with

r = O(logα N), α > 0.Hence, we enjoy the sub-linear complexity.

• Matrix-by-vector complexity of Ax, x ∈ CN .

For general x one has the linear cost O(rdkN log n).

If x = x1 × ...× xd, xi ∈ Cn, we again arrive at sub-linear

complexity O(rdkn log n) = O(rdkN1/d log n).

• Matrix-by-matrix complexity of AB and A!B.

The H-matr. struct. of the Kronecker factors leads to

O(r2dn logq n) = O(r2dN1/d logq n) op. instead of O(N3).

Page 66: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

How to construct a Kronecker product ? B. Khoromskij, Leipzig 2005(L5) 130

1. Singular-value decomposition (SVD) and ACA methods in

the case of two-fold decompositions (q = 2).

2. Analytic approximation to the function-generated q-th

order tensors (q ≥ 2), (see Lect. 6).

Def. 5.4. Given the multi-variate function

g : Rd → R with d = qp, p, q ∈ N, q ≥ 2,

defined in a hypercube

Ω = (ζ1, ..., ζq) ∈ Rd : ‖ζ‖∞ ≤ L, = 1, ..., q ∈ Rd, L > 0, where

‖ · ‖∞ means the ∞-norm of ζ ∈ Rp. On the index set Id, we

introduce the function-generated q-th order tensor

A ≡ A(g) := [ai1...iq ] ∈ RId

with ai1...iq := g(ζ1i1 , ..., ζ

qiq

). (54)

3. Algebraic recompression methods: iterated SVD/ACA,

iterated rank-r approximation to high order tensors (in

general, convergence theory is still open question).

How to construct a Kronecker product ? B. Khoromskij, Leipzig 2005(L5) 131

The incremental rank-one approximation algorithm:

(a) Fit the original tensor A by a rank-one tensor A1;

(b) Subtract A1 from the original tensor A;

(c) Approx. the residue A−A1 with another rank-one tensor.

On each step of the algorithm one solves the minimisation

problem: Find V ∈ Rnp such that

1/2||A− V 1 ⊗ · · · ⊗ V q||2F → min .

It can be solved by the generalised Rayleigh-Newton iteration.

Def. 5.5. We say that a tensor A is orthogonally decomposable if it can

be written as the sum (49) of r rank-one tensors s.t. for = 1, ..., q,

(V k , V

k′ ) = 0 for k = k′, (k, k′ = 1, ..., r).

Thm. 5.6. (Zhang, Golub) If a tensor of order q ≥ 3 is

orthogonally decomposable, then this decomposition is

unique, and the incremental rank-one approximation

algorithm correctly computes it.

Page 67: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Some heuristic algorithms B. Khoromskij, Leipzig 2005(L5) 132

Given a q-th order tensor A having the Kronecker rank m, one

can try to find the best approximation of A by a tensor of

rank r < m. This can be reduced to solving the minimisation

problem: Find V k ∈ Rnp

s.t.

12||A−

r∑k=1

V 1k ⊗ · · · ⊗ V q

k ||2F → min .

It can be realized by using, say, the Newton iteration applied

to the corresponding Lagrange equation. Under certain

simplifications, the constraint minimisation algorithm can be

implemented in O(m2np + (rmq)3) operations.

There is not too much converg. theory behind this algorithm,

moreover the solution is not unique (cf. Prop. 4.7).

However, in most practically interesting cases this algorithm

does a job.

Some conclusions B. Khoromskij, Leipzig 2005(L5) 133

Summarise:

Basic linear algebra can be performed using one-dimensional

operations, thus avoiding the exponential scaling in the

dimension d.

Bottleneck:

Lack of tractable algebraic methods for the robust multi-fold

Kronecker decomposition of high order tensors (for d ≥ 3) as

well as for the HKT-recompression in matrix operations.

However, there are quite satisfactory heuristic algorithms.

Observation:

Analytic approximation methods are of principal importance.

Classical example: an approximation by Gaussian sums.

Recent proposals: Sinc meth., approximation by exponential

sums, wavelet recompression, “approximate approximation”.

Page 68: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Literature to Lecture 5 B. Khoromskij, Leipzig 2005(L5) 134

1. W. Hackbusch and B.N. Khoromskij: Hierarchical Kronecker Tensor-Product Approximation to a Class

of Nonlocal Operators in High Dimensions. Parts I/II. Preprints 29/30, MPI MIS, Leipzig 2005.

2. W. Hackbusch, B.N. Khoromskij and E. Tyrtyshnikov: Hierarchical Kronecker tensor-product approximation.

Preprint 35, MPI MIS, Leipzig 2003 (JNA, to appear).

3. B.N. Khoromskij: Structured data-sparse approximation to high order tensors arising from the deterministic

Boltzmann equation. Preprint 4, MPI MIS, Leipzig 2005.

4. C. Van Loan: The ubiquitous Kronecker product. J. of Comp. and Applied Math. 123 (2000) 85-100.

5. T. Zhang and G.H. Golub: Rank-one approximation to high order tensors. SIAM J. Matrix Anal. Appl.

23 (2001), 534-550.

URL: http://personal-homepages.mis.mpg.de/bokh

http://www.mis.mpg.de/scicomp/Fulltext/Khoromskij/khor5.ps

Literature to Lecture 5 B. Khoromskij, Leipzig 2005(L5) 135

Everything is more simple than one thinksbut at the same time more complex

than one can understand.J. W. von Goethe (1749-1832)

An Introduction to Structured Tensor-Product

Representation of Discrete Nonlocal Operators

Part II: Approximation of Operators and Related Matrices

Boris N. Khoromskij

University of Leipzig/MPI MIS, summer 2005

http://personal-homepages.mis.mpg.de/bokh

http://www.mis.mpg.de/scicomp/Fulltext/Khoromskij

Page 69: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Lect. 6. HKT representation to integral operators B. Khoromskij, Leipzig 2005 136

In this lecture, we collect some known algebraic properties of

q-th order (q > 2) decomposed tensors, especially, in

comparison with the case q = 2, and then discuss the analytic

approximation methods:

(i) properties of multi-way decompositions,

(ii) separation methods for function-generated tensors,

(iii) approximation to the Galerkin matrices,

(iv) examples of integral operators (IOs) and numerics.

Analytic approximation methods may provide the decomposed

tensors with relatively high Kronecker rank, which can be

then reduced by algebraic “recompression algorithms”.

We stress that in spite of existing implementations (which are

usually not in public domain), the robust algebraic methods of

low-rank tensor decompos. still require further developments.

Why the multi-factor analysis is difficult ? B. Khoromskij, Leipzig 2005(L6) 137

Def. 6.1. The minimal number r in the representation

A =r∑

k=1

V 1k ⊗ · · · ⊗ V q

k , V k ∈ R

np

, (55)

is called a tensor rank of the q-th order tensor A. We suppose

that V k ∈ Rn (i.e., p = 1).

Finding of a tensor rank r and the corresponding

decomposition(s) for a high dimensional q-th order tensor is

the main issue of the multi-factor analysis !

For q = 2, Def. 6.1 coincides with the standard definition of

rank(A), which can be calculated by finite algorithm. The

corresponding tensor decomposition can be computed by the

SVD in O(n3) operations. Under the orthogonality

requirement this decomposition is unique.

Page 70: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Little analogy between the cases q ≥ 3 and q = 2 B. Khoromskij, Leipzig 2005(L6) 138

If q > 2, the situation changes dramatically.

I. rank(A) depends on the number field (say, R or C).

II. We do not know any finite algorithm to compute

r = rank(A), except simple bounds:

0 ≤ rank(A) ≤ nq−1.

III. For fixed q and n we do not know the exact value of

maxrank(A). J. Kruskal ’75 proved that:

– for any 2× 2× 2 tensor we have maxrank(A) = 3 < 4;– for 3× 3× 3 tensors there holds maxrank(A) = 5 < 9.

IV. “Probabilistic” properties of rank(A): in the set of 2× 2× 2tensors there is about 79% of rank-2 tensors and 21% of

rank-3 tensors, while rank-1 tensors appear with probability 0.

Clearly, for n× n matrices we have Prank(A) = n = 1.

Little analogy between the cases q ≥ 3 and q = 2 B. Khoromskij, Leipzig 2005(L6) 139

V. However, it is possible to prove very important uniqueness

property within the equivalence classes.

Two representations like (55) are considered as equivalent (essential

equivalence) if either

(a) they differ in the order of terms or

(b) for some set of paramers ak ∈ R such that

qQ=1

ak = 1 (k = 1, ..., r),

there is a transform V k → a

kV k .

A simplified version of the general uniqueness result is the

following (all factors have the same full rank r).

Prop. 1. (J. Kruskal, 1977) Let for each = 1, ..., q, the vectors V k ,

(k = 1, ..., r) with r = rank(A), are linear independent. If

(q − 2)r ≥ q − 1,

then the decomposition (55) is uniquely determined up to the

equivalence (a) - (b) above.

Page 71: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Function-generated tensors B. Khoromskij, Leipzig 2005(L6) 140

Def. 5.4. (cf. Lect. 5) Given the multi-variate function

g : Ω→ R with d = qp, p, q ∈ N, q ≥ 2,

Ω = (ζ1, ..., ζq) ∈ Rd : ‖ζ‖∞ ≤ L, = 1, ..., q with L > 0,ζ ∈ Rp. Let ζ1

i1, ..., ζq

iq be the set of collocation points leaving

on the tensor-product lattice in Ω and indexed by Id. We

recall the defintion of function-generated q-th order tensor:

A ≡ A(g) := [ai1...iq ] ∈ RId

with ai1...iq := g(ζ1i1 , ..., ζ

qiq

). (56)

First, we introduce a low Kronecker rank approximation to

the q-th order tensor A = A(g) ∈ RId

with |Id| = nqp,

Ar := A(gr), gr :=r∑

k=1

Φ1k(ζ1) · · ·Φq

k(ζq) ≈ g,

where gr is a separable approximation to g.

Function-generated tensors B. Khoromskij, Leipzig 2005(L6) 141

We assume that the error g − gr can be estimated in the

L∞(Ω)- or in L2(Ω)-norm, ‖u‖L2 :=√∫

Ωu2(ζ)dζ.

In particular, this might correspond to the Nystrom

discretisation of IOs in Rd (with q = d, p = 2),

(Au) (x) :=∫

Ω

g(x, y)u(y)dy, x, y ∈ Ω ∈ Rd.

In the latter case we have

gr :=r∑

k=1

Φ1k(x1, y1) · · ·Φd

k(xd, yd).

Furthermore, we denote Id = I ×J where I, J are associated

with x ∈ RI and y ∈ RJ .

Page 72: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Function-generated tensors B. Khoromskij, Leipzig 2005(L6) 142

For the error analysis of the Kronecker approximand we make

use of the Euclidean (Frobenius), and ‖ · ‖∞- tensor norms

‖x‖2 :=√∑

i∈Ix2i , ‖x‖∞ := max

i∈I|xi|, x ∈ R

I .

Let g − gr be smooth enough. Then for a quasi-uniform

distribution of collocation points we have

‖A(g)−A(gr)‖2 ≤ CN

1/2I N

1/2J

Lq/2‖g − gr‖L2 . (57)

The next lemma describes relations between the

approximation error ‖g − gr‖ evaluated in different norms and

the corresponding error ‖A(g)−A(gr)‖ of the Kronecker

product representation.

Function-generated tensors B. Khoromskij, Leipzig 2005(L6) 143

Lem. 6.1. We have ‖A−Ar‖∞ ≤ ‖g − gr‖L∞(Ω).

For any x ∈ RI, y ∈ RJ , the consistency error A−Ar can be

bounded by

|〈(A−Ar)x, y〉| ≤ ‖g − gr‖L∞(Ω) ‖x‖1‖y‖1≤ N

1/2I N

1/2J ‖g − gr‖L∞(Ω) ‖x‖2‖y‖2, (58)

|〈(A−Ar)x, y〉| ≤ CN

1/2I N

1/2J

Lq/2‖g − gr‖L2(Ω) ‖x‖2‖y‖2. (59)

Proof. The first assertion follows by the construction of Ar,

‖A− Ar‖∞ = max(i1,...,id)∈Id

|g(ζ1i1 , ..., ζ

qiq

)−r∑

k=1

Φ1k(ζ1

i1) · · ·Φqk(ζq

iq)|

≤ ‖g − gr‖L∞(Ω) .

Now we readily obtain

|〈(A−Ar)x, y〉| ≤ ‖g − gr‖L∞(Ω)

∑i∈I, j∈J

|xiyj| ≤ ‖g − gr‖L∞(Ω) ‖x‖1‖y‖1,

Page 73: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Function-generated tensors B. Khoromskij, Leipzig 2005(L6) 144

which proves (58) since ‖x‖1 ≤ N1/2I ‖x‖2 and ‖y‖1 ≤ N

1/2J ‖y‖2.

Now, applying the Cauchy-Schwarz inequality we have

|〈(A−Ar)x, y〉| ≤∑

i∈I, j∈J|(aij − ar,ij)xiyj|

≤ ‖A−Ar‖2√ ∑

i∈I, j∈Jx2i y

2j ≤ ‖A−Ar‖2‖x‖2‖y‖2.

Then (59) follows from the first norm equivalence in (57).

In many applications the generating function g(ζ) depends

only on a few scalar variables which are functionals of ζ.

Ex. 6.1. Consider a function depending only on one scalar

parameter,

g(ζ) = G(ρ(ζ)) where G : [0, a] → R

with ρ : [−L, L]p → [0, a], a > 0.

Function-generated tensors B. Khoromskij, Leipzig 2005(L6) 145

In the case ρ(ζ) = ‖ζ‖2, the separable approximation gr(ζ) can

be derived from an approximation Gr to the uni-variate

function G(ρ), by exponential sums.

It is easy to see that the approximation error g − gr arising in

Lem. 6.1 can be estimated via the corresponding error G−Gr.

Lem. 6.2. The following estimates are valid

‖g − gr‖L∞ = ‖G−Gr‖L∞ ,

‖g − gr‖L2(Ω) ≤ CLq−12 ‖G−Gr‖L2[0,a].

Proof. The first statement is trivial. The second bound is

obtained by passing to integration in the q-dimensional

spherical coordinates.

Page 74: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Galerkin discretisation B. Khoromskij, Leipzig 2005(L6) 146

Given the integral operator A : L2(Ω) → L2(Ω) in Rd, d ≥ 2,

(Au) (x) :=∫

Ω

g(x, y)u(y)dy, x, y ∈ Ω := [0, 1]d

with the shift-invariant kernel function g(x, y) = g(|x− y|).A principal ingredient in the HKT representation of the

Galerkin discretisations in Rd is a separable approximation of

the multi-variate function representing the kernel of an IO.

Clearly, g(x, y) can be represented in the form

g(x, y) = G(ζ1, ..., ζd) ≡ g

(√ζ21 + ... + ζ2

d

),

where ζ = |x − y| ∈ [0, 1], = 1, ..., d.

With fixed 0 ≤ α0 < 1, we introduce the auxiliary function

F (ζ1, ..., ζd) := (ζ1 · · · ζd−1)α0G(ζ1, ..., ζd). (60)

Galerkin discretisation B. Khoromskij, Leipzig 2005(L6) 147

We suppose that a multi-variate function F : Rd → R can be

well approximated by a separable expansion

Fr(ζ1, ..., ζd) :=r∑

k=1

Φ1k(ζ1) · · ·Φd

k(ζd) ≈ F, (61)

where the set of functions Φk : = 1, ..., d, k = 1, ..., r with

Φk : [0, 1] → R is fixed or can be chosen adaptively.

We apply a Galerkin scheme by tensor-product test functions

φi(x1, ..., xd) = φi11 (x1)···φid

d (xd), i = (i1, ..., id), i ∈ In := 1, ..., n.

Now we approximate the Galerkin stiffness matrix

A = (Aφi, φj)L2i,j∈Idn∈ R

N×N , N = nd,

by a matrix A(r) of the form A(r) =r∑

k=1

V 1k ⊗ · · · ⊗ V d

k ≈ A.

Page 75: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Galerkin discretisation B. Khoromskij, Leipzig 2005(L6) 148

Here the V k , = 1, ..., d, are n× n matrices given by

V k =

∫ 1

0

|x − y|−α Φk(|x − y|)φi

(x)φj

(y)dxdy

n

i,j=1

(62)

with α = α0 ≥ 0, = 1, ..., d− 1, and αd = 0 (see (60)).

Def. 6.2. A function g(x, y), x, y ∈ Rd, is called

asymptotically smooth if there exists γ ≥ 1, and p ∈ R such

that for all x, y ∈ Rd, x = y, and all multi-indices α, β such that

|α|+ |β| > 0 with |α| = α1 + ... + αd, we have

|∂αx ∂β

y g(x, y)| ≤ Cα!β!γ|α|+|β||x− y|−p−|α|−|β|.

The next lemma shows that the error ‖A−A(r)‖ with respect

to usual norms is directly related to the accuracy ‖F − Fr‖∞of the separable approximation (61) of F .

Galerkin discretisation B. Khoromskij, Leipzig 2005(L6) 149

Lem. 6.3. Let (61) be valid, then for any i, j ∈ Idn, we have

|ai,j − ari,j| ≤ ‖F − Fr‖∞

d∏=1

∥∥∥|x − y|−α φi

(x)φj

(y)∥∥∥

L1([0,1]×[0,1])

for the components of A−A(r).

Let us further assume that the function

g,k(u, v) := |u− v|−αΦk(|u− v|), (u, v) ∈ [0, 1]2,

is asymptotically smooth for = 1, ..., d, k = 1, ..., r. Then, for

low-order piecewise polynomial Galerkin basis functions, V k

can be approximated by a rank-m H-matrix V k with an error

‖V k − V

k ‖ ≤ Cηm for some η < 1.

Page 76: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Galerkin discretisation B. Khoromskij, Leipzig 2005(L6) 150

Proof. By construction we obtain

|ai,j − ari,j| =

∣∣∣∣∣∫

Ω×Ω

(F − Fr)

(d∏

=1

|x − y|−α

)φi(x)φj(y)dxdy

∣∣∣∣∣≤ ‖F − Fr‖∞

∥∥∥∥∥(

d∏=1

|x − y|−α

)φi(x)φj(y)

∥∥∥∥∥L1(Ω×Ω)

= ‖F − Fr‖∞d∏

=1

∥∥∥|x − y|−α φi

(x)φj

(y)∥∥∥

L1([0,1]×[0,1]),

where the last eq. follows by inserting the tensor-product

basis and by separating the 2d-dimensional integral.

Second, V k given by (62) appears to be the exact Galerkin

stiffness matrix for an IO with the kernel function g,k(u, v)(u, v) ∈ [0, 1]× [0, 1]. Since g,k(u, v) is supposed to be

asymptotically smooth, the result follows by the conventional

theory of the H-matrix approximation.

Galerkin discretisation B. Khoromskij, Leipzig 2005(L6) 151

Note that due to Lem. 6.3, ‖A−A(r)‖ can be easily estimated

in the Frobenius, l2 or l∞ matrix norms. In particular,

‖A−A(r)‖∞ ≤ nd‖F−Fr‖∞d∏

=1

∥∥∥|x − y|−αφi

(x)φj

(y)∥∥∥

L1([0,1]×[0,1]).

Several methods of separable approximations to multi-variate

functions are presented in Part I.

In general, approximability property (61) can be validated by

using the tensor-product Sinc interpolation, where the factor

Φk(|u− v|) can be proved to be asymptotically smooth.

For the class of kernel functions approximated by the

quadrature-type methods, the factor Φk(|u− v|) even appears

to be globally smooth (indeed, it is the entire function).

Page 77: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Galerkin discretisation B. Khoromskij, Leipzig 2005(L6) 152

Lem. 6.4. For both the tensor-product Sinc-interpolation

and for the quadrature methods the function g,k(u, v) (cf.

Lem. 6.3) is asymptotically smooth (AS).

Proof. In the first case we have

g,k(u, v) = |u− v|−αSk,h(φ−1(|u− v|)), u, v ∈ [0, 1],

where Sk,h refers for the k-th Sinc function with step-size h,

and φ−1(x) = arsinh(arcosh( 1x )). Since Sk,h(x), x ∈ R, is

holomorphic in x, and since the factor |u− v|−α is AS, we

conclude that g,k(u, v) has the same property.

Applying quadrature method, we obtain the entire function

Φk(|u− v|) = exp(−tk|u− v|2), tk > 0.

Then the previous argument completes the proof.

Galerkin discretisation B. Khoromskij, Leipzig 2005(L6) 153

Lem. 6.3 and 6.4 prove the existence of a low Kronecker

rank HKT approximation to the class of multi-dimensional

integral operators.

Given a tolerance ε > 0, in general, we have the bound

r = O([

log(

1h

)log

(1ε

)log

(log

)]d−1)

,

where h = O(n−1) is the mesh-size of the FE discretisation.

In the case of translation-invariant kernels, we obtain a

dimensionally independent bound

r = O(

log n log(

)log

(log

)),

see examples below.

Page 78: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Main examples B. Khoromskij, Leipzig 2005(L6) 154

Toward a separable approximation to the multi-variate

functions

1x1 + ... + xd

and1√

x21 + ... + x2

d

(xi > 0, i = 1, ..., d).

Ex. 6.1. In the first case, to apply the Sinc method, we

make use of the Laplace integral transform

=∫

R+

e−ρtdt (ρ > 0) (63)

with the integrand f(t) = e−ρt, assuming that ρ ∈ [1, R], R > 1.

In order to apply the improved error estimate , we make use

of substitutions t = log(1 + eu) and u = sinh(w) to obtain

=∫

R

f2(w)dw with f2(w) =cosh(w)

1 + e− sinh(w)e−ρ log(1+esinh(w)).

(64)

Main examples B. Khoromskij, Leipzig 2005(L6) 155

The decay of f2 on the real axis is

f2(w) ≈ 12ew− ρ

2 ew

as w →∞; f2(w) ≈ 12e|w|− 1

2 e|w|as w → −∞,

corresponding to C = 12 , b = min1, ρ/2, a = 1 in Thm. 2.6.

Lem. 6.5. (Hackbusch, BNK) If ρ ∈ [1, R], the choice

δ = δ(R) = O(1/ log(R)), a = 1, b = 1/2 in Thm. 2.6 (with the

corresponding value of h) implies the uniform quadrature

error estimate

∣∣∣∣1ρ − IM (f2, h)∣∣∣∣ Ce

− π2M

log(3R) log(π2M) . (65)

Page 79: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Main examples B. Khoromskij, Leipzig 2005(L6) 156

In the case of 1/ρ = 1x1+...+xd

, the estimate (114) implies that

an approximation of accuracy ε is obtainable with

M ≤ O (log( 1

ε ) · log R), (66)

provided that 1 ≤ x1 + ... + xd ≤ R, which can be achieved by a

proper scaling. The numerical results even support the better

estimate M ≤ O (log( 1

ε ) + log R)

(see Fig. 19, 13).

0 200 400 600 800 1000−8

−6

−4

−2

0

2

4

6x 10

−6

0 200 400 600 800 1000−2.5

−2

−1.5

−1

−0.5

0

0.5

1x 10

−8

0 200 400 600 800 1000−1

−0.5

0

0.5

1

1.5

2

2.5

3x 10

−13

Figure 12: The absolute quadrature error for (64) with 1 ≤ ρ ≤ 103, and

with M = 16 (left), M = 32 (middle), M = 64 (right).

Main examples B. Khoromskij, Leipzig 2005(L6) 157

0 0.5 1 1.5 2

x 104

−8

−6

−4

−2

0

2

4

6

8x 10

−6

0 0.5 1 1.5 2

x 104

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2x 10

−7

0 0.5 1 1.5 2

x 104

−4

−2

0

2

4

6x 10

−10

Figure 13: The absolute quadrature error for (64) with 1 ≤ r ≤ 18000,

and with M = 16 (left), M = 32 (middle), M = 64 (right).

Lem. 6.5 also shows that the separation rank r = 2M + 1depends only linear-logarithmically on both the tolerance

ε > 0 and the upper bound R of ρ = x1 + ... + xd. Hence, there

is no dependence on the dimension d.

Page 80: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Main examples B. Khoromskij, Leipzig 2005(L6) 158

Ex. 6.2. In the case of Newton potential 1/√

x21 + ... + x2

d, we

make use of the Gauss integral

=2√π

∫R+

e−ρ2t2dt (ρ ∈ [1, R]) . (67)

To obtain robustness in ρ, we rewrite the Gauss integral (67)

using substitutions t = log(1 + eu) and u = sinh(w),

=∫

R

f(w)dw with f(w) := cosh(w)F (sinh(w)) (68)

with

F (u) :=2√π

e−ρ2 log2(1+eu)

1 + e−u.

Main examples B. Khoromskij, Leipzig 2005(L6) 159

Lem. 6.6. Let δ < π/2, ρ ≥ 1. Then for the function f from

(271) we have f ∈ H1(Dδ).

In addition, Thm. 2.6 is satisfied with a = 1.

The improved (2M + 1)-point quadrature with the choice

δ(ρ) = πC+log(ρ) allows the error bound∣∣∣∣1ρ − IM (f, h)

∣∣∣∣ ≤ C1 exp(− π2M

(C + log(ρ)) log M

). (69)

Proof. It is easy to check that f is holomorphic in Dδ and

N(f, Dδ) <∞ uniformly in ρ (with the choice δ = δ(ρ)). Now

we check the double-exponential decay of the integrand as

|w| → ∞ and then apply Thm. 2.6, where

δ = δ(ρ) =π

C + log(ρ).

Page 81: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Main examples B. Khoromskij, Leipzig 2005(L6) 160

We apply (69) and obtain the bound (70),

M ≤ O (log( 1

ε ) · log R). (70)

Hence again there is no dependence on the dimension d.

Numerical examples for this quadrature with values ρ ∈ [1, R],R ≤ 5000, are presented in Fig. 21.

0 50 100 150 200−4

−3

−2

−1

0

1

2

3x 10

−8

0 200 400 600 800 1000−3

−2

−1

0

1

2

3

4x 10

−7

0 1000 2000 3000 4000 5000−5

0

5x 10

−7

Figure 14: The absolute quadrature error for M = 64 with R = 200 (left),

R = 1000 (middle), R = 5000 (right).

Further examples B. Khoromskij, Leipzig 2005(L6) 161

Again, we observe almost linear error growth in ρ. Similar

results were obtained in the case R > 5000 manifesting a

rather stable behaviour of the quadrature error with respect

to R.

Ex. 6.3. log(x + y)

In boundary element methods (BEM), one is interested in a

low separation rank representation of the kernel function

s(x, y) = log(x + y), x ∈ [0, 1], y ∈ [h, 1] with some small

mesh-size parameter h > 0. A representation like

1x + y

=k∑

m=1

Φm(x)Ψm(y) + δk with |δk| ≤ ε (71)

can be constructed by means of the quadrature applied to the

integral (64) with ρ = x + y and k = 2M + 1. Let ψm be the

Page 82: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Further examples B. Khoromskij, Leipzig 2005(L6) 162

anti-derivatives of a function Ψm. Integration of (71) yields

log(x + y) =

y∫1−x

dt

x + t=

y∫1−x

(k∑

m=1

Φm(x)Ψm(t) + δk

)dt

=k∑

m=1

Φm(x)[ψm(y)− ψm(1− x)] + Sk

= Φ0(x) +k∑

m=1

Φm(x)ψm(y) + Sk

with Φ0(x) = −k∑

m=1Φm(x)ψm(1− x) and |Sk| =

∣∣∣∣∣ y∫1−x

δkdt

∣∣∣∣∣ ≤ ε.

This resulting representation of log(x + y) has the separation

rank k + 1 and the same accuracy ε as (71).

Further examples B. Khoromskij, Leipzig 2005(L6) 163

Ex. 6.4. Helmholtz kernel in Rd

Given κ ∈ R, define the Helmholtz kernel function

g(x, y) :=cos(κ|x− y|)|x− y| = e

eiκ|x−y|

|x− y| for (x, y) ∈ [0, 1]d × [0, 1]d

in Cartesian coordinates x = (x1, ..., xd), y = (y1, ..., yd) ∈ Rd. The

Sinc approximation can be applied in the case of a weakly

admissible block (in the H-matrix techniques) w.r.t. the

transformed variables ζ1, ..., ζd . For (ζ1, ..., ζd) ∈ [0, 1]d, define

G(ζ1, ..., ζd) := g(x, y), ζ = |x − y|, = 1, ..., d,

which implies

G(ζ1, ..., ζd) := cos(

κ√

ζ21 + ... + ζ2

d

)/√

ζ21 + ... + ζ2

d .

We approximate the modified function

F (ζ1, ..., ζd) := (ζ1 · ... · ζd−1)α0G(ζ1, ..., ζd), 0 < α0 < 1,

Page 83: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Further examples B. Khoromskij, Leipzig 2005(L6) 164

on the domain Ω1 := [0, 1]d−1 × [h, 1], where h > 0 is a small

(mesh) parameter.

Now we apply Thm. 4.5 with δ = 1/| log h| to construct the

approximation GM (x) via the interpolation of F and obtain

|G(x)−GM (x)| ≤d−1∏=1

x−α0

∣∣EM (F, h)(φ−1(x))∣∣ (72)

≤ Chα0(1−d)| log h|Λd−1M N0(F, Dδ) e−πM/(| log h| log M)

with ζ ∈ (0, 1]d.

For this example N0(F, Dδ) = O(eκ), while the Kronecker rank

is given by r = (2M + 1)d−1. Clearly, for a large κ, the bound

(72) does not provide a satisfactory complexity.

Literature to Lecture 6 B. Khoromskij, Leipzig 2005(L6) 165

1. W. Hackbusch and B.N. Khoromskij: Hierarchical Kronecker Tensor-Product Approximation to a Class

of Nonlocal Operators in High Dimensions. Parts I/II. Preprints 29/30, MPI MIS, Leipzig 2005.

2. D. Braess and W. Hackbusch: Approximation of 1x

by exponential sums in [1, ∞). To appear in IMA JNA.

3. W. Hackbusch, B.N. Khoromskij and E. Tyrtyshnikov: Hierarchical Kronecker tensor-product approximation.

Preprint 35, MPI MIS, Leipzig 2003 (JNA, to appear).

4. J. B. Kruskal: Three-way arrays: Rank and uniqueness of trilinear decompositions. Linear Algebra

Appl., 18 (1977), 95-138.

http://personal-homepages.mis.mpg.de/bokh

http://www.mis.mpg.de/scicomp/Fulltext/Khoromskij/khor6.ps

Page 84: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Lect. 7. Structured Representation to Matrix-Valued Functions B. Khoromskij, Leipzig 2005 166

The matrix-valued functions (MVF) of the discrete (elliptic)

operator L arise in wide range of applications. Structured

tensor-product representations are developed for several

classes of MVFs:

F1(L) :=L−α, α > 0,

F2(L) :=e−tL,

F3,k(L) := cos(t√L)L−k, k ∈ N,

F4(L) :=∫ ∞

0

e−tL∗Ge−tLdt,

F5(L) := sign(L).

Both the discrete elliptic inverse L−1 and the matrix

exponential e−tL play an important role in numerical PDEs.

Usually MVFs appear to be fully populated, hence data-sparse

formats are needed for their efficient representation.

Representation of Operators B. Khoromskij, Leipzig 2005(L7) 167

There are different methods to represent MVFs (set L = A):

• In the case of diagonalisable matrices, i.e., A = T−1DT

with D = diagd1, ..., dn - diagonal, one defines

F (A) = T−1F (D)T, F (D) = diagF (d1), ..., F (dn).• Dunford-Cauchy integral for analytic functions

F (A) =1

2πi

∫Γ

F (z)(zI −A)−1dz, Γ ∈ C.

• Laplace type transform

F (A) =∫

R

f(t)e−tAdt.

• Transforms via trigonometric kernels

F (A) =∫

R

[a(t) cos(tA) + b(t) sin(tA)]dt.

• Polynomial expansions or/and nonlinear iterations.

Page 85: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Examples of matrix-valued functions B. Khoromskij, Leipzig 2005(L7) 168

Ex. 7.1. The solution operator to the initial value parabolic

problem∂u

∂t+ Lu(t) = 0, u(0) = u0 ∈ X, (73)

is given by

T (t;L) = e−tL =∫

Γ

e−zt(zI − L)−1dz,

where L is an elliptic operator (say, L = −∆) in a Hilbert

space X and u(t) is a vector-valued function u : R+ → X.

Given the initial vector u0, the solution of the initial value

problem can be represented by u(t) = T (t;L)u0.

A simple example of a parabolic PDE is the 1D heat equation

∂u

∂t− ∂2u

∂x2= 0, u : R+ × [0, 1]→ R

with the corresponding boundary and initial conditions.

Examples of matrix-valued functions B. Khoromskij, Leipzig 2005(L7) 169

Ex. 7.2. Initial-value problem for the second order

differential equation with an operator coefficient

u′′(t) + Lu(t) = 0, u(0) = u0, u′(0) = 0,

has the solution operator

C(t;L) := cos(t√L) =

∫Γ

cos(t√

z)(zI − L)−1dz,

(the hyperbolic operator cosine family), so that

u(t) = C(t;L)u0.

It represents the function-to-operator map cos(t√·)→ C(t;L).

An example of a hyperbolic PDE is the classical wave eq.

∂2u

∂t2− ∂2u

∂x2= 0

subject to the corresponding boundary and initial conditions.

Page 86: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Examples of matrix-valued functions B. Khoromskij, Leipzig 2005(L7) 170

Ex. 7.3. For the boundary value problem

d2u

dx2− Lu = 0, u(0) = 0, u(1) = u1, (74)

in a Hilbert space X, the solution operator is the normalised

hyperbolic operator sine family

E(x;L) :=(sinh(

√L))−1

sinh(x√L) =

∫Γ

sinh(x√

z)sinh(

√z)

(zI − L)−1dz,

so that u(x) = E(x;L)u1.

The simplest PDE of the type (74) is the Laplace equation in

a cylindric domain:

d2u

dx2+

d2u

dy2= 0, x ∈ [0, 1], y ∈ [c, d],

u(0, y) = 0, u(1, y) = u1(y).

Rem. 7.1 Constructions 7.1-7.3 are useful to avoid time

stepping and hence allow parallel (in time) computations.

Examples of matrix-valued functions B. Khoromskij, Leipzig 2005(L7) 171

Ex. 7.3. For the Sylvester matrix equation

AX + XB = G, (A, B, G ∈ Rn×n given)

the solution X ∈ Rn×n is given by the integral

X = F(A, B)G :=∫ ∞

0

e−tAGe−tBdt,

supposing that A, B provide existence of this integral (cf.

Lect. 5). The (nonlinear) Riccati matrix equation

AX + XA + XFX = G, (75)

where A, F, G ∈ Rn×n are given and X ∈ Rn×n is the unknown

matrix, can be solved by Newton’s iteration. At each iteration

step the Lyapunov equation has to be solved (Xk → X)

(A− FXk)Xk+1 + Xk+1(A− FXk) = −XkFXk + G.

Page 87: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Examples of matrix-valued functions B. Khoromskij, Leipzig 2005(L7) 172

Ex. 7.5. Let A ∈ Rn×n be a matrix whose spectrum σ(A)does not intersect the imeginary exis. The matrix function

F (A) = sign(A) is defined by

sign(A) :=1πi

∫Γ+

(zI −A)−1dz − I (76)

with Γ+ being any simply connected closed curve in C whose

interior contains all eigenvalues of A with positive real part.

The HKT representation to the MVF sign(A) is based on an

efficient quadrature for the integral

sign(A) =1cf

∫R+

f(tA)t

dt.

Efficiet numerical implementation is possible for certain

functions f having trigonometric structure (cf. Lect. 8).

Examples of matrix-valued functions B. Khoromskij, Leipzig 2005(L7) 173

Ex. 7.6. A negative fractional power of A is represented by

A−σ =1

Γ(σ)

∫ ∞

0

tσ−1e−tAdt, σ > 0, (77)

provided that the integral exists.

With the choice A = −∆, the representation (77) would be of

the particular interest in the cases:

(a) σ = 1 (inverse Laplacian),

(b) σ = 1/2 (preconditioning for the Laplace-Beltrami

operator (−∆)1/2, and for the hypersingular integral operator,

e.g., in BEM applications),

(c) σ = 2 (inverse biharmonic operator).

A positive fractional power of A, say A1/2, can be represented

by a simple factorisation

A1/2 = A A−1/2.

Page 88: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Examples of matrix-valued functions B. Khoromskij, Leipzig 2005(L7) 174

Ex. 7.7. In some cases iterative schemes (with possible

recompression at each iteration) can be applied.

(a) An approximation to A−1: given X0 ∈ Rn×n, the

Newton-Schulz iteration

Xk+1 = Xk(2I −AXk), k = 1, 2, ... (78)

converges to A−1 locally quadratically (cf. anylisis below).

Iteration (78) is nothing but the Newton method

Ψ′(Xk)(Xk+1 −Xk) = −Ψ(Xk)

for solving the nonlinear matrix equation

Ψ(X) := A−X−1 = 0.

In fact, Ψ(X + δ)−Ψ(X) = X−1δ(X + δ)−1 providing

Ψ′(Xk)(δ) = X−1k δX−1

k . Now (78) follows from

Xk+1 −Xk = −Xk(A−X−1k )Xk.

Examples of matrix-valued functions B. Khoromskij, Leipzig 2005(L7) 175

(b) Newton-Schulz iteration scheme to approximate sign(A):

Xk+1 = Xk +12[I − (Xk)2

]Xk , X0 = A/||A||2. (79)

For diagonalisable matrices we have locally quadratic

convergence Xk → sign(A) (see the analysis below).

This scheme was already successfully applied in many-particle

calculations.

The above mentioned schemes (a) and (b) are especially

efficient in the case q = 2, since the optimal SVD or ACA

recompression in the H- and HKT-formats can be applied.

(c) Newton’s method to calculate sign(A). The iteration

X0 = A, Xk+1 =12(Xk + X−1

k ) (80)

converges (locally quadratically) to sign(A). This method is

proved to be efficient in the H-matrix arithmetics.

Page 89: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Examples of matrix-valued functions B. Khoromskij, Leipzig 2005(L7) 176

Ex. 7.8. The matrix exponential can be defined and then

calculated by

exp(A) :=∞∑

k=0

1k!

Ak ≈ EN :=N−1∑k=0

1k!

Ak. (81)

This approximation converges exponentially (if N is large

enough, say, N ≥ e||A||),

||EN − exp(A)|| ≤∞∑

k=N

1k!||A||k ≤ C(||A||)

N !≈(

e||A||)N

)N

.

The Horner scheme to calculate (81) requires only N − 1matrix multiplications

AN := I; for k = N − 1 downto 1 do Ak :=1k

Ak+1A + I,

such that EN := A0.

Examples of matrix-valued functions B. Khoromskij, Leipzig 2005(L7) 177

If ||A|| > 1 the algorithm (81) may produce very large terms

for intermediate values of N !

Recal that for commutative matrices A, B we have

exp(A + B) = exp(A) exp(B), in particular exp(A) = [exp(A/2)]2.

Now, the algorithm (81) can be modified as follows:

(a) Choose n such that 12n ‖A‖ ≤ 1.

(b) Compute B = exp(A/2n) by algorithm (81).

(c) Compute exp(A) = B2n

in n ≈ log2(‖A‖) matrix quadrations.

If B = exp(A/2n) can be represented in certain data-sparse

format (e.g., H-matrix or Kronecker product form) then

truncating all the intermediate products B2m

, m = 1, ..., n, into

the fixed format leads to the desired representation of exp(A).In this case, the truncation error analysis is still an open

question.

Page 90: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Analysis of iterative schemes B. Khoromskij, Leipzig 2005(L7) 178

Newton-Schulz iteration (78) to compute A−1.

Denote the residual error by Ek = I −AXk, k = 0, 1, 2, . . .. It is

easy to see that

Xk+1 = Xk(I + Ek), k = 0, 1, 2, . . . ,

which implies (for k = 1, 2, . . .)

Ek = I−AXk−1(I+Ek−1) = I−(I−Ek−1)(I +Ek−1) = E2k−1. (82)

Applying (82) recursively, we find that

Ek = E2k

0 , k = 1, 2, . . . . (83)

It is also clear that

A−1 −Xk = A−1Ek = A−1E2k

0 = X0(I −E0)−1E2k

0 .

Analysis of iterative schemes B. Khoromskij, Leipzig 2005(L7) 179

Under the assumption on the spectral radius of E0,

ρ ≡ ρ[E0] = maxj|λj | < 1,

where λj = λj(E0) are the eigenvalues of E0, we obtain that

the error Ek in (83) vanishes like ρ2k

.

Rem. 7.1. The iteration (78) can be applied to any

preconditioned matrix B = R0A, where R0 is a spectrally

equivalent preconditioner to A so that σ(B) is uniformly

bounded in n. Assuming that both R0 and R0A already have

the H-matrix representation, we then obtain the approximate

inverse of interest from

A−1 = (R0A)−1R0.

In some cases this approach provides the constructive proof

for the existence of the H-matrix inverse.

Page 91: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Analysis of iterative schemes B. Khoromskij, Leipzig 2005(L7) 180

Let E0 = I −BX0. The requirement ρ[E0] < 1 can be achieved

under the following conditions.

Lem. 7.1. Let B have real eigenvalues in the interval

0 < m ≤ λj ≤ M , j = 1, 2, . . . , n. Let X0(w) = wI, then ρ[E0] < 1for all w ∈ (0, 2

M ). Moreover, if ρ(w) = ρ[E0(w)], then there

holds

ρ(w∗) = minw∈(0, 2

M )ρ(w) =

M −m

M + m< 1, w∗ =

2M + m

. (84)

Proof. This lemma is a reformulation of a standard

convergence result for the Richardson iteration.

Implementing (78) in the formatted H-matrix arithmetics one

can compute the H-matrix approximation Xk to A−1 with

O(log log ε−1) iterations, where ‖I −AXk‖ ≤ ε.

Analysis of iterative schemes B. Khoromskij, Leipzig 2005(L7) 181

Newton-Schulz iteration (274) to compute sign(A).

Diagonalisable case. Let T be the unitary transform that

diagonalises A, i.e., A = T DT with di ∈ [−1, 1], then it also

diagonalises all Sk, k = 1, 2, .... Hence we have to show that

the scalar iteration

xk+1 = f(xk), with x0 ∈ [−1, 0) ∪ (0, 1]

and with f(x) := x + 12x(1− x2) ≡ xg(x), converges to sign(x0)

quadratically.

Clearly, f(x), x ∈ [−1, 1], is increasing and has the fixed points

x = −1, 0, 1. Since on the interval (−1, 1) we have g(x) > 1, it

implies 0 < xk < xk+1 ≤ 1 if x0 ∈ (0, 1] and −1 ≤ xk+1 < xk < 0 if

x0 ∈ [−1, 0).

Hence, both x = −1 and x = 1 are stable fixed points.

Page 92: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Analysis of iterative schemes B. Khoromskij, Leipzig 2005(L7) 182

For example, consider the case with small initial guess x0 > 0.For x ∈ [−1/2, 1/2], we have g(x) ≥ q > 1 with q = 1 + 3/8, thus

the number of iterations xk+1 = xkg(xk) to achieve the value,

say, xk = 0.5 starting from x0 > 0 is about O(logq x0).

For xk ≥ 1/2, we enter the regime with quadratic

convergence. In fact, we just have

1− xk+1 =12(1− xk)2(xk + 2),

which implies |1− xk+1| ≤ 32 (1− xk)2. In this stage, to achieve

precision ε > 0 one requires O(log2 log2 ε−1) iterations.

For the initial guess we actually have x0 = cond(A)−1, which

implies that the total number of iterations is bounded by

O(log2 log2 ε−1) + O(logq cond(A)).

Analysis of iterative schemes B. Khoromskij, Leipzig 2005(L7) 183

Note that iteration (274) can be written as Xk = Φ(Xk−1)with Φ(X) := X + 1

2

(I −X2

)X (see Lect. 8). Clearly, (274)

ensures that all Xk (k = 1, 2, ...) are simultaniously diagonalised

by the same matrix T , hence we have (with B = sign(A)):

Φ(X)−B = X −B +12(B2 −X2)X

=12(X −B)(B(B −X) + (B −X)(B + X)

= −(X −B)2(B +12X). (85)

The analysis for algorithm (80) in the diagonalisable case is

reduced to that one for the Newton meth. applied to the eq.

Ψ(x) := x2 − 1 = 0,

that is xk+1 = 12 (xk + 1

xk).

Page 93: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Analysis of iterative schemes B. Khoromskij, Leipzig 2005(L7) 184

The iterative calculation may be not very simple !

Newton iteration to compute the square root A1/2 of the

symmetric positive definite matrix A: Given X0, the iteration

Xk∆k + ∆kXk = A−X2k , (86)

where ∆k = Xk+1 −Xk, converges to A1/2 quadratically

(locally). It requires solving matrix Lyapunov equation.

This scheme can be consider as the Newton iteration to solve

the nonlinear matrix equation

Ψ(X) := A−1 −X2 = 0.

Clearly,

Ψ(X + δ)−Ψ(X) = −X∆−∆X,

so our iteration can be interpreted as the Newton method for

solving Ψ(X) = 0 (see Lect. 8 for the analysis of truncated

iterations).

Analysis of iterative schemes B. Khoromskij, Leipzig 2005(L7) 185

Iteration (86) can be written as Xk = Φk(Xk−1) corresponding

to the choice

Φk(X) := Φ(X),

where Φ(X) solves the matrix equation

X(Φ(X)−X) + (Φ(X)−X)X = A−X2.

Simple calculation shows that the latter equation implies

(with the substitution A = B2)

X(Φ(X)−B) + XB −X2 + (Φ(X)−B)X + BX −X2 = B2 −X2,

which leads to the matrix Lyapunov equation with respect to

Y = Φ(X)−B,

XY + Y X = (B −X)2.

Page 94: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Analysis of iterative schemes B. Khoromskij, Leipzig 2005(L7) 186

Making use of the solution operator for the Lyapunov

equation (assume that X = X > 0), we arrive at the norm

estimate

‖Φ(X)−B‖ ≤∥∥∥∥∫ ∞

0

e−tX(B −X)2e−tXdt

∥∥∥∥ ≤ C‖B −X‖2.

This proves relation (3) in Lem. 8.1 with α = 2. Hence, Thm.

8.1 ensure the convergence of the truncated version of the

nonlinear iteration (86).

Note that the simpler iteration

X0 = a0A, Xk := Xk−1−12(Xk−1−X−1

k−1A) (k = 1, 2, . . .) , (87)

where a0 > 0 is the given constant, does not guarantee, in

general, the convergence of truncated iterations.

Truncated Newton iteration to compute A−1 B. Khoromskij, Leipzig 2005(L7) 187

We analyse the case of second order tensors (q = 2)

Ar =r∑

k=1

Uk ⊗ Vk, Uk ∈ Rm×m, Vk ∈ R

n×n.

Recall that for a matrix A ∈ Rm×n we use the vector

representation A → vec(A) ∈ Rmn, where vec(A) is an nm× 1vector obtained by “stacking” A’s columns

vec(A) := [a11, ..., an1, a12, ..., anm]T ,

so, vec(A) is a rearranged version of A. Introduce the linear

invertible operator L : Rmn×mn → Rm2×n2by

L(Ar) ≡ Ar :=r∑

k=1

vec(Vk)⊗ vec(Uk)T .

L is unitary with respect to the spectral or Frobenius norm,

but there is no permutation matrix P with Ar = PArPT .

Page 95: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Truncated Newton iteration to compute A−1 B. Khoromskij, Leipzig 2005(L7) 188

Making use of the transform L allows to reduce the low

Kronecker rank approximation of A to those for the low-rank

approximation to A. For fixed r one may apply truncation

operator R of the form

R(A) := L−1(Πr(L(A))),

where Πr(A) is the best rank-r approximation to A in the

given norm (say, spectral or Frobenius norm).

We formulate the general statement. Let B = F(A) be

defined by the given matrix-valued function F and let R be

the truncation operator that satisfies

‖X −RX‖ ≤ CR‖X −B‖ (88)

for all X in the “small” neighbourhood S(B) of B.

In particular, we consider F(A) = A−1, F(A) =√

A and

F(A) = sign(A).

Truncated Newton iteration to compute A−1 B. Khoromskij, Leipzig 2005(L7) 189

Consider the case (78). Introduce the modified (truncated)

Newton-Schultz iteration

Zk+1 = Xk(2I −AXk), Xk+1 = R(Zk+1), k = 1, 2, ... (89)

Thm. 7.1. Let (88) be satisfied. Then for any initial guess

X0 = R(X0) ∈ S(B), the truncated Newton-Schultz iteration

(89) converges to A−1 quadratically

||A−1 −Xk|| ≤ (1 + CR)||A|| ||A−1 −Xk||2, k = 1, 2, ...

Proof. Note that (88) leads to

B ≡ A−1 = R(A−1).

Now equation (89) implies

A−1 − Zk+1 = (A−1 −Xk)A(A−1 −Xk) which yields

||A−1 − Zk+1|| ≤ ||A|| ||A−1 −Xk||2. (90)

Page 96: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Truncated Newton iteration to compute A−1 B. Khoromskij, Leipzig 2005(L7) 190

On the other hand, (88) implies

||Xk − Zk|| = ||R(Zk)− Zk|| ≤ CR||A−1 − Zk||,

hence the triangle inequality leads to

||A−1 −Xk|| ≤ ||A−1 − Zk||+ ||Zk −Xk|| ≤ (1 + CR)||A−1 − Zk||

Combinig this bound with (90) completes the proof.

Let us check (88) for the choice R(A) = L−1(Πr(L(A))). We

denote Y = L(X) and YB = L(B) and note that B = R(B)yields ΠrYB = YB.

In the following proof we make use of the standard stability

estimates for the singular values of the perturbed matrix

(Wielandt, Hoffman ’55).

Now we estimate in the Frobenis norm

Truncated Newton iteration to compute A−1 B. Khoromskij, Leipzig 2005(L7) 191

‖L−1‖−1‖X −RX‖ ≤ ‖(I −Πr)Y ‖

=

√√√√ n∑k=r+1

σk(Y )2

=

√√√√ n∑k=r+1

(σk(Y )− σk(YB))2

≤n∑

k=r+1

|σk(Y )− σk(YB)|

≤n−r+1∑

k=1

σk(Y − YB)

≤ √n− r||L(X −B)||.

Estimate (88) now follows with CR =√

n− r‖L−1‖‖L‖.

Page 97: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Few remarks B. Khoromskij, Leipzig 2005(L7) 192

1. Similar result holds in the spectral norm. The factor√n− r can be omitted due to the Mirsky theorem.

2. The error estimate above allows the straightforward local

analysis for algorithm (86) with the truncation operator R.

3. The truncated Newton-Schulz iterations (89) and (86) can

be analysed in the H-matrix format as well using the similar

techniques (but applied block-wise).

4. In the case of three (or more) factors (q ≥ 3) we can

analyse the sub-optimal truncation operator R via Tucker’s

decomposition.

Literature to Lecture 7 B. Khoromskij, Leipzig 2005(L7) 193

1. W. Hackbusch and B.N. Khoromskij: Low-Rank Kronecker-Product Approximation to Multi-Dimensional

Nonlocal Operators . Parts I/II. Preprints 29/30, MPI MIS, Leipzig 2005.

2. W. Hackbusch, B.N. Khoromskij and E. Tyrtyshnikov: Approximate Iteration for Structured Matrices.

Preprint MPI MIS 2005.

URL: http://personal-homepages.mis.mpg.de/bokh

http://www.mis.mpg.de/scicomp/Fulltext/khor7.ps

Page 98: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Lect. 8 Truncated iterations. Approximating a matrix exp(A) B. Khoromskij, Leipzig 2005(L8) 194

Let V be a normed space (e.g., n× n matrices) and consider

a function f : V → V. Assume that A ∈ V and B := f(A) can be

obtained by the locally convergent fixed-point iterations

Given X0 ∈ V, Xk = Φ(Xk−1), k = 1, 2, ... , (91)

where Φ : V → V is a one-step operator,

limk→∞

Xk = B = Φ(B). (92)

Lem. 8.1. Assume that there are constants cΦ, εΦ > 0 s.t.

‖Φ(X)−B‖ ≤ cΦ ‖X −B‖2 ∀ X with ‖X −B‖ ≤ εΦ, (93)

and set ε := min (εΦ, 1/cΦ). Then (92) holds for any X0

satisfying ||X0 −B|| < ε, and, moreover,

‖Xk −B‖ ≤ c−1Φ (cΦ ‖X0 −B‖ )2

k

(k = 0, 1, 2, . . .) . (94)

Truncated iterations. B. Khoromskij, Leipzig 2005(L8) 195

Proof: Let ek := ‖Xk −B‖. Then, due to (93),

ek ≤ cΦe2k−1, provided that ek−1 ≤ εΦ. (95)

Since (95), ek−1 ≤ ε ≤ εΦ imply ek ≤ cΦε2 = ε (cΦε) ≤ ε. Hence,

all iterates stay in the ε-neighbourhood of B.

(94) is proved by induction:

ek ≤(95)

cΦe2k−1 =

induct. hypoth.cΦ ·

(c−1Φ (cΦe0)

2k−1)2

=c−1Φ (cΦe0)

2k

.

Whenever e0 < ε, (94) shows ek → 0.

Rem. 8.1. (94) together with e0 ≤ ε implies monotonicity:

‖Xk −B‖ ≤ ‖Xk−1 −B‖ . (96)

Rem. 8.2. Condition (93) is valid for the Newton iteration.

Page 99: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Truncated iterations. B. Khoromskij, Leipzig 2005(L8) 196

Let S ⊂ V be a subset (not necessarily a subspace) considered

as a class of certain structured elements (e.g. structured

matrices) and suppose that R : V → S is an operator mapping

elements from V onto suitable structured approximants in S.

We call R a truncation operator.

Define a truncated iterative process as follows:

Y0 := R(X0), Yk := R(Φ(Yk−1)) (k = 1, 2 . . . .) . (97)

Thm. 8.1. Under the premises of Lem. 8.1, assume that

‖X −R(X)‖ ≤ cR ‖X −B‖ ∀ X with ‖X −B‖ ≤ εΦ. (98)

Then there exists δ > 0 such that the truncated iteration

(97) converges to B so that for k = 1, 2, . . .

‖Yk −B‖ ≤ cRΦ ‖Yk−1 −B‖2 with cRΦ := (cR + 1)cΦ (99)

for any starting value Y0 = R(Y0) satisfying ‖Y0 −B‖ < δ.

Truncated iterations. B. Khoromskij, Leipzig 2005(L8) 197

Proof: Let ε := min (εΦ, 1/cΦ) and define Zk = Φ(Yk−1). By

(96) we have

‖Zk −B‖ ≤ ‖Yk−1 −B‖ ,

provided that ‖Yk−1 −B‖ ≤ ε. Then

‖Yk −B‖ = ‖R(Zk)− Zk + Zk −B‖ ≤ (cR + 1) ‖Zk −B‖ . (100a)

Assuming ‖Yk−1 −B‖ ≤ ε, the bounds ε ≤ εΦ and (93) ensure

‖Zk −B‖ = ‖Φk(Yk−1)−B‖ ≤ cΦ ‖Yk−1 −B‖2 . (100b)

Combining (100a) and (100b), we obtain (99) for any k,

provided that ‖Yk−1 −B‖ ≤ ε.

Similar to the proof of Lem. 8.1, the choice δ := min (ε, 1/cRΦ)guarantees that ‖Y0 −B‖ ≤ δ implies ‖Yk −B‖ ≤ δ ≤ ε, k ∈ N.

Page 100: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Truncated iterations. B. Khoromskij, Leipzig 2005(L8) 198

Cor. 8.1. Under the assumptions of Thm. 8.1, any starting

value Y0 with ‖Y0 −B‖ ≤ δ leads to

‖Yk −B‖ ≤ c−1RΦ (cRΦ ‖Y0 −B‖)2k

(k = 1, 2, . . .) , (101)

where cRΦ and δ are defined as above.

The condition (98) has a clear geometrical meaning. If

R(X) := argmin ‖X − Y ‖ : Y ∈ S

is the best approximation to X in the given norm, inequality

(98) holds with cR = 1, since B ∈ S. Therefore, (98) with

cR ≥ 1 can be viewed as a quasi-optimality condition.

If the norm is defined by a scalar product, S is a subspace

and R(X) is the orthogonal projection onto S, then (98) is

obviously fulfilled with cR = 1.

Analysis of truncation operators B. Khoromskij, Leipzig 2005(L8) 199

The next lemma is easy to prove.

Lem. 8.2. Let B = R(B) be fixed and assume that R is

Lipschitz at B or R is a bounded linear operator. Then the

inequality (98) holds.

Let V = RI×I be the space of matrices and S ⊂ V a subspace

with a prescribed sparsity pattern P ⊂ I × I, i.e., X ∈ S if and

only if Xij = 0 for all (i, j) /∈ P. A familiar example of a

truncation in this case is R(X) defined entry-wise by

R(X)ij =

⎧⎨⎩ Xij for (i, j) /∈ P,

0 for (i, j) ∈ P.(102)

Since R is linear, it satisfies the hypotheses of Lem. 8.2.

Page 101: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Analysis of truncation operators B. Khoromskij, Leipzig 2005(L8) 200

Rem. 8.3. Usually, the subset S as above is not helpful since

sparse argument A ∈ S yields fully populated result f(A).

However, it is well-known that after a DWT

X → L(X) :=W−1XWone can apply a matrix compression.

Figure 15: Wavelet transform of a matrix: “fingrer”-like structure.

Analysis of truncation operators B. Khoromskij, Leipzig 2005(L8) 201

Such a matrix compression is of the form (102) and will be

denoted by Π. Then, the truncation R applied to X is the

composition of the DWT L, the pattern projection Π and the

back-transformation L−1:

R := L−1 Π L. (103)

The same form of R is typical as well for many other choices

of L and Π.

Next, we give the characterization of Π that ensures the

property (98) for R.

Page 102: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Analysis of truncation operators B. Khoromskij, Leipzig 2005(L8) 202

Lem. 8.3. Let V and W be normed spaces and L : V → W a

bounded linear operator with a bounded inverse. Given

B ∈ V , assume that Π : W → W satisfies

‖Z −Π(Z)‖ ≤ cΠ ‖Z − L(B)‖ ∀ Z ∈ W (104)

with∥∥L−1(Z)−B

∥∥ ≤ εΦ. Then the truncation operator R of

the form (103) satisfies condition (98) with cR := cΠ ‖L‖ ‖L−1‖.Proof: Let Z = L(X). Then, obviously,

‖R(X)−X‖ =∥∥L−1(Π(Z)− Z)

∥∥ ≤ cΠ‖L−1‖ ‖Z − L(B)‖ ,

and it remains to observe that

‖Z − L(B)‖ = ‖L(X)− L(B)‖ ≤ ‖L‖ ‖X −B‖ .

Analysis of truncation operators B. Khoromskij, Leipzig 2005(L8) 203

Applications of Lem. 8.3 (in the case of H-matrices) are

facilitated by the following construction. Define a suitable

system of normed spaces W1, . . . , WN and set

W := W1 × . . .×WN = H = (H1, . . . , HN ) : Hi ∈ Wi (105)

with ‖H‖ =√∑N

i=1 ‖Hi‖2.Let each Wi be associated with a truncation oper.

Πi : Wi → Wi satisfying (for some fixed Zi ∈Wi)

‖Hi −Π(Hi)‖ ≤ ci ‖Hi − Zi‖ ∀ Hi ∈ Wi and 1 ≤ i ≤ N. (106)

Page 103: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Analysis of truncation operators B. Khoromskij, Leipzig 2005(L8) 204

Lem. 8.4. Let W be the normed space from (105) and let

the truncation operators Πi satisfy (106), where the elements

Zi ∈Wi are defined by

L(B) = (Z1, . . . , ZN ).

Suppose that the product of the truncation operators Πi

defines Π : W → W via

Π(H) := (Π1(H1), . . . , ΠN (HN )) for H = (H1, . . . , HN ), Hi ∈ Wi.

Then R from (103) satisfies (98).

Proof: Let L(X) = H = (H1, . . . , HN ). Then, according to the

definitions of L and Π,

‖H −Π(H)‖ ≤√∑N

i=1c2i ‖Hi − Zi‖2 ≤ max

1≤i≤Nci

√∑N

i=1‖Hi − Zi‖2,

which proves (104) and allows us to use Lem. 8.3.

Analysis of truncation operators B. Khoromskij, Leipzig 2005(L8) 205

An important example of Π in the case of a matrix space W

is given by optimal low-rank approximations.

Lem. 8.5. Let W be a normed space of all matrices of a

fixed size and let S ⊂ W consist of all matrices whose rank

does not exceed r. Then for any H ∈W there exists a matrix

T ∈ S such that

‖H − T‖ = minrank Z≤r

‖H − Z‖ .

Proof: Consider a minimising sequence Zk ∈ S, i.e.,

limk→∞

‖H − Zk‖ = ρ := infrank Z≤r

‖H − Z‖ . Since the sequence Zk is

bounded, a convergent subsequence Zki → T exists. Its limit

satisfies ‖H − T‖ = ρ. The assertion T ∈ S is due to the fact that a

matrix of rank equal to p > r possesses a vicinity wherein any matrix is of

rank ≥ p (use the continuity of the determinant and the existence of a

nonzero minor of order p for a matrix of rank p).

Page 104: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Analysis of truncation operators B. Khoromskij, Leipzig 2005(L8) 206

Matrix theory provides well-developed tools for the construction of

low-rank approximations in the case of any unitarily invariant norm.

For most familiar unitarily invariant norms such as thespectral and the Frobenius norm, it can be establishedthrough simple arguments: It is well-known that

minrank Z≤r

‖H − Z‖2 = σr+1(H), minrank Z≤r

‖H − Z‖F =

s Xi≥r+1

σ2i (H).

Thus, the truncation property (98) is easy to achieve (with

cR = 1) when we are aware of the existence of the best

approximation element.

Sometimes (e.g., for three-way approximations of bounded tensor rank)

this is not the case. However, all cases are supported by extension of

Thm. 8.1 as we can always capitalise on a quasi-optimal construction:

Let ρ(H) = infT∈S

‖H − T‖. For a given fixed ε > 0, we can adapt an

ε-optimal approximation Π(H) to H in the sense that

ρ(H) ≤ ‖H − Π(H)‖ ≤ ρ(H) + ε.

Application to hierarchical block matrices B. Khoromskij, Leipzig 2005(L8) 207

Let V = Rn×n be the space of n× n matrices, and consider

each matrix as a union of N disjoint blocks of possibly

different sizes, where each matrix block belongs the matrix

space Wi (1 ≤ i ≤ N). Given X ∈ V , let Li(X) ∈Wi be the ith

block of X and define the space W according to (105).

Figure 16: Standard- (left) and Weak-admissible H-partitionings.

Page 105: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Application to hierarchical block matrices B. Khoromskij, Leipzig 2005(L8) 208

The above-considered operator L : V → W reads

L(X) := (L1(X), . . . , LN (X)) (block-tracing operator).

If the Frobenius norm is used on the spaces V and

W1, . . . , WN , the norm induced on W is again the Frobenius

norm. Since the blocks are disjoint, L is isometrical. Hence,

the inverse L−1 exists and satisfies

‖L‖ = ‖L−1‖ = 1.

Fix a positive integer r and let Si ⊂ Wi be the subset of

matrices of rank ≤ r. Define S as the Cartesian product

S = S1 × . . .× SN ⊂ W.

Application to hierarchical block matrices B. Khoromskij, Leipzig 2005(L8) 209

Let H = Q1Σ(H)Q2 be the SVD of H (with unitary Q1 and

Q2) and let Σr(H) be the corresponding r-term truncation.

Besides, let Πi : Wi → Si be of the form

Π(H) := Q1Σr(H)Q2, (107)

providing the best possible approximant to H in the set S of

matrices of rank ≤ r, in the Frobenius norm. This involves

the SVD of the matrix block Wi. Defining Π : W → S as in

Lem. 8.4 and using Lem. 8.5, we can apply Thm. 8.1 to

R = L−1 Π L.

Note that exactly this kind of truncation is used in the theory

of H-matrices. The typical block partitioning in the

construction of hierarchical matrices is presented in Fig. 16.

Page 106: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Application to tensor approximations B. Khoromskij, Leipzig 2005(L8) 210

Let V1 = Rp×q and V2 = Rr×s, while V = Rpr×qs for some

p, q, r, s ∈ N. The Kronecker product is a mapping from V1 ⊗ V2

into V : for A ∈ V1 and B ∈ V2, the Kronecker product A⊗B is

given by the block matrix

⎡⎢⎢⎢⎣a11B a21B . . .

a12B a22B . . ....

.... . .

⎤⎥⎥⎥⎦ ∈ V .

We say that a matrix M ∈ V has a Kronecker rank ≤ k, if

there is a representation

M =∑

ν=1

Aν ×Bν with Aν ∈ V1, Bν ∈ V2 and ≤ k. (108)

We define the subset of structured matrices S by the set of

all matrices of Kronecker rank ≤ k. If k is not too large, this

is an interesting representation since matrices of the large

size pr × qs can be described by matrices Aν , Bν of small size.

Application to tensor approximations B. Khoromskij, Leipzig 2005(L8) 211

As described in Lect. 6 (cf. operation vec(A)), there is a

simple isomorphism L from V = Rpr×qs to Rpq×rs such that the

representation (108) of M ∈ S ⊂ V = Rpr×qs is equivalent to

rank(L(M)) ≤ k. Hence, we obtain the situation of Lem. 8.5

with W := L(V ) = Rpq×rs.

The truncation operator is again of the form R = L−1 Π L,

where Π : W → W is the optimal SVD-based truncation or an

appropriate substitute.

Our framework can be applied also to tensor (multi-linear)

representation (108) where the number of factors is greater

than 2. In this case the truncation procedures are not so well

developed; however, some are available and claimed to be

efficient in particular applications (mostly for data analysis in

chemometrics, physicometrics, etc.)

Page 107: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Application to tensor approximations B. Khoromskij, Leipzig 2005(L8) 212

Summerize (analysis of the truncated iterations):

Initially, the main purpose of this truncation was the

reduction of storage and of the matrix-by-vector complexity

for a given matrix in V .

In the sequel, the same truncation is used for computing

various matrix functions f(A) of A ∈ S ⊂ V, where B := f(A) is

known to be close to S (e.g., for f(A) = A−1, f(A) =√

A and

for f(A) = sign(A)).

The above results suggest some general framework for a

rigorous analysis of the basic truncated iterative algorithms.

Finally we remark that the optimal truncation is often

replaced by an approximate one which is cheaper to compute

(e.g., by cross approximation techniques (ACA), multi-way

decomposition algorithms, wavelet truncation).

Approximating matrix-valued function exp(A) and A−1 B. Khoromskij, Leipzig 2005(L8) 213

The elliptic operator A : V → V ′ with V = H10 (Ω), V ′ = H−1(Ω),

A =d∑

j=1

− ∂

∂xjaj(xj)

∂xj+ bj(xj)

∂xj+ cj(xj)

,

is supposed to have “separable” coefficients. The associated

bilinear form (with c(x) =∑

cj(xj))

a(u, v) =∫

Ω

d∑j=1

aj(x)∂u

∂xj

∂v

∂xj+

d∑j=1

bj(x)∂u

∂xjv + c(x)uv

dx

with a : V × V → R is assumed to be continuous and V -elliptic:

|a(u, v)| ≤ C‖u‖V ‖v‖V , e a(v, v) ≥ δ0‖v‖2V , δ0 > 0.

In tensor-product setting we have (x1, ..., xd) ∈ Ω := (0, 1)d ∈ Rd.

Page 108: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Approximating matrix-valued function exp(A) and A−1 B. Khoromskij, Leipzig 2005(L8) 214

Let X = L2(Ω), then the corresponding elliptic operator A and

its discrete counterpart A (say, A is the FEM/FD stiffness

matrix corresponding to A) satisfy

‖(zI −A)−1‖X←X ≤ 1|z| sin(θ1 − θ)

∀ z ∈ C : θ1 ≤ | arg z| ≤ π,

(109)

for any θ1 ∈ (θ, π), where cos θ = δ0/C.

In the case of discrete elliptic operators A, the bound (109)

on the matrix resolvent is valid uniformly in the mesh-size h

(cf. example below).

The H-matrix and KHT formats are well suited to represent

the following MVFs:

exp(−tA), A−1,√

A, sign(A).

Approximating matrix-valued function exp(A) and A−1 B. Khoromskij, Leipzig 2005(L8) 215

Ex. 8.1. Consider the elliptic operator of divergent type,

A := −d∑

j=1

∂jaj(xj)∂j , x ∈ Ω := (0, 1)d,

defined on V . We assume that aj ≥ a0 > 0 and introduce a

uniform grid with step size h and N = nd interior nodes. Using

the (2d + 1)-point stencil, we obtain the FD discretisation

Ahz := −d∑

j=1

2ajij

zi1...id− bj

ij−1zi1...(ij−1)...id− cj

ij+1zi1...(ij+1)...id

h2,

1 ≤ ij ≤ n, where z denotes the vector corresponding to

[zi1...id]nij=1 ∈ RN given in the tensor-product numbering.

As usual, we can regard d-dimensional n× . . .× n arrays

(tensors) also as one-dimensional ones (vectors) with nd

components.

Page 109: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Approximating matrix-valued function exp(A) and A−1 B. Khoromskij, Leipzig 2005(L8) 216

The matrix A = Ah in (213) takes the form A =d∑

j=1

Aj with

A1 = V 1⊗I⊗. . .⊗I, A2 = I⊗V 2⊗. . .⊗I, . . . , Ad = I⊗. . .⊗I⊗V d,

V j =1h2

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣

2aj1 −cj

1

−bj2 2aj

2 −cj2

. . .. . .

. . .

−bjn−1 2aj

n−1 −cjn−1

−bjn 2aj

n

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦n×n

,

and I being the n× n identity. It is easy to see that Aj > 0 for all

j = 1, . . . , d. Moreover, Aj commute pairwise, i.e., AjAm = AmAj, hence

(cf. Thm. 5.3 in Lect. 5)

exp(A) =dY

j=1

exp(Aj) =dO

j=1

exp(V j). (110)

Approximating matrix-valued function exp(A) and A−1 B. Khoromskij, Leipzig 2005(L8) 217

Ex. 8.2. In the situation of Example 8.1, we consider an

application to parabolic problems in Rd posed in the

semi-discrete form. Using the semigroup theory, the solution

of the first order evolution equation

du

dt+ Au = f, u(0) = u0 ∈ R

N ,

with a given initial vector u0 and with a given right-hand side

f ∈ L2(QT ), QT := (0, T )× RN , can be represented as

u(t) = exp(−tA)u0 +

t∫0

exp(−(t− s)A)f(s)ds, t ∈ (0, T ].

Assume that our input data can be represented in the

tensor-product form

Page 110: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Approximating matrix-valued function exp(A) and A−1 B. Khoromskij, Leipzig 2005(L8) 218

u0 ≈r∑

k=1

uk1(x1)⊗ . . .⊗ uk

d(xd),

f(s) ≈r∑

k=1

fk1 (s; x1)⊗ . . .⊗ fk

d (s; xd)

with uki , fk

i ∈ Rn, i = 1, ..., d, and with r = O(| log ε|q). Then we

obtain the tensor-product approximation u(t) ≈ u(t) by

u(t) =r∑

k=1

⎧⎨⎩d⊗

j=1

exp(−tV j)ukj (xj) +

d⊗j=1

t∫0

exp((s− t)V j)fkj (s; xj)ds

⎫⎬⎭ ,

which can be implemented with complexity O(rdn logp n).

Probl. 1. Represent A−1 in the HKT -format.

Probl. 2. Approximate sign(A) in the HKT -format.

Approximating matrix-valued functions by exponential sums B. Khoromskij, Leipzig 2005(L8) 219

Assume that for given f(ρ), ρ ∈ [1, R], there is an accurate

r-term approximation fr(ρ) by exponential sums

|f(ρ)− fr(ρ)| ≤ εR, ρ ∈ [1, R] (111)

with fr(ρ) :=r∑

k=1

ake−bkρ. The question is how accurate does

the ansatz fr(A) represent the matrix-valued function f(A)?

We consider two cases

(A) Real-diagonalisable matrix A, i.e., A = T−1DT with a

diagonal D = diagd1, ..., dn, where di ∈ [1, R].

(B) There is the Dunford-Cauchy integral representation for

the analytic function f :

f(A) =1

2πi

∫Γ

f(z)(zI −A)−1dz.

Page 111: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Approximating matrix-valued functions by exponential sums B. Khoromskij, Leipzig 2005(L8) 220

Lem. 8.6. In Case (A) we have

‖f(A)− fr(A)‖ ≤ ‖T‖ ‖T−1‖ εR.

In Case (B) let (112) hold with εR = g(z)εΓ, at least for ρ = z

such that z ∈ Γ. Then we have

‖f(A)− fr(A)‖ ≤ εΓ

2πmaxz∈Γ

|g(z)|∫

Γ

∥∥(zI −A)−1∥∥ d |z|.

In the case of discrete elliptic operator A, we have∫Γ

∥∥(zI −A)−1∥∥ d |z| ≤ C

∫Γ

d |z||z| ,

where the constant depends on the coefficients of the related

operator A and Γ contains σ(A).

Approximating matrix-valued functions by exponential sums B. Khoromskij, Leipzig 2005(L8) 221

Proof: In the first case we readily obtain

‖f(A)− fr(A)‖ = ‖T−1 diagf1, ..., fnT‖

with fi = f(di)− fr(di), which proves the statement. If T is

the unitary transform then ‖T‖ = ‖T−1‖ = 1.

In the cesond case we obtain

‖f(A)− fr(A)‖ =12π

∥∥∥∥∥∫

Γ

[f(z)−r∑

k=1

ake−bkz](zI −A)−1dz

∥∥∥∥∥≤ εΓ

∫Γ

|g(z)| ∥∥(zI −A)−1∥∥ d |z|,

which proves the main assertion. Finally, in the case of

discrete elliptic operators we apply the resolvent estimate,∥∥(zI −A)−1∥∥ ≤ C

|z| .

Page 112: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Literature to Lecture 8 B. Khoromskij, Leipzig 2005(L8) 222

1. W. Hackbusch and B.N. Khoromskij: Low-Rank Kronecker-Product Approximation to Multi-Dimensional

Nonlocal Operators . Parts I/II. Preprints 29/30, MPI MIS, Leipzig 2005.

2. W. Hackbusch, B.N. Khoromskij and E. Tyrtyshnikov: Approximate Iterations for Structured Matrices.

Preprint MPI MIS, Leipzig 2005.

URL: http://personal-homepages.mis.mpg.de/bokh

http://www.mis.mpg.de/scicomp/Fulltext/Khoromskij/khor8.ps

Lect. 9. Kronecker-prod. representation to A−1 and sign(A) B. Khoromskij, Leipzig 2005 223

Outlook

1. Solution operator exp(−tA) for the linear parabolic eq.:

– well parallelisable;

– avoids time stepping !

2. Repsenting f(A) via approximation to f(z), z ∈ C by

exponential sums∑

ake−bkz.

3. Robust and asymptotically optimal Sinc-quadrature to

represent 1/ρα, ρ ∈ [1, R], α > 0, (cf. Ex. 7.6).

4. HKT representation to f(A) = A−1 and numerics.

5. Robust and asymptotically almost optimal Sinc-quadrature

to represent sign(ρ), |ρ| > a > 0.

6. Generalised HKT representation to f(A) = sign(A).

Page 113: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

exp(−tA) as the solution operator for parabolic PDEs B. Khoromskij, Leipzig 2005(L9) 224

Ex. 9.1. In the situation of Example 8.1, we consider an

application to parabolic problems in Rd posed in the

semi-discrete form (A ∈ RN×N , f ∈ RN). The solution of the

first order evolution equation

du

dt+ Au = f, u(0) = u0 ∈ R

N ,

with a given initial vector u0 and with a given right-hand side

f ∈ L2(QT ), QT := (0, T )× RN , can be represented as

u(t) = exp(−tA)u0 +

t∫0

exp(−(t− s)A)f(s)ds, t ∈ (0, T ].

Assume that our input data can be represented in the

tensor-product form as follows

exp(−tA) as the solution operator for parabolic PDEs B. Khoromskij, Leipzig 2005(L9) 225

u0 ≈r∑

k=1

uk1(x1)⊗ . . .⊗ uk

d(xd),

f(s) ≈r∑

k=1

fk1 (s; x1)⊗ . . .⊗ fk

d (s; xd)

with uki , fk

i ∈ Rn, i = 1, ..., d, and with r = O(| log ε|q). Then we

obtain the tensor-product approximation u(t) ≈ u(t) by

u(t) :=r∑

k=1

⎧⎨⎩d⊗

j=1

exp(−tV j)ukj (xj) +

d⊗j=1

t∫0

exp((s− t)V j)fkj (s; xj)ds

⎫⎬⎭ ,

which can be implemented with complexity O(rdn logp n).

Probl. 1. Represent f(A) = A−1 in the HKT -format.

Probl. 2. Approximate f(A) = sign(A) in the HKT -format.

Page 114: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Approximating MVFs by exponential sums B. Khoromskij, Leipzig 2005(L9) 226

Assume that for given f(ρ), ρ ∈ [1, R], there is an accurate

r-term approximation fr(ρ) by exponential sums

|f(ρ)− fr(ρ)| ≤ εR, ρ ∈ [1, R] (112)

with fr(ρ) :=r∑

k=1

ake−bkρ. The question is how accurate does

the ansatz fr(A) represent the matrix-valued function f(A)?

We consider two cases

(A) Real-diagonalisable matrix A, i.e., A = T−1DT with a

diagonal D = diagd1, ..., dn, where di ∈ [1, R].

(B) The analytic function f has the Dunford-Cauchy integral

representation:

f(A) =1

2πi

∫Γ

f(z)(zI −A)−1dz,

where Γ “envelopes” σ(A).

Approximating MVFs by exponential sums B. Khoromskij, Leipzig 2005(L9) 227

Lem. 9.1. In Case (A) we have

‖f(A)− fr(A)‖ ≤ ‖T‖ ‖T−1‖ εR.

In Case (B), let (112) hold with εR = g(ρ)εΓ, at least for

ρ = z ∈ Γ. Then we have

‖f(A)− fr(A)‖ ≤ εΓ

2πmaxz∈Γ

|g(z)|∫

Γ

∥∥(zI −A)−1∥∥ d |z|.

In the case of discrete elliptic operator A, we have∫Γ

∥∥(zI −A)−1∥∥ d |z| ≤ C log

|λmax||λmin| , λmax, λmin ∈ σ(A),

where C depends on the ellipticity and continuity constants of

the related operator A.

Page 115: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Approximating MVFs by exponential sums B. Khoromskij, Leipzig 2005(L9) 228

Proof: In the first case we readily obtain

‖f(A)− fr(A)‖ = ‖T−1 diagf1, ..., fnT‖with fi = f(di)− fr(di), which proves the statement. If T is

the unitary transform then ‖T‖ = ‖T−1‖ = 1.

In Case (B), we derive

‖f(A)− fr(A)‖ =12π

∥∥∥∥∥∫

Γ

[f(z)−r∑

k=1

ake−bkz](zI −A)−1dz

∥∥∥∥∥≤ εΓ

∫Γ

|g(z)| ∥∥(zI −A)−1∥∥ d |z|,

which proves the general assertion. Finally, in the case of

discrete elliptic operators we shoose Γ in such a way that∥∥(zI −A)−1∥∥ ≤ C

|z| , (cf. Lect. 8), to obtain∫Γ

∥∥(zI −A)−1∥∥ d |z| ≤ C

∫Γ

d |z||z| .

sinc-quadrature for the Laplace integral transform B. Khoromskij, Leipzig 2005(L9) 229

The change of variables ξ = log(1 + esinh(w)) in the Laplace

integral transform

=∫ ∞

0

e−ρξdξ (ρ > 0) , (113)

leads to

=∫

R

cosh(w)F (sinh(w); ρ)dw, with F (u; ρ) :=e−ρ log(1+eu)

1 + e−u.

Lem. 9.2. Let ρ ∈ [1, R] and define the quadrature

IM := hM∑

k=−M

cosh(kh)F (sinh(kh); ρ) ≈∫

R

f2(w; ρ)dw =1ρ.

Then choosing h = log(4πM)/M , implies

‖1/ρ− IM‖L∞[1,R] ≤ Ce− π2M√

2 log(3R) log(4πM) . (114)

Page 116: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

sinc-quadrature for the Laplace integral transform B. Khoromskij, Leipzig 2005(L9) 230

Proof. Choose δ(ρ) = π2√

2 log 3ρ(does not effect quadrature!).

Then for ρ ∈ (1,∞), f2(w; ρ) = cosh(w)F (sinh(w); ρ), w ∈ R, can

be analytically extended to Dδ := z ∈ C : |m z| ≤ δ with

δ < π/2, s.t. ∫∂Dδ

|f2(z; ρ)| |dz| ≤ const <∞ (115)

independent of ρ. Hence f2 ∈ H1(Dδ), while δ ∈ (0, δ(ρ)], ρ ≥ 1ensures the finite norm N(f2, Dδ) ≤ const <∞, uniform in ρ.

The decay of f2 on the real axis is

f2(w) ≈ 12ew− ρ

2 ew

as w →∞, f2(w) ≈ 12e|w|− 1

2 e|w|as w → −∞,

corresponding to C = 12 , b = 1/2, a = 1 in Thm. 2.6.

If ρ ∈ [1, R], the choice δ = δ(R) in Thm. 2.6 implies (114)

‖1/ρ− IM‖L∞[1,R] ≤ Ce−2πδ(R)M

log(4πM) .

A HKT-representation to A−1 B. Khoromskij, Leipzig 2005(L9) 231

Rem. 9.1. Remind that the matrix exponential of a discrete

elliptic operator can be represented in the H-matrix format

with linear-logarithmic cost in view of

exp (−tA) =1

2πi

∫Γ

e−tz(zI −A)−1dz ≈∑

k

ake−tzk(zkI −A)−1.

Lem. 9.3. Suppose A = TDT−1 with e σ(D) ⊂ R>0 and let

A =∑d

j=1 Aj as above. Given M ∈ N, then there is the

HKT -approximand A−1M of the Kronecker rank r = 2M + 1,

that provides exponential convergence

‖A−1 −A−1M ‖ ≤ Ce−sM/ log(4πM), s =

π2

√2 log[3 cond(D)]

.

Proof. First, construct the sinc-quadrature fr(ρ) ≈ f(ρ) = 1/ρ(cf. Lem. 9.2) and then apply the corresponding matrix

approximant fr(A) (cf. Lem. 9.1):

Page 117: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

A HKT-representation to A−1 B. Khoromskij, Leipzig 2005(L9) 232

Choose h = C log M/M , zk = sinh(kh) and define

A−1 ≈ h

M∑k=−M

cosh(kh)F (zk; A) = h

M∑k=−M

cosh(kh)1 + e−zk

d⊗j=1

e− log(1+ezk )V j

.

Second, apply the H-matrix approx. to each individual

exponent exp(−αkV j) to obtain

A−1 ≈ hM∑

k=−M

cosh(kh)1 + e−zk

d⊗j=1

M1∑m=−M1

κm,j(zk)(ζm,jI − V j)−1=: A−1M .

(116)

Note that each sum in the tensor-product can be converted

into an H-matrix of the rank r1 ≤ (2M1 + 1)rank(ζm,jI − V j)−1

with M1 = O(| log ε|). However, since ζI − V j is a

three-diagonal matrix, the whole sum can be implemented

exactly with O(2M1n) operations.

Numerics I: 1x1+...+xd

- function generated tensor B. Khoromskij, Leipzig 2005(L9) 233

Robust exponentially convergent sinc-quadrature, ρ = x1 + ... + xd ∈ [1, R],

xi > 0,

1

ρ=

ZR

cosh(w)F (sinh(w))dw ≈ hMX

k=−M

cosh(kh)F (sinh(kh)),

F (u) = e−ρ log(1+eu)

1+e−u , M = O(log ε−1 log R), h = log MM

, r = 2M + 1.

0 200 400 600 800 1000−8

−6

−4

−2

0

2

4

6x 10

−6

0 200 400 600 800 1000−2.5

−2

−1.5

−1

−0.5

0

0.5

1x 10

−8

0 200 400 600 800 1000−1

−0.5

0

0.5

1

1.5

2

2.5

3x 10

−13

Figure 17: The absolute quadrature error for R = 103 with M = 16

(left), M = 32 (middle), M = 64 (right). Similar results are observed for

R = 32 · 103.

Page 118: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Numerics II: Elliptic inverse A−1 B. Khoromskij, Leipzig 2005(L9) 234

HKT - approximation to (−∆h)−1 in Rd

Apply Sinc-quadrature in Lem. 9.3.

Kronecker approximation to (−∆h)−1 in [0, 1]d with N = nd, n = 128

M 4 9 16 25 36 49 64

d = 3 2.410-2 3.810-2 5.610-2 9.910-5 2.610-6 8.210-10 7.010-12

d = 6 1.910-2 1.510-3 3.710-4 7.710-7 4.510-9 8.210-12 1.110-14

d = 9 3.010-3 3.010-3 1.010-5 1.610-7 1.010-9 1.410-12 1.710-15

d = 12 3.010-7 3.910-5 1.010-8 7.810-9 1.810-10 5.010-13 5.610-16

Approximation to (−∆h)−1 in [0, 1]d with d = 3, M = 25.

n 4 8 16 32 64 128

ε 2.5 10-8 7.710-8 4.2 10-8 5.7 10-7 8.5 10-6 3.5 10-6

Observations.

1. Method applies on non-uniform grids and for variable

coefficients (generalisation of FFT).

2. We ensure the complexity O(dn logq n) with fixed q ≥ 1.3. Implementation of the matrix-vector multiplication

depends on the sparsity of an argument.

HKT-approximation to sign(A): complexity bound B. Khoromskij, Leipzig 2005(L9) 235

Each term in the Kronecker-product representation

A(r) =r∑

k=1

ckV 1k × · · · × V d

k (117)

can be amplified by an extra factor Sk ∈ RN×N . Hence, we

introduce the generalised tensor-product format (GHKT)

A(r) =r∑

k=1

Sk ·(V 1

k × · · · × V dk

) ≈ A (118)

with a matrix Sk ∈ HKT (rS) with O(drSn logq n)-complexity,

where asymptotically rS " n. We denote A(r) ∈ GHKT (r, rS).

The format (118) will be applied to the MVF F (A) = sign(A).

In the following, we suppose that A = T D T−1, di ∈ R.

Page 119: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

HKT-approximation to sign(A): complexity bound B. Khoromskij, Leipzig 2005(L9) 236

Lem. 9.4. Let A ∈ RN×N be such that 0 /∈ e σ(A), and let

the function f : R → R satisfy the following assumptions:

(A1) f(t) = −f(−t), t ∈ R,

(A2) cf :=∫∞0

f(t)t dt ∈ (0,∞) exists as an improper integral.

Then we have

sign(A) =1cf

∫R+

f(tA)t

dt ≡ I(A). (119)

Proof. First we note that for a ∈ R \ 0, the assumptions

(A1)-(A2) imply (119) with A substituted by a,

sign(a) =1cf

∫R+

f(ta)t

dt. (120)

Since A = T D T−1, we obtain

f(tA) = T f(tD) T−1. (121)

HKT-approximation to sign(A): complexity bound B. Khoromskij, Leipzig 2005(L9) 237

Moreover, sign(A) = T sign(D) T−1 holds and (120) implies the

desired relation:

1cf

∫R+

f(tA)t

dt = T

(1cf

∫R+

f(tD)t

dt

)T−1 = T sign(D) T−1 = sign(A).

Choice of f(t). We consider the following examples of f

fn(t) :=jn(t)tn−1

, n = 1, 2, . . . ,

where jn(t) are the spherical Bessel functions of the first kind.

In particular, we have j0(t) = sin(t)t and

j1(t) =sin(t)− t cos(t)

t2, j2(t) =

(3t3− 1

t

)sin(t)− 3

t2cos(t).

The functions jn(z) have the asymptotical property

z−njn(z) → 11 · 3 · 5 . . . (2n− 1)

as z → 0 (n = 0, 1, 2, . . .).

Page 120: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

HKT-approximation to sign(A): complexity bound B. Khoromskij, Leipzig 2005(L9) 238

We also make use of the integral representation

jn(z) =zn

2n+1n!

∫ π

0

cos(z cos θ) sin2n+1 θ dθ (n = 0, 1, 2, . . .).

(122)

Since the matrix A is diagonalisable, the error analysis of the

quadrature rule is reduced to the scalar case (cf. Lem. 9.1).

An exponentially convergent quadrature for (120) with

f = f1(a) with a ∈ R. In general, one can expect a ∈ [1, Λ] with

1 " Λ, so we deal with the integration of a highly oscillatory

function

f1(at)/t =sin(at)− t cos(at)

t3

with a smooth weight. Hence, we have

f1(at)t

=j1(at)

t1≤ C

at2, t →∞. (123)

HKT-approximation to sign(A): complexity bound B. Khoromskij, Leipzig 2005(L9) 239

The latter implies∣∣∣∣∫ ∞

R

f1(at)t

dt

∣∣∣∣ ≤ C

aR, R > 0.

Given a tolerance ε > 0, we choose R > 0 such that R−1 = aε,

i.e., R = (aε)−1 ≤ ε−1, and then construct a quadrature on the

finite interval [0, R] (recall that a−1 ∈ [Λ−1, 1]). We can assume

without loss of generality that ε = 2−K1 , Λ = 2K0 with some

K0, K1 ∈ N, so that a−1 ∈ [2−K0 , 1].

We split [0, R] into the two parts [0, 2−K0 ] and ω := [2−K0 , R],where we set R = 2K1 .

We now decompose the integration interval ω =K1⋃

k=−K0

[bk, bk+1]

by the points bk = 2k, k = −K0, . . . , 0, . . . , K1.

Note that coefficients q1 = z−3 and q2 = z−2 can be

approximated on each interval δk = [bk, bk+1] by a polynomial

Page 121: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

HKT-approximation to sign(A): complexity bound B. Khoromskij, Leipzig 2005(L9) 240

Pp,k of degree ≤ p such that, say,

maxt∈δk

|q1(t)−Pp,k(t)| ≤ Ce−cp (k = −K0, ..., K1). (124)

Next we use the integrals∫ x

0

tm sin(at)dt = −m∑

k=0

k!(mk

)xm−k

ak+1cos

(ax +

12kπ

),

∫ x

0

tm cos(at)dt =m∑

k=0

k!(mk

) xm−k

ak+1sin

(ax +

12kπ

)to obtain the following approximation on the interval ω:

1cf

∫ω

f1(at)t

dt #K1∑

k=−K0

p∑=0

[γk(1/a) sin(ask) + µk(1/a) cos(ack)] ,

providing an exponential convergence of the order O(e−cp).

HKT-approximation to sign(A): complexity bound B. Khoromskij, Leipzig 2005(L9) 241

Due to (122), the integrand f1(az)z is an entire function and,

in particular, holomorphic in the Bernstein ellipse Eρ with

ρ > 1/(2a), corresponding to the interval [0, a−1] (cf. Lect. 4).

Furthermore, maxz∈Eρ

∣∣∣ f1(az)z

∣∣∣ can be estimated by a constant not

depending on a. Therefore, the Gauss quadrature on [0, Λ−1]has exponential convergence. This yields the approximation

sign(λ) ∼ signM (λ) :=M∑

k=1

ak(1/λ) sin(skλ)+bk(1/λ) cos(ckλ) (125)

(with ak, bk polynomials of degr. ≤ p), such that for λ ∈ [1, Λ]

| sign(λ)− signM (λ)| ≤ C(K0 + K1) e−cp

with

K1 = | log ε|, K0 = log(cond(D)), M := (K0 + K1) p. (126)

Page 122: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

HKT-approximation to sign(A): complexity bound B. Khoromskij, Leipzig 2005(L9) 242

Rem. 9.2. Matrices A−l, (l = 1, ..., p), can be repreresented by

(117) via the fixed set of tensor-skeletons

Φk := V 1k ⊗ ...⊗ V d

k , k = 1, ..., kA−1, (uniform tensor-basis). We

make use of Φk in (118).

Lem. 9.5. Let A be symmetric with minλ∈σ+(A) λ = O(1).Then, given ε > 0, the quadrature points and weights from

(125) and (126) fulfil∥∥∥∥∥sign(A)−M∑

k=1

[ak(A−1) sin(skA) + bk(A−1) cos(ckA)]

∥∥∥∥∥2

≤ C c(T )(K0 + K1)e−cp, (127)

where ak(A−1), bk(A−1) are polynomials of degree p as defined

in (124), M, K0, K1 are explained in (126) and

c(T ) = ‖T‖‖T−1‖.Proof. Since A = TDT−1, we use the representation (121),

HKT-approximation to sign(A): complexity bound B. Khoromskij, Leipzig 2005(L9) 243

where D has real entries. The estimate (123) implies that we

can restrict integration onto the interval [0, R] and derive∥∥∥∥∥ 1cf

∫ R

0

f1(tA)t

dt−M∑

k=1

[ak(1/A) sin(skA) + bk(1/A) cos(ckA)]

∥∥∥∥∥2

=

∥∥∥∥∥T(

1cf

∫ R

0

f1(tD)t

dt−M∑

k=1

[ak sin(skD) + bk cos(ckD)]

)T−1

∥∥∥∥∥2

≤ c(T ) maxλ∈σ+(A)

∣∣∣∣∣ 1cf

∫ R

0

f1(tλ)t

dt−M∑

k=1

[ak sin(skλ) + bk cos(ckλ)]

∣∣∣∣∣≤ C c(T ) [K0 + K1] e−cp.

Choosing M = p(K0 + K1) (cf. (126)) completes the proof.

Finally, we derive tensor-product representations of the

matrices sin(skA) and cos(ckA) involved in (127). For this

purpose, we apply Prop. 9.1 (cf. Lect.5).

Page 123: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

HKT-approximation to sign(A): complexity bound B. Khoromskij, Leipzig 2005(L9) 244

Prop. 9.1. Let d ≥ 2. The trigonometric identity

sin

⎛⎝ d∑j=1

xj

⎞⎠ =d∑

j=1

sin(xj)∏

k∈1,...,d\j

sin(xk + αk − αj)sin(αk − αj)

(128)

holds for all real α1, . . . , αd s.t. sin(αk − αj) = 0 for all j = k.

The following statement extends the trigonometric identity

(128) to the case of matrix-valued functions sin(A) and cos(A).

Lem. 9.6. Let A =d∑

j=1

Aj ∈ RN×N with matrices Aj of the

form as in Lect. 8, where V j ∈ Rn×n (j = 1, . . . , d) and N = nd.

Suppose that α1, . . . , αd ⊂ R are chosen in such a way that

the representation (128) is valid. Then the following

HKT-approximation to sign(A): complexity bound B. Khoromskij, Leipzig 2005(L9) 245

tensor-product representation with exactly d terms

sin(A) =d∑

j=1

d⊗k=1

βkj sin(V j + δkjI), βkj =

⎧⎨⎩1

sin δkj, k = j,

1 k = j,

(129)

and with δkj = αk − αj, holds. A similar result holds for cos(A).

To guarantee the stability of representation (129) we have to

control the condition |αk − αj −mπ| > δ > 0 for m ∈ Z, k = j.

Lem. 9.5 and Lem. 9.6 lead to the GKHT-representation of

the matrix sign(A) with A−1 ∈ HKT (rA−1), sin(skA) ∈ HKT (d).

Setting rS = dM , r = rA−1, we get the complexity

O(d2MrA−1n logq n) provided that each V j (j = 1, . . . , d) can be

diagonalised with the cost O(n logq n), otherwise the cost is

O(n2 logq n).

Page 124: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

HKT-approximation to sign(A): complexity bound B. Khoromskij, Leipzig 2005(L9) 246

If some of the assumptions above are not satisfied, one can

apply the integral representation to the matrix sign-function

of A ∈ RN×N ,

sign(A) :=1πi

∫Γ+

(zI −A)−1dz − I. (130)

The exponentially convergent quadrature

sign(A) ≈r∑

k=1

ck(zkI −A)−1 − I, r = O (log2 ε + log2 cond(A)

),

for the integral (274) provides the direct approximation of

F (A) = sign(A) by a sum of matrix resolvents. The quadrature

points and weights can be chosen symmetrically w.r.t. the

real axis. Using the standard results for the elliptic inverse,

we are led to the overall cost O(rd2n2 logq n), which is

quadratic in both d and n.

Literature to Lect. 9 B. Khoromskij, Leipzig 2005(L9) 247

1. W. Hackbusch and B.N. Khoromskij: Low-Rank Kronecker-Product Approximation to Multi-Dimensional

Nonlocal Operators . Parts I/II. Preprints 29/30, MPI MIS, Leipzig 2005.

URL: http://personal-homepages.mis.mpg.de/bokh

http://www.mis.mpg.de/scicomp/Fulltext/Khoromskij/khor9.ps

Page 125: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Lect. 10. HKT repr. to the Hartree-Fock and Boltzmann eq. B. Khoromskij, Leipzig 2005 248

Outlook

1. Density Function Theory (DFT) via the Hartree-Fock eq.

(A) Reduction to the density matrix eq. via sign-matrices

(B) Representation of the Fock matr. in tensor-product form.

(C) Truncated nonlinear iteration to compute sign(F− µI).The proper formats:

– diagonally dominant, tensor-product data-sparse.

2. Boltzmann eq.

(A) Boltzmann collision integral in the HKT-representation

(B) Hadamard tensor-product operations

3. Ornstein-Zernike (OZ) integral eq. (brief survey).

4. Other directions.

Schrodinger and Hartry-Fock eq. B. Khoromskij, Leipzig 2005(L10) 249

The multi-dimensional Schrodinger eq. leads to the

challenging numerical problem.

The Schrodinger eq. for many-particle system reads as

HΨ = ΛΨ

with the Hamiltonian H = H[r1, ..., rNe ],

H := −12

Ne∑i=1

∆i−K∑

a=1

Ne∑i=1

Za

|ri −Ra|+∑

i<j≤Ne

1|ri − rj |+

∑a<b≤K

ZaZb

|Ra −Rb| ,

Za, Ra are charges and positions of the nuclei, ri ∈ R3. Hence

the problem is posed in Rd with high dimension d = 3Ne.

Desired size of the system is Ne = 10q, q = 1, 2, 3, 4, ...?

Focusing on density matrix computation.

Structured tensor representation to density matrix D in DFT

to approximate the ground state in the Schrodinger eq.

Page 126: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Nonlocal operators in many-particle models B. Khoromskij, Leipzig 2005(L10) 250

In DFT the many-particle problem is mapped onto a system

of noninteracting particles, resulting in a significant

simplification of a computation process. The so-called density

matrices play the key role in order to achieve linear

(sub-linear) scaling in Hartree-Fock-DFT methods.

The Hartree-Fock equation (in R3 !) reads as

Fφi = εiφi, i = 1, ..., Ne/2

with the Hartree-Fock operator

Fφ(x) := −1

2∆φ(x) + Vc(x) φ(x) + 2

Zd3y

ρ(y, y)

|x − y| φ(x) −Z

d3yρ(x, y)

|x − y| φ(y),

x, y ∈ R3. Here the density function ρ(x, y) is defined by

ρ(x, y) :=∑k≤p

φk(x)φk(y), p = Ne/2.

Nonlocal operators in many-particle models B. Khoromskij, Leipzig 2005(L10) 251

Nonlocal operators related to the Hartree-Fock eq.

1. Integral operators with the Newton potential

(Nu)(y) =∫

Ω

1|x− y|u(x)dx, y ∈ Ω ∈ R

3.

2. IOs with product kernels in R3: J - Hartree potential,

K - exchange potential.

3. sign(·) - to represent the spectral projection D (density

matrix) formed from the “occupied orbitals”

D =12[I− sign(F[D]− µI)], D ∈ R

M×M , M = O(3Ne).

4. 1x+y+z - generated energy tensor in Rn2×n2×n2

, x, y, z ∈ R2.

Tensor decomposition of “orbital energy denominators”.

Page 127: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Nonlocal operators in many-particle models B. Khoromskij, Leipzig 2005(L10) 252

Suppose that ϕi(x), i = 1, ..., M , is the set of tensor-product

orthogonal basis funct. defined in a bounded hypercube in R3.

Let D = dkl ∈ RM×M be the corresponding matrix

representation to the DF, such that

ρ(x, y) ≈M∑

k,l=1

dklϕk(x)ϕl(y) =: ρ(x, y)

We define “Galerkin type” approximation to the Fock oper.

F = K0 + 2J−K,

where K0 is the Galerkin representation to the “local”

component of the Fock operator and J = Jij, K = Kij with

Kij =∫ ∫

ρ(x, y)|x− y|ϕ

i(x)ϕj(y)dxdy, Jij =∫ ∫

ρ(y, y)|x− y|ϕ

i(x)ϕj(x)dxdy,

are the discrete exchange and Hartree potentials, respectively.

Nonlocal operators in many-particle models B. Khoromskij, Leipzig 2005(L10) 253

Given D = dk, the matrices K = K[D], J = J[D] can be

calculated as the tensor-matrix products

K = T · D, J = TT (i,) · D, (131)

where a tensor T = T kij is given by

T kij =

∫ ∫ϕk(x)ϕi(x)ϕ(y)ϕj(y)

|x− y| dxdy,

and Kij =∑k,

dkTkij , Jij =

∑k,

dkTkij .

Now we obtain F[D] = K0 −K[D] + 2J[D].

Rem. 10.1. Let KN be the Nystrom discr. of N . Then

(131) simplifies by making use of the Hadamard prod.,

K = KN ! D, J = diag∧[KN · diag∨(D)

],

where diag∧ and diag∨ are the operators converting a vector

into diagonal matrix and vice versa.

Page 128: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Nonlocal operators in many-particle models B. Khoromskij, Leipzig 2005(L10) 254

Given F, the spectral projection D[k, l] formed from the

occupied orbitals can be computed via the solution of the

eigenvalue problem

FΨj = λjΨj , j = 1, ..., p; λ1 ≤ ... ≤ λp ≤ ...,

by

D[k, l] =∑j≤p

Ψj [k]T Ψj [l].

The complexity scales qubically in M .

Rem. 10.2. The idempotency (proj.) prop. holds: D2 = D.

To avoid the solution of an eigenvalue problem, it is possible

to represent D directly using the matrix sign function.

Lem. 10.1. Let us choose µ ∈ (λp, λp+1), then

D =12[I− sign(F− µI)].

Nonlocal operators in many-particle models B. Khoromskij, Leipzig 2005(L10) 255

Proof. Since Ψj is orthogonal, F is unitary diagonalisable,

sign(F− µI) =M∑

j=1

ΨTj sign(λj − µ)Ψj ,

hence

D =∑

λj<µ

ΨTj Ψj =

12[I−

M∑j=1

ΨTj sign(λj − µ)Ψj ].

We can implement the corresponding matrix operations in the

tensor-product arithmetics.

Assume that the density matrix D is already represented in

the Kronecker product form (with M = n3)

D =rD∑s=1

D1s ⊗D2

s ⊗D3s , Dm

s ∈ Rn×n,

Dms is associated with the couple (xm, ym), m = 1, 2, 3.

Page 129: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Nonlocal operators in many-particle models B. Khoromskij, Leipzig 2005(L10) 256

Let ϕi(x) = ϕi1(x1)ϕi2(x2)ϕi3(x3), i = (i1, i2, i3), im = 1, ..., n, and

suppose that the Newton potential can be represented by

1|x− y| ≈

rN∑s=1

N1s (x1, y1)N2

s (x2, y2)N3s (x3, y3).

Due to “separability” results for the Newton potential and

implying the tensor-product structure of a basis, we derive

T =rN∑s=1

T 1s ⊗ T 2

s ⊗ T 3s , T m

s ∈ Rn2×n2

,

where (for m = 1, 2, 3)

[T ms ]kmlm

imjm=∫

Nms (xm, ym)ϕkm(xm)ϕim(xm)ϕlm(ym)ϕjm(ym)dxmdym.

Both T ms and D require O(n4) and O(n2) memory units,

respectively, while the “MVM” T · D now costs O(n4)(compare with O(n12)).

Nonlocal operators in many-particle models B. Khoromskij, Leipzig 2005(L10) 257

Hierarchical/Wavelet formats for low-dim. components

There are two principal cases:

(A) The FEM-Galerkin approximation.

(B) The wavelet basis ϕi.Note that the kernel-functions Nm

s (xm, ym) are proved to be

asymptotically smooth. Hence, in case (A), the H-matrix

reperesentation to the matrices T ms does a job.

In turn, in case (B), the wavelet representation to the kernels

Nms (xm, ym) can be applied.

Thus, the storage and MVM-complexity related to T ms , is

reduced from O(n4) to linear cost O(n3).

Page 130: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Nonlocal operators in many-particle models B. Khoromskij, Leipzig 2005(L10) 258

For the Nystrom representation (cf. Rem. 10.1), we enjoy

the sublinear cost O(r2Dn2) for basic matrix-tensor operations

due to (assume that rD = rK)

KN ! D =rD∑

s,t=1

(K1t !D1

s)⊗ (K2t !D2

s)⊗ (K3t !D3

s),

where each Hadamard product is implemented in O(n2) oper.

Concerning the matrix J, we arrive at the optimal complexity

O(r2Dn2) again, due to

diag∨(D) =rD∑s=1

diag∨(D1s)⊗ diag∨(D2

s)⊗ diag∨(D3s).

Now apply the H-matrix format (rank rH) to represent Dms

and Kmt . The Hadamard product of two H-matrices,

Kmt !Dm

s , requires only O(r2Hn log n) op., hence we arrive at

O(n logq n) complexity HKT-arithmetics.

WKT is also applicable.

The deterministic Boltzmann eq. in R3 B. Khoromskij, Leipzig 2005(L10) 259

The particle density f(t, x, v), x ∈ Ω ∈ R3, of dilute gas satisfies

the Boltzmann eq.

ft + (v, gradxf) = Q(f, f),

which describes the time evolution of f : R+ × Ω× R3 → R+.

With fixed t, x, the Boltzmann collision integral can be split as

Q(f, f) = Q+(f, f)(v) +Q−(f, f)(v),

where the loss part Q− has a simple form

Q−(f, f)(v) = f(v)∫

R3Btot(‖u‖)f(w)dw

with u = v − w being the relative velocity.

Integral Q− can be approximated by block-Toeplitz matrix in

the linear-logarithmic cost in N = n3.

Page 131: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

The deterministic Boltzmann eq. in R3 B. Khoromskij, Leipzig 2005(L10) 260

The gain part can be represented by a double integral

Q+(f, f)(v) =∫

R3

∫S2

B(‖u‖, µ)f(v′)f(w′)dedw, (132)

v′ = 12 (v + w + ‖u‖e) ∈ R3, w′ = 1

2 (v + w − ‖u‖e) ∈ R3; e ∈ S2 ⊂ R3

is the unit vector.

In the case of inverse power cut-off potential, we have

B(‖u‖, µ) = ‖u‖1−4/νgν(µ), ν > 1, µ = cos(θ) =〈u, e〉‖u‖

with gν being a given function of the scattering angle only,

s.t. gν ∈ L1([−1, 1]).〈·, ·〉 denotes the L2- scalar product in Rp, ‖ · || ≡ || · ||2 :=

√〈·, ·〉(with p = 3).

The deterministic Boltzmann eq. in R3 B. Khoromskij, Leipzig 2005(L10) 261

Key point: the efficient calculation of the gain part.

Let F be the p-dimensional Fourier transform, then

Q+(f, f)(v) = Fy→v

[∫R3

g(u, y)F−1z→y[f(z − u)f(z + u)](u, y)du

](v)

with

g(u, y) = g(‖u‖, ‖y‖, | 〈u, y〉 |),that depends only on the three scalar var., ‖u‖, ‖y‖, 〈u, y〉.Indeed, up to a scaling factor

g(u, y) =∫ π

0

gν(cos θ)e−i〈u,y〉 cos θJ0(√‖u‖2‖y‖2 − 〈u, y〉2 sin θ) sin θdθ,

J0(z) is the Bessel function J0(z) = 12π

∫ 2π

0eiz cos ψdψ.

Page 132: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Choice of the Kernel Function B. Khoromskij, Leipzig 2005(L10) 262

Ex. 1. The variable hard spheres (p = 3)

g1,λ(u, y) := ‖u‖λ sinc(‖u‖‖y‖

π), u, y ∈ R

p, λ ∈ (−3, 1], (133)

where the sinc-function (Cardinal function) is defined by

sinc(z) =sin(πz)

πz, z ∈ C.

This model corresponds to the case of second order tensors

(q = 2) with V k ∈ Rn×n×n (cf. Lect. 5,6).

Ex. 2. The general kernel function

g2,λ(u, y) :=‖u− y‖λ√‖u‖2 + ‖y‖2 + 2| 〈u, y〉 | , u, y ∈ R

p. (134)

The presence of | 〈u, y〉 | in the arguments of g2,λ(u, y) makes

the approximation process much more involved.

Choice of the Kernel Function B. Khoromskij, Leipzig 2005(L10) 263

Main result: Reduce the complexity from O(n6 log n) to

O(n4 log n) in the case (133), and to O(n5 log n) in the case

(134).

Page 133: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Kronecker Hadamard product B. Khoromskij, Leipzig 2005(L10) 264

Given tensors U ⊗ Y ∈ RI×J with U ∈ RI, Y ∈ RJ , and

B ∈ RI×L. Let T : RL → RJ be the linear operator (tensor)

that maps tensors defined on the index set L into those

defined on J .

Def. 10.1. (cf. Def. 5.3) The Hadamard “scalar” product

[D, C]I ∈ RK of two tensors D := [Di,k] ∈ RI×K and

C := [Ci,k] ∈ RI×K with K ∈ I,J ,L is defined by

[D, C]I :=∑i∈I

[Di,K]! [Ci,K],

where ! denotes the Hadamard product on the index set Kand [Di,K] := [Di,k]k∈K.

Lem. 10.2. (cf. Lem. 5.2) Let U, Y, B and T be given as

above. Then, with K = J , the following identity is valid

[U ⊗ Y, T ·B]I = Y ! (T · [U, B]I) ∈ RJ . (135)

Kronecker Hadamard product B. Khoromskij, Leipzig 2005(L10) 265

Proof. By definition of the Hadamard scalar product we have

[U ⊗ Y, T ·B]I =∑i∈I

[U ⊗ Y ]i,J ! [T ·B]i,J

=∑i∈I

[[U ]i · Y ]i,J ! [T ·B]i,J

= Y !(∑

i∈I[U ]i[T ·B]i,J

)

= Y !(

T ·∑i∈I

[U ]i[B]i,L

),

then the assertion follows.

Identity (135) is of the great importance in the current

applications since in the right-hand side the operator T is

removed from the scalar product and, so, it applies only once.

Page 134: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Ornstein-Zernike eq. in R3 B. Khoromskij, Leipzig 2005(L10) 266

In numerical modelling of a mono-atomic isotropic liquid with

spherically symmetric Lennard-Jones interaction potential

U(r) = 4ε[(σ/r)12 − (σ/r)6] between the particles (σ and ε are

the resp. size and energy parameters), the Ornstein-Zernike

equation relates the total correlation function h(r) with the

direct correlation function c(r) (with density ρ) by

h(r) = c(r) + ρ

∫R3

c(|r− r′|)h(r′)dr′. (136)

The ”closure” relation is

h(r) = exp[−βU(r) + h(r)− c(r) + B(r)]− 1. (137)

Key point: FFT vs. structured matrices in wavelet basis.

Ornstein-Zernike eq. in R3 B. Khoromskij, Leipzig 2005(L10) 267

0 2 4 6 8 10 12 14−1.5

−1

−0.5

0

0.5

1

1.5

0 2 4 6 8 10 12 14−15

−10

−5

0

5

Figure 18: Radial parts of correlation funct. h(r) (top) and c(r) of simple

mono-atomic liquid with Lennard–Jones potential param. ρ = 0.7, ε = 0.7.

Page 135: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Numerics I: 1x1+...+xd

- function generated tensor B. Khoromskij, Leipzig 2005(L10) 268

Robust exponentially convergent sinc-quadrature, ρ = x1 + ... + xd ∈ [1, R],

xi > 0,

1

ρ=

ZR

cosh(w)F (sinh(w))dw ≈ hMX

k=−M

cosh(kh)F (sinh(kh)),

F (u) = e−ρ log(1+eu)

1+e−u , M = O(log ε−1 log R), r = 2M + 1, h = Cintlog M

M.

0 200 400 600 800 1000−8

−6

−4

−2

0

2

4

6x 10

−6

0 200 400 600 800 1000−2.5

−2

−1.5

−1

−0.5

0

0.5

1x 10

−8

0 200 400 600 800 1000−1

−0.5

0

0.5

1

1.5

2

2.5

3x 10

−13

Figure 19: The absolute quadrature error for R = 103 with M = 16

(left), M = 32 (middle), M = 64 (right). Similar results are observed for

R = 32 · 103.

Numerics I: 1x1+...+xd

- function generated tensor B. Khoromskij, Leipzig 2005(L10) 269

4 7 10 13 16 19 22 25 28 3110

−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

M − number of quadrature points

erro

r

F = exp(t −r exp(t)), r=1.0, Cint=1.0

4 7 10 13 16 19 22 25 28 3110

−10

10−8

10−6

10−4

10−2

100

M − number of quadrature points

erro

r

F = exp(t −r exp(t)), r=1.0, Cint=1.1

4 7 10 13 16 19 22 25 28 3110

−10

10−8

10−6

10−4

10−2

100

M − number of quadrature points

erro

r

F = exp(t −r exp(t)), r=1.0, Cint=1.2

Figure 20: Quadrature for ρ = 1.0 with different Cint.

Application in QC. Arithmetics with function-generated energy matrix

Ejk =1

ej1 + ej2 + ej3 + ek1 + ek2 + ek3

(ej, ek

> 0), Ejk ∈ RJ×K

with j = (j1, j2, j3) ∈ J , k = (k1, k2, k3) ∈ K, j = 1, ..., NJ , k = 1, ..., NK, for

= 1, 2, 3. Construct a low Kronecker rank separable approximation to1

x1+...+xd,

Pdi=1 xi ∈ [1, R] via the sinc-quadrature/appr. by exp. sums.

For experimental data in quantum chemistry: NJ , NK ∈ [102, 103],

R ∈ [103, 104].

Page 136: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Numerics II: Newton potential (symmetric quadrature) B. Khoromskij, Leipzig 2005(L10) 270

Approximating the Gauss integral 1ρ

=R

RF (t; ρ)dt with ρ = |x− y|, x, y ∈ Rd,

hMX

k=−M

cosh(kh)F (sinh(kh); ρ) ≈Z

R

F (t; ρ)dt, F (t; ρ) =1√π

e−ρ2t2 . (138)

Rank r = M + 1 (symmetric) quadrature (138), ρ = 1.0

M 4 9 16 25 36

ε 1.110-4 1.510-6 2.310-9 2.010-12 < 1.010-15

The Gaussian int. with ρ = 0, 2, 1, 10; Cint = 1.0; applies for ρ ∈ [0.2, 10].

4 7 10 13 16 19 22 25 28 31 34 37 40 43 4610

−12

10−10

10−8

10−6

10−4

10−2

100

102

M − number of quadrature points

erro

r

F = exp(−r2t2), r=0.2, Cint

=1.0

4 7 10 13 16 19 22 25 28 31 34 37 40 43 4610

−16

10−14

10−12

10−10

10−8

10−6

10−4

10−2

M − number of quadrature points

erro

r

F = exp(−r2t2), r=1., Cint

=1.0

4 7 10 13 16 19 22 25 28 31 34 37 40 43 4610

−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

M − number of quadrature points

erro

r

F = exp(−r2t2), r=10., Cint

=1.0

Numerics III: Newton potential (robust quadrature) B. Khoromskij, Leipzig 2005(L10) 271

Robust nonsymmetric quadrature with

1

ρ=

ZR

F (u; ρ)du; F (u; ρ) :=2√π

e−ρ2 log2(1+eu)

1 + e−u, ρ ∈ [1, R].

0 50 100 150 200−4

−3

−2

−1

0

1

2

3x 10

−8

0 200 400 600 800 1000−3

−2

−1

0

1

2

3

4x 10

−7

0 1000 2000 3000 4000 5000−5

0

5x 10

−7

Figure 21: The absolute quadrature error for M = 64 with R = 200 (left),

R = 103 (middle), R = 5 · 103 (right). Similar results are observed in the

case R > 5 · 103.

Page 137: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Numerics IV: Boltzmann equation B. Khoromskij, Leipzig 2005(L10) 272

−10

−5

0

5

10

−10

−5

0

5

100

0.2

0.4

0.6

0.8

1

u

Fig 1: 1D Kernel Function f=1/(|u|+|v|); u=x−z, v=y−z

v −5

0

5

−5

0

5−2

−1

0

1

2

3

4

5

u

Fig 1: 1D Kernel Function f=|x||bet*sinc(|uv|); u=x−z, v=y−z

v

Figure 22: Function g2,λ(u, y) for λ = 0 (left) and g1,λ(u, y) for λ = 1.

g1,λ(u, y) := ‖u‖λ sinc(‖u‖‖y‖

π), u, y ∈ R

p, λ ∈ (−3, 1],

g2,λ(u, y) :=‖u − y‖λp

‖u‖2 + ‖y‖2 + 2| 〈u, y〉 | , u, y ∈ Rp.

Numerics IV: Boltzmann equation B. Khoromskij, Leipzig 2005(L10) 273

4 8 12 16 20 24 28 32 36 40 44 4810

−12

10−10

10−8

10−6

10−4

10−2

100

M − number of quadrature points

err

or

|x|s sinc(y|x|), x ∈ [−1,1],s=1,y=16

4 8 12 16 20 24 28 32 36 40 44 4810

−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

M − number of quadrature points

err

or

|x|s sinc(y|x|), x ∈ [−1,1],s=1,y=25

4 8 12 16 20 24 28 32 36 40 44 4810

−5

10−4

10−3

10−2

10−1

M − number of quadrature points

err

or

|x|s sinc(y|x|), x ∈ [−1,1],s=1,y=36

Figure 23: L∞-error of the sinc-interp. to |x|λsinc(|x|y), x ∈ [−1, 1], λ = 1.

Best r-term approx. to 1/√

ρ byP

aie−biρ (W. Hackbusch ’05)

L∞- and weighted L2([1, R])-norm.

R 10 50 100 200 ‖ · ‖L∞ W (ρ) = 1/√

ρ

r = 4 3.710-4 9.610-4 1.510-3 2.210-3 1.910-3 4.810-3

r = 5 2.810-4 2.810-4 3.710-4 5.810-4 4.210-4 1.210-3

r = 6 8.010-5 9.810-5 1.110-4 1.610-4 9.510-5 3.310-4

r = 7 3.510-5 3.810-5 3.910-5 4.710-5 2.210-5 8.110-5

Page 138: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Approximating sign(A) B. Khoromskij, Leipzig 2005(L10) 274

General definition: Given A ∈ RM×M , M = nd,

sign(A) :=1πi

∫Γ+

(zI −A)−1dz − I ∈ RM×M

with Γ+ ∈ C being any simply closed curve that contains

σ+(A) = λ ∈ σ(A) : eλ > 0.Iterative evaluation:

The Newton-Schulz iteration: Xk → sign(A),

Xk = Xk−1 +12[I − (Xk−1)2

]Xk−1, k = 1, 2, ...

with X0 = A/||A||2 has locally quadratic convergence.

NSI - Convergence theory: Lem. 8.1 applies with α = 2.

Thm. 8.1. applies with α = 2, but under restrictive “nearly

commutativity” condition.

Numerics V: T-NSI to compute sign(∆h − µI) B. Khoromskij, Leipzig 2005(L10) 275

Figure 24: Exact/trunc. NSIs on 16×16- and 32×32-grids (r = 7, r = 10).

Page 139: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Comments on Numerics V. B. Khoromskij, Leipzig 2005(L10) 276

Numerics demonstrates robust and asymptotically optimal

convergence of T-NSI provided that the Kronecker rank is

chosen properly, otherwise, Xk → I (M. Espig, MPI MIS).

Bound on Kronecker-rank: r = O(d(| log ε|+ log cond(A))| log ε|).Complexity of T-NSI: O(dr4n2 + r6) + O(r2r4d3) + ...

T-NSI to compute sign(A − µI) with A = ∆h

Grid t(SVD) t(NSI) t(T-NSI) r(sign(A − µI))

4 × 4 0.02 0.0 0.02 4

8 × 8 0.03 0.03 0.15 6

16 × 16 0.74 0.85 0.64 7

32 × 32 108.5 56.5 17.4 10

64 × 64 6400. 4000. 210. 13

Here t(SV D), t(NSI), t(T −NSI) denote the CP-time (sec.)

required for SVD, exact NSI and truncated NSI, respectively.

Concluding Remarks B. Khoromskij, Leipzig 2005(L10) 277

1. HKT -approximation (for d ≥ 3) is a subtle concept mostly

based on analytic tools with possible algebraic recompression.

It offers the low-Kronecker-rank data-sparse representation to

(a) Integral operators in Rd, e.g., with the Newton, Yukawa

and Helmholtz kernels

1|x− y| ;

e−µ|x−y|

|x− y| , µ ∈ R+;e−i κ2|x−y|

|x− y| , κ2 ∈ R,

(b) A−1, A being the discrete elliptic op. in [a, b]d,e.g., A = −∆− κ2,

(c) Certain class of the matrix-valued functions F(A), e.g.,

sign(A), exp(A),∫

R+

e−tAGe−tBdt.

Page 140: An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations on high-dimensional, densely populated matrices/tensors require huge computational

Concluding Remarks B. Khoromskij, Leipzig 2005(L10) 278

2. We enjoy the sub-linear cost O(dpn logq N), p, = 1, 2 with

N = nd.

3. Applications: FEM/BEM in elliptic and parabolic problems

in Rd, many-particle modelling based on DFT for the

Hartree-Fock eq., Boltzmann eq., Ornstein-Zernike eq., linear

algebra, complexity theory, control theory.

4. By-product: O(N logq N) - O(N1/d logq N) complexity

(approximate) direct elliptic problem solver on non-uniform

tensor grids in Rd and for variable (“separable”) coefficients

(generalisation of FFT).

Sub-linear cost O(N1/d logq N) in the case of tensor rhs.

5. Other directions: chemometrics, statistics, signal

processing (in biology).

Literature to Lecture 10 B. Khoromskij, Leipzig 2005(L10) 279

1. W. Hackbusch and B.N. Khoromskij: Hierarchical Kronecker Tensor-Product Approximation to a Class

of Nonlocal Operators in High Dimensions. Parts I/II. Preprints 29/30, MPI MIS, Leipzig 2005.

2. B.N. Khoromskij: Structured data-sparse approximation to high order tensors arising from the deterministic

Boltzmann equation. Preprint 4, MPI MIS, Leipzig 2005.

3. M. Fedorov, H.-J. Flad, L. Grasedyck, and B.N. Khoromskij: Low-rank wavelet solver for the

Ornstein-Zernike integral equation. Preprint 59, MPI MIS, Leipzig 2005.

4. W. Hackbusch, B.N. Khoromskij, E. Tyrtyshnikov: Approximate iteration for structured matrices.

MPI MIS, Leipzig 2005.

5. H.-J. Flad, W. Hackbusch, B.N. Khoromskij and R. Schneider: Concept of data-sparse tensor-product

approximation in many-particle modelling. Leipzig 2005, in progress.

http://personal-homepages.mis.mpg.de/bokh

http://www.mis.mpg.de/scicomp/Fulltext/Khoromskij/khor10.ps