low-rank methods for analysis of high-dimensional data (siam cse talk 2017)

Low-rank tensor methods for analysis of highdimensional data

Alexander Litvinenko and Mike Espig

Center for UncertaintyQuantification


Center for Uncertainty Quantification Logo Lock-up

http://sri-uq.kaust.edu.sa/

Extreme Computing Research Center, KAUST

Alexander Litvinenko and Mike Espig Low-rank tensor methods for analysis of high dimensional data

http://sri-uq.kaust.edu.sa/

4*

KAUST

I received very rich collaboration experience as a co-organizator of:I 3 UQ workshops,I 2 Scalable Hierarchical Algorithms for eXtreme Computing

(SHAXC) workshopsI 1 HPC Conference (www.hpcsaudi.org, 2017)

4*

My previous work

After applying the stochastic Galerkin method, obtain:Ku = f, where all ingredients are represented in a tensor format

Compute max{u}, var(u), level sets of u, sign(u)[1] Efficient Analysis of High Dimensional Data in Tensor Formats,

Espig, Hackbusch, A.L., Matthies and Zander, 2012.

Research which ingredients influence on the tensor rank of K[2] Efficient low-rank approximation of the stochastic Galerkin matrix in tensor formats,

Wahnert, Espig, Hackbusch, A.L., Matthies, 2013.

Approximate κ(x , ω), stochastic Galerkin operator K in TensorTrain (TT) format, solve for u, postprocessing[3] Polynomial Chaos Expansion of random coefficients and the solution of stochastic

partial differential equations in the Tensor Train format, Dolgov, Litvinenko, Khoromskij, Matthies, 2016.




-2 / 18

4*

Typical quantities of interest

Keeping all input and intermediate data in a tensorrepresentation one wants to perform different tasks:

I evaluation for specific parameters (ω1, . . . , ωM),I finding maxima and minima,I finding ‘level sets’ (needed for histogram and probability

density).Example of level set: all elements of a high dimensional tensorfrom the interval [0.7,0.8].




-1 / 18

4*

Canonical and Tucker tensor formats

Definition and Examples of tensors




0 / 18

4*

Canonical and Tucker tensor formats

[Pictures are taken from B. Khoromskij and A. Auer lecture course]

Storage: O(nd )→ O(dRn) and O(Rd + dRn).




1 / 18

4*

Definition of tensor of order d

Tensor of order d is a multidimensional array over a d-tupleindex set I = I1 × · · · × Id ,

A = [ai1...id : i` ∈ I`] ∈ RI , I` = {1, ...,n`}, ` = 1, ..,d .

A is an element of the linear space

Vn =d⊗`=1

V`, V` = RI`

equipped with the Euclidean scalar product 〈·, ·〉 : Vn ×Vn → R,defined as

〈A,B〉 :=∑

(i1...id )∈I

ai1...id bi1...id , for A, B ∈ Vn.

Let T :=⊗d

µ=1 Rnµ ,

RR(T ) :={∑R

i=1⊗d

µ=1 viµ ∈ T : viµ ∈ Rnµ}

,Center for UncertaintyQuantification



2 / 18

4*

Examples of rank-1 and rank-2 tensors

Rank-1:f (x1, ..., xd ) = exp(f1(x1) + ...+ fd (xd )) =

∏dj=1 exp(fj(xj))

Rank-2: f (x1, ..., xd ) = sin(∑d

j=1 xj), since

2i · sin(∑d

j=1 xj) = ei∑d

j=1 xj − e−i∑d

j=1 xj

Rank-d function f (x1, ..., xd ) = x1 + x2 + ...+ xd can beapproximated by rank-2: with any prescribed accuracy:

f ≈∏d

j=1(1 + εxj)

ε−∏d

j=1 1ε

+O(ε), as ε→ 0




3 / 18

4*

Tensor and Matrices

Rank-1 tensor

A = u1 ⊗ u2 ⊗ ...⊗ ud =:d⊗µ=1

uµ

Ai1,...,id = (u1)i1 · ... · (ud )id

Rank-1 tensor A = u ⊗ v , matrix A = uvT , A = vuT , u ∈ Rn,v ∈ Rm,Rank-k tensor A =

∑ki=1 ui ⊗ vi , matrix A =

∑ki=1 uivT

i .Kronecker product of n × n and m ×m matrices is a new blockmatrix A⊗ B ∈ Rnm×nm, whose ij-th block is [AijB].




4 / 18

4*

Computing QoI in low-rank tensor format

Now, we consider how tofind maxima in a high-dimensional tensor

4*

Maximum norm and corresponding index

Let u =∑r

j=1⊗d

µ=1 ujµ ∈ Rr , compute

‖u‖∞ := maxi:=(i1,...,id )∈I

|ui | = maxi:=(i1,...,id )∈I

∣∣∣∣∣∣r∑

j=1

d∏µ=1

(ujµ)

iµ

∣∣∣∣∣∣ .Computing ‖u‖∞ is equivalent to the following e.v. problem.

Let i∗ := (i∗1 , . . . , i∗d ) ∈ I, #I =

∏dµ=1 nµ.

‖u‖∞ = |ui∗ | =

∣∣∣∣∣∣r∑

j=1

d∏µ=1

(ujµ)

i∗µ

∣∣∣∣∣∣ and e(i∗) :=d⊗µ=1

ei∗µ ,

where ei∗µ ∈ Rnµ the i∗µ-th canonical vector in Rnµ (µ ∈ N≤d ).




5 / 18

Then

u � e(i∗) =

r∑j=1

d⊗µ=1

ujµ

� d⊗µ=1

ei∗µ

=r∑

j=1

d⊗µ=1

ujµ � ei∗µ

=r∑

j=1

d⊗µ=1

[(ujµ)i∗µei∗µ

]

=

r∑j=1

d∏µ=1

(ujµ)i∗µ

︸︷︷︸

ui∗=

d⊗µ=1

e(i∗µ) = ui∗e(i∗).

Thus, we obtained an “eigenvalue problem”:

u � e(i∗) = ui∗e(i∗).




6 / 18

4*

Computing ‖u‖∞, u ∈ Rr by vector iteration

By defining the following diagonal matrix

D(u) :=r∑

j=1

d⊗µ=1

diag((ujµ)`µ

)`µ∈N≤nµ

(1)

with representation rank r , obtain D(u)v = u � v .Now apply the well-known vector iteration method (with ranktruncation) to

D(u)e(i∗) = ui∗e(i∗),

obtain ‖u‖∞.[Approximate iteration, Khoromskij, Hackbusch, Tyrtyshnikov 05],

and [Espig, Hackbusch 2010]




7 / 18

4*

How to compute the mean value in CP format

Let u =∑r

j=1⊗d

µ=1 ujµ ∈ Rr , then the mean value u can becomputed as a scalar product

u =

⟨ r∑j=1

d⊗µ=1

ujµ

,

d⊗µ=1

1nµ

1µ

⟩ =r∑

j=1

d⊗µ=1

⟨ujµ, 1µ

⟩nµ

=

(2)

=r∑

j=1

d∏µ=1

1nµ

( nµ∑k=1

(ujµ)k

), (3)

where 1µ := (1, . . . ,1)T ∈ Rnµ .Numerical cost is O

(r ·∑d

µ=1 nµ)

.




8 / 18

4*

How to compute the variance in CP format

Let u ∈ Rr and

u := u − ud⊗µ=1

1nµ

1 =r+1∑j=1

d⊗µ=1

ujµ ∈ Rr+1, (4)

then the variance var(u) of u can be computed as follows

var(u) =〈u, u〉∏dµ=1 nµ

=1∏d

µ=1 nµ

⟨r+1∑i=1

d⊗µ=1

uiµ

,

r+1∑j=1

d⊗ν=1

ujν

⟩

=r+1∑i=1

r+1∑j=1

d∏µ=1

1nµ

⟨uiµ, ujµ

⟩.

Numerical cost is O(

(r + 1)2 ·∑d

µ=1 nµ)

.

4*

Computing QoI in low-rank tensor format

Now, we consider how tofind ‘level sets’,

for instance, all entries of tensor u from interval [a,b].

4*

Definitions of characteristic and sign functions

1. To compute level sets and frequencies we needcharacteristic function.2. To compute characteristic function we need sign function.

The characteristic χI(u) ∈ T of u ∈ T in I ⊂ R is for every multi-index i ∈ I pointwise defined as

(χI(u))i :=

{1, ui ∈ I,0, ui /∈ I.

Furthermore, the sign(u) ∈ T is for all i ∈ I pointwise definedby

(sign(u))i :=

1, ui > 0;−1, ui < 0;0, ui = 0.




10 / 18

4*

sign(u) is needed for computing χI(u)

LemmaLet u ∈ T , a,b ∈ R, and 1 =

⊗dµ=1 1µ, where

1µ := (1, . . . ,1)t ∈ Rnµ .(i) If I = R<b, then we have χI(u) = 1

2(1+ sign(b1− u)).

(ii) If I = R>a, then we have χI(u) = 12(1− sign(a1− u)).

(iii) If I = (a,b), then we haveχI(u) = 1

2(sign(b1− u)− sign(a1− u)).

Computing sign(u), u ∈ Rr , via hybrid Newton-Schulz iterationwith rank truncation after each iteration.




11 / 18

4*

Level Set, Frequency

Definition (Level Set, Frequency)Let I ⊂ R and u ∈ T . The level set LI(u) ∈ T of u respect to I ispointwise defined by

(LI(u))i :=

{ui ,ui ∈ I ;0,ui /∈ I ,

for all i ∈ I.The frequency FI(u) ∈ N of u respect to I is defined as

FI(u) := # suppχI(u).




12 / 18

4*

Computation of level sets and frequency

PropositionLet I ⊂ R, u ∈ T , and χI(u) its characteristic. We have

LI(u) = χI(u)� u

and rank(LI(u)) ≤ rank(χI(u)) rank(u).The frequency FI(u) ∈ N of u respect to I is

FI(u) = 〈χI(u),1〉 ,

where 1 =⊗d

µ=1 1µ, 1µ := (1, . . . ,1)T ∈ Rnµ .




13 / 18

4*

Numerical Experiments

2D L-shape domain, N = 557 dofs.Total stochastic dimension is Mu = Mk + Mf = 20, there are|J | = 231 PCE coefficients

u =231∑j=1

uj,0 ⊗20⊗µ=1

ujµ ∈ R557 ⊗20⊗µ=1

R3.




14 / 18

4*

Level sets

Now we compute level sets

sign(b‖u‖∞1− u)for b ∈ {0.2, 0.4, 0.6, 0.8}.

I Tensor u has 320 ∗ 557 ≈ 2 · 1012 entries ≈ 16 TB ofmemory.

I The computing time of one level set was 10 minutes.I Intermediate ranks of sign(b‖u‖∞1− u) and of rank(uk )

were less than 24.




15 / 18

4*

Example: Canonical rank d , whereas TT rank 2

d-Laplacian over uniform tensor grid. It is known to have theKronecker rank-d representation,

∆d = A⊗IN⊗...⊗IN +IN⊗A⊗...⊗IN +...+IN⊗IN⊗...⊗A ∈ RI⊗d⊗I⊗d

(5)with A = ∆1 = tridiag{−1,2,−1} ∈ RN×N , and IN being theN × N identity. Notice that for the canonical rank we have rankkC(∆d ) = d , while TT-rank of ∆d is equal to 2 for anydimension due to the explicit representation

∆d = (∆1 I)×(

I 0∆1 I

)× ...×

(I 0

∆1 I

)×(

I∆1

)(6)

where the rank product operation ”×” is defined as a regularmatrix product of the two corresponding core matrices, theirblocks being multiplied by means of tensor product. The similarbound is true for the Tucker rank rankTuck (∆d ) = 2.

4*

Advantages and disadvantages

Denote k - rank, d-dimension, n = # dofs in 1D:

1. CP: ill-posed approx. alg-m, O(dnk), hard to computeapprox.

2. Tucker: reliable arithmetic based on SVD, O(dnk + kd )

3. Hierarchical Tucker: based on SVD, storage O(dnk + dk3),truncation O(dnk2 + dk4)

4. TT: based on SVD, O(dnk2) or O(dnk3), stable5. Quantics-TT: O(nd )→ O(d logqn)

low-rank methods for analysis of high-dimensional data (siam cse talk 2017)

Education