low-rank methods for analysis of high-dimensional data (siam cse talk 2017)
TRANSCRIPT
Low-rank tensor methods for analysis of highdimensional data
Alexander Litvinenko and Mike Espig
Center for UncertaintyQuantification
Center for UncertaintyQuantification
Center for Uncertainty Quantification Logo Lock-up
http://sri-uq.kaust.edu.sa/
Extreme Computing Research Center, KAUST
Alexander Litvinenko and Mike Espig Low-rank tensor methods for analysis of high dimensional data
4*
KAUST
I received very rich collaboration experience as a co-organizator of:I 3 UQ workshops,I 2 Scalable Hierarchical Algorithms for eXtreme Computing
(SHAXC) workshopsI 1 HPC Conference (www.hpcsaudi.org, 2017)
4*
My previous work
After applying the stochastic Galerkin method, obtain:Ku = f, where all ingredients are represented in a tensor format
Compute max{u}, var(u), level sets of u, sign(u)[1] Efficient Analysis of High Dimensional Data in Tensor Formats,
Espig, Hackbusch, A.L., Matthies and Zander, 2012.
Research which ingredients influence on the tensor rank of K[2] Efficient low-rank approximation of the stochastic Galerkin matrix in tensor formats,
Wahnert, Espig, Hackbusch, A.L., Matthies, 2013.
Approximate κ(x , ω), stochastic Galerkin operator K in TensorTrain (TT) format, solve for u, postprocessing[3] Polynomial Chaos Expansion of random coefficients and the solution of stochastic
partial differential equations in the Tensor Train format, Dolgov, Litvinenko, Khoromskij, Matthies, 2016.
Center for UncertaintyQuantification
Center for UncertaintyQuantification
Center for Uncertainty Quantification Logo Lock-up
-2 / 18
4*
Typical quantities of interest
Keeping all input and intermediate data in a tensorrepresentation one wants to perform different tasks:
I evaluation for specific parameters (ω1, . . . , ωM),I finding maxima and minima,I finding ‘level sets’ (needed for histogram and probability
density).Example of level set: all elements of a high dimensional tensorfrom the interval [0.7,0.8].
Center for UncertaintyQuantification
Center for UncertaintyQuantification
Center for Uncertainty Quantification Logo Lock-up
-1 / 18
4*
Canonical and Tucker tensor formats
Definition and Examples of tensors
Center for UncertaintyQuantification
Center for UncertaintyQuantification
Center for Uncertainty Quantification Logo Lock-up
0 / 18
4*
Canonical and Tucker tensor formats
[Pictures are taken from B. Khoromskij and A. Auer lecture course]
Storage: O(nd )→ O(dRn) and O(Rd + dRn).
Center for UncertaintyQuantification
Center for UncertaintyQuantification
Center for Uncertainty Quantification Logo Lock-up
1 / 18
4*
Definition of tensor of order d
Tensor of order d is a multidimensional array over a d-tupleindex set I = I1 × · · · × Id ,
A = [ai1...id : i` ∈ I`] ∈ RI , I` = {1, ...,n`}, ` = 1, ..,d .
A is an element of the linear space
Vn =d⊗`=1
V`, V` = RI`
equipped with the Euclidean scalar product 〈·, ·〉 : Vn ×Vn → R,defined as
〈A,B〉 :=∑
(i1...id )∈I
ai1...id bi1...id , for A, B ∈ Vn.
Let T :=⊗d
µ=1 Rnµ ,
RR(T ) :={∑R
i=1⊗d
µ=1 viµ ∈ T : viµ ∈ Rnµ}
,Center for UncertaintyQuantification
Center for UncertaintyQuantification
Center for Uncertainty Quantification Logo Lock-up
2 / 18
4*
Examples of rank-1 and rank-2 tensors
Rank-1:f (x1, ..., xd ) = exp(f1(x1) + ...+ fd (xd )) =
∏dj=1 exp(fj(xj))
Rank-2: f (x1, ..., xd ) = sin(∑d
j=1 xj), since
2i · sin(∑d
j=1 xj) = ei∑d
j=1 xj − e−i∑d
j=1 xj
Rank-d function f (x1, ..., xd ) = x1 + x2 + ...+ xd can beapproximated by rank-2: with any prescribed accuracy:
f ≈∏d
j=1(1 + εxj)
ε−∏d
j=1 1ε
+O(ε), as ε→ 0
Center for UncertaintyQuantification
Center for UncertaintyQuantification
Center for Uncertainty Quantification Logo Lock-up
3 / 18
4*
Tensor and Matrices
Rank-1 tensor
A = u1 ⊗ u2 ⊗ ...⊗ ud =:d⊗µ=1
uµ
Ai1,...,id = (u1)i1 · ... · (ud )id
Rank-1 tensor A = u ⊗ v , matrix A = uvT , A = vuT , u ∈ Rn,v ∈ Rm,Rank-k tensor A =
∑ki=1 ui ⊗ vi , matrix A =
∑ki=1 uivT
i .Kronecker product of n × n and m ×m matrices is a new blockmatrix A⊗ B ∈ Rnm×nm, whose ij-th block is [AijB].
Center for UncertaintyQuantification
Center for UncertaintyQuantification
Center for Uncertainty Quantification Logo Lock-up
4 / 18
4*
Computing QoI in low-rank tensor format
Now, we consider how tofind maxima in a high-dimensional tensor
4*
Maximum norm and corresponding index
Let u =∑r
j=1⊗d
µ=1 ujµ ∈ Rr , compute
‖u‖∞ := maxi:=(i1,...,id )∈I
|ui | = maxi:=(i1,...,id )∈I
∣∣∣∣∣∣r∑
j=1
d∏µ=1
(ujµ)
iµ
∣∣∣∣∣∣ .Computing ‖u‖∞ is equivalent to the following e.v. problem.
Let i∗ := (i∗1 , . . . , i∗d ) ∈ I, #I =
∏dµ=1 nµ.
‖u‖∞ = |ui∗ | =
∣∣∣∣∣∣r∑
j=1
d∏µ=1
(ujµ)
i∗µ
∣∣∣∣∣∣ and e(i∗) :=d⊗µ=1
ei∗µ ,
where ei∗µ ∈ Rnµ the i∗µ-th canonical vector in Rnµ (µ ∈ N≤d ).
Center for UncertaintyQuantification
Center for UncertaintyQuantification
Center for Uncertainty Quantification Logo Lock-up
5 / 18
Then
u � e(i∗) =
r∑j=1
d⊗µ=1
ujµ
� d⊗µ=1
ei∗µ
=r∑
j=1
d⊗µ=1
ujµ � ei∗µ
=r∑
j=1
d⊗µ=1
[(ujµ)i∗µei∗µ
]
=
r∑j=1
d∏µ=1
(ujµ)i∗µ
︸ ︷︷ ︸
ui∗=
d⊗µ=1
e(i∗µ) = ui∗e(i∗).
Thus, we obtained an “eigenvalue problem”:
u � e(i∗) = ui∗e(i∗).
Center for UncertaintyQuantification
Center for UncertaintyQuantification
Center for Uncertainty Quantification Logo Lock-up
6 / 18
4*
Computing ‖u‖∞, u ∈ Rr by vector iteration
By defining the following diagonal matrix
D(u) :=r∑
j=1
d⊗µ=1
diag((ujµ)`µ
)`µ∈N≤nµ
(1)
with representation rank r , obtain D(u)v = u � v .Now apply the well-known vector iteration method (with ranktruncation) to
D(u)e(i∗) = ui∗e(i∗),
obtain ‖u‖∞.[Approximate iteration, Khoromskij, Hackbusch, Tyrtyshnikov 05],
and [Espig, Hackbusch 2010]
Center for UncertaintyQuantification
Center for UncertaintyQuantification
Center for Uncertainty Quantification Logo Lock-up
7 / 18
4*
How to compute the mean value in CP format
Let u =∑r
j=1⊗d
µ=1 ujµ ∈ Rr , then the mean value u can becomputed as a scalar product
u =
⟨ r∑j=1
d⊗µ=1
ujµ
,
d⊗µ=1
1nµ
1µ
⟩ =r∑
j=1
d⊗µ=1
⟨ujµ, 1µ
⟩nµ
=
(2)
=r∑
j=1
d∏µ=1
1nµ
( nµ∑k=1
(ujµ)k
), (3)
where 1µ := (1, . . . ,1)T ∈ Rnµ .Numerical cost is O
(r ·∑d
µ=1 nµ)
.
Center for UncertaintyQuantification
Center for UncertaintyQuantification
Center for Uncertainty Quantification Logo Lock-up
8 / 18
4*
How to compute the variance in CP format
Let u ∈ Rr and
u := u − ud⊗µ=1
1nµ
1 =r+1∑j=1
d⊗µ=1
ujµ ∈ Rr+1, (4)
then the variance var(u) of u can be computed as follows
var(u) =〈u, u〉∏dµ=1 nµ
=1∏d
µ=1 nµ
⟨r+1∑i=1
d⊗µ=1
uiµ
,
r+1∑j=1
d⊗ν=1
ujν
⟩
=r+1∑i=1
r+1∑j=1
d∏µ=1
1nµ
⟨uiµ, ujµ
⟩.
Numerical cost is O(
(r + 1)2 ·∑d
µ=1 nµ)
.
4*
Computing QoI in low-rank tensor format
Now, we consider how tofind ‘level sets’,
for instance, all entries of tensor u from interval [a,b].
4*
Definitions of characteristic and sign functions
1. To compute level sets and frequencies we needcharacteristic function.2. To compute characteristic function we need sign function.
The characteristic χI(u) ∈ T of u ∈ T in I ⊂ R is for every multi-index i ∈ I pointwise defined as
(χI(u))i :=
{1, ui ∈ I,0, ui /∈ I.
Furthermore, the sign(u) ∈ T is for all i ∈ I pointwise definedby
(sign(u))i :=
1, ui > 0;−1, ui < 0;0, ui = 0.
Center for UncertaintyQuantification
Center for UncertaintyQuantification
Center for Uncertainty Quantification Logo Lock-up
10 / 18
4*
sign(u) is needed for computing χI(u)
LemmaLet u ∈ T , a,b ∈ R, and 1 =
⊗dµ=1 1µ, where
1µ := (1, . . . ,1)t ∈ Rnµ .(i) If I = R<b, then we have χI(u) = 1
2(1+ sign(b1− u)).
(ii) If I = R>a, then we have χI(u) = 12(1− sign(a1− u)).
(iii) If I = (a,b), then we haveχI(u) = 1
2(sign(b1− u)− sign(a1− u)).
Computing sign(u), u ∈ Rr , via hybrid Newton-Schulz iterationwith rank truncation after each iteration.
Center for UncertaintyQuantification
Center for UncertaintyQuantification
Center for Uncertainty Quantification Logo Lock-up
11 / 18
4*
Level Set, Frequency
Definition (Level Set, Frequency)Let I ⊂ R and u ∈ T . The level set LI(u) ∈ T of u respect to I ispointwise defined by
(LI(u))i :=
{ui ,ui ∈ I ;0,ui /∈ I ,
for all i ∈ I.The frequency FI(u) ∈ N of u respect to I is defined as
FI(u) := # suppχI(u).
Center for UncertaintyQuantification
Center for UncertaintyQuantification
Center for Uncertainty Quantification Logo Lock-up
12 / 18
4*
Computation of level sets and frequency
PropositionLet I ⊂ R, u ∈ T , and χI(u) its characteristic. We have
LI(u) = χI(u)� u
and rank(LI(u)) ≤ rank(χI(u)) rank(u).The frequency FI(u) ∈ N of u respect to I is
FI(u) = 〈χI(u),1〉 ,
where 1 =⊗d
µ=1 1µ, 1µ := (1, . . . ,1)T ∈ Rnµ .
Center for UncertaintyQuantification
Center for UncertaintyQuantification
Center for Uncertainty Quantification Logo Lock-up
13 / 18
4*
Numerical Experiments
2D L-shape domain, N = 557 dofs.Total stochastic dimension is Mu = Mk + Mf = 20, there are|J | = 231 PCE coefficients
u =231∑j=1
uj,0 ⊗20⊗µ=1
ujµ ∈ R557 ⊗20⊗µ=1
R3.
Center for UncertaintyQuantification
Center for UncertaintyQuantification
Center for Uncertainty Quantification Logo Lock-up
14 / 18
4*
Level sets
Now we compute level sets
sign(b‖u‖∞1− u)for b ∈ {0.2, 0.4, 0.6, 0.8}.
I Tensor u has 320 ∗ 557 ≈ 2 · 1012 entries ≈ 16 TB ofmemory.
I The computing time of one level set was 10 minutes.I Intermediate ranks of sign(b‖u‖∞1− u) and of rank(uk )
were less than 24.
Center for UncertaintyQuantification
Center for UncertaintyQuantification
Center for Uncertainty Quantification Logo Lock-up
15 / 18
4*
Example: Canonical rank d , whereas TT rank 2
d-Laplacian over uniform tensor grid. It is known to have theKronecker rank-d representation,
∆d = A⊗IN⊗...⊗IN +IN⊗A⊗...⊗IN +...+IN⊗IN⊗...⊗A ∈ RI⊗d⊗I⊗d
(5)with A = ∆1 = tridiag{−1,2,−1} ∈ RN×N , and IN being theN × N identity. Notice that for the canonical rank we have rankkC(∆d ) = d , while TT-rank of ∆d is equal to 2 for anydimension due to the explicit representation
∆d = (∆1 I)×(
I 0∆1 I
)× ...×
(I 0
∆1 I
)×(
I∆1
)(6)
where the rank product operation ”×” is defined as a regularmatrix product of the two corresponding core matrices, theirblocks being multiplied by means of tensor product. The similarbound is true for the Tucker rank rankTuck (∆d ) = 2.
4*
Advantages and disadvantages
Denote k - rank, d-dimension, n = # dofs in 1D:
1. CP: ill-posed approx. alg-m, O(dnk), hard to computeapprox.
2. Tucker: reliable arithmetic based on SVD, O(dnk + kd )
3. Hierarchical Tucker: based on SVD, storage O(dnk + dk3),truncation O(dnk2 + dk4)
4. TT: based on SVD, O(dnk2) or O(dnk3), stable5. Quantics-TT: O(nd )→ O(d logqn)